Disclaimer: This is an example of a student written essay.
Click here for sample essays written by our professional writers.

Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of UKEssays.com.

Development of Automated Information Extraction for Legal Domain Requests

Paper Type: Free Essay Subject: Information Systems
Wordcount: 2242 words Published: 28th Jul 2021

Reference this

Someone willing to work in the legal domain has to do a large amount of reading. That is the most common phrase in the first semester of a legal study. Therefore, the massive number of documents in the legal domain requests computational tools supporting efficient search and filtering of information. Increasingly machine-learning oriented research in information retrieval and document classification has spawned several systems capable of handling structural content management. The systems help users to automatically identify relevant structured portions of legal texts, such as paragraphs, chapters or intertextual references.[1] The purpose of this article is to explain, what automated information extraction is, and how it works, with the help of an example. Furthermore, its aim is to show how it holds a significant gain for the field of law. People are busy, and therefore efficiency is more important than ever. It is crucial to deal with and understand the technology that can save time and empower capability.

Get Help With Your Essay

If you need assistance with writing your essay, our professional essay writing service is here to help!

Essay Writing Service

Definition of automated information extraction

It is essential to deal with and understand the technology because lawyers need a lot of personal information and details for the case from their clients. For this reason, law firms have a massive number of documents and databases full of personal information. Information extraction is the process of extracting pre-specified information from text documents, databases, websites or various sources. As an illustration, consider the following paragraph:
“John Doe offered to sell his house to Jane Doe and promised to keep the offer open until Friday. On Thursday John Doe accepted an offer from a third party to purchase the home. John Doe then asked a friend to tell Jane Doe that the proposal was withdrawn. On hearing the news, Jane Doe went to John Doe’s house Friday morning allegedly to accept the offer. She then brought an action seeking specific performance of the contract.”
Through the help of information extraction, basic facts can now be withdrawn of the free-flowing text and arranged in a structured, machine-readable form. For example:

  • Persons: John Doe, Jane Doe
  • Location: John Doe’s House
  • Subject: House
  • Related mentions: Third Party, Offer, Proposal withdrawn

Furthermore, information extraction is a sub-area of Natural Language Processing. Two crucial challenges exist in information extraction. One originates from the variety of ways of expressing the same fact. The other problem, shared by almost all NLP tasks, arises from the high expressiveness of natural languages, which can have ambiguous structure and meaning.[2]

How does automated information extraction works

There are many subtleties and sophisticated techniques involved in the process of information extraction.[3] This process usually begins by identifying and associating grammar features to natural language content. Otherwise, they would be quite indistinguishable character strings. The process is formed of successive NLP steps starting on making contents uniform and ending with the identification of the roles of the words and how they are arranged. Ordinarily, the first steps are tokenisation and sentence boundary detection. The purpose of tokenisation is to break contents into sentences and define the limit of each token — for example word, punctuation mark, or other character clusters such as currencies. Tools for tokenizing texts are found in software suites. Tokenizers often rely on uncomplicated heuristics as all contiguous strings of alphabetic characters are part of one token. The same applies to numbers. Tokens are separated by whitespace characters (space and line break) or by punctuation characters which are not included in abbreviations. Sentence boundary detection addresses the problem of finding sentence boundaries. However, finding these boundaries is not a trivial task since end-of-sentence punctuation marks are equivocal in multiple languages. After that, morphological analysis makes tokens uniform by determining word lemmata. This process is called lemmatization. The advantage of finding word lemmata is to have a single form for all words that have similar meanings. For example, the words “connect”, “connected”, “connecting”, “connection” and “connections” broadly refer to the same concept and have the same lemma. This process also reduces the total number of terms to handle, which is favourable from a computer processing point of view, as it reduces the size and complexity of data in the system.[4] The final step is usually syntactic parsing which can be done using significantly different formalisms. Syntactic parsing is regularly a computational intensive task that is not used as often as in IE systems as tokenisation or sentence boundary detection. Syntactic parsing aims to analyse sentences to produce structures representing how words are arranged in sentences.[5] Structures are produced for a given formal grammar, and throughout the years, different formalisms were proposed considering both linguistic and computational concerns. These NLP steps develop the textual contents for subsequent identification and extraction of relevant information.[6]

After development and enrichment, the document’s contents are now characterised and suitable to be processed by algorithms that will locate and extract information.[7] Identifying entities mentioned in texts is a widespread task in IE and is labelled as Named Entity Recognition (NER). It seeks to locate and classify textual mentions that refer to specific types of entities, such as persons, organisations, addresses and dates. A sequence of noun referents often composes named entities to a single entity, for example, “António Guterres” or “The Secretary-General of the United Nations”. Named Entity Recognition is commonly an early step that prepares further processing and is also a related task by itself as many applications need to detect the entities referred in the documents. Named Entity Recognition is frequently recognised as a two-step procedure. At first, the boundaries of entities are detected, and then predefined categories get assigned, such as persons, organisations, locations, or dates.

Moreover, ontology classes, properties and restrictions could be used to improve the performance of the information extraction process significantly. The benefit of this kind of approach caused the creation of the term Ontology-Based Information Extraction (OBIE). An ontology is specified as a formal specification of a shared conceptualisation.[8] It intends to determine knowledge about a field of action by describing a hierarchy of concepts correlated by subsumption relationships. It can also include assumptions to express other connections between ideas and constrain their intended interpretation.[9]

A system is supposed to implement OBIE when its process of IE capitalises on ontologies to upgrade the capacity of the extraction. Common OBIE systems apply ontological properties to lead the IE process, whether by restricting the possible arguments of a relation or by inferring and confirming hidden information from the extracted facts.

How is it a significant gain for the legal area?

As mentioned previously, lawyers and law firms have a gigantic mass of documents and databases full of personal information. In addition to that they also have a lot of details for their cases and contract details. If a lawyer would spend 10 minutes thinking about tasks in which he needs to search through unstructured text and extract useful data, it would be impossible for him to count those tasks. But to concretise possible fields of application, the following examples will demonstrate, where information extraction works or could work. Information extraction can be useful for contract management departments which can use NLP to discover and extract key terms from their contracts. It helps them generate reports that sum up terms across contracts, and contrast contractual terms to "standard" terms for risk assessment objectives. Then, the applications automatically extract dates, dollar amounts and other vital information for preparation, budgeting and risk reduction purposes. For example, a law firm which is specialised in commercial law could use it to extract dates of the transactions from their contracts.  Furthermore, information extraction can be useful for derivatives traders. The reason is that NLP-powered software empowers them to analyse derivatives contracts. This software can extract interest rates, termination events, and other relevant information. As soon as data has been extracted from the agreements, it is used to carry trading decisions, manage collateral, and strengthen regulatory and compliance necessities. Therefore, a finance lawyer can focus on the trading decisions. Also, law firms in the energy industry can benefit from information extraction, because they explore the use of NLP to speed up oil and gas title abstracting. Oil and gas title abstraction can take months for energy clients with hundreds of parcels of land to recognise and summarize all transports and burdens. Using NLP to identify and extract key information possibly manage to reduce the abstraction process from months to weeks. Thus, law firms in the energy industry improve their efficiency. Another field of law which could benefit from information extraction is intellectual property. Some attorneys already use the software “Lex Machina”, which provides Legal Analytics to law firms and companies. It helps them to extract key data from public court records specifically parties, patents outcomes and other related data. The system links this data together and creates summary reports which help the user to craft IP strategy and win cases.[10] Therefore, IE improved the efficiency of the IP lawyer, and he could use the time for elaborate tasks. IE could be implemented, as stated above in the legal domain. There is, for example, a system called “History Assistant”. This system extracts information from court opinions and utilises this information to suggest previous cases to which a new case should be linked. The natural language component depends on the court hierarchy to define the relative dignity of courts. Meanwhile, the prior case retrieval module sees the court system as a graph with states representing courts, and transitions within such states representing appeals.[11]

Find Out How UKEssays.com Can Help You!

Our academic experts are ready and waiting to assist with any writing project you may have. From simple essay plans, through to full dissertations, you can guarantee we have a service perfectly matched to your needs.

View our services


This essay considered the issue of automated information extraction from legal databases and texts. It was argued that information extraction is a good opportunity to save lawyers and law firms a lot of time and storage capacity. This is due to a lot of tasks in which lawyers and law firms have to search through unstructured text and extract useful data out of it. Information extraction and NLP could easily manage these tasks. There are established systems like “History Assistant” which make NLP usable and applicable to law. It is clear that lawyers and law firms are obliged to improve their efficiency. 

Notes and sources

  1. Claudia Soria, Roberto Bartolini, Alessandro Lenci, Simonetta Montemagni and Vito Pirreli (2007), Automatic extraction of semantics in law documents, University of Pisa, Department of Linguistic, at https://www.academia.edu/2472854/Automatic_extraction_of_semantics_in_law_documents.
  2. Mario Rodrigues and António Joaquim da Silva Teixeira (2015), Advanced Applications of Natural Language Processing for Performing Information Extraction Book, pp. 13-50.
  3. Ontotext USA, Inc., What is Information Extraction?, at https://www.ontotext.com/knowledgehub/fundamentals/information-extraction/.
  4. Porter MF (1980), An algorithm for suffix stripping. Program Electron Libr Inf Syst 14, pp. 130– 137.
  5. Ronald Wayne Langacker (1997), Constituency, dependency, and conceptual grouping. Cogn Linguist 8:1– p. 32.
  6. Mario Rodrigues and António Joaquim da Silva Teixeira (2015), Advanced Applications of Natural Language Processing for Performing Information Extraction Book, pp. 13-50.
  7. Lev Ratinov , Dan Roth (2009), Design challenges and misconceptions in Named Entity Recognition. In: Proceedings of the thirteenth conference on computational natural language learning (CONLL). pp 147– 155.
  8. Thomas Robert Gruber (1993), A translation approach to portable ontology specifications. Knowl Acquis 5:199– 220.
  9. Nicola Guarino (1998), Formal ontology and information systems. In: FOIS 98— Proceedings of the international conference on formal ontology in information systems. IOS Press, Amsterdam, pp 3– 15.
  10. Lars Mahler, What Is NLP and Why Should Laywers Care? (2015), at https://www.lawpracticetoday.org/article/nlp-laywers/.
  11. Peter Jackson, Khalid Al-Kofahi, Alex Tyrell and Arun Vachher, Information extraction from case law and retrieval of prior cases (2001), at https://doi.org/10.1016/S0004-3702(03)00106-1.


Cite This Work

To export a reference to this article please select a referencing stye below:

Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.

Related Services

View all

DMCA / Removal Request

If you are the original writer of this essay and no longer wish to have your work published on UKEssays.com then please: