Legal Text Analysis

Natural Language Processing is an indispensable part of legal applications.

Legal texts mostly consist of unstructured text, NLP allows either the automatic acquisition of legal knowledge or the automatic support for close reading of legal text.

By using a combination of NLP and Semantic Web technologies involving e.g. XML and Linked Open Data, novel methods can be developed to analyze the law, make it more available to the general public, and support automated reasoning. 

The examples below involve varying combinations of expert qualitative work and automation.

Example 1: Conceptual Support for Focused Qualitative Text Analysis
In this work we address the role of language technology in scholarly research flows, and present a focused close reading workflow model that combines focused manual close reading by experts with (semi-) automatic language analysis. The integration of manual and automatic interpretation and analysis of textual material aims at optimizing the scholarly exploration of text by providing focused information for dynamically addressing research questions.

Reference: Peters, W, Parks, L. and Lennan, M. (2018), Integrating Language Technology into Scholarly Research Workflows. In: Pitcher, L. and Pidd, M. (Eds.), Proceedings of the Digital Humanities Congress 2018. Studies in the Digital Humanities. Sheffield: The Digital Humanities Institute, 2019. Read article

Example 2: Annotating Legal Documents

This paper illustrates manual annotation using an online annotation tool. A group of law school students annotated a corpus of legal cases for a variety of annotation types, e.g. citation indices, legal facts, rationale, judgement and cause of action. Disagreements between annotators were discussed and resolved within an educational setting and the resulting annotated corpus was curated, producing a gold standard corpus of annotated texts.

Reference: Wyner, A., Peters, W. and Katz, D., A Case Study in Legal Annotation, In: Proceedings of JURIX 2013, the 26th International Conference on Legal Knowledge and Information Systems, Bologna, Italy 2013 Read article

Example 3: Knowledge Extraction as Linked Open Data
This work concerns the enrichment of the Talk of Europe (ToE) data set, which consists of plenary debates of the European Parliament as Linked Open Data (http://linkedpolitics.ops.few.vu.nl/).
Using automatic natural language processing techniques we produced a ToE extension in the form of extracted and linked English terminology.
The work involved the following steps:

  • Linguistic data pre-processing such as tokenization, part of speech tagging, sentence detection
  • Term extraction: determine important domain-specific vocabulary by assigning termhood scores to terms.
  • Sentence-based sentiment, which can provide insight into the attitude of members of parliament and their political parties regarding certain issues.
  • Relation extraction: finding related terms internal to each data set using pointwise mutual information
  • Semantic Web and Linked Open Data: RDF production using e.g. standard SKOS relations, to enable researchers to further explore the content of the parliamentary speeches by means of querying semantic metadata (terms, relations between terms, sentiments) in the semantic web query language SPARQL.

Example 4: Extraction and Modeling of Legal Semantic Relations
Legal experts need tools that help to extract and interpret large amounts of legal texts in a uniform way. Of particular interest, experts wish to access the norms, which express the duties, rights, etc. of the parties discussed in the law. At a more fine-grained level, it is important to access who bears what role vis a vis the norm, e.g. who is the responsible agent or the receiving party to the action. Yet, it is widely acknowledged that given the complexity of legal language, this is a difficult task. One step towards facilitating this task is to establish a semantic model of the norms, giving the structure of the parties, the roles they play, and the interconnections amongst the different forms of norms. Such a model provides a target to guide identification and extraction of key textual components, which would move us closer to the goal of making the contents of legal texts accessible in greater detail, variety, and volume.
We have created a semi-automatic methodology and application for identifying the Hohfeldian relation Duty in legal text, using the General Architecture for Text Engineering tool for the automated extraction of Duty instances and its associated roles. The method is intended to incrementally support scholars in their interpretation.

Reference: Peters, W. and Wyner, A., Legal Text Interpretation: Identifying Hohfeldian Relations from Text. In: Proceedings of LREC 2016 (Read article)