Open source and text-mining legal research

By Heather Morse on March 21, 2009

“There are pros and cons to each side of the closed versus open source issue in legal informatics, just as in other disciplines like software development. For instance, closed sources provide added value in editorial services, quality control and currency. Although these endeavors should be rewarded, one can argue that the legal system is publicly funded, its decisions are in the public interest, and hence, primary legal material ought to be publicly available. Furthermore, one can claim that free access to legal source material would provide fertile ground for the development of innovative tools to search, analyze and apply the law.” – Dr. Adam Zachary Wyner, “Text Mining Case Law,” Legal Technology, Law.com

It’s a discussion worth having, and not just because firms continue to search for line-items in their budgets they can trim. Legal research, as we know it, remains a closed source via subscription services offered by two vendors, LexisNexis and Westlaw, with their proprietary tools — head notes, commentary, and advisory services. The World Legal Information Institute is free, independent, and non-profit access to the worldwide law, with 894 databases from 123 countries. While continued development of an open corpora of case law is exciting indeed, there’s one huge problem: the parameters of search.

LexisNexis and Westlaw use Boolean search, a technology that is already 30 years old. Free text search (commonly known as keyword search), as in Google, has its limitations: you get more extraneous information than you need. Both types of searches rely on strings or patterns created by words, not necessarily meaning. What we really need to perfect search in legal databases is semantic search, utilizing syntax and contextual semantics.

I wrote about using Powerset as a tool to search Wikipedia here last year. Powerset has since been bought by Microsoft, and you can get the feel for linguistic search there, but only on Wikipedia. I experimented with disambiguation searches in the Powerlabs while it was in beta, and passed along the suggestion to a couple of the AI scientists that this would be a very cool technology to apply to legal research. Wyner references General Architecture for Text Engineering (GATE).

Imagine the time and money legal researchers would save if they could just type a real question into the search engine in the open source corpora of case law, such as,”What are court rules in all jurisdictions governing the accrual of interest in condemnation judgments?” and get the precise, relevant cases. In my email discussion with Dr. Wyner today, he responded that the research community knows very little about the questions lawyers would want to query in natural language. He wrote, “The more such sample questions we could have as an open research community, the better able we would be to design tools to give the relevant answers.”

While open source and the perfection of text-mining tools through the semantic web will create further disintermediation in the legal industry, transparency and efficiency in legal knowledge systems are worthy goals.

Dr. Wyner’s blog is LanguageLogicLawSoftware.

Menu

The Legal Watercooler

Open source and text-mining legal research

About Heather