Automating the Litigation Workflow
The Modern Litigation Workflow
Litigation is a complex process that is often characterized by a series of back-and-forth exchanges between parties and the court itself. These exchanges can be in the form of motions, demands, requests, or other types of communications. In many cases, the form and content of these exchanges are highly structured and predictable. However, the process of responding to these exchanges can be time-consuming and expensive, and many strategic elements of the process are often overlooked due to lack of time or data.
What if you could re-imagine the litigation process as a series of steps that could be automated with large language models and rich historical data?
The Kelvin Legal Data OS provides components that can be used to automate many of the steps in this re-imagined litigation process, including document classification, key fact extraction, data retrieval, and response generation.
In this first part of our series on litigation support, we'll imagine that we're working in a litigation team managing a portfolio of matters before the court. We'll show you how to use Kelvin to clean, classify, and summarize a batch of documents received from the court.
In the second part of the series, we'll then show you how to use Kelvin to we'll show you how to use a "task router" to automatically draft an email, analyze historical docket data, and begin a responsive motion.
Step 1: Preprocessing Court Documents
In most courts, documents are filed or made available in PDF format. Frequently, these documents are scans of paper documents, which means that they often begin their life as unsearchable image PDFs. In order to classify these documents, we need to convert them to text. The Kelvin Document Index can automatically OCR and convert these documents, or you can use Kelvin PDF API to convert them with more control over the process.
In other cases, documents may be received in Word or even WordPerfect formats from the court or opposing counsel. In these cases, the Kelvin Document Index again supports automatically converting them to text, or you can use Kelvin's Office API to convert them with more control over the process.
In this example, we'll show you how to use the low-level Kelvin PDF API client to first convert a PDF to searchable text.
We'll skip the step of monitoring a docket for new filings and downloading the PDF from PACER or a state court. See the Kelvin Research - Court Data example for more information on how to do this.
Let's assume we have a folder of PDFs that we've downloaded from the Western District
of Michigan using kelvin.research.pacer
or another commercial API. The lines of code below first
iterate through these files and convert them to text using the Kelvin PDF API client.
# imports
from kelvin.source.filesystem_source import FilesystemSource
from kelvin.api.pdf.sync_client import KelvinPDFSyncClient
# create the client
pdf_client = KelvinPDFSyncClient()
# iterate over files
for file in FilesystemSource("pacer/miwd/"):
# send to Kelvin PDF
pdf_response = pdf_client.extract_text(
file.to_bytes(),
allow_ocr=True
)
# print the name and text show the output
print(file.name, pdf_response['text'])
The Kelvin PDF API client returns a JSON object that contains the extracted text as well as other metadata about the document. In this example, we'll just print the text to the console, which should look something like this:
pacer/miwd/gov.uscourts.miwd.53299.188.0.pdf:
UNITED STATES DISTRICT COURT
FOR THE WESTERN DISTRICT OF MICHIGAN
SOUTHERN DIVISION
__________________________
KEVIN KING,
Petitioner,
v. Case No. 2:07-cv-133
PATRICIA CARUSO, et al.
...
Even the best OCR systems cannot guarantee 100% accuracy, so it is important to take steps to review the quality of outputs and, in some cases, automatically correct errors. Kelvin OCR uses Kelvin Speller to automatically score and correct OCR errors if configured to do so, but we'll show how to manage this manually in the code below:
# imports
from kelvin.speller.scoring.tokens import score_wordlist, score_simple_word
# print scores based on two quality methods.
# lower scores are better
print("Word classifier score:", score_simple_word(pdf_response['text']))
print("English wordlist score:", score_wordlist(pdf_response['text'], wordlist_type=('en/legal', 100000)))
The output of this code should look something like this:
Word classifier: 0.08011869436201781
English wordlist: 0.07195845697329377
The first method, score_simple_word
, uses a classifier that models the
"shape" of words based on the type of characters they contain. Words with abnormal shapes, e.g., apl7e or 3anana, are
treated as more likely to be OCR errors. The second method, score_wordlist
, leverages the large,
high-quality wordlists provided with Kelvin to score the document based on the number of words that are not in the
wordlist. Novel words, such as the unique names of individuals or companies in a document, may not be errors,
but generally, higher scores indicates lower quality documents or OCR parameters.
Should you need to correct OCR errors, Kelvin Speller provides a variety of tools
for doing so, including both efficient dictionary-based methods and large language model (LLM) engines that can
be used to correct OCR errors in context. In this example, we'll show how to use both BKTreeEngine
and
LLMEngine
with gpt-3.5-turbo
.
# imports
from kelvin.speller.engines.bktree_engine import BKTreeEngine
# create the engine and process the text
bktree_engine = BKTreeEngine()
print(bktree_engine.process("The partics singed the con+ract."))
The output of this code should look something like this, indicating the corrected text and the number of corrections made:
('The parties signed the contract.', 3)
The LLMEngine
interface is similar, but it requires a Kelvin NLP
LLM engine to execute.
# imports
from kelvin.nlp.llm.engines.openai_engine import OpenAIEngine
from kelvin.speller.engines.llm_engine import LLMEngine
# setup LLM and engine
llm_gpt35 = OpenAIEngine("gpt-3.5-turbo")
llm_engine = LLMEngine(engine=llm_gpt35)
# process the text
print(bktree_engine.process("The partics singed the con+ract."))
At this point, we've seen how to use Kelvin to work with real court documents, including both converting document formats to text and correcting OCR errors. In the next section, we'll show how to use Kelvin to classify documents and extract information from them.
Choose-your-own Large Language Model (LLM)
Note that we pass an engine parameter to the get_document_summary
method above. This parameter specifies which LLM to use, like gpt-3.5-turbo
, gpt-4
, or
another local or remote LLM. It's easy to configure Kelvin to use one or more local or remote LLMs via configuration
or runtime like this:
"LLM": [
{
"name": "gpt-3.5-turbo",
"provider": "openai",
"provider_model": "gpt-3.5-turbo",
"provider_auth": {
"org_id": "org-abc123...",
"api_key": "sk-abc123..."
}
},
{
"name": "gpt4all-j-1.3",
"provider": "transformers",
"provider_model": "gpt4all-j-1.3",
}
],
We support dozens of LLMs currently, including GPT-4 through OpenAI
or Azure, Claude from Anthropic, or transformers
models like MPT-7B, GPT4All, Dolly2, and more. It's easy
to add your own LLM engine for any other text completion or chat assistant-style model.
Step 2: Triaging New Documents
In litigation, we often play a reactive role, finding out in the middle of a Friday afternoon that opposing counsel has filed a motion that requires a prompt response. In these situations, it is important to be able to quickly triage newly-filed documents to determine what type of document they are, produce a summary of their contents, and plan responsive activities.
From a technical perspective, this triage process starts with document classification. While Kelvin supports a variety of classification models, we will use the SALI LMSS ontology for this example. The SALI standard provides a hierarchical taxonomy of legal documents that can be used to classify materials filed in litigation, allowing us to quickly determine what type of document we are dealing with. Among other types of tags, the standard also provides a taxonomy of Areas of Law, which we can use to determine the subject matter(s) that a document relates to.
Working with the SALI LMSS ontology is straightforward with Kelvin. As part of
Kelvin Graph's universe of supported ontologies, we provide first-class support for the top-level SALI classes
like Document / Artifact
and Area of Law
. This makes it easy, for example, to get the list of
all Areas of Law with their definitions like this:
# imports
from kelvin.graph.models.sali.lmss import LMSSGraph
# create the graph
lmss_graph = LMSSGraph()
# print the list with definitions
for iri in lmss_graph.get_areas_of_law():
print(lmss_graph.concepts[iri]['label'],
lmss_graph.concepts[iri]['definitions'])
This code produces the following output:
Juvenile Law ['Juvenile law and child protective proceedings.']
Fraud and Economic Torts Law ['The wrong done by another person or business that occurs in, or affects, commerce.']
...
These SALI tags look like labels and definitions above, though it's important to note that these are not related to the case documents we are working with. We'll send the complete list of all SALI tags to the model for classifying our documents in the next steps below.
The SALI LMSS ontology covers multiple aspects of documents, including the type of document, document identifiers, and document metadata. For this example, we primarily care about the the type of litigation document, which is representing by this LMSS class:
Document / Artifact > Litigation Document
Like many ontologies, SALI LMSS uses Internationalized Resource Identifiers (IRIs) to identify concepts in a way
that is both human-readable and machine-processable. For example, the IRI for the Litigation Document class is
easy to look up using Kelvin Graph's LMSS support like this: lmss_graph.label_to_iri["Litigation Document"] =
'https://...'
In the lines of code below, we'll load all of these concepts into a simple dictionary with concept labels and concept definitions. Then, we'll pass this data to a zero-shot large language model (LLM) classifier powered by GPT-4, using both the raw text and an N-gram representation of the text.
# imports
from kelvin.nlp.llm.engines.openai_engine import OpenAIEngine
from kelvin.nlp.llm.classify.text_classifier import TextClassifier
# get SALI LMSS concepts for area of law
area_of_law_data = {
lmss_graph.concepts[iri]['label']: lmss_graph.concepts[iri]['definitions']
for iri in lmss_graph.get_areas_of_law(max_depth=1)
}
# get SALI LMSS concepts for document type
doc_type_data = {
lmss_graph.concepts[iri]['label']: lmss_graph.concepts[iri]['definitions']
for iri in lmss_graph.get_children(lmss_graph.label_to_iri["Litigation Document"][0], max_depth=2)
}
# setup LLM engine
engine = OpenAIEngine(model="gpt-4")
# setup text classifier
doc_type_classifier = TextClassifier(engine, label_data=doc_type_data)
area_of_law_classifier = TextClassifier(engine, label_data=area_of_law_data)
# classify a document from the PDF API response above
doc_type_labels = doc_type_classifier.classify(pdf_response['text'])
aol_labels = area_of_law_classifier.classify(pdf_response['text'])
print('Document type:', doc_type_labels[0])
print('Area of Law:', aol_labels[0])
This code should produce output like this:
Document type: ('Opinion and Order', 1.0)
Area of Law: ('Personal Injury and Tort Law', 1.0)
Step 3: Summarizing and Interrogating Documents
Once we have classified a document, we can use Kelvin to summarize its contents to help us understand what it is about and what types of actions we need to take in response. For this example, we'll ask GPT-4 to provide additional context for the document by summarizing it in a few sentences.
# imports
from kelvin.nlp.llm.summarize.recursive_split_summarizer import RecursiveSplitSummarizer
# summarize
summarizer = RecursiveSplitSummarizer(engine=engine)
summary = summarizer.get_summary(pdf_response['text'])
print(summary)
This code should produce output like this:
In a case involving secondhand smoke exposure in prison, the U.S. District
Court for the Western District of Michigan partially adopted the Report and
Recommendation, granting summary judgment on future damages but denying
defendants' objection on the issue of qualified immunity, as it was a
question of fact for the jury.
We can also easily ask the model to answer questions about the document, which can be useful in either an ad-hoc context or as part of a more regular workflow. For example, if we want to understand why the court denied the defendant's objection, we can ask the model to answer that question for us:
# imports
from kelvin.nlp.llm.qa.recursive_split_answerer import RecursiveSplitAnswerer
# setup the engine and ask the question
answerer = RecursiveSplitAnswerer(engine=engine)
question = 'List the denied motions and justification for each denial.'
rejection_list = answerer.get_answer(pdf_response['text'], question)
This code should produce output like this:
1. Plaintiff's objection: The court denied Plaintiff's objection without
providing any specific explanation for the rejection. However, it can be
inferred that the Court found no convincing argument or evidence from the
Plaintiff's objection that could potentially change the outcome of the
Magistrate Judge's Report and Recommendation.
2. Defendants' objections as to Issue I (Defendant Caruso’s Qualified
Immunity): The court denied Defendants' objection, agreeing with the
R & R's analysis and conclusion that Issue I is a question of fact for
the jury to decide. Defendant Caruso reiterated the same argument she
made to the Magistrate Judge without providing any new precedent or
argument, which did not persuade the court to change the decision of
the Magistrate Judge's Report and Recommendation.
Kelvin even makes it simple to create communications or other memo-like work product. Let's look at how we can combine the summary and answer above into a brief email update to a client:
# imports
from kelvin.nlp.llm.summarize.text_memoizer import TextMemoizer
# setup the memoizer
memoizer = TextMemoizer(engine=engine,
context_type="litigation update",
context_role="plaintiff's counsel writing to the client")
memo_input = f"""Summary: {summary}\n{rejection_list}"""
memo_output = memoizer.get_summary(memo_input)
print(memo_output)
[Your Name]
[Your Law Firm]
[Your Address]
[City, State, Zip Code]
[Date]
[Client's Name]
[Client's Address]
[City, State, Zip Code]
RE: LITIGATION UPDATE IN SECONDHAND SMOKE EXPOSURE CASE
Dear [Client's Name],
I am writing to update you on the recent developments in your case involving
secondhand smoke exposure in prison before the U.S. District Court for the
Western District of Michigan. The Court has issued an order addressing the
objections raised by both parties to the Magistrate Judge's Report and
Recommendation (R & R). Below is a summary of the Court's decisions on
the Plaintiff's and Defendants' objections:
1. Plaintiff's Objection: The Court denied our objection to the R & R
without providing a specific explanation for the rejection. It appears
the Court did not find our arguments or evidence sufficiently persuasive
to alter the outcome of the Magistrate Judge's analysis and conclusions.
2. Defendants' Objections Regarding Defendant Caruso's Qualified Immunity
(Issue I): The Court denied the Defendants' objection, siding with the
R & R's analysis and conclusion that whether Defendant Caruso is entitled
to qualified immunity is a question of fact for the jury to determine.
Defendant Caruso failed to provide any new legal precedent or argument to
support her position, which did not persuade the Court to deviate from
the Magistrate Judge's decision.
Additionally, the Court partially adopted the R & R, granting summary
judgment on future damages, but denied the Defendants' objection on the
issue of qualified immunity. This means that the issue of qualified
immunity for Defendant Caruso will proceed to trial, where the jury will
decide the matter.
We will continue to advocate diligently on your behalf and prepare for
trial. Please feel free to contact our office should you have any
further questions or require additional information.
Thank you for entrusting our firm with your case. We remain committed
to seeking justice on your behalf.
Sincerely,
[Your Name]
[Your Law Firm]
It's important to remember that this is a basic example using only the opinion and order itself. For real workflows, we would combine the opinion and order with other documents, such as the complaint. the Magistrate Judge's Report and Recommendation, or cited precedent from the Seventh Circuit using a legal research API. We could also provide any additional information, examples, or instructions to our model to draft this communication in specific style or template or to include additional information.
Step 4: Building Custom Status Reports
Now let's look at how we can put these pieces together to create a table that summarizes any new Motions or Orders received today. We'll iterate over all the documents in our batch, classify them to determine whether they are a Motion or Order, and, if so, summarize them and present them in a table.
# imports
from kelvin.source.filesystem_source import FilesystemSource
from kelvin.api.pdf.sync_client import KelvinPDFSyncClient
from kelvin.graph.models.sali.lmss import LMSSGraph
# create the graph for SALI data
lmss_graph = LMSSGraph()
# get SALI LMSS concepts for document type
doc_type_data = {
lmss_graph.concepts[iri]['label']: lmss_graph.concepts[iri]['definitions']
for iri in lmss_graph.get_children(lmss_graph.label_to_iri["Litigation Document"][0], max_depth=2)
}
# setup LLM engine
engine = OpenAIEngine(model="gpt-4")
# setup text classifier
doc_type_classifier = TextClassifier(engine, label_data=doc_type_data)
# get the list of order types and their children
order_iri = lmss_graph.label_to_iri['Order'][0]
order_children_iri = lmss_graph.get_children(order_iri)
motion_iri = lmss_graph.label_to_iri['Motions'][0]
motion_children_iri = lmss_graph.get_children(motion_iri)
# setup a summarizer
summarizer = RecursiveSplitSummarizer(engine=engine)
# create the client
pdf_client = KelvinPDFSyncClient()
# iterate over files and find the Motions/Orders
summary_document_list = []
for file in FilesystemSource("pacer/miwd/"):
# send to Kelvin PDF API
pdf_response = pdf_client.extract_text(
file.to_bytes(),
allow_ocr=True
)
# classify the document using the SALI Litigation Document concepts and GPT-4
doc_classifications = doc_type_classifier.get_classifications(pdf_response['text'])
# now, we'll check if the label is a Motion/Order or "child" concept
for doc_label, doc_label_score in doc_classifications:
# check if label is an order
doc_label_iri = lmss_graph.label_to_iri[doc_label][0]
if doc_label_iri == order_iri or doc_label_iri in order_children_iri:
summary = summarizer.get_summary(document['text'])
summary_document_list.append({
"file": file,
"text": pdf_response['text'],
"type": "order",
"classifications": doc_classifications,
"summary": summary
})
break
# check if label is a motion
if doc_label_iri == motion_iri or doc_label_iri in motion_children_iri:
summary = summarizer.get_summary(document['text'])
summary_document_list.append({
"file": file,
"text": pdf_response['text'],
"type": "motion",
"classifications": doc_classifications,
"summary": summary
})
break
This table can be used to quickly review the documents received today and looks like this:
File | Type | Summary |
---|---|---|
gov.uscourts.miwd.53299.188.0.pdf | Order | In the case Kevin King v. Patricia Caruso et al., objections were raised to the Magistrate Judge's Report and Recommendation, which advised denying the Defendants' Motion for Summary Judgment. The Court partially adopted the R&R, partially granting the Motion for Summary Judgment. Defendant Caruso's qualified immunity was determined to be a question for the jury, but Plaintiff's expectation damages were denied due to insufficient evidence of medically certain future disease. |
gov.uscourts.miwd.66565.67.0.pdf | Order | A scheduling conference for the case Georgia Pacific Consumer Products, LP, et al. v. NCR Corporation, et al. is set for June 27, 2011, in the United States District Court, Western District of Michigan. The conference aims to discuss case management, expediting disposition, and establishing discovery plans. The parties must submit a joint status report at least three business days before the event. |
gov.uscourts.miwd.62920.46.0.pdf | Order | The US District Court of the Western District of Michigan approved a summary judgment in favor of defendant Nancy Lange in a case where plaintiff Fingal E. Johnson accused her of deliberate indifference to his medical needs. The case remains open against another defendant, Dr. Lacy. |
... |
Step 5: Routing tasks to the right people with legal “AutoGPT”
This summary provides a quick overview of the document's contents, which we can use to send to a large language model or attorney to determine next steps. In the real world, we'd almost certainly want to put this in front of an internal expert to review and determine next steps. However, at large firms, finding the right person available at the right time can be a challenge.
What if you could combine rich expertise and experience data with the document summary to automatically route the document to the right person?
What if the system could automatically check the resource's calendar with Office 365 integration to see if they're available to take on the task today?
Even better, what if the system could automatically check on the resource's recent utilization in Aderant or query Elite, Foundation, or Intapp to see if the resource is conflicted out of the case?
Kelvin is designed to help you do all of this and more. Stay tuned for our next example in this series, which will show you how to use Kelvin to route a draft response to the best available attorney by reviewing Aderant and Office 365 information to rank the most appropriate resources.