Due Diligence/M&A Automation
- A Transactional Day in the Life
- Automated M&A with Kelvin Legal Data OS
- Step-by-step through the automation process
- Syncing the deal room
- Listing the deal room documents
- Summarizing the documents
- Choose-your-own Large Language Model (LLM)
- Answering questions about single documents
- Automating custom diligence checklists with Kelvin
- Retrieval-augmented techniques for diligence checklists with Kelvin Vector and Kelvin embeddings
- What’s next?
On this page
- A Transactional Day in the Life
- Automated M&A with Kelvin Legal Data OS
- Step-by-step through the automation process
- Syncing the deal room
- Listing the deal room documents
- Summarizing the documents
- Choose-your-own Large Language Model (LLM)
- Answering questions about single documents
- Automating custom diligence checklists with Kelvin
- Retrieval-augmented techniques for diligence checklists with Kelvin Vector and Kelvin embeddings
- What’s next?
A Transactional Day in the Life
Attorneys who work in transactional practice areas are often faced with the challenge of reviewing a large number of documents in a short period of time. Whether they work for a law firm, a Fortune 500 company, or a private equity firm, the challenge is often the same.
The following is a typical day in the life of a transactional attorney:
- The team receives a request to support a transaction.
- A shared drive or deal room is shared by the client or counterparty.
- The team manually downloads the documents and then uploads them to an internal shared drive or DMS.
- The team performs a first pass to identify the documents that are relevant to the transaction or prior requests for information.
- If any documents are missing, the team requests them from the client or counterparty.
- For standardized transactions and checklists, the team follows a process to review the documents and identify the relevant information.
- For non-standardized transactions, the team performs an open-ended review of the documents to identify potential issues.
- The team prepares a communication for the client or counterparty summarizing their findings and additional requests for information.
The process is often repeated multiple times as the transaction progresses, strategic alternatives are considered, and new information is provided by the client or counterparty. To make matters even worse, the team is often working on multiple transactions at the same time - and, of course, they are all urgent.
The process is time-consuming, inefficient, and error-prone. There has to be a better way.
Automated M&A with Kelvin Legal Data OS
Kelvin Legal Data OS can help transactional attorneys and their teams work more efficiently and effectively. The following is a typical day in the life of a transactional attorney using Kelvin:
- The team receives a request to support a transaction.
- A shared drive or deal room is shared by the client or counterparty.
- The team uses Kelvin Source to automatically download the documents and upload them to the Kelvin Document Index or an internal shared drive or DMS.
- An automated processes using the LLM summarization methods in Kelvin Document Index summarizes and re-organize the documents based on the team's division of labor.
- An automated process using the Q&A methods in Kelvin Document Index identifies the answers to diligence checklist questions.
- The team reviews the results of these automated processes, confirming and correcting the results as necessary.
- An automated process using the LLM summarization and memo-writing methods in Kelvin Document Index and Kelvin NLP generates a report for the client or counterparty summarizing the team's findings and additional requests for information.
Step-by-step through the automation process
We'll walk through a simple example of how Kelvin can be used to support a transaction. In this example, a transaction involving the Bluth company (yes, that Bluth company) is being supported by an M&A team who has been asked to review the documents in the Bluth company's Box.com data room. The team has been asked to summarize the documents and review the executive employment agreements for standard checklist items.
Kelvin is most frequently used through Jupyter notebooks in the Kelvin Data Lab, which makes it easy to collaborate and iterate. Like all Python code, Jupyter notebooks start with the imports necessary to run the code. The following code imports the Kelvin libraries that we'll need for this example:
Syncing the deal room
# import Kelvin Source and Kelvin API libraries
from kelvin.source.box_source import BoxSource
from kelvin.api.document_index.sync_client import KelvinDocumentIndexSyncClient
The Bluth company's counsel has shared a link to the data room with the M&A team, providing one team member with access to the data room. Luckily, Kelvin Source makes it easy to download the documents from the data room for the team to collaborate on, no matter how big the deal room.
# setup the Box.com source object
data_room_url = "https://bluthcompany.box.com/folder/1234567890"
data_room_source = BoxSource(data_room_url, recursive=True)
# create a client for the Kelvin Document Index
doc_index_client = KelvinDocumentIndexSyncClient()
# download all documents and upload them to the Kelvin Document Index
for document in data_room_source:
doc_index_client.upload_document(
file_object=file.to_filelike(),
file_name=file.name
)
Listing the deal room documents
At this point, the documents - including their text, tables, images, and metadata - are all available in the Kelvin Document Index. Let's start by listing the documents in the data room:
# list the first few documents in the data room
document_data = []
documents = doc_client.get_documents()
for document in documents:
for document_instance in doc_client.get_document_instances(document["id"]):
document_data.append(document_instance)
document_df = pandas.DataFrame(document_data)
The following table shows the first few documents in the data room, but you can see the full list of documents and read them here.
id | document_id | file_name | created_at | updated_at |
---|---|---|---|---|
0 | 1 | List of Liabilities.docx | 2023-04-21T18:08:53.569800 | 2023-04-21T18:08:53.569800 |
1 | 2 | Employee Stock Ownership Plan for the Bluth Co... | 2023-04-21T18:08:53.585452 | 2023-04-21T18:08:53.585452 |
2 | 3 | Bluth Banana Cloud - Open Source.docx | 2023-04-21T18:08:53.585917 | 2023-04-21T18:08:53.585917 |
3 | 4 | Cornballer Patent.docx | 2023-04-21T18:08:53.748395 | 2023-04-21T18:08:53.748395 |
4 | 5 | SWOT Analysis.docx | 2023-04-21T18:08:59.048763 | 2023-04-21T18:08:59.048763 |
... |
While we're only showing five records here, we've used Kelvin on projects with over 100,000 documents. It's easy to scale Kelvin to any size deal room.
Summarizing the documents
While it's nice to know that the documents are in the data room, it's more useful to know what's in the documents. Kelvin Document Index makes it trivial to summarize the documents for quick triage and review. The following code summarizes the first five documents in the data room:
# get the first five documents in the data room
for document in documents[:5]:
# get the document instances (handling for multiple copies of the same document)
document_instances = doc_client.get_document_instances(document["id"])
file_name = document_instances[0]["file_name"]
# use the LLM summarization methods in Kelvin Document Index to summarize the document
summary = doc_client.get_document_summary(document["id"], engine="gpt-4")
print(f"{file_name}: {summary}")
The summarization methods in Kelvin NLP and the Kelvin Document Index are highly configurable, allowing users to contextualize the summary, change the role or persona of the summarizer, or specify the desired length of the summary in words, sentences, or paragraphs.
The following table shows the summaries for the first five documents in the data room:
- List of Liabilities.docx (id=1): The text lists different types of liabilities, including short and long-term debt, accounts payable, deferred revenue, employee benefits, lease obligations, taxes payable, and warranty liabilities.
- Employee Stock Ownership Plan for the Bluth Company, Inc.pdf (id=2): The Bluth Company offers an ESOP for employees over 21 who have worked for at least a year. The company contributes a percentage of its net income each year and employees become fully vested after two years or upon reaching age 65. Retirement, disability, death, or termination of employment results in a distribution of the vested account balance in cash or stock.
- Bluth Banana Cloud - Open Source.docx (id=3): The Bluth Banana Cloud platform utilizes several open source software tools for various purposes such as Apache Tomcat for web hosting, MySQL for database management, Redis for data storage, RabbitMQ for message brokering, Elasticsearch for search and analytics, Logstash for data processing, Kibana for data visualization, Docker for containerization, Kubernetes for container orchestration, and Grafana for data monitoring and visualization.
- Cornballer Patent.docx (id=4): The document describes the Cornballer, a kitchen appliance designed for quick and efficient cooking of corn balls. It has a non-stick surface, timer and temperature control, as well as a safety mechanism. The appliance is compact and believed to have commercial potential.
- SWOT Analysis.docx (id=5): The document lists the strengths of a company's unique offering, experienced team, and user-friendly platform, as well as weaknesses such as a limited customer base and marketing efforts. Opportunities for expansion and new features are identified, but competition from other SaaS providers and potential industry disruption or customer preferences pose threats, along with defects in the platform or services.
Choose-your-own Large Language Model (LLM)
Note that we pass an engine parameter to the get_document_summary
method above. This parameter specifies which LLM to use, like gpt-3.5-turbo
, gpt-4
, or
another local or remote LLM. It's easy to configure Kelvin to use one or more local or remote LLMs via configuration
or runtime like this:
"LLM": [
{
"name": "gpt-3.5-turbo",
"provider": "openai",
"provider_model": "gpt-3.5-turbo",
"provider_auth": {
"org_id": "org-abc123...",
"api_key": "sk-abc123..."
}
},
{
"name": "gpt4all-j-1.3",
"provider": "transformers",
"provider_model": "gpt4all-j-1.3",
}
],
We support dozens of LLMs currently, including GPT-4 through OpenAI
or Azure, Claude from Anthropic, or transformers
models like MPT-7B, GPT4All, Dolly2, and more. It's easy
to add your own LLM engine for any other text completion or chat assistant-style model.
Answering questions about single documents
In the table above, we see that one of the files summarizes the liabilities of the company. Lists of liabilities are often key documents in due diligence, as they outline financial obligations and additional sources of covenants or restrictions on the target company. Let's see how we can use Kelvin Document Index to answer questions about the first document in the data room, List of Liabilities.docx, just like a human would.
# setup the question and send it to the Kelvin API with gpt-3.5-turbo
question = "How many debt instruments does the company have?"
answer = doc_client.get_document_answer(1, "gpt-3.5-turbo", question)
The results are shown below with the correct answer:
- Question: How many debt instruments does the company have?
- Answer: Two
Depending on the LLM used, we can ask substantially more open-ended or complex questions, like the following:
question = "How do employees vest?"
answer = doc_client.get_document_answer(2, "gpt-4", question)
- Question: How do employees vest?
- Answer: Employees vest in their ESOP accounts after two years of continuous employment with the company or upon reaching their 65th birthday.
Automating custom diligence checklists with Kelvin
Basic summarization and question answering is incredibly useful, but for many organizations, diligence is a more structured process. Kelvin Document Index makes it easy to implement custom checklists for diligence, compliance, or other use cases. Let's see how we can implement a custom checklist for compensation in executive employment agreements.
Today, most diligence checklist processes are run almost entirely through manual review. Legal professionals and other experts review documents, flag relevant information, and typically summarize their findings in brief memos or Excel spreadsheets. This process is time-consuming, expensive, and error-prone, and it's difficult to scale across multiple documents, data rooms, or deals simultaneously.
One common type of checklist has to do with terms of employment and compensation for senior executives. These agreements are often complex, and it's important to ensure that key terms are consistent with the company's policies and market norms. While checklists may vary across industries and organizations, they almost always include basic information about the employee, their salary, and their bonus plans.
In this example, we'll implement these common elements of executive employment checklists
using a retrieval-augmented workflow. We'll start by using the term salary
to
retrieve relevant documents from the corpus, and demonstrate how to use embeddings and vector similarity
in the example below.
We'll then extract the name of the employee, their salary, and their bonus plan using an LLM and structure this information with Kelvin NLP to create a useful spreadsheet summary.
# store results across all documents
salary_data = []
# find all sentences or paragraphs containing salary
segments = doc_client.search_document_segment_contents("salary")
for segment in segments:
# get the first unique file name
document_instances = doc_client.get_document_instances(segment["document_id"])
file_name = document_instances[0]["file_name"]
# get the name of the employee
employee_name = doc_client.get_document_answer(
segment["document_id"],
"gpt-3.5-turbo",
"Respond with only the name of the employee."
)
employee_bonus = doc_client.get_document_answer(
segment["document_id"],
"gpt-3.5-turbo",
"Does the employee have a bonus plan?"
)
# get the text and extract monetary amounts
segment_text = doc_client.get_document_segment(segment["id"])
money_values = nlp_client.get_money(segment_text['text'])
for money in money_values['moneys']:
salary_data.append({
"file_name": file_name,
"name": employee_name["answer"],
"bonus": employee_bonus["answer"],
"text": money['text'],
"quantity": money['quantity'],
"currency": money['currency'],
})
The results are shown below:
file_name | name | bonus | text | quantity | currency |
---|---|---|---|---|---|
Employment Agreement - Lucille Bluth.docx | Lucille Bluth | Yes, the employee has a bonus plan based on th... | $500,000 | 500000.0 | USD |
Employment Agreement - Michael Bluth.docx | Michael Bluth | Yes, the employee has a bonus plan. | $450,000 | 450000.0 | USD |
Employment Agreement - Buster Bluth.docx | Buster Bluth | Yes, the employee is eligible to earn annual b... | $290,000 | 290000.0 | USD |
Retrieval-augmented techniques for diligence checklists with Kelvin Vector and Kelvin embeddings
The above example is simple, but it demonstrates how Kelvin Document Index can be
used to rapidly automate traditional diligence practices. However, the example is brittle, as it relies on the term
salary
to retrieve relevant documents. If the term salary
is not used in a document, it
will not be retrieved. To address this, we can use Kelvin Vector to retrieve documents or segments that are similar
to a given query based on their embedding vectors.
(While Kelvin supports external embeddings from sources like OpenAI or
transformers
, we ship our own, legal-specific embeddings that are trained on a large corpus of legal
and financial documents. We present results below with both text-embedding-ada-002
and Kelvin's smallest, fastest
model, en-001-small
, for comparison.)
query = """Retention bonuses are taxable income to the employee and must be added to the employee's compensation
in the year in which they are awarded. In view of this, the company, as an additional retention incentive, will
provide a "gross up" to employee income by paying the taxes for retention bonuses so that employees will receive
the full amount indicated above "net of taxes."""
segments = doc_client.search_document_segment_vectors(
query=query,
vector_type="en-001-small",
k=3,
threshold=10
)
for segment in segments:
# get the first unique file name
document_instances = doc_client.get_document_instances(segment["document_segment"]["document_id"])
file_name = document_instances[0]["file_name"]
# get the text and extract monetary amounts
segment_text = doc_client.get_document_segment(segment["document_segment"]["id"])
# ...
The three highest-similarity document segments for Kelvin's embeddings are shown below:
- Employment Agreement - Lucille Bluth.docx: **4. Compensation** If there is a change in control of Bluth, as defined in the applicable agreements governing such change in control, and if the CFO's employment is terminated without cause or if the CFO resigns for good reason within 12 months following such change in control, then the CFO shall be entitled to receive (i) a lump sum severance payment equal to two times her base salary and target bonus in effect immediately prior to such termination or resignation, (ii) immediate vesting of all equity awards granted to her by the Company, and (iii) continuation of health benefits for a period of 12 months following such termination or resignation.
- Employment Agreement - Buster Bluth.docx: **3. Compensation** Bluth shall pay Employee an annual salary of $290,000, payable in accordance with the Company's regular payroll practices. Employee shall also be eligible to earn annual bonuses based on the Company's achievement of specific financial goals.
- Employment Agreement - Michael Bluth.docx: **3. Compensation** Bluth shall pay Employee an annual salary of $450,000, payable in accordance with the Company's regular payroll practices. Employee shall also be eligible to earn annual bonuses based on the Company's achievement of specific financial goals.
In contrast, OpenAI's state-of-the-art text-embedding-ada-002
results are
shown below:
- Tax Opinion.docx: Generally, foreign nationals are subject to withholding if they are engaged in a US trade or business, and receive a US-source payment of income. Depending on the country of residence of the foreign contractor, certain tax treaty provisions may provide an exemption from withholding.
- Tax Opinion.docx: In conclusion, the Bluth Company should ensure that it complies with all federal and state laws and regulations regarding withholding of income from payments made to foreign contractors.
- Tax Opinion.docx: Subject: Tax opinion on withholding of payments made to foreign contractors Date: January 18, 2018 The Bluth Company should withhold income tax from payments made to foreign contractors.
The OpenAI embeddings are clearly not ideal for this task, as they are not able to recall the relevant documents or text segments.
What’s next?
In this first part of our M&A example, we demonstrated how to use Kelvin Source and Kelvin Document Index to rapidly ingest raw documents, summarize, triage, answer questions, and search. In the next part, we will demonstrate how to use Kelvin Document Index to synthesize these results into a single, cohesive memo or report that can be used to communicate within the team or with clients. Stay tuned for more!