Kelvin NLP
Kelvin NLP
Kelvin NLP is comprehensive natural language processing library designed to provide both low-level
and high-level functionality via Python or API access. The core functionality of the library is
implemented without dependence on external frameworks like spaCy
, NLTK
, or
OpenNLP
, allowing for a high-performance, low-impact, permissive use at scale.
While not required, for convenience and compatibility, Kelvin NLP does provide an interface to libraries
like spaCy
or transformers
and to external APIs like GPT from OpenAI/Azure Cognitive Services
or Claude from Anthropic.
Kelvin NLP can be used as:
- A high-performance Python library with minimal external dependencies and no remote APIs
- A convenient Python library with integration into frameworks like
spaCy
ortransformers
- A stateless, high-performance API with an OpenAPI schema compatible with nearly 50 languages
The following durable storage options are available:
- Filesystem
- Postgres
- SQL Server
Use Cases and Tasks
Kelvin NLP is designed to simplify the following common use cases:
- Document Search
- Document or Text Clustering
- Document or Text Classification
- Feature Engineering
- Feature Storage
Kelvin NLP provides support for the following tasks:
Low-Level Tasks
- tokenization of legal documents
- segmentation of legal documents into sentences, paragraphs, clauses, sections, or subdocuments
- classification of characters or tokens into common types (e.g., punctuation, enumeration, roman numeral)
- generation of token or character n-gram sequences
- efficient lookup structure construction (e.g., DAWG, B-K Trees,)
- efficient similarity structure construction (e.g., locally-sensitive hashes or forests)
- spare or dense frequency matrix construction
- dictionary or stopword construction
- JSONL or XML training sample construction
- custom NLP model training via
gensim
,scikit-learn
, andtransformers
High-Level Tasks
- word or phrase similarity (string or token distance metrics)
- document or segment embedding construction (shallow via gensim, deep via transformers)
- clustering by term, n-gram, embeddings, or LLM
- classification by term, n-gram, embeddings, or LLM
- information extraction (built-in offline models or external LLM)
- addresses
- dates
- durations
- entities (persons, companies, etc.)
- money or currency
- numeric values
- percentages or ratios
- summarization via LLM (e.g., GPT, T5 or other
transformers
models) - knowledge graph extraction via LLM
- question answering via LLM
- generative patterns like drafting emails or memos via LLM
LLM Support
For more information about which large language models can be used with Kelvin NLP, please see the Supported LLMs page.
API and Library Documentation
You can learn more about Kelvin NLP by reviewing the PyDoc documentation and OpenAPI schemas provided with the library. To access the PyDoc documentation, you can run the following command:
kelvin --docs kelvin.nlp
Examples
- Segmenting and tokenizing documents with Kelvin NLP
- Working with tokens and n-grams in Kelvin NLP
- Using Kelvin NLP embedding models
- Creating your own embedding models with Kelvin NLP
- Question answering with Kelvin NLP
- Summarizing documents with Kelvin NLP
- Finding named entities like people and companies with Kelvin NLP
- Finding similar clauses in contracts with Kelvin embeddings
- Summarizing invoices with Kelvin Billing and Kelvin NLP
- Summarize and Search Federal Register documents with Kelvin NLP
- Summarize and Search EDGAR Filings with Kelvin NLP
- M&A Deal Room with Kelvin Source and NLP