Kelvin NLP

Kelvin NLP

Kelvin NLP is comprehensive natural language processing library designed to provide both low-level and high-level functionality via Python or API access. The core functionality of the library is implemented without dependence on external frameworks like spaCy, NLTK, or OpenNLP, allowing for a high-performance, low-impact, permissive use at scale.

While not required, for convenience and compatibility, Kelvin NLP does provide an interface to libraries like spaCy or transformers and to external APIs like GPT from OpenAI/Azure Cognitive Services or Claude from Anthropic.

Kelvin NLP can be used as:

  • A high-performance Python library with minimal external dependencies and no remote APIs
  • A convenient Python library with integration into frameworks like spaCy or transformers
  • A stateless, high-performance API with an OpenAPI schema compatible with nearly 50 languages

The following durable storage options are available:

  • Filesystem
  • Postgres
  • SQL Server

Use Cases and Tasks

Kelvin NLP is designed to simplify the following common use cases:

  • Document Search
  • Document or Text Clustering
  • Document or Text Classification
  • Feature Engineering
  • Feature Storage

Kelvin NLP provides support for the following tasks:

Low-Level Tasks

  • tokenization of legal documents
  • segmentation of legal documents into sentences, paragraphs, clauses, sections, or subdocuments
  • classification of characters or tokens into common types (e.g., punctuation, enumeration, roman numeral)
  • generation of token or character n-gram sequences
  • efficient lookup structure construction (e.g., DAWG, B-K Trees,)
  • efficient similarity structure construction (e.g., locally-sensitive hashes or forests)
  • spare or dense frequency matrix construction
  • dictionary or stopword construction
  • JSONL or XML training sample construction
  • custom NLP model training via gensim, scikit-learn, and transformers

High-Level Tasks

  • word or phrase similarity (string or token distance metrics)
  • document or segment embedding construction (shallow via gensim, deep via transformers)
  • clustering by term, n-gram, embeddings, or LLM
  • classification by term, n-gram, embeddings, or LLM
  • information extraction (built-in offline models or external LLM)
    • addresses
    • dates
    • durations
    • entities (persons, companies, etc.)
    • money or currency
    • numeric values
    • percentages or ratios
  • summarization via LLM (e.g., GPT, T5 or other transformers models)
  • knowledge graph extraction via LLM
  • question answering via LLM
  • generative patterns like drafting emails or memos via LLM

LLM Support

For more information about which large language models can be used with Kelvin NLP, please see the Supported LLMs page.

API and Library Documentation

You can learn more about Kelvin NLP by reviewing the PyDoc documentation and OpenAPI schemas provided with the library. To access the PyDoc documentation, you can run the following command:

kelvin --docs kelvin.nlp