Skip to main content

The Classical Language Toolkit

Project description

The Classical Language Toolkit (CLTK) is a Python library offering natural language processing (NLP) for pre-modern languages.

Installation

For the CLTK's latest version:

pip install cltk

Optional extras

  • GenAI (OpenAI-backed annotation):
pip install "cltk[openai]"
  • Stanza (discriminative NLP backends powered by Stanford Stanza):
pip install "cltk[stanza]"

You can combine extras, for example:

pip install "cltk[openai,stanza]"

# or include local LLM support as well
pip install "cltk[openai,stanza,ollama]"
  • Local LLMs via Ollama:

Install the optional extra and ensure an Ollama server is running locally:

pip install "cltk[ollama]"

By default, when using backend='ollama', CLTK uses the model llama3.1:8b. To choose a different local model, pass the model parameter to NLP(...), e.g. qwen2.5:14b, gemma2:27b, llama3.1:70b, or any Ollama model string.

Choosing a model

  • OpenAI backend (GenAI in the cloud):
from cltk import NLP

# Default model is "gpt-5-mini" when backend='openai'
nlp = NLP('lati1261', backend='openai')

# Choose a specific model
nlp_big = NLP('lati1261', backend='openai', model='gpt-5')

# Requires OPENAI_API_KEY to be set in the environment
# (e.g., via a .env file or shell env var)
  • Ollama backend (local LLMs):
from cltk import NLP

# Default model is "llama3.1:8b" when backend='ollama'
nlp_local = NLP('lati1261', backend='ollama')

# Choose a specific local model (any installed/pullable Ollama model)
nlp_qwen = NLP('lati1261', backend='ollama', model='qwen2.5:14b')

# To use the hosted Ollama Cloud, set OLLAMA_CLOUD_API_KEY
# and choose backend='ollama-cloud'. The same model strings apply.

For more information, see Installation docs or, to install from source, Development.

Pre-1.0 software remains available on the branch v0.1.x and docs at https://legacy.cltk.org. Install it with pip install "cltk<1.0".

Documentation

Documentation at https://docs.cltk.org.

Citation

When using the CLTK, please cite the following publication, including the DOI:

Johnson, Kyle P., Patrick J. Burns, John Stewart, Todd Cook, Clément Besnier, and William J. B. Mattingly. "The Classical Language Toolkit: An NLP Framework for Pre-Modern Languages." In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations, pp. 20-29. 2021. 10.18653/v1/2021.acl-demo.3

The complete BibTeX entry:

@inproceedings{johnson-etal-2021-classical,
    title = "The {C}lassical {L}anguage {T}oolkit: {A}n {NLP} Framework for Pre-Modern Languages",
    author = "Johnson, Kyle P.  and
      Burns, Patrick J.  and
      Stewart, John  and
      Cook, Todd  and
      Besnier, Cl{\'e}ment  and
      Mattingly, William J. B.",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.acl-demo.3",
    doi = "10.18653/v1/2021.acl-demo.3",
    pages = "20--29",
    abstract = "This paper announces version 1.0 of the Classical Language Toolkit (CLTK), an NLP framework for pre-modern languages. The vast majority of NLP, its algorithms and software, is created with assumptions particular to living languages, thus neglecting certain important characteristics of largely non-spoken historical languages. Further, scholars of pre-modern languages often have different goals than those of living-language researchers. To fill this void, the CLTK adapts ideas from several leading NLP frameworks to create a novel software architecture that satisfies the unique needs of pre-modern languages and their researchers. Its centerpiece is a modular processing pipeline that balances the competing demands of algorithmic diversity with pre-configured defaults. The CLTK currently provides pipelines, including models, for almost 20 languages.",
}

License

Copyright (c) 2014–present Kyle P. Johnson under the MIT License.

Project details


Release history Release notifications | RSS feed

This version

2.5.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cltk-2.5.0.tar.gz (470.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cltk-2.5.0-py3-none-any.whl (304.2 kB view details)

Uploaded Python 3

File details

Details for the file cltk-2.5.0.tar.gz.

File metadata

  • Download URL: cltk-2.5.0.tar.gz
  • Upload date:
  • Size: 470.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for cltk-2.5.0.tar.gz
Algorithm Hash digest
SHA256 9257ed83285da9f92e9a492297435d724a6c7c2650bbc5231fc2ccabd3407c87
MD5 bdb3eeec3485e1e4acf858f001238909
BLAKE2b-256 241db4f35513ebd6f8e92bf95a8472333cf9abe7ea4e83559285a9402b0d5b38

See more details on using hashes here.

File details

Details for the file cltk-2.5.0-py3-none-any.whl.

File metadata

  • Download URL: cltk-2.5.0-py3-none-any.whl
  • Upload date:
  • Size: 304.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for cltk-2.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 216ac4b8c197725130157f4b8fe1da5019e70e7681e133161013e1c55d089468
MD5 3c51e0ce24e230e5b5166530109a26e5
BLAKE2b-256 2b9efaebc52de60e0ccb751782960d79b37f1c631ca767e510d2b1172a0006b0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page