Skip to main content

A toolkit for Corpus Linguistics Analysis

Project description

Kitconc 3.4.3

Kitconc is a package for Corpus Linguistics and text analysis with Python. It contains, among other things, tools for creating:

  • Corpora;

  • Frequency wordlists;

  • Keywords (Log-Likelihood, Chi-Square, TF-IDF);

  • Concordance lines;

  • Collocates;

  • N-gram lists;

  • Dispersion plots;

  • Excel data files;

  • Semantic search with sentence embeddings.

The package is built on top of platforms and packages for scientific research: numpy, pandas, NLTK, XlsxWriter and matplotlib.

Requirements

Kitconc requires Python 3.10 or later.

Package dependencies (pip install kitconc):

numpy>=1.26.4,<2.0.0 pandas>=2.2.0,<3.0.0 matplotlib>=3.7.0,<4.0.0 xlsxwriter>=3.2.3,<4.0.0 ttkbootstrap>=1.12.0,<2.0.0 pillow>=11.2.0,<12.0.0 requests>=2.31.0,<3.0.0 nltk>=3.9.1,<4.0.0 chardet>=5.2.0,<6.0.0 pypdf>=4.0.0,<7.0.0 cryptography>=3.1,<47.0.0 mcp>=1.0.0,<2.0.0 setuptools>=70.0.0

Additional dependencies listed in requirements.txt (full local environment):

torch>=2.6,<2.10 (CPU wheels via –extra-index-url https://download.pytorch.org/whl/cpu) transformers>=4.45,<6.0.0 sentence-transformers>=3.0,<6.0.0 sqlite-vec>=0.1.7,<1.0.0 fastapi>=0.110,<1.0.0 uvicorn[standard]>=0.27,<1.0.0 python-dotenv>=1.0.0,<2.0.0

Installation

pip install kitconc

Kitconc App (graphical interface)

kitconc-app

Agent Layer (internal actions)

Kitconc now includes an internal action layer for agent/tool orchestration:

  • kitconc.agent.actions.KitconcActions

  • Full parity with shell commands from kit_cmd.py (do_*)

  • Typed schemas in kitconc.agent.schemas

  • Contract documentation in kitconc/agent/CONTRACT.md

  • Semantic retrieval action: semantic_search(…)

Basic usage:

from kitconc.agent import KitconcActions actions = KitconcActions(“kitconc_workspace”) actions.create(“ads”, “kitconc_corpora/ads”, “english”) actions.use(“ads”) rows = actions.keywords(limit=10)

MCP Server (for agent integrations)

kitconc-mcp –transport stdio

For HTTP clients (recommended):

kitconc-mcp –transport streamable-http –host 127.0.0.1 –port 8001

or (legacy SSE):

kitconc-mcp –transport sse –host 127.0.0.1 –port 8001

Includes semantic retrieval tool: semantic_search (query, top_k, db_path, model_name)

MCP runtime is included in package dependencies (pip install kitconc is enough).

What’s new in 3.2.0

  • Tkinter launcher command – start GUI with kitconc-app

  • Agent action layer – kitconc.agent.actions.KitconcActions with command parity from kit_cmd.py

  • Typed schemas – available in kitconc.agent.schemas

  • MCP server entrypoint – run with kitconc-mcp

  • Semantic search MCP tool – semantic_search for sqlite-vec indexes

  • Embedding index hardening – transactional writes and thread-safe SQLite access

  • Progress flag rename – use verbose=True (replacing show_progress=True)

What’s new in 3.1.0

  • TF-IDF keywords – third keyword extraction method alongside Log-Likelihood and Chi-Square

  • Keyword filters – ignore numbers, ignore words with strange characters, minimum word length

  • PDF support – add PDF files directly to a corpus

  • Embeddings module – semantic search with sentence-transformers and SQLite vector storage

  • Dialog improvements – dialog boxes now center correctly in fullscreen and large-window mode

Language resources

Kitconc comes with built-in language resources for Portuguese and English corpora. It also provides functions for adding your own language resources.

Usage example

See how easy it is to use Kitconc:

https://ilexis.net.br/kitconc

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kitconc-3.4.3.tar.gz (6.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kitconc-3.4.3-py3-none-any.whl (6.5 MB view details)

Uploaded Python 3

File details

Details for the file kitconc-3.4.3.tar.gz.

File metadata

  • Download URL: kitconc-3.4.3.tar.gz
  • Upload date:
  • Size: 6.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for kitconc-3.4.3.tar.gz
Algorithm Hash digest
SHA256 8787ea3719c3d39d87613920f8bd2b6cf304bc46d6dac0e43838e4eb7a401e66
MD5 8ccae1dfdb7cabff02bee14f36580a6a
BLAKE2b-256 917f44997771a6be3aa9fa4f94ef02e5e53c37d6f4613cc624fc3aca4c078fa5

See more details on using hashes here.

Provenance

The following attestation bundles were made for kitconc-3.4.3.tar.gz:

Publisher: fluxodetrabalho.yml on ilexistools/kitconc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kitconc-3.4.3-py3-none-any.whl.

File metadata

  • Download URL: kitconc-3.4.3-py3-none-any.whl
  • Upload date:
  • Size: 6.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for kitconc-3.4.3-py3-none-any.whl
Algorithm Hash digest
SHA256 fd068d384d5f333156d0293a4aab90c8999326d0e157d7e832c530eff2e78694
MD5 544f7b8f48f3142e4134d811748219a6
BLAKE2b-256 9f3e562f2d5af58c86db06f67f1d4ae0603cd61ae8415a831fc0c1d44f4ec3dd

See more details on using hashes here.

Provenance

The following attestation bundles were made for kitconc-3.4.3-py3-none-any.whl:

Publisher: fluxodetrabalho.yml on ilexistools/kitconc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page