Skip to main content

A toolkit for Corpus Linguistics Analysis

Project description

Kitconc 3.4.3

Kitconc is a package for Corpus Linguistics and text analysis with Python. It contains, among other things, tools for creating:

  • Corpora;

  • Frequency wordlists;

  • Keywords (Log-Likelihood, Chi-Square, TF-IDF);

  • Concordance lines;

  • Collocates;

  • N-gram lists;

  • Dispersion plots;

  • Excel data files;

  • Semantic search with sentence embeddings.

The package is built on top of platforms and packages for scientific research: numpy, pandas, NLTK, XlsxWriter and matplotlib.

Requirements

Kitconc requires Python 3.10 or later.

Package dependencies (pip install kitconc):

numpy>=1.26.4,<2.0.0 pandas>=2.2.0,<3.0.0 matplotlib>=3.7.0,<4.0.0 xlsxwriter>=3.2.3,<4.0.0 ttkbootstrap>=1.12.0,<2.0.0 pillow>=11.2.0,<12.0.0 requests>=2.31.0,<3.0.0 nltk>=3.9.1,<4.0.0 chardet>=5.2.0,<6.0.0 pypdf>=4.0.0,<7.0.0 cryptography>=3.1,<47.0.0 mcp>=1.0.0,<2.0.0 setuptools>=70.0.0

Additional dependencies listed in requirements.txt (full local environment):

torch>=2.6,<2.10 (CPU wheels via –extra-index-url https://download.pytorch.org/whl/cpu) transformers>=4.45,<6.0.0 sentence-transformers>=3.0,<6.0.0 sqlite-vec>=0.1.7,<1.0.0 fastapi>=0.110,<1.0.0 uvicorn[standard]>=0.27,<1.0.0 python-dotenv>=1.0.0,<2.0.0

Installation

pip install kitconc

Kitconc App (graphical interface)

kitconc-app

Agent Layer (internal actions)

Kitconc now includes an internal action layer for agent/tool orchestration:

  • kitconc.agent.actions.KitconcActions

  • Full parity with shell commands from kit_cmd.py (do_*)

  • Typed schemas in kitconc.agent.schemas

  • Contract documentation in kitconc/agent/CONTRACT.md

  • Semantic retrieval action: semantic_search(…)

Basic usage:

from kitconc.agent import KitconcActions actions = KitconcActions(“kitconc_workspace”) actions.create(“ads”, “kitconc_corpora/ads”, “english”) actions.use(“ads”) rows = actions.keywords(limit=10)

MCP Server (for agent integrations)

kitconc-mcp –transport stdio

For HTTP clients (recommended):

kitconc-mcp –transport streamable-http –host 127.0.0.1 –port 8001

or (legacy SSE):

kitconc-mcp –transport sse –host 127.0.0.1 –port 8001

Includes semantic retrieval tool: semantic_search (query, top_k, db_path, model_name)

MCP runtime is included in package dependencies (pip install kitconc is enough).

What’s new in 3.2.0

  • Tkinter launcher command – start GUI with kitconc-app

  • Agent action layer – kitconc.agent.actions.KitconcActions with command parity from kit_cmd.py

  • Typed schemas – available in kitconc.agent.schemas

  • MCP server entrypoint – run with kitconc-mcp

  • Semantic search MCP tool – semantic_search for sqlite-vec indexes

  • Embedding index hardening – transactional writes and thread-safe SQLite access

  • Progress flag rename – use verbose=True (replacing show_progress=True)

What’s new in 3.1.0

  • TF-IDF keywords – third keyword extraction method alongside Log-Likelihood and Chi-Square

  • Keyword filters – ignore numbers, ignore words with strange characters, minimum word length

  • PDF support – add PDF files directly to a corpus

  • Embeddings module – semantic search with sentence-transformers and SQLite vector storage

  • Dialog improvements – dialog boxes now center correctly in fullscreen and large-window mode

Language resources

Kitconc comes with built-in language resources for Portuguese and English corpora. It also provides functions for adding your own language resources.

Usage example

See how easy it is to use Kitconc:

https://ilexis.net.br/kitconc

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kitconc-3.4.4.tar.gz (6.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kitconc-3.4.4-py3-none-any.whl (6.5 MB view details)

Uploaded Python 3

File details

Details for the file kitconc-3.4.4.tar.gz.

File metadata

  • Download URL: kitconc-3.4.4.tar.gz
  • Upload date:
  • Size: 6.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for kitconc-3.4.4.tar.gz
Algorithm Hash digest
SHA256 102506d6ae34e4582df698e6ff0ab06fd9f2e3d274c5b649427bcd46a0f0204a
MD5 2ed115c9400bc21ad096b17510dfc32e
BLAKE2b-256 3ead6923404f54e9728ed5be6c5d058cea6e9741fb5ca2b00f3f8cb49fd2d2ba

See more details on using hashes here.

Provenance

The following attestation bundles were made for kitconc-3.4.4.tar.gz:

Publisher: fluxodetrabalho.yml on ilexistools/kitconc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kitconc-3.4.4-py3-none-any.whl.

File metadata

  • Download URL: kitconc-3.4.4-py3-none-any.whl
  • Upload date:
  • Size: 6.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for kitconc-3.4.4-py3-none-any.whl
Algorithm Hash digest
SHA256 0ee86c1cb3c14695137214abad069904860bd1af992f6825356ed97eb1500614
MD5 f5aff3aadd26ec50081b704f49b94c35
BLAKE2b-256 36e13f4c641e7b89825dc5887aad5ac08edcc03affc4d937054eee0713588b03

See more details on using hashes here.

Provenance

The following attestation bundles were made for kitconc-3.4.4-py3-none-any.whl:

Publisher: fluxodetrabalho.yml on ilexistools/kitconc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page