Skip to main content

English text analysis for information retrieval

Project description

termflow

English text analysis for information retrieval workloads in C++ and Python.

PyPI Python Versions Python Package

termflow is a library-first analysis stack for search, indexing, tagging, and query normalization. It provides a built-in English analyzer, term extraction helpers, and a lightweight query rewrite layer without trying to be a full search engine.

Why termflow

  • C++20 core library with optional Python bindings
  • English analyzer with configurable stemming, stop words, possessive handling, and ASCII folding
  • Term extraction API for finalized search/index terms
  • Query parser and rewrite support for canonicalization, equivalents, and expansions
  • Installable Python wheels for Linux and macOS
  • CMake install flow for downstream C++ consumers

Install

Python package:

pip install termflow-ir

Python import:

import termflow

CLI quick check:

termflow analyze "The Running Cars"

For C++ installation and find_package(termflow) usage, see docs/installation.md.

Quick Start

Python:

import termflow

analyzer = termflow.EnglishAnalyzer()
terms = analyzer.analyze_terms("The Running Cars")
normalized = analyzer.normalize("Running Café")

print(terms)  # ['run', 'car']
print(normalized)  # 'running café'

C++:

#include <iostream>
#include "termflow/analysis/english_analyzer.hpp"

int main() {
  termflow::EnglishAnalyzer analyzer;
  const auto terms = analyzer.analyze_terms("The Running Cars");

  for (const auto& term : terms) {
    std::cout << term << "\n";
  }
}

Features

Area What it includes
Analysis EnglishAnalyzer, token analysis, normalization, stemming, stop words, ASCII folding
Term extraction TermExtractor with length, numeric, and character-policy filtering
Query processing clause parsing, analyzed query terms, rewrite loading, validation, and alternatives
Python bindings built-in analyzer, term extractor, and query module under termflow.query
CLI termflow analyze, termflow extract, and termflow analyze-query for quick validation
C++ consumption installable CMake package and external find_package example

Documentation

Runnable examples:

Scope

termflow currently focuses on:

  • English text analysis
  • Batch-oriented APIs
  • Query parsing and rewrite preparation
  • Reusable components for embedding in larger applications

termflow does not currently provide:

  • indexing or retrieval
  • ranking or scoring
  • token graphs
  • phrase execution logic
  • multilingual analyzers

Build From Source

Local build:

cmake -S . -B build -G Ninja
cmake --build build
ctest --test-dir build --output-on-failure

Build Python bindings from source:

cmake -S . -B build -G Ninja -DTERMFLOW_BUILD_PYTHON=ON
cmake --build build
PYTHONPATH=build/python python3 -c 'import termflow; print(termflow.EnglishAnalyzer().analyze_terms("Running Cars"))'

Build Python distributions:

python3 -m build --sdist --wheel
python3 -m twine check dist/*

Project Status

termflow is early-stage and intentionally narrow in scope. The current focus is making the built-in English analysis and packaging story solid before expanding into more languages or broader IR features.

Contributing

Issues and pull requests are welcome. If you want to make a larger API or packaging change, open an issue first so the direction is clear before implementation work starts.

License

This repository does not yet include a LICENSE file. Until that is added, do not assume open source usage terms.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

termflow_ir-0.1.2.tar.gz (44.8 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

termflow_ir-0.1.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (13.1 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

termflow_ir-0.1.2-cp312-cp312-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl (12.9 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.26+ ARM64manylinux: glibc 2.28+ ARM64

termflow_ir-0.1.2-cp312-cp312-macosx_14_0_x86_64.whl (15.3 MB view details)

Uploaded CPython 3.12macOS 14.0+ x86-64

termflow_ir-0.1.2-cp312-cp312-macosx_14_0_arm64.whl (15.2 MB view details)

Uploaded CPython 3.12macOS 14.0+ ARM64

termflow_ir-0.1.2-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (13.1 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

termflow_ir-0.1.2-cp311-cp311-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl (12.9 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.26+ ARM64manylinux: glibc 2.28+ ARM64

termflow_ir-0.1.2-cp311-cp311-macosx_14_0_x86_64.whl (15.3 MB view details)

Uploaded CPython 3.11macOS 14.0+ x86-64

termflow_ir-0.1.2-cp311-cp311-macosx_14_0_arm64.whl (15.2 MB view details)

Uploaded CPython 3.11macOS 14.0+ ARM64

termflow_ir-0.1.2-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (13.1 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

termflow_ir-0.1.2-cp310-cp310-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl (12.9 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.26+ ARM64manylinux: glibc 2.28+ ARM64

termflow_ir-0.1.2-cp310-cp310-macosx_14_0_x86_64.whl (15.3 MB view details)

Uploaded CPython 3.10macOS 14.0+ x86-64

termflow_ir-0.1.2-cp310-cp310-macosx_14_0_arm64.whl (15.2 MB view details)

Uploaded CPython 3.10macOS 14.0+ ARM64

File details

Details for the file termflow_ir-0.1.2.tar.gz.

File metadata

  • Download URL: termflow_ir-0.1.2.tar.gz
  • Upload date:
  • Size: 44.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for termflow_ir-0.1.2.tar.gz
Algorithm Hash digest
SHA256 d229cd7b06256c5219a81026364c5cc470d14c5120df91d1b312f60db8c1c6ef
MD5 d4822d528343f802506813425b9c2d53
BLAKE2b-256 1b77476184d6620ef1d7e82fa35d16c76861ec554c075283dccc49f8f86f854b

See more details on using hashes here.

Provenance

The following attestation bundles were made for termflow_ir-0.1.2.tar.gz:

Publisher: python-package.yml on gathera/termflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file termflow_ir-0.1.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for termflow_ir-0.1.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 22542b47d0a1989aa458f3d29e37d9b8ff9cab570c3904f5b208519fcf2fc0fe
MD5 3bfab2660fea7e65226a49c95c066214
BLAKE2b-256 c389c18c0fa5f753a421626e48e511dc9a567b872c7b23c8db43ee92940a8c96

See more details on using hashes here.

Provenance

The following attestation bundles were made for termflow_ir-0.1.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: python-package.yml on gathera/termflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file termflow_ir-0.1.2-cp312-cp312-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for termflow_ir-0.1.2-cp312-cp312-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 fc55472d6c21c872d559c65a2cef84f1a5a2120bfa6a220e2f6931212526edea
MD5 3956f2e3121181456666b24e39962715
BLAKE2b-256 e85fe01adf00bbb1104ef1a63ec1f0fcbed4db3b7aea396aa1602fe2ef77c00b

See more details on using hashes here.

Provenance

The following attestation bundles were made for termflow_ir-0.1.2-cp312-cp312-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl:

Publisher: python-package.yml on gathera/termflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file termflow_ir-0.1.2-cp312-cp312-macosx_14_0_x86_64.whl.

File metadata

File hashes

Hashes for termflow_ir-0.1.2-cp312-cp312-macosx_14_0_x86_64.whl
Algorithm Hash digest
SHA256 76f973b1b9ca9ee8c04543a2c6d8b7ff3737c1948267aff82b24dae316fdb881
MD5 e166c94af4ffe562679f5d38e6288d12
BLAKE2b-256 3be46ea1754dd52ebd59c2716ef7c8bd955668bf08960314ff50efbbd869f916

See more details on using hashes here.

Provenance

The following attestation bundles were made for termflow_ir-0.1.2-cp312-cp312-macosx_14_0_x86_64.whl:

Publisher: python-package.yml on gathera/termflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file termflow_ir-0.1.2-cp312-cp312-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for termflow_ir-0.1.2-cp312-cp312-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 c52984f9d36ad360cd3826c6fdc322038cbd7a7e8dd60f749227a7a4a9095960
MD5 1c8372ea29c91990bcce7c7d4f81971b
BLAKE2b-256 37d8e63c94566abea537ff9ebcdd46561594cd41c361084b16ba4ca5b6f7f31c

See more details on using hashes here.

Provenance

The following attestation bundles were made for termflow_ir-0.1.2-cp312-cp312-macosx_14_0_arm64.whl:

Publisher: python-package.yml on gathera/termflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file termflow_ir-0.1.2-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for termflow_ir-0.1.2-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 fc360880d21615a6499e4863beede40cb1aca09db6cd9d4b369bc41de08513eb
MD5 5ff1fc2935c5057cbee7ff5d278f7792
BLAKE2b-256 084794bfb7b6609069a8bee745446ef2f5b284bc4e51796c858aeb9d0bafbdab

See more details on using hashes here.

Provenance

The following attestation bundles were made for termflow_ir-0.1.2-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: python-package.yml on gathera/termflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file termflow_ir-0.1.2-cp311-cp311-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for termflow_ir-0.1.2-cp311-cp311-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 9583923323cf84797f721759989f3f4f478467bd6d979a361a017b8f97aacfd8
MD5 39b20ae3ef34ce76fff5e0548625393a
BLAKE2b-256 8e15c9297a40af6912a89cd0f712d66cf6514cac69070888e3a403df927a46a2

See more details on using hashes here.

Provenance

The following attestation bundles were made for termflow_ir-0.1.2-cp311-cp311-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl:

Publisher: python-package.yml on gathera/termflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file termflow_ir-0.1.2-cp311-cp311-macosx_14_0_x86_64.whl.

File metadata

File hashes

Hashes for termflow_ir-0.1.2-cp311-cp311-macosx_14_0_x86_64.whl
Algorithm Hash digest
SHA256 aa404696d5865ad493067d9f498c423b46b9a36f1d7101077355c339d9921f28
MD5 562881ca2b9380c51003af72565c521a
BLAKE2b-256 5249a07dac0caa9f68396c3b47bc12a649d766382d5a50e8e33a3af021c2d2d9

See more details on using hashes here.

Provenance

The following attestation bundles were made for termflow_ir-0.1.2-cp311-cp311-macosx_14_0_x86_64.whl:

Publisher: python-package.yml on gathera/termflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file termflow_ir-0.1.2-cp311-cp311-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for termflow_ir-0.1.2-cp311-cp311-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 06f86da2e0d89276147e2f4d432a904dfc423d687544c2616edd3dd4a73642be
MD5 b046f235ba24d84b9d13d1d004bf71ed
BLAKE2b-256 e9032d886ed0229f80987d0bab4446b736dacb74a5021658ced3eeedd88c41ab

See more details on using hashes here.

Provenance

The following attestation bundles were made for termflow_ir-0.1.2-cp311-cp311-macosx_14_0_arm64.whl:

Publisher: python-package.yml on gathera/termflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file termflow_ir-0.1.2-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for termflow_ir-0.1.2-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 65b155851fb796393f1586d702b4564f9a171e36e08bc86101789b0ba750ec66
MD5 b6a770cccce86296eb703b98fe3e3ce8
BLAKE2b-256 2dd0bdb434fb34d76fdae436e4bfbed51634139122e539747b9f9d8d9d7ae3e3

See more details on using hashes here.

Provenance

The following attestation bundles were made for termflow_ir-0.1.2-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: python-package.yml on gathera/termflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file termflow_ir-0.1.2-cp310-cp310-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for termflow_ir-0.1.2-cp310-cp310-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 e01af04fc60b1187a8a8cc2b4f222a5bdacbd0a876e5631415927ace150407c8
MD5 c57711df0bd89f8a2d83a8afad3e61eb
BLAKE2b-256 d15fdb4bf132a24a8dfc4aeaa84074bc74c6cfb1370d6e23a77c1737608e3f14

See more details on using hashes here.

Provenance

The following attestation bundles were made for termflow_ir-0.1.2-cp310-cp310-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl:

Publisher: python-package.yml on gathera/termflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file termflow_ir-0.1.2-cp310-cp310-macosx_14_0_x86_64.whl.

File metadata

File hashes

Hashes for termflow_ir-0.1.2-cp310-cp310-macosx_14_0_x86_64.whl
Algorithm Hash digest
SHA256 97d8113c795ca9124cb93d98a607f41a3ef310fad18a5c148791dff35ae87d5e
MD5 873f2586ca881b415403e4ac99e077e6
BLAKE2b-256 261e9d038d925495e935602fa93f9b92269861a3795a193bbf409c20a92eb939

See more details on using hashes here.

Provenance

The following attestation bundles were made for termflow_ir-0.1.2-cp310-cp310-macosx_14_0_x86_64.whl:

Publisher: python-package.yml on gathera/termflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file termflow_ir-0.1.2-cp310-cp310-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for termflow_ir-0.1.2-cp310-cp310-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 9b1c7a2a3253115b49d98f091bb88a255f6ab871882777f9898b4faf006b6d05
MD5 5b06ffb01b48baa80f02ba640917e362
BLAKE2b-256 cd5b87b8d8d1c616fc5d85d73316fd248e28f5b53933525179a2b4cf09bab8fd

See more details on using hashes here.

Provenance

The following attestation bundles were made for termflow_ir-0.1.2-cp310-cp310-macosx_14_0_arm64.whl:

Publisher: python-package.yml on gathera/termflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page