Skip to main content

C++ person concept extractor with Python bindings

Project description

Concept Extractor

C++ person-name concept extractor split out from RAG/ragz.

What Is Included

  • C++ graph extractor sources and headers in cpp/src/graph and cpp/include/graph
  • pybind11 binding in cpp/bindings/graph.cpp
  • Python wrapper in concept_extractor/graph
  • Russian Snowball stemmer sources in cpp/third_party/libstemmer_ru
  • self-contained C++ rule test in cpp/tests
  • DVC pointers for OpenCorpora graph assets in assets/graph

Tokenization code, old wheel releases, generated pybuild outputs, and unrelated tests are intentionally not copied.

Build C++ Tests

cmake -S . -B build -DBUILD_PYTHON=OFF
cmake --build build --parallel
ctest --test-dir build --output-on-failure

Build Python Package

pip install -v --no-build-isolation .

Install From PyPI

pip install concept-extractor

Use From Python

from concept_extractor.graph import PersonExtractor

extractor = PersonExtractor("assets/graph/opcorpora-parsed")
concepts = extractor.extract_batch(["Встреча с Алексеем Черниковым."])

Run Example

cd ..
python3 -m pip install -e ./concept_extractor
python3 examples/example.py
# or explicitly:
python3 examples/example.py --dict-dir ./assets/opcorpora-parsed

Assets

Large OpenCorpora files are tracked via DVC pointers:

dvc pull assets/graph/opcorpora-parsed.tar.dvc
tar -xf assets/graph/opcorpora-parsed.tar -C assets/graph

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aryjikov_concept_extractor-0.1.0.tar.gz (37.2 kB view details)

Uploaded Source

File details

Details for the file aryjikov_concept_extractor-0.1.0.tar.gz.

File metadata

  • Download URL: aryjikov_concept_extractor-0.1.0.tar.gz
  • Upload date:
  • Size: 37.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.12.7 Linux/6.17.0-35-generic

File hashes

Hashes for aryjikov_concept_extractor-0.1.0.tar.gz
Algorithm Hash digest
SHA256 984320bad9dd2fb88f518ec0c40bc0dc3b60f3887b58084ae7bb1bdd1e16698f
MD5 ab1de3f1b8e2a6e5a64dc700892e2bb1
BLAKE2b-256 3815dbc94b51f212e6b8331a11c0e5ff184533300e7107892ff562b32c39a919

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page