Skip to main content

C++ person concept extractor with Python bindings

Project description

Concept Extractor

C++ person-name concept extractor split out from RAG/ragz.

What Is Included

  • C++ graph extractor sources and headers in cpp/src/graph and cpp/include/graph
  • pybind11 binding in cpp/bindings/graph.cpp
  • Python wrapper in concept_extractor/graph
  • Russian Snowball stemmer sources in cpp/third_party/libstemmer_ru
  • self-contained C++ rule test in cpp/tests
  • DVC pointers for OpenCorpora graph assets in assets/graph

Tokenization code, old wheel releases, generated pybuild outputs, and unrelated tests are intentionally not copied.

Build C++ Tests

cmake -S . -B build -DBUILD_PYTHON=OFF
cmake --build build --parallel
ctest --test-dir build --output-on-failure

Build Python Package

pip install -v --no-build-isolation .

Install From PyPI

pip install concept-extractor

Use From Python

from concept_extractor.graph import PersonExtractor

extractor = PersonExtractor("assets/graph/opcorpora-parsed")
concepts = extractor.extract_batch(["Встреча с Алексеем Черниковым."])

Run Example

cd ..
python3 -m pip install -e ./concept_extractor
python3 examples/example.py
# or explicitly:
python3 examples/example.py --dict-dir ./assets/opcorpora-parsed

Assets

Large OpenCorpora files are tracked via DVC pointers:

dvc pull assets/graph/opcorpora-parsed.tar.dvc
tar -xf assets/graph/opcorpora-parsed.tar -C assets/graph

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aryjikov_concept_extractor-0.1.1.tar.gz (8.5 MB view details)

Uploaded Source

File details

Details for the file aryjikov_concept_extractor-0.1.1.tar.gz.

File metadata

  • Download URL: aryjikov_concept_extractor-0.1.1.tar.gz
  • Upload date:
  • Size: 8.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.12.7 Linux/6.17.0-35-generic

File hashes

Hashes for aryjikov_concept_extractor-0.1.1.tar.gz
Algorithm Hash digest
SHA256 6d8ac59f420fb6810d119eb0209637dc9362be2c05691b4b2904823f28b49ab7
MD5 21ea9bce483fe1222196cf1dabb4530f
BLAKE2b-256 bde3538be41f6da6edadf7d0fa8673c2079d58a067d85ce00de1df8a6eb50332

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page