C++ person concept extractor with Python bindings
Project description
Concept Extractor
C++ person-name concept extractor split out from RAG/ragz.
What Is Included
- C++ graph extractor sources and headers in
cpp/src/graphandcpp/include/graph - pybind11 binding in
cpp/bindings/graph.cpp - Python wrapper in
concept_extractor/graph - Russian Snowball stemmer sources in
cpp/third_party/libstemmer_ru - self-contained C++ rule test in
cpp/tests - DVC pointers for OpenCorpora graph assets in
assets/graph
Tokenization code, old wheel releases, generated pybuild outputs, and unrelated tests are intentionally not copied.
Build C++ Tests
cmake -S . -B build -DBUILD_PYTHON=OFF
cmake --build build --parallel
ctest --test-dir build --output-on-failure
Build Python Package
pip install -v --no-build-isolation .
Install From PyPI
pip install concept-extractor
Use From Python
from concept_extractor.graph import PersonExtractor
extractor = PersonExtractor("assets/graph/opcorpora-parsed")
concepts = extractor.extract_batch(["Встреча с Алексеем Черниковым."])
Run Example
cd ..
python3 -m pip install -e ./concept_extractor
python3 examples/example.py
# or explicitly:
python3 examples/example.py --dict-dir ./assets/opcorpora-parsed
Assets
Large OpenCorpora files are tracked via DVC pointers:
dvc pull assets/graph/opcorpora-parsed.tar.dvc
tar -xf assets/graph/opcorpora-parsed.tar -C assets/graph
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file aryjikov_concept_extractor-0.1.0.tar.gz.
File metadata
- Download URL: aryjikov_concept_extractor-0.1.0.tar.gz
- Upload date:
- Size: 37.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.2 CPython/3.12.7 Linux/6.17.0-35-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
984320bad9dd2fb88f518ec0c40bc0dc3b60f3887b58084ae7bb1bdd1e16698f
|
|
| MD5 |
ab1de3f1b8e2a6e5a64dc700892e2bb1
|
|
| BLAKE2b-256 |
3815dbc94b51f212e6b8331a11c0e5ff184533300e7107892ff562b32c39a919
|