id-svo-extractor: Extract SVO triples from Indonesian text.
Project description
id-svo-extractor
id-svo-extractor is a heuristic tool designed to extract SVO (Subject-Verb-Object) triples from Indonesian text. It uses Stanza's state-of-the-art Indonesian language pipeline for NLP.
Requirements
To use id-svo-extractor, you will need Python v3.10 or higher and the following Python package:
- spacy-stanza v1.0.4
You must also download Stanza's Indonesian models for tokenize
, mwt
, pos
, lemma
, and depparse
processors before initializing the pipeline.
Installation
Install the package directly from PyPI:
pip install id-svo-extractor
Usage
Here's a basic example to get you started.
from id_svo_extractor import create_pipeline
from id_svo_extractor.utils import collect_svo_triples
from stanza import download
# Download Stanza's Indonesian models for tokenize, mwt, pos, lemma, and depparse processors.
# This step is mandatory before initializing the NLP pipeline.
download("id", processors="tokenize,mwt,pos,lemma,depparse")
# Initialize the NLP pipeline.
nlp = create_pipeline()
doc = nlp("Niko dan Okin mendesain brosur promosi dan mencetak poster iklan.")
for sentence in doc.sents:
# Extracted triples for each sentence are stored in `svo_triples` custom attribute.
print(sentence._.svo_triples)
# Output:
# [ SVOTriple(s=[Niko], v=[mendesain], o=[brosur, promosi]),
# SVOTriple(s=[Okin], v=[mendesain], o=[brosur, promosi]),
# SVOTriple(s=[Niko], v=[mencetak], o=[poster, iklan]),
# SVOTriple(s=[Okin], v=[mencetak], o=[poster, iklan]) ]
print(collect_svo_triples(doc))
# Output:
# [ ('Niko', 'mendesain', 'brosur promosi'),
# ('Okin', 'mendesain', 'brosur promosi'),
# ('Niko', 'mencetak', 'poster iklan'),
# ('Okin', 'mencetak', 'poster iklan') ]
License
This project is licensed under the Apache License 2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
id_svo_extractor-0.3.0.tar.gz
(13.9 kB
view hashes)
Built Distribution
Close
Hashes for id_svo_extractor-0.3.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 848fa1f6648afe5f7c95894be2301adde6a226c390a5c3c7fc72c892ba0d3d5e |
|
MD5 | 1ae75bceb42615123041b3223d8b7f93 |
|
BLAKE2b-256 | d99f88a54bb2933c3debce9d22bf170b525a12972f106e52d93674ede4b1f8b5 |