id-svo-extractor: Extract SVO triples from Indonesian text.
Project description
id-svo-extractor
id-svo-extractor is a heuristic tool designed to extract SVO (Subject-Verb-Object) triples from Indonesian text. It uses Stanza's state-of-the-art Indonesian language pipeline for NLP.
Requirements
To use id-svo-extractor, you will need Python v3.10 or higher and the following Python package:
- spacy-stanza v1.0.4
You must also download Stanza's Indonesian models for tokenize
, mwt
, pos
, lemma
, and depparse
processors before initializing the pipeline.
Installation
Install the package directly from PyPI:
pip install id-svo-extractor
Usage
Here's a basic example to get you started.
from id_svo_extractor import create_pipeline
from id_svo_extractor.utils import collect_svo_triples
from stanza import download
# Download Stanza's Indonesian models for tokenize, mwt, pos, lemma, and depparse processors.
# This step is mandatory before initializing the NLP pipeline.
download("id", processors="tokenize,mwt,pos,lemma,depparse")
# Initialize the NLP pipeline.
nlp = create_pipeline()
doc = nlp("Niko dan Okin mendesain brosur promosi dan mencetak poster iklan.")
for sentence in doc.sents:
# Extracted triples for each sentence are stored in `svo_triples` custom attribute.
print(sentence._.svo_triples)
# Output:
# [ SVOTriple(s=[Niko], v=[mendesain], o=[brosur, promosi]),
# SVOTriple(s=[Okin], v=[mendesain], o=[brosur, promosi]),
# SVOTriple(s=[Niko], v=[mencetak], o=[poster, iklan]),
# SVOTriple(s=[Okin], v=[mencetak], o=[poster, iklan]) ]
print(collect_svo_triples(doc))
# Output:
# [ ('Niko', 'mendesain', 'brosur promosi'),
# ('Okin', 'mendesain', 'brosur promosi'),
# ('Niko', 'mencetak', 'poster iklan'),
# ('Okin', 'mencetak', 'poster iklan') ]
License
This project is licensed under the Apache License 2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file id_svo_extractor-0.3.0.tar.gz
.
File metadata
- Download URL: id_svo_extractor-0.3.0.tar.gz
- Upload date:
- Size: 13.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | de13bf0f7e05115431486bbfcf6868958de1b8e143ead2004cbdacaddc634fc9 |
|
MD5 | 44bd8ea7cf23ac374b976d45056af58c |
|
BLAKE2b-256 | daf12baa14920391f0126ee46378d9bc248ff6948c860b850ee6b49648ca961a |
File details
Details for the file id_svo_extractor-0.3.0-py3-none-any.whl
.
File metadata
- Download URL: id_svo_extractor-0.3.0-py3-none-any.whl
- Upload date:
- Size: 13.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 848fa1f6648afe5f7c95894be2301adde6a226c390a5c3c7fc72c892ba0d3d5e |
|
MD5 | 1ae75bceb42615123041b3223d8b7f93 |
|
BLAKE2b-256 | d99f88a54bb2933c3debce9d22bf170b525a12972f106e52d93674ede4b1f8b5 |