Skip to main content

id-svo-extractor: Extract SVO triples from Indonesian text.

Project description

(id, svo, extractor)

id-svo-extractor

id-svo-extractor is a heuristic tool designed to extract SVO (Subject-Verb-Object) triples from Indonesian text. It uses Stanza's state-of-the-art Indonesian language pipeline for NLP.

Requirements

To use id-svo-extractor, you will need Python v3.10 or higher and the following Python package:

You must also download Stanza's Indonesian models for tokenize, mwt, pos, lemma, and depparse processors before initializing the pipeline.

Installation

Install the package directly from PyPI:

pip install id-svo-extractor

Usage

Here's a basic example to get you started.

from id_svo_extractor import create_pipeline
from id_svo_extractor.utils import collect_svo_triples
from stanza import download


# Download Stanza's Indonesian models for tokenize, mwt, pos, lemma, and depparse processors.
# This step is mandatory before initializing the NLP pipeline.
download("id", processors="tokenize,mwt,pos,lemma,depparse")

# Initialize the NLP pipeline.
nlp = create_pipeline()

doc = nlp("Niko dan Okin mendesain brosur promosi dan mencetak poster iklan.")

for sentence in doc.sents:
    # Extracted triples for each sentence are stored in `svo_triples` custom attribute.
    print(sentence._.svo_triples)
    # Output:
    # [ SVOTriple(s=[Niko], v=[mendesain], o=[brosur, promosi]),
    #   SVOTriple(s=[Okin], v=[mendesain], o=[brosur, promosi]),
    #   SVOTriple(s=[Niko], v=[mencetak], o=[poster, iklan]),
    #   SVOTriple(s=[Okin], v=[mencetak], o=[poster, iklan]) ]

print(collect_svo_triples(doc))
# Output:
# [ ('Niko', 'mendesain', 'brosur promosi'),
#   ('Okin', 'mendesain', 'brosur promosi'),
#   ('Niko', 'mencetak', 'poster iklan'),
#   ('Okin', 'mencetak', 'poster iklan') ]

License

This project is licensed under the Apache License 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

id_svo_extractor-0.3.0.tar.gz (13.9 kB view hashes)

Uploaded Source

Built Distribution

id_svo_extractor-0.3.0-py3-none-any.whl (13.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page