Skip to main content

id-svo-extractor: Extract SVO triples from Indonesian text.

Project description

(id, svo, extractor)

id-svo-extractor

id-svo-extractor is a heuristic tool designed to extract SVO (Subject-Verb-Object) triples from Indonesian text. It uses Stanza's state-of-the-art Indonesian language pipeline for NLP.

Requirements

To use id-svo-extractor, you will need Python v3.10 or higher and the following Python package:

You must also download Stanza's Indonesian models for tokenize, mwt, pos, lemma, and depparse processors before initializing the pipeline.

Installation

Install the package directly from PyPI:

pip install id-svo-extractor

Usage

Here's a basic example to get you started.

from id_svo_extractor import create_pipeline
from id_svo_extractor.utils import collect_svo_triples
from stanza import download


# Download Stanza's Indonesian models for tokenize, mwt, pos, lemma, and depparse processors.
# This step is mandatory before initializing the NLP pipeline.
download("id", processors="tokenize,mwt,pos,lemma,depparse")

# Initialize the NLP pipeline.
nlp = create_pipeline()

doc = nlp("Niko dan Okin mendesain brosur promosi dan mencetak poster iklan.")

for sentence in doc.sents:
    # Extracted triples for each sentence are stored in `svo_triples` custom attribute.
    print(sentence._.svo_triples)
    # Output:
    # [ SVOTriple(s=[Niko], v=[mendesain], o=[brosur, promosi]),
    #   SVOTriple(s=[Okin], v=[mendesain], o=[brosur, promosi]),
    #   SVOTriple(s=[Niko], v=[mencetak], o=[poster, iklan]),
    #   SVOTriple(s=[Okin], v=[mencetak], o=[poster, iklan]) ]

print(collect_svo_triples(doc))
# Output:
# [ ('Niko', 'mendesain', 'brosur promosi'),
#   ('Okin', 'mendesain', 'brosur promosi'),
#   ('Niko', 'mencetak', 'poster iklan'),
#   ('Okin', 'mencetak', 'poster iklan') ]

License

This project is licensed under the Apache License 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

id_svo_extractor-0.3.0.tar.gz (13.9 kB view details)

Uploaded Source

Built Distribution

id_svo_extractor-0.3.0-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file id_svo_extractor-0.3.0.tar.gz.

File metadata

  • Download URL: id_svo_extractor-0.3.0.tar.gz
  • Upload date:
  • Size: 13.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.0

File hashes

Hashes for id_svo_extractor-0.3.0.tar.gz
Algorithm Hash digest
SHA256 de13bf0f7e05115431486bbfcf6868958de1b8e143ead2004cbdacaddc634fc9
MD5 44bd8ea7cf23ac374b976d45056af58c
BLAKE2b-256 daf12baa14920391f0126ee46378d9bc248ff6948c860b850ee6b49648ca961a

See more details on using hashes here.

File details

Details for the file id_svo_extractor-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for id_svo_extractor-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 848fa1f6648afe5f7c95894be2301adde6a226c390a5c3c7fc72c892ba0d3d5e
MD5 1ae75bceb42615123041b3223d8b7f93
BLAKE2b-256 d99f88a54bb2933c3debce9d22bf170b525a12972f106e52d93674ede4b1f8b5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page