Pure Python triplet extraction based on Stanford OpenIE

These details have not been verified by PyPI

Project links

Project description

triplet-extract

Pure Python triplet extraction - Extract (subject, relation, object) triples from text

Example

from triplet_extract import extract

text = "95.6% of people don't know what GraphRAG is for"
triplets = extract(text)

for t in triplets:
    print(f"({t.subject}, {t.relation}, {t.object})")

Output:

(95.6% of people, don't know, what GraphRAG is for)

What makes this different:

Natural formatting with proper contraction spacing
Good contradiction handling
Quantifiers preserved and normalized (percentages, scientific units) so "95.6% of people" is kept intact
Stanford OpenIE doesn't extract any triplets from this sentence

About

This is a Python port of Stanford OpenIE, a system for extracting relation triples from natural language text. The implementation follows the same three-stage pipeline as the original and uses the trained models from the Stanford NLP Group's research.

Reference: "Leveraging Linguistic Structure For Open Domain Information Extraction" Gabor Angeli, Melvin Jose Johnson Premkumar, and Christopher D. Manning Association for Computational Linguistics (ACL), 2015 Paper | Stanford OpenIE | CoreNLP Github

This port uses spaCy for dependency parsing instead of Stanford CoreNLP, providing a pure Python alternative that works without Java dependencies. I'm grateful to the Stanford NLP Group for their groundbreaking research and for making their models available.

Note: This implementation supports English text only. The trained models and natural logic rules are language-specific.

Design Philosophy

This implementation prioritizes preserving rich semantic context in extracted triples. Unlike some ports that simplify subjects and relations, this port retains qualifiers, quantifiers, and contextual information (e.g., "The U.S. president Barack Obama" rather than just "Barack Obama", or "25% of people" rather than just "people"). This makes the output particularly well-suited for knowledge graph construction, GraphRAG applications, and other systems that benefit from semantically rich representations.

Installation

pip install triplet-extract
python -m spacy download en_core_web_sm

For local development with uv:

git clone https://github.com/adlumal/triplet-extract.git
cd triplet-extract
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv pip install -e ".[dev]"
uv run spacy download en_core_web_sm

Usage

Basic Extraction

from triplet_extract import extract

text = "Cats love milk and mice."
triplets = extract(text)

for t in triplets:
    print(f"({t.subject}, {t.relation}, {t.object})")

Using the Extractor Class

The OpenIEExtractor class provides more control over the extraction pipeline:

from triplet_extract import OpenIEExtractor

extractor = OpenIEExtractor(
    enable_clause_split=True,    # Split complex sentences into clauses
    enable_entailment=True,      # Generate entailed shorter forms
    min_confidence=0.5           # Filter low-confidence triplets
)

triplets = extractor.extract_triplet_objects(text)

for t in triplets:
    print(f"Subject: {t.subject}")
    print(f"Relation: {t.relation}")
    print(f"Object: {t.object}")
    print(f"Confidence: {t.confidence}")
    print()

Pipeline Options

The extractor implements three stages:

Stage 1: Clause Splitting (enable_clause_split) Breaks complex sentences into simpler clauses using beam search. For example, "Obama, born in Hawaii, is president" becomes ["Obama is president", "Obama born in Hawaii"].

Stage 2: Forward Entailment (enable_entailment) Generates shorter entailed forms using natural logic. For example, "Blue cats play" produces ["Blue cats play", "cats play"]. This applies to all fragments, including those from clause splitting.

Confidence Threshold (min_confidence) Filters triplets below the specified confidence score (0.0 to 1.0). Higher values give fewer but higher-quality results.

# Fast extraction without variations
extractor = OpenIEExtractor(
    enable_clause_split=False,
    enable_entailment=False
)

# High-precision extraction
extractor = OpenIEExtractor(
    min_confidence=0.7
)

Batch Processing

For processing multiple texts efficiently:

texts = [
    "First sentence to process.",
    "Second sentence to process.",
    "Third sentence to process."
]

results = extractor.extract_batch(texts, batch_size=32, progress=True)

for text, triplets in zip(texts, results):
    print(f"\n{text}")
    print(f"  {len(triplets)} triplets extracted")

Performance Tips

Reuse extractor instances when processing multiple texts:

# Good: Reuse the same extractor
extractor = OpenIEExtractor(min_confidence=0.5)
for text in texts:
    triplets = extractor.extract_triplet_objects(text)

# Avoid: Creates new extractor (reloads models) each time
for text in texts:
    triplets = extract(text, min_confidence=0.5)

Use batch processing for best performance:

results = extractor.extract_batch(texts, batch_size=32)

Verbose Logging

The library is silent by default. Enable logging to see internal operations:

import logging

logging.basicConfig(level=logging.DEBUG)  # Show all details
# or
logging.basicConfig(level=logging.INFO)   # Show major steps

from triplet_extract import extract
triplets = extract("Your text here")

How It Works

The system implements the three-stage pipeline from the Stanford OpenIE paper:

Stage 1: Clause Splitting Uses a pre-trained linear classifier to break complex sentences into independent clauses. The classifier was trained on the LSOIE dataset and considers dependency parse structure to make splitting decisions.

Stage 2: Forward Entailment Applies natural logic deletion rules to generate shorter entailed forms. Uses prepositional phrase attachment affinities to determine which constituents can be safely deleted while preserving truth.

Stage 3: Pattern Matching Extracts (subject, relation, object) triples from sentence fragments using dependency patterns. Handles various syntactic constructions including copular sentences, prepositional phrases, and clausal complements.

The trained models (clause splitting classifier and PP attachment affinities) are from the original Stanford implementation and are included in this package.

Implementation Notes

This implementation uses spaCy for dependency parsing instead of Stanford CoreNLP. While the algorithm and models are the same, the parsers may produce different dependency trees for the same sentence. Differences in tokenization, POS tagging, and dependency labels mean that extraction results won't be identical to the original Java implementation.

In practice, core extractions remain highly compatible with Stanford OpenIE, though edge cases may differ, particularly with unusual capitalization or complex grammatical constructions. If you require exact compatibility with Stanford OpenIE output, please use the original Java implementation.

Citation

If you use this library in research, kindly cite the original paper:

@inproceedings{angeli2015openie,
  title={Leveraging Linguistic Structure For Open Domain Information Extraction},
  author={Angeli, Gabor and Johnson Premkumar, Melvin Jose and Manning, Christopher D},
  booktitle={Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL 2015)},
  year={2015}
}

Contributing

Bug reports and feature requests are welcome. Please open an issue on GitHub if you encounter problems or have suggestions for improvements.

License

GPL-3.0-or-later

This is a derivative work of Stanford OpenIE, which is licensed under GPL-3.0. The trained models included in this package are from the original Stanford implementation and remain under their GPL-3.0 license.

See LICENSE for details.

Related packages

stanford-openie-python

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.0

Nov 4, 2025

This version

0.1.0 yanked

Nov 3, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

triplet_extract-0.1.0.tar.gz (20.9 MB view details)

Uploaded Nov 3, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

triplet_extract-0.1.0-py3-none-any.whl (20.9 MB view details)

Uploaded Nov 3, 2025 Python 3

File details

Details for the file triplet_extract-0.1.0.tar.gz.

File metadata

Download URL: triplet_extract-0.1.0.tar.gz
Upload date: Nov 3, 2025
Size: 20.9 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for triplet_extract-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`554b02e35e4874b90c3f4194a776d3f4ddad48e6e3d000d486af3aac7b2dc447`
MD5	`758301d8e7473f028f5aa291b2494e5c`
BLAKE2b-256	`22018d483f587a0eecfb45079acc2719abedb51a6fa4ce4a9d96925422b53fc0`

See more details on using hashes here.

File details

Details for the file triplet_extract-0.1.0-py3-none-any.whl.

File metadata

Download URL: triplet_extract-0.1.0-py3-none-any.whl
Upload date: Nov 3, 2025
Size: 20.9 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for triplet_extract-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`187b61b3bbcee3d0a70cfd308e796372828cea7ea17580deb268349497a41899`
MD5	`86e4c888edef30f9d0af4fe06da1fc0e`
BLAKE2b-256	`edc3876fa8a8471acba26d4e369c701afbfbb7e7896e5bdc18ee0de2ae5c60e9`

See more details on using hashes here.

triplet-extract 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

triplet-extract

Example

About

Design Philosophy

Installation

Usage

Basic Extraction

Using the Extractor Class

Pipeline Options

Batch Processing

Performance Tips

Verbose Logging

How It Works

Implementation Notes

Citation

Contributing

License

Links

Related packages

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes