Open-domain text-to-graph extractor with entities, relations, schema inference, and Neo4j export.

These details have not been verified by PyPI

Project description

rapidGraph

rapidGraph is a local-first, open-domain text-to-graph extractor for arbitrary text. It turns raw text files or inline text into structured JSON containing:

entities
relations
potential_schema
expanded_schema
provenance-aware documents, chunks, and relation_support

It is designed for:

general entity and relation extraction across business, technical, scientific, and mixed-topic text
CPU-friendly local runs with selectable quality modes
provenance-aware graph building for future RAG or GraphRAG pipelines
optional direct Neo4j ingestion

The public distribution name is rapidGraph, the Python import package is rapidgraph, and the installed CLI command is rapidgraph.

What It Does

At a high level, rapidGraph:

normalizes raw text
splits it into chunked spans
extracts entity candidates
extracts relation candidates
canonicalizes duplicate or near-duplicate entity mentions
links relation endpoints back to canonical entities
infers schema patterns from the final graph
preserves chunk/document provenance for downstream graph and retrieval use

The extractor is open-domain best effort. It does not enforce a fixed ontology and keeps Unknown types when typing confidence is weak.

Core Features

Open-domain entity extraction
Open-domain relation extraction
Schema inference from observed graph edges
Provenance-aware output with documents, chunks, and relation support records
Multi-file corpus ingestion in one run
Two canonicalization scopes:
- document: keep each file independent
- corpus: merge compatible entities across files
Three CPU-aware execution modes:
- fast
- balanced
- quality
Optional embedding-assisted canonicalization and linking
Optional Neo4j export

Install

Install from source:

pip install .

Install with optional extras:

pip install ".[neo4j]"
pip install ".[embeddings]"
pip install ".[dev]"
pip install ".[neo4j,embeddings,dev]"

After publishing to PyPI, users will be able to install with:

pip install rapidGraph

PyPI extras will work the same way:

pip install "rapidGraph[neo4j]"
pip install "rapidGraph[embeddings]"
pip install "rapidGraph[dev]"

CLI Quick Start

Show help:

rapidgraph --help

Process inline text:

rapidgraph --text "Google is based in California." --pretty

Process one file:

rapidgraph --input input.txt --pretty

Process multiple files:

rapidgraph --input input.txt input2.txt --pretty

Write output to JSON:

rapidgraph --input input.txt --output graph.json --pretty

The repo-root compatibility command still works:

python extract_graph.py --input input.txt --pretty

Execution Modes

rapidGraph supports three relation extraction modes.

`fast`

Best for:

CPU-only quick passes
bulk experiments
basic graph drafts

Behavior:

uses GLiNER and heuristics
does not run REBEL
fastest startup and lowest CPU cost

`balanced`

This is the default mode.

Best for:

normal CPU usage
better relation quality without full model cost

Behavior:

runs heuristics everywhere
runs REBEL only on shortlisted high-value spans
usually the best tradeoff

`quality`

Best for:

maximum relation recall
slower offline analysis
smaller corpora where quality matters more than throughput

Behavior:

runs REBEL across all chunks
highest model cost

Input Model

The CLI accepts either:

--text "..." for inline text
--input file1.txt [file2.txt ...] for one or more text files

--text and --input are mutually exclusive.

Output Model

The extractor returns one combined JSON object with these top-level fields.

`entities`

Each entity includes:

id
text
canonical
type
confidence
mentions

Each mention includes:

text
start
end
chunk_index
document_id
chunk_id

`relations`

Each relation includes:

source_id
target_id
relation
confidence
evidence
chunk_ids
document_ids

`potential_schema`

Strict schema aggregation using:

(source_type, relation, target_type)

This is the backward-compatible schema view.

`expanded_schema`

Richer schema aggregation using finer-grained normalized types and more examples.

`documents`

One document row per input source:

id
source
title
text_hash
char_count

`chunks`

Each chunk includes:

id
document_id
index
text unless omitted
start
end
block_index
overlap_sentences

`relation_support`

One row per final relation edge with merged provenance:

source_id
relation
target_id
chunk_ids
document_ids
evidence

`meta`

Includes model names, thresholds, chunk counts, mode, embedding stats, relation backend stats, warnings, and processing time.

Flag Reference

Input and Output Flags

`--text TEXT`

Inline text input.

Example:

rapidgraph --text "Transformer uses self-attention." --pretty

`--input INPUT [INPUT ...]`

One or more UTF-8 text files.

Examples:

rapidgraph --input input.txt
rapidgraph --input input.txt input2.txt

`--output OUTPUT`

Write JSON to a file instead of stdout.

Example:

rapidgraph --input input.txt --output graph.json --pretty

`--pretty`

Pretty-print JSON output.

Quality and Runtime Flags

`--mode {fast,balanced,quality}`

Controls the CPU and quality tradeoff.

Examples:

rapidgraph --input input.txt --mode fast
rapidgraph --input input.txt --mode balanced
rapidgraph --input input.txt --mode quality

`--disable-rebel`

Forces heuristic-only relation extraction even if the mode would otherwise use REBEL.

Example:

rapidgraph --input input.txt --mode quality --disable-rebel

`--max-model-spans MAX_MODEL_SPANS`

Only used meaningfully in balanced mode. Caps the number of shortlisted spans sent to REBEL.

Example:

rapidgraph --input input.txt --mode balanced --max-model-spans 6

Extraction Threshold Flags

`--entity-threshold ENTITY_THRESHOLD`

Minimum confidence used to keep entity candidates.

Example:

rapidgraph --input input.txt --entity-threshold 0.45

`--relation-threshold RELATION_THRESHOLD`

Minimum confidence used to keep relations.

Example:

rapidgraph --input input.txt --relation-threshold 0.3

`--max-chars MAX_CHARS`

Chunk size budget. Larger values preserve more context but cost more runtime.

Example:

rapidgraph --input input.txt --max-chars 1400

Chunking Flags

`--chunk-mode {paragraph,sentence}`

Controls chunk construction.

paragraph: structure-aware paragraph-first chunking
sentence: simpler sentence packing

Example:

rapidgraph --input input.txt --chunk-mode paragraph
rapidgraph --input input.txt --chunk-mode sentence

`--chunk-overlap CHUNK_OVERLAP`

Sentence overlap between neighboring chunks. Higher values preserve context across chunk boundaries but increase redundancy.

Example:

rapidgraph --input input.txt --chunk-overlap 2

Multi-File and Canonicalization Flags

`--entity-scope {document,corpus}`

Controls how entities are canonicalized across multiple files.

document: identical entities in different files stay separate
corpus: compatible entities can merge across files

Examples:

rapidgraph --input input.txt input2.txt --entity-scope document
rapidgraph --input input.txt input2.txt --entity-scope corpus

Use document when:

document-local provenance matters most
names are ambiguous across files
you want a safer default

Use corpus when:

the files are about a shared topic
you want a consolidated graph across the corpus
you plan to export one merged graph to Neo4j

Provenance Flags

`--include-chunk-text`

Include full chunk text in the chunks array. This is the default.

`--no-include-chunk-text`

Keep chunk records but omit chunk text.

`--omit-provenance-text`

Alias for omitting chunk text while preserving chunk IDs and metadata.

Examples:

rapidgraph --input input.txt --no-include-chunk-text
rapidgraph --input input.txt --omit-provenance-text

Embedding-Assisted Linking Flags

These are opt-in. They are not enabled by default.

`--embedding-linking`

Enable embedding-assisted rescue for ambiguous entity merges and unresolved relation endpoints.

`--embedding-model EMBEDDING_MODEL`

Sentence embedding model to use. Default:

sentence-transformers/all-MiniLM-L6-v2

`--embedding-threshold EMBEDDING_THRESHOLD`

Cosine similarity threshold for accepting embedding-based merges or links.

`--embedding-cache-dir EMBEDDING_CACHE_DIR`

Local cache directory for embedding vectors.

`--embedding-max-candidates EMBEDDING_MAX_CANDIDATES`

Caps the candidate pool used during embedding-assisted linking.

Examples:

rapidgraph \
  --input input.txt \
  --embedding-linking \
  --embedding-threshold 0.84 \
  --embedding-cache-dir .cache/extract_graph_embeddings

rapidgraph \
  --input input.txt input2.txt \
  --entity-scope corpus \
  --embedding-linking \
  --embedding-max-candidates 8

Neo4j Flags

These flags are optional. If omitted, the extractor only emits JSON.

`--neo4j-uri NEO4J_URI`

Neo4j URI such as:

neo4j://127.0.0.1:7687

`--neo4j-user NEO4J_USER`

Neo4j username.

`--neo4j-password NEO4J_PASSWORD`

Neo4j password.

`--neo4j-database NEO4J_DATABASE`

Target Neo4j database name.

`--neo4j-clean-document`

Delete matching document subgraphs before re-ingesting them. Useful when rerunning the same document set.

Example:

rapidgraph \
  --input input.txt input2.txt \
  --mode quality \
  --entity-scope corpus \
  --neo4j-uri neo4j://127.0.0.1:7687 \
  --neo4j-user neo4j \
  --neo4j-password 12345678 \
  --neo4j-database neo4j \
  --neo4j-clean-document

Logging Flag

`--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}`

Controls CLI log verbosity.

Example:

rapidgraph --input input.txt --log-level DEBUG

Recommended Flag Combinations

Quick CPU pass

rapidgraph --input input.txt --mode fast --pretty

Best default for most users

rapidgraph --input input.txt --mode balanced --pretty

Higher recall on one document

rapidgraph --input input.txt --mode quality --chunk-overlap 2 --pretty

Multi-file corpus graph

rapidgraph \
  --input input.txt input2.txt \
  --mode balanced \
  --entity-scope corpus \
  --pretty

Multi-file corpus with stronger cross-file merging

rapidgraph \
  --input input.txt input2.txt \
  --mode balanced \
  --entity-scope corpus \
  --embedding-linking \
  --pretty

Lean provenance payload

rapidgraph \
  --input input.txt \
  --omit-provenance-text \
  --pretty

Neo4j export with replacement of existing document graph

rapidgraph \
  --input input.txt input2.txt \
  --mode quality \
  --entity-scope corpus \
  --neo4j-uri neo4j://127.0.0.1:7687 \
  --neo4j-user neo4j \
  --neo4j-password 12345678 \
  --neo4j-database neo4j \
  --neo4j-clean-document

Python Library Usage

Basic usage:

from rapidgraph import DocumentInput, build_default_extractor

extractor = build_default_extractor(mode="balanced")
result = extractor.extract_documents(
    [
        DocumentInput(
            text="Google is based in California.",
            source="one.txt",
            title="one.txt",
        ),
        DocumentInput(
            text="Google hired Sundar Pichai.",
            source="two.txt",
            title="two.txt",
        ),
    ],
    entity_scope="corpus",
)

print(result.model_dump())

Neo4j Graph Shape

When Neo4j export is enabled, the graph is designed to remain compatible with future GraphRAG workflows.

Current node labels:

Document
Chunk
Entity

Current relationship types:

HAS_CHUNK
MENTIONS
RELATES_TO

The semantic relation name is stored as a property on RELATES_TO, which is why Neo4j Browser shows one relationship type while preserving relation semantics in properties.

Packaging

Build distributions:

python -m build

Validate package metadata:

python -m twine check dist/*

Install from a built wheel:

pip install dist/rapidgraph-0.1.0-py3-none-any.whl

Publishing to PyPI

Create a PyPI account, generate an API token, then upload:

python -m twine upload dist/*

If the rapidGraph name is accepted on PyPI, users will be able to install with:

pip install rapidGraph

Development

Install dev dependencies:

pip install ".[dev]"

Run tests:

pytest -q tests/test_extract_graph.py

Build the package:

python -m build

License

MIT

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.1

Apr 26, 2026

0.2.0

Apr 26, 2026

This version

0.1.0

Apr 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rapidgraph-0.1.0.tar.gz (37.0 kB view details)

Uploaded Apr 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rapidgraph-0.1.0-py3-none-any.whl (27.9 kB view details)

Uploaded Apr 25, 2026 Python 3

File details

Details for the file rapidgraph-0.1.0.tar.gz.

File metadata

Download URL: rapidgraph-0.1.0.tar.gz
Upload date: Apr 25, 2026
Size: 37.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for rapidgraph-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`ad11ba1606ee59dd7fb251a14e6d70f88bb1b28a95f403867c36f1f241ebd393`
MD5	`d3ee730a94084cae3769882d77015fa5`
BLAKE2b-256	`140ef3f6db7fc31a9d7cd38ef2f1380ce3318d1426ff11f694c87b3d76543b9a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for rapidgraph-0.1.0.tar.gz:

Publisher: publish.yml on Chillthrower/rapidGraph

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: rapidgraph-0.1.0.tar.gz
- Subject digest: ad11ba1606ee59dd7fb251a14e6d70f88bb1b28a95f403867c36f1f241ebd393
- Sigstore transparency entry: 1382926479
- Sigstore integration time: Apr 25, 2026
Source repository:
- Permalink: Chillthrower/rapidGraph@6e4756195587f18821bc92574545b338ff298f8a
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/Chillthrower
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6e4756195587f18821bc92574545b338ff298f8a
- Trigger Event: release

File details

Details for the file rapidgraph-0.1.0-py3-none-any.whl.

File metadata

Download URL: rapidgraph-0.1.0-py3-none-any.whl
Upload date: Apr 25, 2026
Size: 27.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for rapidgraph-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`03e12128c5bb1e485c6773621ef3b231d0cb97861810b63d74d56b2839e5dcd9`
MD5	`4823473f221f5ca2e7205c4d8ce36c58`
BLAKE2b-256	`d8b932734779e17f37392c6625cbcb6624b6cc94edf9e6964816185e6d96ae7a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for rapidgraph-0.1.0-py3-none-any.whl:

Publisher: publish.yml on Chillthrower/rapidGraph

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: rapidgraph-0.1.0-py3-none-any.whl
- Subject digest: 03e12128c5bb1e485c6773621ef3b231d0cb97861810b63d74d56b2839e5dcd9
- Sigstore transparency entry: 1382926516
- Sigstore integration time: Apr 25, 2026
Source repository:
- Permalink: Chillthrower/rapidGraph@6e4756195587f18821bc92574545b338ff298f8a
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/Chillthrower
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6e4756195587f18821bc92574545b338ff298f8a
- Trigger Event: release

rapidGraph 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

rapidGraph

What It Does

Core Features

Install

CLI Quick Start

Execution Modes

fast

balanced

quality

Input Model

Output Model

entities

relations

potential_schema

expanded_schema

documents

chunks

relation_support

meta

Flag Reference

Input and Output Flags

--text TEXT

--input INPUT [INPUT ...]

--output OUTPUT

--pretty

Quality and Runtime Flags

--mode {fast,balanced,quality}

--disable-rebel

--max-model-spans MAX_MODEL_SPANS

Extraction Threshold Flags

--entity-threshold ENTITY_THRESHOLD

--relation-threshold RELATION_THRESHOLD

--max-chars MAX_CHARS

Chunking Flags

--chunk-mode {paragraph,sentence}

--chunk-overlap CHUNK_OVERLAP

Multi-File and Canonicalization Flags

--entity-scope {document,corpus}

Provenance Flags

--include-chunk-text

--no-include-chunk-text

--omit-provenance-text

Embedding-Assisted Linking Flags

--embedding-linking

--embedding-model EMBEDDING_MODEL

--embedding-threshold EMBEDDING_THRESHOLD

--embedding-cache-dir EMBEDDING_CACHE_DIR

--embedding-max-candidates EMBEDDING_MAX_CANDIDATES

Neo4j Flags

--neo4j-uri NEO4J_URI

--neo4j-user NEO4J_USER

--neo4j-password NEO4J_PASSWORD

--neo4j-database NEO4J_DATABASE

--neo4j-clean-document

Logging Flag

--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}

Recommended Flag Combinations

Quick CPU pass

Best default for most users

Higher recall on one document

Multi-file corpus graph

Multi-file corpus with stronger cross-file merging

Lean provenance payload

Neo4j export with replacement of existing document graph

Python Library Usage

Neo4j Graph Shape

Packaging

Publishing to PyPI

Development

License

Project details

Verified details

Maintainers

`fast`

`balanced`

`quality`

`entities`

`relations`

`potential_schema`

`expanded_schema`

`documents`

`chunks`

`relation_support`

`meta`

`--text TEXT`

`--input INPUT [INPUT ...]`

`--output OUTPUT`

`--pretty`

`--mode {fast,balanced,quality}`

`--disable-rebel`

`--max-model-spans MAX_MODEL_SPANS`

`--entity-threshold ENTITY_THRESHOLD`

`--relation-threshold RELATION_THRESHOLD`

`--max-chars MAX_CHARS`

`--chunk-mode {paragraph,sentence}`

`--chunk-overlap CHUNK_OVERLAP`

`--entity-scope {document,corpus}`

`--include-chunk-text`

`--no-include-chunk-text`

`--omit-provenance-text`

`--embedding-linking`

`--embedding-model EMBEDDING_MODEL`

`--embedding-threshold EMBEDDING_THRESHOLD`

`--embedding-cache-dir EMBEDDING_CACHE_DIR`

`--embedding-max-candidates EMBEDDING_MAX_CANDIDATES`

`--neo4j-uri NEO4J_URI`

`--neo4j-user NEO4J_USER`

`--neo4j-password NEO4J_PASSWORD`

`--neo4j-database NEO4J_DATABASE`

`--neo4j-clean-document`

`--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}`