AWS GraphRAG Toolkit, lexical graph
Project description
Lexical Graph
The lexical-graph package provides a framework for automating the construction of a hierarchical lexical graph from unstructured data, and composing question-answering strategies that query this graph when answering user questions.
Features
- Built-in graph store support for Amazon Neptune Analytics, Amazon Neptune Database, and Neo4j.
- Built-in vector store support for Neptune Analytics, Amazon OpenSearch Serverless, Amazon S3 Vectors and Postgres with the pgvector extension.
- Built-in support for foundation models (LLMs and embedding models) on Amazon Bedrock.
- Easily extended to support additional graph and vector stores and model backends.
- Multi-tenancy – multiple separate lexical graphs in the same underlying graph and vector stores.
- Continuous ingest and batch extraction (using Bedrock batch inference) modes.
- Versioned updates for updating source documents and querying the state of the graph and vector stores at a point in time.
- Quickstart AWS CloudFormation templates for Neptune Database, OpenSearch Serverless, and Amazon Aurora Postgres.
Installation
The lexical-graph requires Python 3.10 or greater and pip.
Install from the latest release tag:
$ pip install https://github.com/awslabs/graphrag-toolkit/archive/refs/tags/v3.16.2.zip#subdirectory=lexical-graph
Or install from the main branch to get the latest changes:
$ pip install https://github.com/awslabs/graphrag-toolkit/archive/refs/heads/main.zip#subdirectory=lexical-graph
If you're running on AWS, you must run your application in an AWS region containing the Amazon Bedrock foundation models used by the lexical graph (see the configuration section in the documentation for details on the default models used), and must enable access to these models before running any part of the solution.
Additional dependencies
You will need to install additional dependencies for specific graph and vector store backends:
Amazon OpenSearch Serverless
$ pip install opensearch-py llama-index-vector-stores-opensearch
Postgres with pgvector
$ pip install psycopg2-binary pgvector
Neo4j
$ pip install neo4j
Connection strings
Pass a connection string to GraphStoreFactory.for_graph_store() or VectorStoreFactory.for_vector_store() to select a backend:
| Store | Connection string |
|---|---|
| Neptune Analytics (graph) | neptune-graph://<graph-id> |
| Neptune Database (graph) | neptune-db://<hostname> or any hostname ending .neptune.amazonaws.com |
| Neo4j (graph) | bolt://, bolt+ssc://, bolt+s://, neo4j://, neo4j+ssc://, or neo4j+s:// URLs |
| OpenSearch Serverless (vector) | aoss://<url> |
| Neptune Analytics (vector) | neptune-graph://<graph-id> |
| pgvector (vector) | constructed via PGVectorIndexFactory |
| S3 Vectors (vector) | constructed via S3VectorIndexFactory |
| Dummy / no-op | None or any unrecognised string — falls back to DummyGraphStore / DummyVectorIndex |
Example of use
Indexing
from graphrag_toolkit.lexical_graph import LexicalGraphIndex
from graphrag_toolkit.lexical_graph.storage import GraphStoreFactory
from graphrag_toolkit.lexical_graph.storage import VectorStoreFactory
# requires pip install llama-index-readers-web
from llama_index.readers.web import SimpleWebPageReader
def run_extract_and_build():
with (
GraphStoreFactory.for_graph_store(
'neptune-db://my-graph.cluster-abcdefghijkl.us-east-1.neptune.amazonaws.com'
) as graph_store,
VectorStoreFactory.for_vector_store(
'aoss://https://abcdefghijkl.us-east-1.aoss.amazonaws.com'
) as vector_store
):
graph_index = LexicalGraphIndex(
graph_store,
vector_store
)
doc_urls = [
'https://docs.aws.amazon.com/neptune/latest/userguide/intro.html',
'https://docs.aws.amazon.com/neptune-analytics/latest/userguide/what-is-neptune-analytics.html',
'https://docs.aws.amazon.com/neptune-analytics/latest/userguide/neptune-analytics-features.html',
'https://docs.aws.amazon.com/neptune-analytics/latest/userguide/neptune-analytics-vs-neptune-database.html'
]
docs = SimpleWebPageReader(
html_to_text=True,
metadata_fn=lambda url:{'url': url}
).load_data(doc_urls)
graph_index.extract_and_build(docs, show_progress=True)
if __name__ == '__main__':
run_extract_and_build()
Querying
from graphrag_toolkit.lexical_graph import LexicalGraphQueryEngine
from graphrag_toolkit.lexical_graph.storage import GraphStoreFactory
from graphrag_toolkit.lexical_graph.storage import VectorStoreFactory
def run_query():
with (
GraphStoreFactory.for_graph_store(
'neptune-db://my-graph.cluster-abcdefghijkl.us-east-1.neptune.amazonaws.com'
) as graph_store,
VectorStoreFactory.for_vector_store(
'aoss://https://abcdefghijkl.us-east-1.aoss.amazonaws.com'
) as vector_store
):
query_engine = LexicalGraphQueryEngine.for_traversal_based_search(
graph_store,
vector_store
)
response = query_engine.query('''What are the differences between Neptune Database
and Neptune Analytics?''')
print(response.response)
if __name__ == '__main__':
run_query()
Documentation
Release
Release instructions are found in the RELEASE.md
License
This project is licensed under the Apache-2.0 License.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file graphrag_lexical_graph-3.18.2.tar.gz.
File metadata
- Download URL: graphrag_lexical_graph-3.18.2.tar.gz
- Upload date:
- Size: 372.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1f74c4558962b4fd6f5dfee35fa1f70938b357630c9f3ea33cb21a6d3e1c4173
|
|
| MD5 |
a9cfb0a885861dbaba58f787b1ef4542
|
|
| BLAKE2b-256 |
d6041e69fe454631210ca45fa32c551c8d56e660ac4bb496c0cb94d230cdaf99
|
Provenance
The following attestation bundles were made for graphrag_lexical_graph-3.18.2.tar.gz:
Publisher:
lexical-graph-release.yml on awslabs/graphrag-toolkit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
graphrag_lexical_graph-3.18.2.tar.gz -
Subject digest:
1f74c4558962b4fd6f5dfee35fa1f70938b357630c9f3ea33cb21a6d3e1c4173 - Sigstore transparency entry: 1186780181
- Sigstore integration time:
-
Permalink:
awslabs/graphrag-toolkit@937d42ec641c4f533b1423599e59aa41e2eb426f -
Branch / Tag:
refs/tags/graphrag-lexical-graph/v3.18.2 - Owner: https://github.com/awslabs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
lexical-graph-release.yml@937d42ec641c4f533b1423599e59aa41e2eb426f -
Trigger Event:
release
-
Statement type:
File details
Details for the file graphrag_lexical_graph-3.18.2-py3-none-any.whl.
File metadata
- Download URL: graphrag_lexical_graph-3.18.2-py3-none-any.whl
- Upload date:
- Size: 414.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4d87934148d99e92ed7bc09ad46a287cafb3af527b587fd55a0427733b691933
|
|
| MD5 |
7de49cc77d0b0d777d8b0a8a7da712e2
|
|
| BLAKE2b-256 |
394f43c53fbbecffc35af86cec6a0f4e33571ca2ed2e5706992c6f03afc0ff80
|
Provenance
The following attestation bundles were made for graphrag_lexical_graph-3.18.2-py3-none-any.whl:
Publisher:
lexical-graph-release.yml on awslabs/graphrag-toolkit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
graphrag_lexical_graph-3.18.2-py3-none-any.whl -
Subject digest:
4d87934148d99e92ed7bc09ad46a287cafb3af527b587fd55a0427733b691933 - Sigstore transparency entry: 1186780183
- Sigstore integration time:
-
Permalink:
awslabs/graphrag-toolkit@937d42ec641c4f533b1423599e59aa41e2eb426f -
Branch / Tag:
refs/tags/graphrag-lexical-graph/v3.18.2 - Owner: https://github.com/awslabs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
lexical-graph-release.yml@937d42ec641c4f533b1423599e59aa41e2eb426f -
Trigger Event:
release
-
Statement type: