Skip to main content

A package to build chemical knowledge graphs using data from PubChem and Neo4j

Project description

ChemGraphBuilder DOI

chemgraphbuilder is a Python package designed for transforming chemical data into knowledge graphs.
It leverages PubChem for data extraction and Neo4j for building graph databases, enabling researchers to efficiently extract, process, and visualize complex chemical relationships with precision.
The package is designed for easy extension to include other data sources in future releases.


Table of Contents


Neo4j Requirements

chemgraphbuilder requires a running Neo4j database that is accessible via Bolt URI, username, and password.

You can run Neo4j:

  • Locally (Neo4j Desktop or Docker)
  • Remotely (Neo4j Aura Cloud)

Default Bolt port: 7687
Default Web UI port: 7474


🚀 Quick Start

Follow these steps to get up and running with chemgraphbuilder and Neo4j in under 5 minutes.

1️⃣ Install and Run Neo4j

Option A – Docker (fastest)

docker run \
  --name neo4j \
  -p 7474:7474 -p 7687:7687 \
  -e NEO4J_AUTH=neo4j/testpassword \
  neo4j:5.14

Option B – Neo4j Desktop

  1. Download from: https://neo4j.com/download/
  2. Create a new project and database.
  3. Note the Bolt URI, username, and password.

2️⃣ Install chemgraphbuilder

pip install chemgraphbuilder

Or visit the PyPI Project Page for the latest release.


3️⃣ Connect to Neo4j in Python

from chemgraphbuilder import Neo4jBase

# Connection details
NEO4J_URI = "bolt://localhost:7687"
NEO4J_USER = "neo4j"
NEO4J_PASSWORD = "testpassword"

# Connect and test
db = Neo4jBase(uri=NEO4J_URI, user=NEO4J_USER, password=NEO4J_PASSWORD)
db.test_connection()

4️⃣ First Example

# Create a simple test node
db.run_query("CREATE (:Test {name: 'Hello Neo4j'})")
print("Node created!")

Check in the Neo4j Browser:

MATCH (n) RETURN n;

Usage

From Python

from chemgraphbuilder.setup_data_folder import SetupDataFolder
from chemgraphbuilder.node_collector_processor import NodesCollectorProcessor
from chemgraphbuilder.relationship_collector_processor import RelationshipsCollectorProcessor
from chemgraphbuilder.graph_nodes_loader import GraphNodesLoader
from chemgraphbuilder.graph_relationships_loader import GraphRelationshipsLoader

# Setup data folder
setup_folder = SetupDataFolder()
setup_folder.setup()

# Collect nodes
collector = NodesCollectorProcessor(node_type=node_type, enzyme_list=enzyme_list, start_chunk=0)
collector.collect_and_process_data()

# Collect relationships
collector = RelationshipsCollectorProcessor(relationship_type=relationship_type, start_chunk=0)
collector.collect_relationship_data()

# Load nodes into Neo4j
graph_nodes_loader = GraphNodesLoader(uri, username, password)
graph_nodes_loader.load_data_for_node_type(label)
graph_nodes_loader.close()

# Load relationships into Neo4j
graph_relationships_loader = GraphRelationshipsLoader(uri, username, password)
graph_relationships_loader.add_relationships(relationship_type)
graph_relationships_loader.close()

From Command Line

setup-data-folder
collect-process-nodes --node_type Compound --enzyme_list gene1,gene2 --start_chunk 0
collect-process-relationships --relationship_type Assay_Compound --start_chunk 0
load-graph-nodes --uri bolt://localhost:7687 --username neo4j --password password --label Compound
load-graph_relationships --uri bolt://localhost:7687 --username neo4j --password password --relationship_type Assay_Gene

More examples: Usage Examples.


Features

  • Node Representation: Compounds, genes, proteins, bioassays.
  • Comprehensive Relationships: Includes assay-compound, assay-gene, compound similarity, co-occurrence, inhibitor/activator/ligand, etc.
  • Data Integration: Schema supports adding new sources.
  • Flexible Access: Command line & Python API.

Documentation

Full docs: ChemGraphBuilder Documentation


Contributing

Issues: GitHub Issues Pull requests welcome.


License

GPL-3.0 – see LICENSE.


Contact

Asmaa A. Abdelwahabasmaa.a.abdelwahab@gmail.com


Acknowledgments

  • PubChem – for chemical and bioassay data.
  • Neo4j – for graph database capabilities.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chemgraphbuilder-0.1.5.tar.gz (78.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chemgraphbuilder-0.1.5-py3-none-any.whl (59.7 kB view details)

Uploaded Python 3

File details

Details for the file chemgraphbuilder-0.1.5.tar.gz.

File metadata

  • Download URL: chemgraphbuilder-0.1.5.tar.gz
  • Upload date:
  • Size: 78.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for chemgraphbuilder-0.1.5.tar.gz
Algorithm Hash digest
SHA256 45a3daf9c3d4e2af0f417e38e2f6ddfd2ddd1e80c3afcf6b7a21c245b88d1197
MD5 3ca8915c8dbe70ab95b4ec78dbaf0f78
BLAKE2b-256 8c0139fc075becfd594acf0f4117f2ebe1950c128b12915c1d763814fff84a40

See more details on using hashes here.

File details

Details for the file chemgraphbuilder-0.1.5-py3-none-any.whl.

File metadata

File hashes

Hashes for chemgraphbuilder-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 638e2494f40c4a4f603e48302547db85d4f84686dfef01371d2b88be4ad77637
MD5 219ae464fdfa3a948e2d0693be15625c
BLAKE2b-256 22bdb2e8b6bc6bb4f6a95a0312185f0176d8d6b15d72884012bc044c7a698ee9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page