A package to build chemical knowledge graphs using data from PubChem and Neo4j
Project description
ChemGraphBuilder 
chemgraphbuilder is a Python package designed for transforming chemical data into knowledge graphs. Leveraging PubChem for data extraction and Neo4j for building graph databases, it enables researchers to efficiently extract, process, and visualize complex chemical relationships with precision. The package is built in a way that allows for easy extension to include other data sources in future releases.
Table of Contents
Installation
To install ChemGraphBuilder, use pip:
pip install chemgraphbuilder
You can visit this page to get the installation command: PyPI Project Page
Usage
From Python
from chemgraphbuilder.setup_data_folder import SetupDataFolder
from chemgraphbuilder.node_collector_processor import NodesCollectorProcessor
from chemgraphbuilder.relationship_collector_processor import RelationshipsCollectorProcessor
from chemgraphbuilder.graph_nodes_loader import GraphNodesLoader
from chemgraphbuilder.graph_relationships_loader import GraphRelationshipsLoader
# Initialize and setup the data directory before collecting any data
setup_folder = SetupDataFolder()
setup_folder.setup()
# Initialize the collector & Collect and process the data
collector = NodesCollectorProcessor(node_type=node_type, enzyme_list=enzyme_list, start_chunk=0)
collector.collect_and_process_data()
# Initialize the collector & Collect and process the relationship data
collector = RelationshipsCollectorProcessor(relationship_type=relationship_type, start_chunk=0)
collector.collect_relationship_data()
# Initialize the loader & load nodes into neo4j database
graph_nodes_loader = GraphNodesLoader(uri, username, password)
graph_nodes_loader.load_data_for_node_type(label)
graph_nodes_loader.close()
# Initialize the loader & load relationships into neo4j database
graph_relationships_loader = GraphRelationshipsLoader(uri, username, password)
graph_relationships_loader.add_relationships(relationship_type)
graph_relationships_loader.close()
From Command Line
setup-data-folder
collect-process-nodes --node_type Compound --enzyme_list gene1,gene2 --start_chunk 0 # the default start-chunk is 0
collect-process-relationships --relationship_type Assay_Compound --start_chunk 0
load-graph-nodes --uri bolt://localhost:7687 --username neo4j --password password --label Compound
load-graph_relationships --uri bolt://localhost:7687 --username neo4j --password password --relationship_type Assay_Gene
For more detailed examples, visit the Usage Examples.
Features
- Node Representation: Incorporates diverse nodes such as compounds, genes, proteins, and bioassays.
- Comprehensive Relationships: Maps out various interactions, including gene-protein relationships, bioassay-gene relationships, bioassay-compound relationships, compound similarities, compound co-occurrences in literature, and more nuanced interactions like inhibitor, activator, ligand, and other roles between compounds and genes.
- Data Integration: The knowledge graph schema is designed to support the integration of additional data sources, enhancing the depth and accuracy of the knowledge graph.
- Command Line and Programmatic Access: Provides flexibility in usage, allowing for integration into larger workflows or standalone analyses.
Documentation
Full documentation is available at ChemGraphBuilder Documentation.
Contributing
Contributions are welcome! If any issues are found or suggestions for improvements arise, they can be reported via the GitHub Issues page. Contributions to the codebase through pull requests are also encouraged.
License
This project is licensed under the GPL-3.0 License - see the LICENSE file for details.
Contact
For questions or support, please contact Asmaa A. Abdelwahab.
Acknowledgments
This project utilizes the PubChem Database and its API for accessing chemical and bioassay data. We acknowledge the efforts of the PubChem team for maintaining such a valuable resource.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file chemgraphbuilder-0.1.1.tar.gz.
File metadata
- Download URL: chemgraphbuilder-0.1.1.tar.gz
- Upload date:
- Size: 76.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5f7ff6c395618a982c0f9994ff06034e9ca14738fd9f3ec9ad9c3759f80c81f5
|
|
| MD5 |
2ccae80f517e4dc23f42ba529078dd1c
|
|
| BLAKE2b-256 |
262bbb2dd39c71ac5ead78ab2ce6dfe1e2d1a415a87ac7de47745b6a5f5d9bf0
|
File details
Details for the file chemgraphbuilder-0.1.1-py3-none-any.whl.
File metadata
- Download URL: chemgraphbuilder-0.1.1-py3-none-any.whl
- Upload date:
- Size: 58.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d3c1f4385941226dd614a804ffc6c0a8789c5f9d569a82b9a688ee0c34f4b60b
|
|
| MD5 |
9140d0929c675b433c1dd04a1e61996e
|
|
| BLAKE2b-256 |
9186c8d0cd60456a8bb66c23c1822dd81a6e11a8030504cdda659d85cdffe0c7
|