Skip to main content

A package to build chemical knowledge graphs using data from PubChem and Neo4j

Project description

ChemGraphBuilder DOI

chemgraphbuilder is a Python package designed for transforming chemical data into knowledge graphs. Leveraging PubChem for data extraction and Neo4j for building graph databases, it enables researchers to efficiently extract, process, and visualize complex chemical relationships with precision. The package is built in a way that allows for easy extension to include other data sources in future releases.

Table of Contents

  1. Installation
  2. Usage
  3. Features
  4. Documentation
  5. Contributing
  6. License
  7. Contact
  8. Acknowledgments

Installation

To install ChemGraphBuilder, use pip:

pip install chemgraphbuilder

You can visit this page to get the installation command: PyPI Project Page

Usage

From Python

from chemgraphbuilder.setup_data_folder import SetupDataFolder
from chemgraphbuilder.node_collector_processor import NodesCollectorProcessor
from chemgraphbuilder.relationship_collector_processor import RelationshipsCollectorProcessor
from chemgraphbuilder.graph_nodes_loader import GraphNodesLoader
from chemgraphbuilder.graph_relationships_loader import GraphRelationshipsLoader

# Initialize and setup the data directory before collecting any data
setup_folder = SetupDataFolder()
setup_folder.setup()

# Initialize the collector & Collect and process the data
collector = NodesCollectorProcessor(node_type=node_type, enzyme_list=enzyme_list, start_chunk=0)
collector.collect_and_process_data()

# Initialize the collector & Collect and process the relationship data
collector = RelationshipsCollectorProcessor(relationship_type=relationship_type, start_chunk=0)
collector.collect_relationship_data()

# Initialize the loader & load nodes into neo4j database
graph_nodes_loader = GraphNodesLoader(uri, username, password)
graph_nodes_loader.load_data_for_node_type(label)
graph_nodes_loader.close()

# Initialize the loader & load relationships into neo4j database
graph_relationships_loader = GraphRelationshipsLoader(uri, username, password)
graph_relationships_loader.add_relationships(relationship_type)
graph_relationships_loader.close()

From Command Line

setup-data-folder
collect-process-nodes --node_type Compound --enzyme_list gene1,gene2 --start_chunk 0 # the default start-chunk is 0
collect-process-relationships --relationship_type Assay_Compound --start_chunk 0
load-graph-nodes --uri bolt://localhost:7687 --username neo4j --password password --label Compound
load-graph_relationships --uri bolt://localhost:7687 --username neo4j --password password --relationship_type Assay_Gene

For more detailed examples, visit the Usage Examples.

Features

  • Node Representation: Incorporates diverse nodes such as compounds, genes, proteins, and bioassays.
  • Comprehensive Relationships: Maps out various interactions, including gene-protein relationships, bioassay-gene relationships, bioassay-compound relationships, compound similarities, compound co-occurrences in literature, and more nuanced interactions like inhibitor, activator, ligand, and other roles between compounds and genes.
  • Data Integration: The knowledge graph schema is designed to support the integration of additional data sources, enhancing the depth and accuracy of the knowledge graph.
  • Command Line and Programmatic Access: Provides flexibility in usage, allowing for integration into larger workflows or standalone analyses.

Documentation

Full documentation is available at ChemGraphBuilder Documentation.

Contributing

Contributions are welcome! If any issues are found or suggestions for improvements arise, they can be reported via the GitHub Issues page. Contributions to the codebase through pull requests are also encouraged.

License

This project is licensed under the GPL-3.0 License - see the LICENSE file for details.

Contact

For questions or support, please contact Asmaa A. Abdelwahab.

Acknowledgments

This project utilizes the PubChem Database and its API for accessing chemical and bioassay data. We acknowledge the efforts of the PubChem team for maintaining such a valuable resource.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chemgraphbuilder-0.1.1.tar.gz (76.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chemgraphbuilder-0.1.1-py3-none-any.whl (58.6 kB view details)

Uploaded Python 3

File details

Details for the file chemgraphbuilder-0.1.1.tar.gz.

File metadata

  • Download URL: chemgraphbuilder-0.1.1.tar.gz
  • Upload date:
  • Size: 76.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.6

File hashes

Hashes for chemgraphbuilder-0.1.1.tar.gz
Algorithm Hash digest
SHA256 5f7ff6c395618a982c0f9994ff06034e9ca14738fd9f3ec9ad9c3759f80c81f5
MD5 2ccae80f517e4dc23f42ba529078dd1c
BLAKE2b-256 262bbb2dd39c71ac5ead78ab2ce6dfe1e2d1a415a87ac7de47745b6a5f5d9bf0

See more details on using hashes here.

File details

Details for the file chemgraphbuilder-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for chemgraphbuilder-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d3c1f4385941226dd614a804ffc6c0a8789c5f9d569a82b9a688ee0c34f4b60b
MD5 9140d0929c675b433c1dd04a1e61996e
BLAKE2b-256 9186c8d0cd60456a8bb66c23c1822dd81a6e11a8030504cdda659d85cdffe0c7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page