Skip to main content

A python package for integrating data from multiple resources

Project description

pyBioDataFuse

PyPI PyPI - Python Version PyPI - License Codecov status Cookiecutter template from @cthoyt Code style: black Contributor Covenant

💪 Getting Started

We introduce BioDataFuse, a query-based Python tool for seamless integration of biomedical databases. BioDataFuse establishes a modular framework for efficient data wrangling, enabling context-specific knowledge graph creation and supporting graph-based analyses. With a user-friendly interface, it enables users to dynamically create knowledge graphs from their input data. Supported by a robust Python package, pyBiodatafuse, this tool excels in data harmonization, aggregating diverse sources through modular queries. Moreover, BioDataFuse provides plugin capabilities for Cytoscape and Neo4j, allowing local graph hosting. Ongoing refinements enhance the graph utility through tasks like link prediction, making BioDataFuse a versatile solution for efficient and effective biological data integration.

To know more about the package, read our documentation here.

Creating your own graph

To generate your own graph, check out our tutorial notebook in examples.

We support exporting of the graphs in Cytoscape, Neo4J and GraphDB. You can use the following functions:

# on neo4j
neo4j.load_graph(pygraph, uri="bolt://localhost:7687", username="YOUR_USERNAME", password="YOUR_PASSWORD")  # change username and password

# on cytoscape
cytoscape.load_graph(pygraph, network_name="YOUR_CUSTOM_NAME")

# rdf ttl files
bdf = BDFGraph(
    base_uri="https://biodatafuse.org/YOUR_CUSTOM_NAME/",
    version_iri="https://biodatafuse.org/example/YOUR_CUSTOM_NAME.ttl",
    orcid="YOUR_ORCID",
    author="YOUR_NAME",
)

bdf.generate_rdf(combined_df, combined_metadata)  # Generate the RDF from the (meta)data files from the example runs
bdf.serialize(
    "YOUR_CUSTOM_NAME.ttl",
    format="ttl",
)

🚀 Installation

The most recent release can be installed from PyPI with:

$ pip install pyBiodatafuse

The most recent code and data can be installed directly from GitHub with:

$ pip install git+https://github.com/BioDataFuse/pyBiodatafuse.git

👐 Contributing

Contributions, whether filing an issue, making a pull request, or forking, are appreciated. See CONTRIBUTING.md for more information on getting involved.

👋 Attribution

⚖️ License

The code in this package is licensed under the MIT License.

📖 Citation

The work was started as part of the Elixir BioHackathon 2023 integrating and bringing together multiple Core Data Resources together.

Gadiya, Y., Ammar, A., Willighagen, E., Martinat, D., Sima, A. C., Balci, H., & Abbassi Daloii, T. (2023). BioHackEU23 report: Extending interoperability of experimental data using modular queries across biomedical resources. BioHackrXiv Preprints. https://doi.org/10.37044/osf.io/mhsqp

🍪 Cookiecutter

This package was created with @audreyfeldroy's cookiecutter package using @cthoyt's cookiecutter-snekpack template.

🛠️ For Developers

See developer instructions

The final section of the README is for if you want to get involved by making a code contribution.

Development Installation

To install in development mode, use the following:

$ git clone git+https://github.com/BioDataFuse/pyBiodatafuse.git
$ cd pyBiodatafuse
$ pip install -e .

🥼 Testing

After cloning the repository and installing tox with pip install tox, the unit tests in the tests/ folder can be run reproducibly with:

$ tox

Additionally, these tests are automatically re-run with each commit in a GitHub Action.

📖 Building the Documentation

The documentation can be built locally using the following:

$ git clone git+https://github.com/BioDataFuse/pyBiodatafuse.git
$ cd pyBiodatafuse
$ tox -e docs
$ open docs/build/html/index.html

The documentation automatically installs the package as well as the docs extra specified in the setup.cfg. sphinx plugins like texext can be added there. Additionally, they need to be added to the extensions list in docs/source/conf.py.

📦 Making a Release

After installing the package in development mode and installing tox with pip install tox, the commands for making a new release are contained within the finish environment in tox.ini. Run the following from the shell:

$ tox -e finish

This script does the following:

  1. Uses Bump2Version to switch the version number in the setup.cfg, src/pyBiodatafuse/version.py, and docs/source/conf.py to not have the -dev suffix
  2. Packages the code in both a tar archive and a wheel using build
  3. Uploads to PyPI using twine. Be sure to have a .pypirc file configured to avoid the need for manual input at this step
  4. Push to GitHub. You'll need to make a release going with the commit where the version was bumped.
  5. Bump the version to the next patch. If you made big changes and want to bump the version by minor, you can use tox -e bumpversion -- minor after.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pybiodatafuse-1.2.0.tar.gz (1.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyBiodatafuse-1.2.0-py3-none-any.whl (174.4 kB view details)

Uploaded Python 3

File details

Details for the file pybiodatafuse-1.2.0.tar.gz.

File metadata

  • Download URL: pybiodatafuse-1.2.0.tar.gz
  • Upload date:
  • Size: 1.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for pybiodatafuse-1.2.0.tar.gz
Algorithm Hash digest
SHA256 439a78f16a015968ca0915d10bdc905b109084aecbba00b9d5547d208e9720a2
MD5 88dbffe06a0f63d205c51986c4f6a747
BLAKE2b-256 a16acadc7c6416eaa1ca9c46fd0b9a3b0dd0e00344511a85ed29d06fa49c3b64

See more details on using hashes here.

File details

Details for the file pyBiodatafuse-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: pyBiodatafuse-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 174.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for pyBiodatafuse-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 156c7c2624464b672e9affebd3a29a2080b09550e1d30ee3cc4807176275bd99
MD5 a91ea02b24c6f6d7eb7360f09d52007b
BLAKE2b-256 216d53353f117f1f4d13c48da74d1eb69e6295c7d8aba119f035d224d4e008c4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page