Skip to main content

ontology_loader

Project description

ontology_loader

Suite of tools to configure and load an ontology from the OboFoundary into the data object for OntologyClass as specified by NMDC schema.

Development Environment

Pre-requisites

  • =Python 3.9

  • Poetry
  • Docker
  • MongoDB
  • NMDC materialized schema
  • ENV variable for MONGO_PASSWORD (or pass it in via the cli/runner itself directly)
% docker pull mongo
% docker run -d --name mongodb-container -p 27018:27017 mongo

MongoDB Connection Settings

When connecting to MongoDB, you need to set the correct environment variables depending on where your code is running:

  1. When running from your local machine (CLI or tests):

    export MONGO_HOST=localhost
    export MONGO_PORT=27018
    export ENABLE_DB_TESTS=true
    export MONGO_PASSWORD="your_valid_password"
    
  2. When running inside Docker containers:

    export MONGO_HOST=mongo
    export MONGO_PORT=27017
    

The Docker container networking uses container names (like 'mongo') for internal communication, while your host machine must use 'localhost' with the mapped port (27018).

Basic mongosh commands

% docker ps
% docker exec -it [mongodb-container-id] bash
% mongosh mongodb://admin:root@mongo:27017/nmdc?authSource=admin
% show dbs
% use nmdc
% db.ontology_class_set.find().pretty()
% db.ontology_relation_set.find().pretty()
% db.ontology_class_set.find( { id: { $regex: /^PO/ } } ).pretty()
% db.ontology_class_set.find( { id: { $regex: /^UBERON/ } } ).pretty()
% db.ontology_class_set.find( { id: { $regex: /^ENVO/ } } ).pretty()

Command line

% poetry install
% poetry run ontology_loader --help
% poetry run ontology_loader --source-ontology "envo"
% poetry run ontology_loader --source-ontology "uberon"

Running the tests

% make test

Running the linter

% make lint

Python example usage

pip install nmdc-ontology-loader
from ontology_loader.ontology_load_controller import OntologyLoaderController
import tempfile

def load_ontology():
    """Load an ontology using the default MongoDB connection."""
    loader = OntologyLoaderController(
        source_ontology="envo",
        output_directory=tempfile.gettempdir(),
        generate_reports=True,
    )
    loader.run_ontology_loader()

Using with an existing MongoDB connection

If you already have a MongoDB connection established (e.g., in a Dagster/Dagit job), you can pass it directly to the OntologyLoaderController:

from pymongo import MongoClient
from ontology_loader.ontology_load_controller import OntologyLoaderController
import tempfile

# Use an existing MongoDB client
mongo_client = MongoClient("mongodb://admin:password@localhost:27018/nmdc?authSource=admin")

# Pass the client and database name to OntologyLoaderController
loader = OntologyLoaderController(
    source_ontology="envo",
    output_directory=tempfile.gettempdir(),
    generate_reports=True,
    mongo_client=mongo_client,  # Pass the existing client
    db_name="nmdc",  # Required when passing an existing client
)

# The loader will use the provided client instead of creating a new connection
loader.run_ontology_loader()

This approach is particularly useful when:

  • You're running in a job scheduler like Dagster/Dagit
  • You want to reuse an existing connection pool
  • You have custom MongoDB connection settings that are managed externally
  • You need to use a connection with specific authentication or configuration

Note: When passing an existing MongoDB client, you must also provide the db_name parameter to specify which database to use. This is required as the database name cannot be automatically determined from a MongoDB client instance.

Testing CRUD operations in a live MongoDB

If you want to test the CRUD operations in a live MongoDB instance, you need to set two environment variables: MONGO_PASSWORD="your_valid_password" ENABLE_DB_TESTS=true

This will allow you to run tests to actually insert/update/delete records in your MongoDB tests instance instead of simply mocking the calls. You can then run the tests with the following command:

make test

The same test command will run without the environment variables, but it will only mock the calls to the database. This is intended to help prevent accidental data loss or corruption in a live database environment and to ensure that MONGO_PASSWORD is not hardcoded in the codebase.

Reset collections in dev

docker exec -it nmdc-runtime-test-mongo-1 bash
mongosh mongodb://admin:root@mongo:27017/nmdc?authSource=admin
db.ontology_class_set.find({}).pretty()
db.ontology_relation_set.find({}).pretty()
db.biosample_set.find({}).pretty()
db.ontology_class_set.drop()
db.ontology_relation_set.drop()
db.ontology_class_set.countDocuments()
db.ontology_relation_set.countDocuments()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ontology_loader-0.2.2.tar.gz (12.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ontology_loader-0.2.2-py3-none-any.whl (14.4 kB view details)

Uploaded Python 3

File details

Details for the file ontology_loader-0.2.2.tar.gz.

File metadata

  • Download URL: ontology_loader-0.2.2.tar.gz
  • Upload date:
  • Size: 12.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.20

File hashes

Hashes for ontology_loader-0.2.2.tar.gz
Algorithm Hash digest
SHA256 4e15b0f077ab55c8ca717fa9588a778bbbcd4ed97bb9600a7cec0095a57da94c
MD5 dcd8ca8646445ce539beaac8438e7f58
BLAKE2b-256 5c0804695a2d243adbc4094e414c8684bf8a0d35e901e9a412504f4c85bd81dc

See more details on using hashes here.

File details

Details for the file ontology_loader-0.2.2-py3-none-any.whl.

File metadata

File hashes

Hashes for ontology_loader-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 86f47c2e55d8cde5365d2a081f912bd8b7beea2239fdec8943102831990ea865
MD5 cae44e66906e226d2898dcf9c889acc4
BLAKE2b-256 8a870ef1934bafff1eb03df9f5d81db7e76295911c55c92efa00a59db7d19963

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page