Skip to main content

NMDC Submission portal metadata suggestor tool, powered by AI

Project description

nmdc-metadata-suggestor-ai-tool

A Python application for the NMDC Submission portal metadata suggestor tool, powered by AI. This project uses modern Python tooling with uv for dependency management and Docker for containerization.

Prerequisites

  • Python 3.12 or higher
  • uv (or use Docker)
  • Docker and Docker Compose (for containerized development)

Quick Start

LLM Configuration:

You will need to set up a .env file:

Run

cp .env-example .env

For access via PNNL's AI Incubator set the following:

  • AI_INCUBATOR_KEY
  • AI_INCUBATOR_BASE_URL

For access via GCP set the following (this is the path to the service-account.json file that Sierra Moxon can provide):

  • GOOGLE_APPLICATION_CREDENTIALS

The LLMClient will read the appropriate variables for access_provider=pnnl or access_provider=gcp

Option 1: Using uv (Local Development)

  1. Install uv (if not already installed):

    curl -LsSf https://astral.sh/uv/install.sh | sh
    # or
    pip install uv
    
  2. Clone and setup:

    git clone https://github.com/microbiomedata/nmdc-metadata-suggestor-ai-tool.git
    cd nmdc-metadata-suggestor-ai-tool
    
  3. Install dependencies:

    uv sync
    
  4. Configure environment:

    cp .env.example .env
    # Edit .env and add your API keys
    
  5. Use the package in Python:

    uv run python
    
    from nmdc_metadata_suggestor_ai_tool.llm_client import LLMClient
    from nmdc_metadata_suggestor_ai_tool.recommendation_pipeline import run_recommendation_pipeline
    
    submission_object = {
        # NMDC submission JSON payload
    }
    
    client = LLMClient(access_provider="gcp")
    result = run_recommendation_pipeline(submission_object, client)
    print(result.model_dump())
    

Option 2: Using Docker

  1. Clone the repository:

    git clone https://github.com/microbiomedata/nmdc-metadata-suggestor-ai-tool.git
    cd nmdc-metadata-suggestor-ai-tool
    
  2. Configure environment:

    cp .env.example .env
    # Edit .env and add your API keys
    
  3. Run with Docker Compose (development):

    docker-compose up
    
  4. Or build and run production image:

    docker build -t nmdc-suggestor .
    docker run --env-file .env nmdc-suggestor
    

Development

Project Structure

nmdc-metadata-suggestor-ai-tool/
├── src/
│   └── nmdc_metadata_suggestor_ai_tool/
│       ├── __init__.py
│       ├── recommendation_pipeline.py       # Pipeline orchestration
│       ├── llm_client.py                    # LLM client for AI interactions
│       ├── cli/
│       │   ├── __init__.py
│       │   └── doi_cli.py                   # DOI operations CLI
│       ├── models/
│       │   ├── __init__.py
│       │   ├── doi.py                       # DOI data models
│       │   └── llm_output.py                # LLM output model
│       └── publication_ingestion/
│           ├── __init__.py
│           ├── download_pdf.py              # PDF retrieval logic
│           └── retreive_pdf_link.py         # PDF link discovery
├── tests/                                    # Test files
├── scripts/                                  # Vertex AI test scripts
├── docs/                                     # Documentation
├── pyproject.toml                            # Project dependencies and metadata
├── Dockerfile                                # Production Docker image
├── Dockerfile.dev                            # Development Docker image
├── docker-compose.yml                        # Docker Compose configuration
├── .env.example                              # Example environment variables
└── README.md                                 # This file

Running Tests

# Run all tests
uv run pytest

# Run with coverage
uv run pytest --cov=src/nmdc_metadata_suggestor

# Run specific test file
uv run pytest tests/test_example.py

Code Quality

# Format code with Ruff
uv run ruff format

# Lint with Ruff
uv run ruff check

# Type check with MyPy
uv run mypy src

Adding Dependencies

# Add a production dependency
uv add package-name

# Add a development dependency
uv add --dev package-name

# Update dependencies
uv sync

Configuration

Configuration is managed through environment variables or a .env file. See .env.example for available options:

  • DEFAULT_MODEL: Default LLM model to use
  • MAX_TOKENS: Maximum tokens for LLM responses
  • TEMPERATURE: Temperature for LLM responses (0.0-1.0)

Docker Development Workflow

Interactive Development

For interactive development with hot-reload:

# Start container in background
docker-compose up -d

# Execute commands in the container
docker-compose exec app uv run pytest
docker-compose exec app uv run ruff format

# Access shell
docker-compose exec app bash

# Stop container
docker-compose down

Production Build

# Build production image
docker build -t nmdc-suggestor:latest .

# Run production container
docker run --env-file .env nmdc-suggestor:latest

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes
  4. Run tests and quality checks
  5. Commit your changes (git commit -m 'Add amazing feature')
  6. Push to the branch (git push origin feature/amazing-feature)
  7. Open a Pull Request

License

See LICENSE for licensing terms.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nmdc_metadata_suggestor_ai_tool-1.1.tar.gz (77.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nmdc_metadata_suggestor_ai_tool-1.1-py3-none-any.whl (62.9 kB view details)

Uploaded Python 3

File details

Details for the file nmdc_metadata_suggestor_ai_tool-1.1.tar.gz.

File metadata

File hashes

Hashes for nmdc_metadata_suggestor_ai_tool-1.1.tar.gz
Algorithm Hash digest
SHA256 91d31256aac815cdb5b2542074ca6595fac0298b06dd650be27d3745fe7ec31a
MD5 531a239ea3b7a5c94d44ff8c98526ed6
BLAKE2b-256 e64b4057e700e97375cd91d963d9348eb0176041f09288d627702755c06465aa

See more details on using hashes here.

File details

Details for the file nmdc_metadata_suggestor_ai_tool-1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for nmdc_metadata_suggestor_ai_tool-1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5915d70a288685d7b9e4c0a636beced9ab9c17506fbd6c8a2339bc8c4224d75f
MD5 c87e1bdfce6129cd444e2c9397911fe5
BLAKE2b-256 d49176156a31d0146a3d9055a4813d606a1b362f4de0fe4bb9615828298e61b7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page