NMDC Submission portal metadata suggestor tool, powered by AI

These details have not been verified by PyPI

Project links

Project description

nmdc-metadata-suggestor-ai-tool

A Python application for the NMDC Submission portal metadata suggestor tool, powered by AI. This project uses modern Python tooling with uv for dependency management and Docker for containerization.

Prerequisites

Python 3.12 or higher
uv (or use Docker)
Docker and Docker Compose (for containerized development)

Quick Start

LLM Configuration:

You will need to set up a .env file. Copy the example first:

cp .env-example .env

Environment variables used by LLMClient and ConversationManager:

AI_INCUBATOR_KEY: API key for PNNL AI Incubator (when using access_provider=pnnl).
AI_INCUBATOR_BASE_URL: Base URL for the PNNL AI Incubator API.
GOOGLE_APPLICATION_CREDENTIALS: Path to a GCP service account JSON file (for Vertex AI).
VERTEX_PROJECT_ID: (Optional) GCP project id for Vertex. If not provided, the SDK will attempt to infer it from credentials.
GEMINI_REGION: (Optional) GCP region for Gemini/Vertex (defaults to us-east5 or CLOUD_ML_REGION).
CBORG_KEY: API key for CBORG (when using access_provider=cborg).
CBORG_BASE_URL: Base URL for the CBORG API.

The LLMClient will read the appropriate variables depending on access_provider (set to pnnl, cborg, or gcp).

Environment variables are loaded from a .env file in the project root via python-dotenv. Variables already set in your shell take precedence over .env values (override=False is the default).

Option 1: Using uv (Local Development)

Install uv (if not already installed):

curl -LsSf https://astral.sh/uv/install.sh | sh
# or
pip install uv

Clone and setup:

git clone https://github.com/microbiomedata/nmdc-metadata-suggestor-ai-tool.git
cd nmdc-metadata-suggestor-ai-tool

Install dependencies:
```
uv sync
```

Configure environment:

cp .env.example .env
# Edit .env and add your API keys

Use the package in Python:

uv run python

from nmdc_metadata_suggestor_ai_tool.llm_client import LLMClient
from nmdc_metadata_suggestor_ai_tool.recommendation_pipeline import run_recommendation_pipeline

submission_object = {
    # NMDC submission JSON payload
}

client = LLMClient(access_provider="gcp")
result = run_recommendation_pipeline(submission_object, client)
print(result.model_dump())

Advanced: direct ConversationManager usage (optional)

from nmdc_metadata_suggestor_ai_tool.llm_client import LLMClient, ConversationManager

client = LLMClient(access_provider="gcp")
conversation = ConversationManager(llm_client=client)
# Add plain text context (pdf_files may be a list of local PDF paths)
conversation.add_message(text="Please summarize the submission.", pdf_files=None)
# Add any schema context to guide the model
conversation.add_schema_context("<schema description here>")
response = conversation.generate(model="gemini-2.5-flash", max_tokens=1024, gemini_temperature=0.2)
print(response)

Option 2: Using Docker

Clone the repository:

git clone https://github.com/microbiomedata/nmdc-metadata-suggestor-ai-tool.git
cd nmdc-metadata-suggestor-ai-tool

Configure environment:

cp .env.example .env
# Edit .env and add your API keys

Run with Docker Compose (development):
```
docker-compose up
```

Or build and run production image:

docker build -t nmdc-suggestor .
docker run --env-file .env nmdc-suggestor

Development

Project Structure

nmdc-metadata-suggestor-ai-tool/
├── src/
│   └── nmdc_metadata_suggestor_ai_tool/
│       ├── __init__.py
│       ├── recommendation_pipeline.py       # Pipeline orchestration
│       ├── llm_client.py                    # LLM client for AI interactions
│       ├── cli/
│       │   ├── __init__.py
│       │   └── doi_cli.py                   # DOI operations CLI
│       ├── models/
│       │   ├── __init__.py
│       │   ├── doi.py                       # DOI data models
│       │   └── llm_output.py                # LLM output model
│       └── publication_ingestion/
│           ├── __init__.py
│           ├── download_pdf.py              # PDF retrieval logic
│           └── retreive_pdf_link.py         # PDF link discovery
├── tests/                                    # Test files
├── scripts/                                  # Vertex AI test scripts
├── docs/                                     # Documentation
├── pyproject.toml                            # Project dependencies and metadata
├── Dockerfile                                # Production Docker image
├── Dockerfile.dev                            # Development Docker image
├── docker-compose.yml                        # Docker Compose configuration
├── .env.example                              # Example environment variables
└── README.md                                 # This file

Running Tests

# Run all tests
uv run pytest

# Run with coverage
uv run pytest --cov=src/nmdc_metadata_suggestor_ai_tool

# Run specific test file
uv run pytest tests/test_example.py

Code Quality

# Format code with Ruff
uv run ruff format

# Lint with Ruff
uv run ruff check

# Type check with MyPy
uv run mypy src

Adding Dependencies

# Add a production dependency
uv add package-name

# Add a development dependency
uv add --dev package-name

# Update dependencies
uv sync

Configuration

Configuration is managed through environment variables or a .env file. See .env.example for available options:

DEFAULT_MODEL: Default LLM model to use
MAX_TOKENS: Maximum tokens for LLM responses
TEMPERATURE: Temperature for LLM responses (0.0-1.0)

Docker Development Workflow

Interactive Development

For interactive development with hot-reload:

# Start container in background
docker-compose up -d

# Execute commands in the container
docker-compose exec app uv run pytest
docker-compose exec app uv run ruff format

# Access shell
docker-compose exec app bash

# Stop container
docker-compose down

Production Build

# Build production image
docker build -t nmdc-suggestor:latest .

# Run production container
docker run --env-file .env nmdc-suggestor:latest

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes
Run tests and quality checks
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

See LICENSE for licensing terms.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.2.0

May 6, 2026

1.1.2

Apr 29, 2026

This version

1.1.1

Apr 13, 2026

1.1

Mar 23, 2026

1.0.1

Mar 19, 2026

1.0.0

Mar 19, 2026

0.1.0

Mar 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nmdc_metadata_suggestor_ai_tool-1.1.1.tar.gz (78.6 kB view details)

Uploaded Apr 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nmdc_metadata_suggestor_ai_tool-1.1.1-py3-none-any.whl (63.6 kB view details)

Uploaded Apr 13, 2026 Python 3

File details

Details for the file nmdc_metadata_suggestor_ai_tool-1.1.1.tar.gz.

File metadata

Download URL: nmdc_metadata_suggestor_ai_tool-1.1.1.tar.gz
Upload date: Apr 13, 2026
Size: 78.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for nmdc_metadata_suggestor_ai_tool-1.1.1.tar.gz
Algorithm	Hash digest
SHA256	`96f4f64cec5e562b869bba5e4dd779ebd04bc38d5faa2d64d06fb0f914fb6bec`
MD5	`57233d6f77fd1eb6e7f99e0f0c1166ce`
BLAKE2b-256	`658843e6b121c872ca811ade5a0443812bda9a9ef64128595e6a015eff9bfe06`

See more details on using hashes here.

File details

Details for the file nmdc_metadata_suggestor_ai_tool-1.1.1-py3-none-any.whl.

File metadata

Download URL: nmdc_metadata_suggestor_ai_tool-1.1.1-py3-none-any.whl
Upload date: Apr 13, 2026
Size: 63.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for nmdc_metadata_suggestor_ai_tool-1.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`03fbf7b50fe6e7e698ef8252b56e418142da72aa11758d0d2fad54b03437816d`
MD5	`92ac677c14b24d3fcbe4260ed9f3f35a`
BLAKE2b-256	`35a1152654a397908345c64100ba2f0da67c4d2456655ccb43f75e31a5209ff0`

See more details on using hashes here.

nmdc-metadata-suggestor-ai-tool 1.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

nmdc-metadata-suggestor-ai-tool

Prerequisites

Quick Start

LLM Configuration:

Option 1: Using uv (Local Development)

Option 2: Using Docker

Development

Project Structure

Running Tests

Code Quality

Adding Dependencies

Configuration

Docker Development Workflow

Interactive Development

Production Build

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes