Unified Python client for Alliance of Genome Resources (AGR) curation APIs
Project description
AGR Curation API Client
A unified Python client for Alliance of Genome Resources (AGR) curation APIs.
Features
- Unified Interface: Single client for all AGR curation API endpoints
- Multiple Data Sources: Supports REST API, GraphQL, and direct database access
- Type Safety: Full type hints and Pydantic models for request/response validation
- Fuzzy Search: Partial matching and synonym search for genes, alleles, and other entities
- Retry Logic: Automatic retry with exponential backoff for transient failures
- Authentication: Support for API key and Okta token authentication
- Async Support: Built on httpx for both sync and async operations
- Comprehensive Error Handling: Detailed exceptions for different error scenarios
Installation
pip install agr-curation-api-client
For development:
git clone https://github.com/alliance-genome/agr_curation_api_client.git
cd agr_curation_api_client
make install-dev
Authentication
The client supports automatic Okta token generation using the same environment variables as other AGR services:
export OKTA_DOMAIN="your-okta-domain"
export OKTA_API_AUDIENCE="your-api-audience"
export OKTA_CLIENT_ID="your-client-id"
export OKTA_CLIENT_SECRET="your-client-secret"
With these environment variables set, the client will automatically obtain an authentication token when initialized.
Quick Start
Basic Usage
from agr_curation_api import AGRCurationAPIClient, APIConfig
# Option 1: Automatic authentication (requires OKTA env vars)
client = AGRCurationAPIClient()
# Option 2: Manual token configuration
config = APIConfig(
base_url="https://curation.alliancegenome.org/api",
okta_token="your-okta-token" # Optional - will auto-retrieve if not provided
)
client = AGRCurationAPIClient(config)
# Use the client
with client:
# Get genes from WormBase
genes = client.get_genes(data_provider="WB", limit=10)
for gene in genes:
symbol = gene.gene_symbol.get("displayText", "") if gene.gene_symbol else ""
print(f"{gene.curie}: {symbol}")
Data Source Selection
The client supports multiple data access methods with automatic fallback. By default, it tries database access first (fastest), then GraphQL, then REST API.
# Automatic data source selection (default behavior)
client = AGRCurationAPIClient()
genes = client.get_genes(taxon="NCBITaxon:6239", limit=100) # Uses database if available
# Force specific data source
client = AGRCurationAPIClient(data_source="db") # Database only
client = AGRCurationAPIClient(data_source="graphql") # GraphQL only
client = AGRCurationAPIClient(data_source="api") # REST API only
# Direct database access for advanced queries
gene = client.db.get_gene("WB:WBGene00001234")
allele = client.db.get_allele("WB:WBVar00001234")
Working with Genes
from agr_curation_api import AGRCurationAPIClient, Gene
# Use default configuration
client = AGRCurationAPIClient()
# Get genes from a specific data provider
wb_genes = client.get_genes(data_provider="WB", limit=100)
print(f"Found {len(wb_genes)} WormBase genes")
# Get a specific gene by ID (works with database, GraphQL, or API)
gene = client.get_gene("WB:WBGene00001234")
if gene:
print(f"Gene: {gene.gene_symbol}")
print(f"Full name: {gene.gene_full_name}")
print(f"Species: {gene.taxon}")
# Get all genes (paginated)
all_genes = client.get_genes(limit=5000, page=0)
Working with Species
# Get all species
species_list = client.get_species()
for species in species_list:
print(f"{species.abbreviation}: {species.display_name}")
# Find a specific species
wb_species = [s for s in species_list if s.abbreviation == "WB"]
if wb_species:
print(f"WormBase: {wb_species[0].full_name}")
Working with Ontology Terms
# Get GO term root nodes
go_roots = client.get_ontology_root_nodes("goterm")
print(f"Found {len(go_roots)} GO root terms")
# Get children of a specific GO term
children = client.get_ontology_node_children("GO:0008150", "goterm") # biological_process
for child in children:
print(f"{child.curie}: {child.name}")
# Get disease ontology terms
disease_roots = client.get_ontology_root_nodes("doterm")
# Get anatomical terms
anatomy_roots = client.get_ontology_root_nodes("anatomicalterm")
Working with Expression Annotations
# Get expression annotations for WormBase
wb_expressions = client.get_expression_annotations(
data_provider="WB",
limit=100
)
for expr in wb_expressions:
if expr.expression_annotation_subject:
gene_id = expr.expression_annotation_subject.get("primaryExternalId")
gene_symbol = expr.expression_annotation_subject.get("geneSymbol", {}).get("displayText")
print(f"Gene: {gene_id} ({gene_symbol})")
if expr.expression_pattern:
anatomy = expr.expression_pattern.get("whereExpressed", {}).get("anatomicalStructure", {}).get("curie")
print(f" Expressed in: {anatomy}")
Working with Alleles
# Get alleles from a specific data provider
wb_alleles = client.get_alleles(data_provider="WB", limit=50)
for allele in wb_alleles:
symbol = allele.allele_symbol.get("displayText", "") if allele.allele_symbol else ""
print(f"{allele.curie}: {symbol}")
# Get a specific allele by ID (works with database, GraphQL, or API)
allele = client.get_allele("WB:WBVar00001234")
if allele:
print(f"Allele: {allele.allele_symbol}")
print(f"Full name: {allele.allele_full_name}")
print(f"Extinction status: {allele.is_extinct}")
Searching Entities
The client provides two search methods with different strengths:
Fuzzy Search (search_entities_fuzzy): Best for user-friendly, autocomplete-style searching
- Database-only (direct SQL queries)
- Partial text matching: "rut" finds "rutabaga", "RUT", "rut-1"
- Automatically searches symbols, full names, and synonyms
- Returns results with relevance scoring (exact > starts-with > contains)
- Use when: searching by partial names, building autocomplete, finding entities by common/historical names
- Supported types: gene, allele, agm, strain, genotype, fish
Generic Search (search_entities): Best for programmatic queries with known field structures
- REST API-based with structured filters
- Exact field matching with precise field paths (e.g., "geneSymbol.displayText")
- Supports complex boolean filter logic
- Returns complete entity objects
- Use when: you know exact field names, need complex filters, require full entity data
- Supports all entity types available in the API
Fuzzy Search
# Search for genes with partial matching
# Example: Find genes containing "rut" in Drosophila
results = client.db.search_entities_fuzzy(
entity_type='gene',
search_pattern='rut',
taxon_curie='NCBITaxon:7227',
include_synonyms=True,
limit=10
)
for result in results:
print(f"{result['entity_curie']}: {result['entity']}")
print(f" Match type: {result['match_type']}") # exact, starts_with, or contains
print(f" Relevance: {result['relevance']}") # 1 (best) to 3 (least)
# Search for alleles without synonyms
allele_results = client.db.search_entities_fuzzy(
entity_type='allele',
search_pattern='daf',
taxon_curie='NCBITaxon:6239',
include_synonyms=False,
limit=20
)
# Supported entity types: 'gene', 'allele', 'agm', 'strain', 'genotype', 'fish'
# Results are ordered by relevance (exact matches first, then starts-with, then contains)
Generic Search
# Generic entity search
search_filters = {
"dataProvider.abbreviation": "WB",
"geneSymbol.displayText": "daf-16"
}
results = client.search_entities(
entity_type="gene",
search_filters=search_filters,
limit=10
)
print(f"Total results: {results.total_results}")
print(f"Returned: {results.returned_records}")
for gene_data in results.results:
print(f"Found gene: {gene_data}")
Error Handling
from agr_curation_api import (
AGRAPIError,
AGRAuthenticationError,
AGRConnectionError,
AGRTimeoutError,
AGRValidationError
)
try:
reference = client.get_reference("invalid-id")
except AGRAuthenticationError:
print("Authentication failed - check your credentials")
except AGRValidationError as e:
print(f"Invalid data: {e}")
except AGRTimeoutError:
print("Request timed out - try again later")
except AGRConnectionError:
print("Connection failed - check network")
except AGRAPIError as e:
print(f"API error: {e}")
if e.status_code:
print(f"Status code: {e.status_code}")
Configuration Options
The APIConfig class supports the following options:
base_url: Base URL for the A-Team Curation API (default: "https://curation.alliancegenome.org/api")okta_token: Okta bearer token for authentication (auto-retrieved if not provided)timeout: Request timeout in seconds (default: 30)max_retries: Maximum retry attempts (default: 3)retry_delay: Initial delay between retries in seconds (default: 1)verify_ssl: Whether to verify SSL certificates (default: True)headers: Additional headers to include in requests
Environment Variables
The client uses the following environment variables for configuration:
ATEAM_API: Override the default A-Team API URL (default: uses production curation API)OKTA_DOMAIN: Your Okta domain (required for automatic authentication)OKTA_API_AUDIENCE: Your API audience (required for automatic authentication)OKTA_CLIENT_ID: Your Okta client ID (required for automatic authentication)OKTA_CLIENT_SECRET: Your Okta client secret (required for automatic authentication)
Development
Running Tests
make test
Code Quality
# Run linting
make lint
# Run type checking
make type-check
# Format code
make format
# Run all checks
make check
Building Documentation
cd docs
make html
Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'feat: add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Support
- Issues: GitHub Issues
- Documentation: API Documentation
- Contact: software@alliancegenome.org
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agr_curation_api_client-0.7.5.tar.gz.
File metadata
- Download URL: agr_curation_api_client-0.7.5.tar.gz
- Upload date:
- Size: 48.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3459e3015165752520f73a3dcf552627b653a465d3b10d85244e1a84649b54bd
|
|
| MD5 |
7cf427ef91ae07208e7ecb9904980b4f
|
|
| BLAKE2b-256 |
c99b5bfaa1d71a70173d517ac4b5875c522b13e02914b129f5c104b18486c04a
|
File details
Details for the file agr_curation_api_client-0.7.5-py3-none-any.whl.
File metadata
- Download URL: agr_curation_api_client-0.7.5-py3-none-any.whl
- Upload date:
- Size: 39.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b957a6767d42c12adec185aaa82a3121ae2ec5e8e074a8a62e64690a10eccbf5
|
|
| MD5 |
8811a998187ed77658b3ed95d48e5866
|
|
| BLAKE2b-256 |
b19ceddd37abbf4ba07101a6c2b213a6b3f3b25020d4a317834c398664378059
|