Skip to main content

Python SDK for DATAQUERY Data API - Query, download, and check availability of economic data files

Project description

DataQuery SDK

Professional Python SDK for the DataQuery API - High-performance data access with parallel downloads, time series queries, and seamless OAuth 2.0 authentication.

Python 3.10+ License: MIT Code style: black

Features

  • High-Performance Downloads: Parallel file downloads with automatic retry and progress tracking
  • Time Series Queries: Query data by expressions, instruments, or groups with flexible filtering
  • OAuth 2.0 Authentication: Automatic token management and refresh
  • Connection Pooling: Optimized HTTP connections with configurable rate limiting
  • Pandas Integration: Direct conversion to DataFrames for analysis
  • Async & Sync APIs: Use async/await or synchronous methods based on your needs

Installation

pip install dataquery-sdk

Quick Start

1. Configure Credentials

Set your API credentials as environment variables:

export DATAQUERY_CLIENT_ID="your_client_id"
export DATAQUERY_CLIENT_SECRET="your_client_secret"

Or create a .env file in your project directory:

DATAQUERY_CLIENT_ID=your_client_id
DATAQUERY_CLIENT_SECRET=your_client_secret

2. Download Files

Synchronous (Python Scripts)

from dataquery import DataQuery

# Download all files for a date range
with DataQuery() as dq:
    results = dq.run_group_download(
        group_id="JPMAQS_GENERIC_RETURNS",
        start_date="20250101",
        end_date="20250131",
        destination_dir="./data"
    )
    print(f"Downloaded {results['successful_downloads']} files")

Asynchronous (Jupyter Notebooks)

from dataquery import DataQuery

# Download all files for a date range
async with DataQuery() as dq:
    results = await dq.run_group_download_async(
        group_id="JPMAQS_GENERIC_RETURNS",
        start_date="20250101",
        end_date="20250131",
        destination_dir="./data"
    )
    print(f"Downloaded {results['successful_downloads']} files")

3. Query Time Series Data

from dataquery import DataQuery

async with DataQuery() as dq:
    # Query by expression
    result = await dq.get_expressions_time_series_async(
        expressions=["DB(MTE,IRISH EUR 1.100 15-May-2029 LON,,IE00BH3SQ895,MIDPRC)"],
        start_date="20240101",
        end_date="20240131"
    )
    
    # Convert to pandas DataFrame
    df = dq.to_dataframe(result)
    print(df.head())

4. Discover Available Data

from dataquery import DataQuery

async with DataQuery() as dq:
    # List all available groups
    groups = await dq.list_groups_async(limit=100)
    
    # Convert to DataFrame for easy viewing
    groups_df = dq.to_dataframe(groups)
    print(groups_df[['group_id', 'group_name', 'description']])

Common Use Cases

Download Single File

from dataquery import DataQuery
from pathlib import Path

async with DataQuery() as dq:
    result = await dq.download_file_async(
        file_group_id="JPMAQS_GENERIC_RETURNS",
        file_datetime="20250115",
        destination_path=Path("./downloads")
    )
    print(f"Downloaded: {result.local_path}")

Query with Filters

async with DataQuery() as dq:
    # Get time series for Ireland bonds only
    result = await dq.get_group_time_series_async(
        group_id="FI_GO_BO_EA",
        attributes=["MIDPRC", "REPO_1M"],
        filter="country(IRL)",
        start_date="20240101",
        end_date="20240131"
    )
    
    df = dq.to_dataframe(result)

Search for Instruments

async with DataQuery() as dq:
    # Search for instruments by keywords
    results = await dq.search_instruments_async(
        group_id="FI_GO_BO_EA",
        keywords="irish"
    )
    
    # Use the results to query time series
    instrument_ids = [inst.instrument_id for inst in results.instruments[:5]]
    data = await dq.get_instrument_time_series_async(
        instruments=instrument_ids,
        attributes=["MIDPRC"],
        start_date="20240101",
        end_date="20240131"
    )

Performance Optimization

Parallel Downloads

async with DataQuery() as dq:
    # Download multiple files concurrently with parallel chunks
    results = await dq.run_group_download_async(
        group_id="JPMAQS_GENERIC_RETURNS",
        start_date="20250101",
        end_date="20250131",
        destination_dir="./data",
        max_concurrent=5,  # Download 5 files simultaneously
        num_parts=4        # Split each file into 4 parallel chunks
    )

Recommended Settings:

  • max_concurrent: 3-5 (concurrent file downloads)
  • num_parts: 2-8 (parallel chunks per file)

Rate Limiting

Configure rate limits to avoid API throttling:

from dataquery import DataQuery, ClientConfig

config = ClientConfig(
    client_id="your_client_id",
    client_secret="your_client_secret",
    rate_limit_rpm=300,  # Requests per minute
    max_retries=3,
    timeout=60.0
)

async with DataQuery(config=config) as dq:
    # Your code here
    pass

Configuration

Environment Variables

# Required
DATAQUERY_CLIENT_ID=your_client_id
DATAQUERY_CLIENT_SECRET=your_client_secret

# Optional - API Endpoints
DATAQUERY_BASE_URL=https://api-developer.jpmorgan.com
DATAQUERY_FILES_BASE_URL=https://api-dataquery.jpmchase.com

# Optional - Performance
DATAQUERY_MAX_RETRIES=3
DATAQUERY_TIMEOUT=60
DATAQUERY_RATE_LIMIT_RPM=300

Programmatic Configuration

from dataquery import DataQuery, ClientConfig

config = ClientConfig(
    client_id="your_client_id",
    client_secret="your_client_secret",
    base_url="https://api-developer.jpmorgan.com",
    max_retries=3,
    timeout=60.0,
    rate_limit_rpm=300
)

async with DataQuery(config=config) as dq:
    # Your code here
    pass

Error Handling

from dataquery import DataQuery
from dataquery.exceptions import (
    DataQueryError,
    AuthenticationError,
    NotFoundError,
    RateLimitError
)

async def safe_query():
    try:
        async with DataQuery() as dq:
            result = await dq.get_expressions_time_series_async(
                expressions=["DB(...)"],
                start_date="20240101",
                end_date="20240131"
            )
            return result
    except AuthenticationError as e:
        print(f"Authentication failed: {e}")
    except NotFoundError as e:
        print(f"Resource not found: {e}")
    except RateLimitError as e:
        print(f"Rate limit exceeded: {e}")
    except DataQueryError as e:
        print(f"API error: {e}")
    except Exception as e:
        print(f"Unexpected error: {e}")

Date Formats

Absolute Dates

start_date="20240101"  # YYYYMMDD format
end_date="20241231"

Relative Dates

start_date="TODAY"      # Today
start_date="TODAY-1D"   # Yesterday
start_date="TODAY-1W"   # 1 week ago
start_date="TODAY-1M"   # 1 month ago
start_date="TODAY-1Y"   # 1 year ago

Calendar Conventions

Calendar Description Use Case
CAL_WEEKDAYS Monday-Friday International data (recommended)
CAL_USBANK US banking days US-only data (default)
CAL_WEEKDAY_NOHOLIDAY All weekdays Generic business days
CAL_DEFAULT Calendar day Include weekends

Examples

The examples/ directory contains comprehensive examples:

  • File Downloads: Single file, batch downloads, availability checks
  • Time Series: Expressions, instruments, groups with filters
  • Discovery: Search instruments, list groups, get attributes
  • Advanced: Grid data, auto-download, custom progress tracking

Run an example:

python examples/files/download_file.py
python examples/expressions/get_expressions_time_series.py

CLI Usage

The SDK includes a command-line interface:

# Download files
dataquery download --group-id JPMAQS_GENERIC_RETURNS \
                   --start-date 20250101 \
                   --end-date 20250131 \
                   --destination ./data

# List groups
dataquery list-groups --limit 100

# Check file availability
dataquery check-availability --file-group-id JPMAQS_GENERIC_RETURNS \
                             --date 20250115

API Reference

Core Methods

File Downloads

  • download_file_async() - Download a single file
  • run_group_download_async() - Download all files in a date range
  • list_available_files_async() - Check file availability

Time Series Queries

  • get_expressions_time_series_async() - Query by expression
  • get_instrument_time_series_async() - Query by instrument ID
  • get_group_time_series_async() - Query entire group with filters

Discovery

  • list_groups_async() - List available data groups
  • search_instruments_async() - Search for instruments
  • list_instruments_async() - List all instruments in a group
  • get_group_attributes_async() - Get available attributes
  • get_group_filters_async() - Get available filters

Utilities

  • to_dataframe() - Convert any response to pandas DataFrame
  • health_check_async() - Check API health
  • get_stats() - Get connection and rate limit statistics

For detailed API documentation, see the API Reference.

Requirements

  • Python 3.10 or higher
  • Dependencies:
    • aiohttp>=3.8.0 - Async HTTP client
    • pydantic>=2.0.0 - Data validation
    • structlog>=23.0.0 - Structured logging
    • python-dotenv>=1.0.0 - Environment variable management

Optional:

  • pandas>=2.0.0 - For DataFrame conversion

Development

Setup Development Environment

# Clone the repository
git clone https://github.com/dataquery/dataquery-sdk.git
cd dataquery-sdk

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install development dependencies
pip install -e ".[dev]"

# Install pre-commit hooks
pre-commit install

Run Tests

# Run all tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=dataquery --cov-report=html

# Run specific test file
pytest tests/test_client.py -v

Code Quality

# Format code
black dataquery/ tests/

# Check linting
flake8 dataquery/ tests/ examples/

# Type checking
mypy dataquery/

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

For issues and questions:

Changelog

See CHANGELOG.md for version history and release notes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataquery_sdk-0.1.4.tar.gz (97.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dataquery_sdk-0.1.4-py3-none-any.whl (97.5 kB view details)

Uploaded Python 3

File details

Details for the file dataquery_sdk-0.1.4.tar.gz.

File metadata

  • Download URL: dataquery_sdk-0.1.4.tar.gz
  • Upload date:
  • Size: 97.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for dataquery_sdk-0.1.4.tar.gz
Algorithm Hash digest
SHA256 9b2b5a7f122e6cf991788185272ca862bba5a41573ccbb463801424bf3f3b193
MD5 f10b0701c5edc1b4d8cd5724b2e9596c
BLAKE2b-256 546a4c8db7424d59c9a8ed05243feaded81d73f7ec77aa13332bf17435763308

See more details on using hashes here.

File details

Details for the file dataquery_sdk-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: dataquery_sdk-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 97.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for dataquery_sdk-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 467b91906a634809fc0ebcd6fe8cdbca047a652d8efd0b1540aaad635de262bb
MD5 21df4d9b3696ed1f45317dadbbbc3190
BLAKE2b-256 edf2e16fb17c736769270469a61fa844b14453f773058f35bc5b4065eb9314b9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page