Skip to main content

Python SDK for DATAQUERY Data API - Query, download, and check availability of economic data files

Project description

DataQuery SDK

Professional Python SDK for the DataQuery API - High-performance data access with parallel downloads, time series queries, and seamless OAuth 2.0 authentication.

Python 3.10+ License: MIT Code style: black

Features

  • High-Performance Downloads: Parallel file downloads with automatic retry and progress tracking
  • Time Series Queries: Query data by expressions, instruments, or groups with flexible filtering
  • OAuth 2.0 Authentication: Automatic token management and refresh
  • Connection Pooling: Optimized HTTP connections with configurable rate limiting
  • Pandas Integration: Direct conversion to DataFrames for analysis
  • Async & Sync APIs: Use async/await or synchronous methods based on your needs

Installation

pip install dataquery-sdk

Quick Start

1. Configure Credentials

Set your API credentials as environment variables:

export DATAQUERY_CLIENT_ID="your_client_id"
export DATAQUERY_CLIENT_SECRET="your_client_secret"

Or create a .env file in your project directory:

DATAQUERY_CLIENT_ID=your_client_id
DATAQUERY_CLIENT_SECRET=your_client_secret

2. Download Files

Synchronous (Python Scripts)

from dataquery import DataQuery

# Download all files for a date range
with DataQuery() as dq:
    results = dq.run_group_download(
        group_id="JPMAQS_GENERIC_RETURNS",
        start_date="20250101",
        end_date="20250131",
        destination_dir="./data"
    )
    print(f"Downloaded {results['successful_downloads']} files")

Asynchronous (Jupyter Notebooks)

from dataquery import DataQuery

# Download all files for a date range
async with DataQuery() as dq:
    results = await dq.run_group_download_async(
        group_id="JPMAQS_GENERIC_RETURNS",
        start_date="20250101",
        end_date="20250131",
        destination_dir="./data"
    )
    print(f"Downloaded {results['successful_downloads']} files")

3. Query Time Series Data

from dataquery import DataQuery

async with DataQuery() as dq:
    # Query by expression
    result = await dq.get_expressions_time_series_async(
        expressions=["DB(MTE,IRISH EUR 1.100 15-May-2029 LON,,IE00BH3SQ895,MIDPRC)"],
        start_date="20240101",
        end_date="20240131"
    )
    
    # Convert to pandas DataFrame
    df = dq.to_dataframe(result)
    print(df.head())

4. Discover Available Data

from dataquery import DataQuery

async with DataQuery() as dq:
    # List all available groups
    groups = await dq.list_groups_async(limit=100)
    
    # Convert to DataFrame for easy viewing
    groups_df = dq.to_dataframe(groups)
    print(groups_df[['group_id', 'group_name', 'description']])

Common Use Cases

Download Single File

from dataquery import DataQuery
from pathlib import Path

async with DataQuery() as dq:
    result = await dq.download_file_async(
        file_group_id="JPMAQS_GENERIC_RETURNS",
        file_datetime="20250115",
        destination_path=Path("./downloads")
    )
    print(f"Downloaded: {result.local_path}")

Query with Filters

async with DataQuery() as dq:
    # Get time series for Ireland bonds only
    result = await dq.get_group_time_series_async(
        group_id="FI_GO_BO_EA",
        attributes=["MIDPRC", "REPO_1M"],
        filter="country(IRL)",
        start_date="20240101",
        end_date="20240131"
    )
    
    df = dq.to_dataframe(result)

Search for Instruments

async with DataQuery() as dq:
    # Search for instruments by keywords
    results = await dq.search_instruments_async(
        group_id="FI_GO_BO_EA",
        keywords="irish"
    )
    
    # Use the results to query time series
    instrument_ids = [inst.instrument_id for inst in results.instruments[:5]]
    data = await dq.get_instrument_time_series_async(
        instruments=instrument_ids,
        attributes=["MIDPRC"],
        start_date="20240101",
        end_date="20240131"
    )

Performance Optimization

Parallel Downloads

async with DataQuery() as dq:
    # Download multiple files concurrently with parallel chunks
    results = await dq.run_group_download_async(
        group_id="JPMAQS_GENERIC_RETURNS",
        start_date="20250101",
        end_date="20250131",
        destination_dir="./data",
        max_concurrent=5,  # Download 5 files simultaneously
        num_parts=4        # Split each file into 4 parallel chunks
    )

Recommended Settings:

  • max_concurrent: 3-5 (concurrent file downloads)
  • num_parts: 2-8 (parallel chunks per file)

Rate Limiting

Configure rate limits to avoid API throttling:

from dataquery import DataQuery, ClientConfig

config = ClientConfig(
    client_id="your_client_id",
    client_secret="your_client_secret",
    rate_limit_rpm=300,  # Requests per minute
    max_retries=3,
    timeout=60.0
)

async with DataQuery(config=config) as dq:
    # Your code here
    pass

Configuration

Environment Variables

# Required
DATAQUERY_CLIENT_ID=your_client_id
DATAQUERY_CLIENT_SECRET=your_client_secret

# Optional - API Endpoints
DATAQUERY_BASE_URL=https://api-developer.jpmorgan.com
DATAQUERY_FILES_BASE_URL=https://api-strm-gw01.jpmchase.com

# Optional - Performance
DATAQUERY_MAX_RETRIES=3
DATAQUERY_TIMEOUT=60
DATAQUERY_RATE_LIMIT_RPM=300

Programmatic Configuration

from dataquery import DataQuery, ClientConfig

config = ClientConfig(
    client_id="your_client_id",
    client_secret="your_client_secret",
    base_url="https://api-developer.jpmorgan.com",
    max_retries=3,
    timeout=60.0,
    rate_limit_rpm=300
)

async with DataQuery(config=config) as dq:
    # Your code here
    pass

Error Handling

from dataquery import DataQuery
from dataquery.exceptions import (
    DataQueryError,
    AuthenticationError,
    NotFoundError,
    RateLimitError
)

async def safe_query():
    try:
        async with DataQuery() as dq:
            result = await dq.get_expressions_time_series_async(
                expressions=["DB(...)"],
                start_date="20240101",
                end_date="20240131"
            )
            return result
    except AuthenticationError as e:
        print(f"Authentication failed: {e}")
    except NotFoundError as e:
        print(f"Resource not found: {e}")
    except RateLimitError as e:
        print(f"Rate limit exceeded: {e}")
    except DataQueryError as e:
        print(f"API error: {e}")
    except Exception as e:
        print(f"Unexpected error: {e}")

Date Formats

Absolute Dates

start_date="20240101"  # YYYYMMDD format
end_date="20241231"

Relative Dates

start_date="TODAY"      # Today
start_date="TODAY-1D"   # Yesterday
start_date="TODAY-1W"   # 1 week ago
start_date="TODAY-1M"   # 1 month ago
start_date="TODAY-1Y"   # 1 year ago

Calendar Conventions

Calendar Description Use Case
CAL_WEEKDAYS Monday-Friday International data (recommended)
CAL_USBANK US banking days US-only data (default)
CAL_WEEKDAY_NOHOLIDAY All weekdays Generic business days
CAL_DEFAULT Calendar day Include weekends

Examples

The examples/ directory contains comprehensive examples:

  • File Downloads: Single file, batch downloads, availability checks
  • Time Series: Expressions, instruments, groups with filters
  • Discovery: Search instruments, list groups, get attributes
  • Advanced: Grid data, auto-download, custom progress tracking

Run an example:

python examples/files/download_file.py
python examples/expressions/get_expressions_time_series.py

CLI Usage

The SDK includes a command-line interface:

# Download files
dataquery download --group-id JPMAQS_GENERIC_RETURNS \
                   --start-date 20250101 \
                   --end-date 20250131 \
                   --destination ./data

# List groups
dataquery list-groups --limit 100

# Check file availability
dataquery check-availability --file-group-id JPMAQS_GENERIC_RETURNS \
                             --date 20250115

API Reference

Core Methods

File Downloads

  • download_file_async() - Download a single file
  • run_group_download_async() - Download all files in a date range
  • list_available_files_async() - Check file availability

Time Series Queries

  • get_expressions_time_series_async() - Query by expression
  • get_instrument_time_series_async() - Query by instrument ID
  • get_group_time_series_async() - Query entire group with filters

Discovery

  • list_groups_async() - List available data groups
  • search_instruments_async() - Search for instruments
  • list_instruments_async() - List all instruments in a group
  • get_group_attributes_async() - Get available attributes
  • get_group_filters_async() - Get available filters

Utilities

  • to_dataframe() - Convert any response to pandas DataFrame
  • health_check_async() - Check API health
  • get_stats() - Get connection and rate limit statistics

For detailed API documentation, see the API Reference.

Requirements

  • Python 3.10 or higher
  • Dependencies:
    • aiohttp>=3.8.0 - Async HTTP client
    • pydantic>=2.0.0 - Data validation
    • structlog>=23.0.0 - Structured logging
    • python-dotenv>=1.0.0 - Environment variable management

Optional:

  • pandas>=2.0.0 - For DataFrame conversion

Development

Setup Development Environment

# Clone the repository
git clone https://github.com/dataquery/dataquery-sdk.git
cd dataquery-sdk

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install development dependencies
pip install -e ".[dev]"

# Install pre-commit hooks
pre-commit install

Run Tests

# Run all tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=dataquery --cov-report=html

# Run specific test file
pytest tests/test_client.py -v

Code Quality

# Format code
black dataquery/ tests/

# Check linting
flake8 dataquery/ tests/ examples/

# Type checking
mypy dataquery/

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

For issues and questions:

Changelog

See CHANGELOG.md for version history and release notes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataquery_sdk-0.0.9.tar.gz (97.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dataquery_sdk-0.0.9-py3-none-any.whl (97.6 kB view details)

Uploaded Python 3

File details

Details for the file dataquery_sdk-0.0.9.tar.gz.

File metadata

  • Download URL: dataquery_sdk-0.0.9.tar.gz
  • Upload date:
  • Size: 97.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for dataquery_sdk-0.0.9.tar.gz
Algorithm Hash digest
SHA256 ac67dd4e8c7d5de7fcaed07d17756d95fa41b8b42d505e7296e714b9f6f900c5
MD5 ad6dec52e0eabf25070dc3ce66d228e4
BLAKE2b-256 83f73e1ede57a82d33799c153347c67b1259119eb58cc9bebb15746cfd37ca85

See more details on using hashes here.

File details

Details for the file dataquery_sdk-0.0.9-py3-none-any.whl.

File metadata

  • Download URL: dataquery_sdk-0.0.9-py3-none-any.whl
  • Upload date:
  • Size: 97.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for dataquery_sdk-0.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 6fcbbaadee93c8fd5e8996efaa77ec8f9b774dcdaa92772227a90ed318413bf8
MD5 88a38137672afb7990978aedf67c2996
BLAKE2b-256 0e9b58389859d9b0d37e2a41a6dbaa0bafc6e20f8e8312d76f163b3aeafcc7b2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page