Python SDK for DATAQUERY Data API - Query, download, and check availability of economic data files
Project description
DataQuery SDK
Professional Python SDK for the DataQuery API - High-performance data access with parallel downloads, time series queries, and seamless OAuth 2.0 authentication.
Features
- High-Performance Downloads: Parallel file downloads with automatic retry and progress tracking
- Time Series Queries: Query data by expressions, instruments, or groups with flexible filtering
- OAuth 2.0 Authentication: Automatic token management and refresh
- Connection Pooling: Optimized HTTP connections with configurable rate limiting
- Pandas Integration: Direct conversion to DataFrames for analysis
- Async & Sync APIs: Use async/await or synchronous methods based on your needs
Installation
pip install dataquery-sdk
Quick Start
1. Configure Credentials
Set your API credentials as environment variables:
export DATAQUERY_CLIENT_ID="your_client_id"
export DATAQUERY_CLIENT_SECRET="your_client_secret"
Or create a .env file in your project directory:
DATAQUERY_CLIENT_ID=your_client_id
DATAQUERY_CLIENT_SECRET=your_client_secret
2. Download Files
Synchronous (Python Scripts)
from dataquery import DataQuery
# Download all files for a date range
with DataQuery() as dq:
results = dq.run_group_download(
group_id="JPMAQS_GENERIC_RETURNS",
start_date="20250101",
end_date="20250131",
destination_dir="./data"
)
print(f"Downloaded {results['successful_downloads']} files")
Asynchronous (Jupyter Notebooks)
from dataquery import DataQuery
# Download all files for a date range
async with DataQuery() as dq:
results = await dq.run_group_download_async(
group_id="JPMAQS_GENERIC_RETURNS",
start_date="20250101",
end_date="20250131",
destination_dir="./data"
)
print(f"Downloaded {results['successful_downloads']} files")
3. Query Time Series Data
from dataquery import DataQuery
async with DataQuery() as dq:
# Query by expression
result = await dq.get_expressions_time_series_async(
expressions=["DB(MTE,IRISH EUR 1.100 15-May-2029 LON,,IE00BH3SQ895,MIDPRC)"],
start_date="20240101",
end_date="20240131"
)
# Convert to pandas DataFrame
df = dq.to_dataframe(result)
print(df.head())
4. Discover Available Data
from dataquery import DataQuery
async with DataQuery() as dq:
# List all available groups
groups = await dq.list_groups_async(limit=100)
# Convert to DataFrame for easy viewing
groups_df = dq.to_dataframe(groups)
print(groups_df[['group_id', 'group_name', 'description']])
Common Use Cases
Download Single File
from dataquery import DataQuery
from pathlib import Path
async with DataQuery() as dq:
result = await dq.download_file_async(
file_group_id="JPMAQS_GENERIC_RETURNS",
file_datetime="20250115",
destination_path=Path("./downloads")
)
print(f"Downloaded: {result.local_path}")
Query with Filters
async with DataQuery() as dq:
# Get time series for Ireland bonds only
result = await dq.get_group_time_series_async(
group_id="FI_GO_BO_EA",
attributes=["MIDPRC", "REPO_1M"],
filter="country(IRL)",
start_date="20240101",
end_date="20240131"
)
df = dq.to_dataframe(result)
Search for Instruments
async with DataQuery() as dq:
# Search for instruments by keywords
results = await dq.search_instruments_async(
group_id="FI_GO_BO_EA",
keywords="irish"
)
# Use the results to query time series
instrument_ids = [inst.instrument_id for inst in results.instruments[:5]]
data = await dq.get_instrument_time_series_async(
instruments=instrument_ids,
attributes=["MIDPRC"],
start_date="20240101",
end_date="20240131"
)
Performance Optimization
Parallel Downloads
async with DataQuery() as dq:
# Download multiple files concurrently with parallel chunks
results = await dq.run_group_download_async(
group_id="JPMAQS_GENERIC_RETURNS",
start_date="20250101",
end_date="20250131",
destination_dir="./data",
max_concurrent=5, # Download 5 files simultaneously
num_parts=4 # Split each file into 4 parallel chunks
)
Recommended Settings:
max_concurrent: 3-5 (concurrent file downloads)num_parts: 2-8 (parallel chunks per file)
Rate Limiting
Configure rate limits to avoid API throttling:
from dataquery import DataQuery, ClientConfig
config = ClientConfig(
client_id="your_client_id",
client_secret="your_client_secret",
rate_limit_rpm=300, # Requests per minute
max_retries=3,
timeout=60.0
)
async with DataQuery(config=config) as dq:
# Your code here
pass
Configuration
Environment Variables
# Required
DATAQUERY_CLIENT_ID=your_client_id
DATAQUERY_CLIENT_SECRET=your_client_secret
# Optional - API Endpoints
DATAQUERY_BASE_URL=https://api-developer.jpmorgan.com
DATAQUERY_FILES_BASE_URL=https://api-dataquery.jpmchase.com
# Optional - Performance
DATAQUERY_MAX_RETRIES=3
DATAQUERY_TIMEOUT=60
DATAQUERY_RATE_LIMIT_RPM=300
Programmatic Configuration
from dataquery import DataQuery, ClientConfig
config = ClientConfig(
client_id="your_client_id",
client_secret="your_client_secret",
base_url="https://api-developer.jpmorgan.com",
max_retries=3,
timeout=60.0,
rate_limit_rpm=300
)
async with DataQuery(config=config) as dq:
# Your code here
pass
Error Handling
from dataquery import DataQuery
from dataquery.exceptions import (
DataQueryError,
AuthenticationError,
NotFoundError,
RateLimitError
)
async def safe_query():
try:
async with DataQuery() as dq:
result = await dq.get_expressions_time_series_async(
expressions=["DB(...)"],
start_date="20240101",
end_date="20240131"
)
return result
except AuthenticationError as e:
print(f"Authentication failed: {e}")
except NotFoundError as e:
print(f"Resource not found: {e}")
except RateLimitError as e:
print(f"Rate limit exceeded: {e}")
except DataQueryError as e:
print(f"API error: {e}")
except Exception as e:
print(f"Unexpected error: {e}")
Date Formats
Absolute Dates
start_date="20240101" # YYYYMMDD format
end_date="20241231"
Relative Dates
start_date="TODAY" # Today
start_date="TODAY-1D" # Yesterday
start_date="TODAY-1W" # 1 week ago
start_date="TODAY-1M" # 1 month ago
start_date="TODAY-1Y" # 1 year ago
Calendar Conventions
| Calendar | Description | Use Case |
|---|---|---|
CAL_WEEKDAYS |
Monday-Friday | International data (recommended) |
CAL_USBANK |
US banking days | US-only data (default) |
CAL_WEEKDAY_NOHOLIDAY |
All weekdays | Generic business days |
CAL_DEFAULT |
Calendar day | Include weekends |
Examples
The examples/ directory contains comprehensive examples:
- File Downloads: Single file, batch downloads, availability checks
- Time Series: Expressions, instruments, groups with filters
- Discovery: Search instruments, list groups, get attributes
- Advanced: Grid data, auto-download, custom progress tracking
Run an example:
python examples/files/download_file.py
python examples/expressions/get_expressions_time_series.py
CLI Usage
The SDK includes a command-line interface:
# Download files
dataquery download --group-id JPMAQS_GENERIC_RETURNS \
--start-date 20250101 \
--end-date 20250131 \
--destination ./data
# List groups
dataquery list-groups --limit 100
# Check file availability
dataquery check-availability --file-group-id JPMAQS_GENERIC_RETURNS \
--date 20250115
API Reference
Core Methods
File Downloads
download_file_async()- Download a single filerun_group_download_async()- Download all files in a date rangelist_available_files_async()- Check file availability
Time Series Queries
get_expressions_time_series_async()- Query by expressionget_instrument_time_series_async()- Query by instrument IDget_group_time_series_async()- Query entire group with filters
Discovery
list_groups_async()- List available data groupssearch_instruments_async()- Search for instrumentslist_instruments_async()- List all instruments in a groupget_group_attributes_async()- Get available attributesget_group_filters_async()- Get available filters
Utilities
to_dataframe()- Convert any response to pandas DataFramehealth_check_async()- Check API healthget_stats()- Get connection and rate limit statistics
For detailed API documentation, see the API Reference.
Requirements
- Python 3.10 or higher
- Dependencies:
aiohttp>=3.8.0- Async HTTP clientpydantic>=2.0.0- Data validationstructlog>=23.0.0- Structured loggingpython-dotenv>=1.0.0- Environment variable management
Optional:
pandas>=2.0.0- For DataFrame conversion
Development
Setup Development Environment
# Clone the repository
git clone https://github.com/dataquery/dataquery-sdk.git
cd dataquery-sdk
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install development dependencies
pip install -e ".[dev]"
# Install pre-commit hooks
pre-commit install
Run Tests
# Run all tests
pytest tests/ -v
# Run with coverage
pytest tests/ --cov=dataquery --cov-report=html
# Run specific test file
pytest tests/test_client.py -v
Code Quality
# Format code
black dataquery/ tests/
# Check linting
flake8 dataquery/ tests/ examples/
# Type checking
mypy dataquery/
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Support
For issues and questions:
- GitHub Issues: Report a bug
- Documentation: Read the docs
- Email: support@dataquery.com
Changelog
See CHANGELOG.md for version history and release notes.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dataquery_sdk-0.1.2.tar.gz.
File metadata
- Download URL: dataquery_sdk-0.1.2.tar.gz
- Upload date:
- Size: 96.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b18163c14a3bb682228c54789b21e7b24daceab20ff68aee41752e14b5f15560
|
|
| MD5 |
1e3128fd357d8154938e85f53a593b6e
|
|
| BLAKE2b-256 |
92b03c6aea9a0b088e947658b199538ae23ccc3834ddec9cbd6483a2b27e9a9f
|
File details
Details for the file dataquery_sdk-0.1.2-py3-none-any.whl.
File metadata
- Download URL: dataquery_sdk-0.1.2-py3-none-any.whl
- Upload date:
- Size: 95.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a99300bdf607d30482142612628211726b94d24fc5e0a80192d4257625db6a12
|
|
| MD5 |
2755fc15547995935729a64f64f87ca5
|
|
| BLAKE2b-256 |
a80c73fd46836067701f53da1e3801aecfe363a947efe5875474a112778dc333
|