Efficiently query large Upstox instruments JSON files using SQLite
Project description
Upstox Instrument Query
A Python package to efficiently query large Upstox instruments JSON files (~60MB) using SQLite for optimal performance.
Features
- Memory Efficient: Streams JSON parsing for minimal memory footprint
- High Performance: Uses SQLite with optimized indexes
- Flexible: Query by instrument key, exchange, instrument type, or custom criteria
- Caching: Implements LRU caching for frequently accessed queries
- CLI Support: Command-line tools for database initialization and updates
- URL Support: Direct loading from Upstox API URLs with gzip handling
Installation
pip install upstox-instrument-query
Usage
Initialize the Database
From a local JSON file:
upstox-query init /path/to/instruments.json /path/to/database.db
From the Upstox URL:
upstox-query init https://assets.upstox.com/market-quote/instruments/exchange/complete.json.gz /path/to/database.db --url
Query the Data
from upstox_instrument_query import InstrumentQuery
# Initialize query interface
query = InstrumentQuery('/path/to/database.db')
# Get instrument by key
instrument = query.get_by_instrument_key('NSE_EQ|INE002A01018')
print(instrument)
# Filter by exchange
nse_instruments = query.filter_by_exchange('NSE')
print(f"Found {len(nse_instruments)} NSE instruments")
# Filter by instrument type
equity_instruments = query.filter_by_instrument_type('EQUITY')
print(f"Found {len(equity_instruments)} EQUITY instruments")
# Search by name (regex)
reliance_instruments = query.search_by_name('RELIANCE')
print(f"Found {len(reliance_instruments)} RELIANCE instruments:")
for instr in reliance_instruments[:3]: # Print first 3
print(f"- {instr['trading_symbol']} ({instr['instrument_type']})")
# Custom query
futures = query.custom_query('instrument_type = ? AND expiry > ?', ('FUTURES', '2025-01-01'))
print(f"Found {len(futures)} futures expiring after 2025-01-01")
CLI Commands
The package provides a command-line interface for common operations:
# Initialize database from file
upstox-query init /path/to/instruments.json /path/to/database.db
# Initialize database from URL
upstox-query init https://assets.upstox.com/market-quote/instruments/exchange/complete.json.gz /path/to/database.db --url
# Update existing database from file
upstox-query update /path/to/instruments.json /path/to/database.db
# Update existing database from URL
upstox-query update https://assets.upstox.com/market-quote/instruments/exchange/complete.json.gz /path/to/database.db --url
Advanced Usage
Case-Sensitive Name Searching
# Default is case-insensitive
case_insensitive = query.search_by_name('reliance')
# Enable case sensitivity
case_sensitive = query.search_by_name('RELIANCE', case_sensitive=True)
Complex Custom Queries
# Find all NSE futures expiring in 2025 with a lot size greater than 500
complex_query = query.custom_query(
'exchange = ? AND instrument_type = ? AND expiry LIKE ? AND lot_size > ?',
('NSE', 'FUTURES', '2025-%', 500)
)
Performance Notes
- SQLite Storage: Uses disk-based SQLite database, minimizing memory usage
- Optimized Indexes: Includes indexes on
instrument_key,exchange,instrument_type, andnamefor fast queries - Result Caching: Caches results for repeated queries using
lru_cache - Streaming Parser: Streams JSON parsing from both local files and URLs to handle large files efficiently
- Compression Support: Handles gzip-compressed JSON from URLs for direct processing
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Development Setup
-
Clone the repository:
git clone https://github.com/jinto-ag/upstox_instrument_query.git cd upstox_instrument_query
-
Install development dependencies:
pip install -e ".[dev]" # or pip install -e . pytest pytest-cov pre-commit
-
Install pre-commit hooks:
pre-commit install --hook-type pre-commit --hook-type commit-msg
Running Tests
Run the test suite using pytest:
pytest
To generate a coverage report:
pytest --cov=upstox_instrument_query --cov-report=html
The coverage report will be available in the htmlcov directory. Open htmlcov/index.html in your browser to view it.
Test-Driven Development (TDD)
This project follows Test-Driven Development practices for all new features. When contributing or adding new functionality, please follow these TDD principles:
- Write tests first: Before implementing any new feature, write tests that define the expected behavior
- Run the tests: Verify that the tests fail (since the feature doesn't exist yet)
- Implement the feature: Write the minimal code needed to make the tests pass
- Refactor: Clean up the code while ensuring tests continue to pass
- Repeat: Continue this cycle for each new piece of functionality
Example TDD workflow:
# 1. Create a test file for the new feature
touch tests/test_new_feature.py
# 2. Write tests for the expected behavior
# Edit tests/test_new_feature.py with test cases
# 3. Run tests to verify they fail appropriately
pytest tests/test_new_feature.py -v
# 4. Implement the feature
# Edit the relevant files in upstox_instrument_query/
# 5. Run tests again to see if they pass
pytest tests/test_new_feature.py -v
# 6. Run coverage to ensure proper test coverage
pytest --cov=upstox_instrument_query --cov-report=term-missing
Always aim to maintain high test coverage (90%+) when adding new features. The project CI pipeline will automatically check test coverage on pull requests.
Conventional Commits
This project follows the Conventional Commits specification for commit messages. This leads to more readable messages that are easy to follow when looking through the project history.
Commit messages should be structured as follows:
<type>[optional scope]: <description>
[optional body]
[optional footer(s)]
Types include:
- feat: A new feature
- fix: A bug fix
- docs: Documentation only changes
- style: Changes that do not affect the meaning of the code
- refactor: A code change that neither fixes a bug nor adds a feature
- perf: A code change that improves performance
- test: Adding missing tests or correcting existing tests
- chore: Changes to the build process or auxiliary tools
Examples:
feat: add option chain retrieval functionality
fix: correct database connection leak
docs: update README with new API methods
The pre-commit hooks enforce this convention when committing changes.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file upstox_instrument_query-0.1.0.tar.gz.
File metadata
- Download URL: upstox_instrument_query-0.1.0.tar.gz
- Upload date:
- Size: 24.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
714ec0e4ea3c0b1bee79f0631b7386c1749ec8885ff8d951c8d2d1e3d1566b0c
|
|
| MD5 |
f06c1fd944e15e100822c518b28dbdbf
|
|
| BLAKE2b-256 |
69f9b8609dcd71acdaae92d3814753f9d6b4817643902d5f1225fe6c347e06dc
|
File details
Details for the file upstox_instrument_query-0.1.0-py3-none-any.whl.
File metadata
- Download URL: upstox_instrument_query-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b107f1d3da9d9603826333015d791a0c37a501d01cef50ba65ee87e080e1f198
|
|
| MD5 |
e80eb1e202f69f5974c2317e067f60c9
|
|
| BLAKE2b-256 |
a1f8872af869b5425cae1ad0afa523ea346a02f23efb07bcdeb98dc7cd65d905
|