Skip to main content

Data aggregation for forex market data

Project description

FOREX DATA

Documentation CI Status PyPI version Python Version Poetry

๐Ÿ“š View Full Documentation | ๐Ÿš€ Quick Start | ๐Ÿ’ก Examples

The forex_data package offers ways to aggregate data from the Forex market into a dataframe having the the essential OHLC information, so the ouput will always have the columns:

  • timestamp
  • open
  • high
  • low
  • close

The purpose is to aggregate data in OHLC format from multiple sources, optimize data caching and provide an interface to easily access and use the data: the main outcome from the interface is a dataframe.

At the moment, sources are divided in historical sources and real-time sources.

SOURCES

HISTORICAL SOURCE

A historical source provides data tipically from the first years of 2000s and the free tier is fine for the purposes of the package. The update rate is slow but data can be retrived with a low resolution like 1-minute timeframe.

The historical source used in the package is histdata.com, which work is really genuine and a lot appreciated.

Summarizing, a historical source can provide tons of data even from many years ago and with no limits at the downside of a slow update rate. For example, histdata updates data on a montly basis.

REAL-TIME SOURCE

A real-time source is what is more tipically known as a source for forex market or stock market data. It offers APIs in determined clients or even just a minimal documentation to establish the API call in HTTP request format. A minimal free or trial offering is proposed, but they rely on premium subscriptions offers based on:

  • real time performance
  • size of tickers list available
  • how much history of a ticker
  • and many other parameters ...

As of now, just alpha-vantage and polygon-io are managed. The intention is to make the most out of them and their free tier access to data. Twelve data interface is under development.

INSTALLATION

From PyPI (Recommended)

The easiest way to install forex_data is via pip:

pip install forex-data-aggregator

Or with Poetry:

poetry add forex-data-aggregator

From Source

If you want to install from source or contribute to development:

  1. Ensure you have Poetry installed
  2. Clone the repository:
git clone https://github.com/nikfio/forex_data.git -b master forex-data
cd forex-data
  1. Install dependencies:
poetry install
  1. Run tests to verify installation:
poetry run pytest

DOCUMENTATION

๐Ÿ“– Comprehensive documentation is available at nikfio.github.io/forex_data

The full documentation includes:

CONFIGURATION FILE

A configuration file can be passed in order to group fixed parameters values. In repository folder clone, look for appconfig folder to see the example template file.

In data managers instantiation, you can pass directly the absolute path to the YAML file or also a folder. In the second case, it will look for the configuration file ending with data_config.yaml in the specified folder. Furthermore, any parameter value can be overridden by explicit assignment in object instantion. The feature will be more clear following the examples section.

ENGINE

Available options:

  • pandas
  • pyarrow
  • polars
  • polars_lazy

DATA_FILETYPE

Available options:

  • csv
  • parquet

parquet filetype is strongly suggested for read/write speed and disk space occupation. Meanwhile, if you have any analysis application outside the Python environment, it would more likely accept csv files over parquet: so csv filetype could be a better choice for its broader acceptance.

DATA_PATH

Specifies the absolute directory path where the downloaded data files will be stored. If not provided, a default location is used (~/.database/).

PROVIDERS_KEY

To use real-time sources you need to provide an API key.

Look here to register and create a key from Alpha-Vantage provider Alpha-Vantage free API registration

Look here to register and create a key from Polygon-IO provider Polygon-IO home page

LOGGING

Logging feature is added via loguru library. By construction log is dumped in a file which location is determined by pathlib. A generic usage folder for the package named .database is created at the current user home folder. Here log is dumped in a file called forexdata.log, the complete location of the log file will be:

~/.database/forexdata.log

EXAMPLES

You can find complete working examples in the examples folder showing the various modules and functionalities the package offers.

To run the examples:

# Historical data example
poetry run python examples/histdata_db_manager.py

# Real-time data example (requires API keys as environment variables)
export ALPHA_VANTAGE_API_KEY="your_key_here"
export POLYGON_IO_API_KEY="your_key_here"
poetry run python examples/realtime_data_manager.py

Historical data

Let's walk through the example for historical data source:

  1. Configuration setup
    # Use a runtime defined config yaml file
    test_config_yaml = '''
    DATA_FILETYPE: 'parquet'
    
    ENGINE: 'polars_lazy'
    
    DATA_PATH: 'ABSOLUTE-PATH-TO-DATA-DIRECTORY'
    '''
    
    You can define configuration inline or use a file. The configuration can override specific settings.

  1. Data manager instance
    from forex_data import HistoricalManagerDB
    
    histmanager = HistoricalManagerDB(
        config=test_config_yaml
    )
    
    Create an instance of the historical data manager with your configuration.

  1. Get data

    ex_ticker = 'EURUSD'
    ex_timeframe = '1d'
    ex_start_date = '2018-10-03 10:00:00'
    ex_end_date = '2018-12-03 10:00:00'
    
    yeardata = histmanager.get_data(
        ticker=ex_ticker,
        timeframe=ex_timeframe,
        start=ex_start_date,
        end=ex_end_date
    )
    

    The call returns a dataframe with data having the timeframe, start, and end specified by the inputs. The output dataframe type depends on the engine selected (polars_lazy, polars, pandas, pyarrow).

    With polars_lazy as ENGINE option, the output dataframe:

    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚ timestamp           โ”† open    โ”† high    โ”† low     โ”† close   โ”‚
    โ”‚ ---                 โ”† ---     โ”† ---     โ”† ---     โ”† ---     โ”‚
    โ”‚ datetime[ms]        โ”† f32     โ”† f32     โ”† f32     โ”† f32     โ”‚
    โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
    โ”‚ 2018-10-03 21:00:00 โ”† 1.1523  โ”† 1.1528  โ”† 1.1512  โ”† 1.1516  โ”‚
    โ”‚ 2018-10-04 21:00:00 โ”† 1.1516  โ”† 1.1539  โ”† 1.1485  โ”† 1.1498  โ”‚
    โ”‚ 2018-10-05 21:00:00 โ”† 1.1498  โ”† 1.1534  โ”† 1.1486  โ”† 1.1514  โ”‚
    โ”‚ ...                 โ”† ...     โ”† ...     โ”† ...     โ”† ...     โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    

  1. Add a timeframe
    histmanager.add_timeframe('1W')
    
    Add a new timeframe. The data manager will create and cache the new timeframe data if not already present.

  1. Plot data
    histmanager.plot(
        ticker=ex_ticker,
        timeframe='1D',
        start_date='2016-02-02 18:00:00',
        end_date='2016-06-23 23:00:00'
    )
    
    Generate a candlestick chart for the specified ticker and date range.

output chart


  1. Conditional Data Retrieval

    You can filter data directly during retrieval using SQL-like conditions.

    from forex_data import (
        HistoricalManagerDB, 
        BASE_DATA_COLUMN_NAME, 
        SQL_COMPARISON_OPERATORS
    )
    
    # 1. Simple condition: OPEN < 1.13
    data = histmanager.get_data(
        ticker='EURUSD',
        timeframe='1D',
        start='2018-01-01',
        end='2018-12-31',
        comparison_column_name=BASE_DATA_COLUMN_NAME.OPEN,
        check_level=1.13,
        comparison_operator=SQL_COMPARISON_OPERATORS.LESS_THAN
    )
    
    # 2. Multiple conditions (OR): HIGH > 1.145 OR LOW < 1.12
    from forex_data import SQL_CONDITION_AGGREGATION_MODES
    
    data = histmanager.get_data(
        ticker='EURUSD',
        timeframe='1D',
        start='2019-01-01',
        end='2019-12-31',
        comparison_column_name=[
            BASE_DATA_COLUMN_NAME.HIGH, 
            BASE_DATA_COLUMN_NAME.LOW
        ],
        check_level=[1.145, 1.12],
        comparison_operator=[
            SQL_COMPARISON_OPERATORS.GREATER_THAN, 
            SQL_COMPARISON_OPERATORS.LESS_THAN
        ],
        aggregation_mode=SQL_CONDITION_AGGREGATION_MODES.OR
    )
    

Real-Time data

Let's walk through the example for real-time data source:

Important: This example requires API keys set as environment variables:

export ALPHA_VANTAGE_API_KEY="your_alphavantage_key"
export POLYGON_IO_API_KEY="your_polygon_io_key"
  1. Configuration with API keys
    from os import getenv
    
    alpha_vantage_key = getenv('ALPHA_VANTAGE_API_KEY')
    polygon_io_key = getenv('POLYGON_IO_API_KEY')
    
    test_config_yaml = f'''
    DATA_FILETYPE: 'parquet'
    
    ENGINE: 'polars_lazy'
    
    DATA_PATH: 'ABSOLUTE-PATH-TO-DATA-DIRECTORY'
    
    PROVIDERS_KEY:
        ALPHA_VANTAGE_API_KEY : {alpha_vantage_key},
        POLYGON_IO_API_KEY    : {polygon_io_key}
    '''
    
    Configuration includes API keys for real-time data providers.

  1. Data manager instance
    from forex_data import RealtimeManager
    
    realtimedata_manager = RealtimeManager(
        config=test_config_yaml
    )
    

  1. Get last daily close

    ex_ticker = 'EURCAD'
    
    dayclose_quote = realtimedata_manager.get_daily_close(
        ticker=ex_ticker,
        last_close=True
    )
    

    Output:

    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚ timestamp           โ”† open    โ”† high    โ”† low     โ”† close  โ”‚
    โ”‚ ---                 โ”† ---     โ”† ---     โ”† ---     โ”† ---    โ”‚
    โ”‚ datetime[ms]        โ”† f32     โ”† f32     โ”† f32     โ”† f32    โ”‚
    โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•ก
    โ”‚ 2025-01-23 00:00:00 โ”† 1.4123  โ”† 1.4156  โ”† 1.4098  โ”† 1.4125 โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    
  2. Get daily close for last N days

    ex_n_days = 13
    
    window_daily_ohlc = realtimedata_manager.get_daily_close(
        ticker=ex_ticker,
        recent_days_window=ex_n_days
    )
    

    Returns the last 13 days of daily OHLC data.

  3. Get daily close for specific date range

    ex_start_date = '2025-01-15'
    ex_end_date = '2025-01-23'
    
    window_limits_daily_ohlc = realtimedata_manager.get_daily_close(
        ticker=ex_ticker,
        day_start=ex_start_date,
        day_end=ex_end_date
    )
    

    Output:

    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚ timestamp           โ”† open   โ”† high   โ”† low    โ”† close  โ”‚
    โ”‚ ---                 โ”† ---    โ”† ---    โ”† ---    โ”† ---    โ”‚
    โ”‚ datetime[ms]        โ”† f32    โ”† f32    โ”† f32    โ”† f32    โ”‚
    โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•ก
    โ”‚ 2025-01-23 00:00:00 โ”† 1.4125 โ”† 1.4156 โ”† 1.4098 โ”† 1.4132 โ”‚
    โ”‚ 2025-01-22 00:00:00 โ”† 1.4089 โ”† 1.4147 โ”† 1.4072 โ”† 1.4125 โ”‚
    โ”‚ 2025-01-21 00:00:00 โ”† 1.4112 โ”† 1.4134 โ”† 1.4063 โ”† 1.4089 โ”‚
    โ”‚ ...                 โ”† ...    โ”† ...    โ”† ...    โ”† ...    โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    
  4. Get OHLC data with custom timeframe

    ex_start_date = '2024-04-10'
    ex_end_date = '2024-04-15'
    ex_timeframe = '1h'
    
    window_data_ohlc = realtimedata_manager.get_data(
        ticker=ex_ticker,
        start=ex_start_date,
        end=ex_end_date,
        timeframe=ex_timeframe
    )
    

    Output:

    Real time 1h window data: shape: (72, 5)
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚ timestamp           โ”† open    โ”† high    โ”† low     โ”† close   โ”‚
    โ”‚ ---                 โ”† ---     โ”† ---     โ”† ---     โ”† ---     โ”‚
    โ”‚ datetime[ms]        โ”† f32     โ”† f32     โ”† f32     โ”† f32     โ”‚
    โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
    โ”‚ 2024-04-10 00:00:00 โ”† 1.4765  โ”† 1.4768  โ”† 1.4752  โ”† 1.4761  โ”‚
    โ”‚ 2024-04-10 01:00:00 โ”† 1.4761  โ”† 1.4768  โ”† 1.4755  โ”† 1.4762  โ”‚
    โ”‚ 2024-04-10 02:00:00 โ”† 1.4762  โ”† 1.4778  โ”† 1.4751  โ”† 1.4771  โ”‚
    โ”‚ ...                 โ”† ...     โ”† ...     โ”† ...     โ”† ...     โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    
  5. Intraday data with dynamic dates

    from pandas import Timestamp, Timedelta
    
    ex_start_date = Timestamp.now() - Timedelta('10D')
    ex_end_date = Timestamp.now() - Timedelta('8D')
    ex_timeframe = '5m'
    
    window_data_ohlc = realtimedata_manager.get_data(
        ticker='EURUSD',
        start=ex_start_date,
        end=ex_end_date,
        timeframe=ex_timeframe
    )
    

    Get 5-minute data for recent days using dynamic date calculations.

PYTEST and pipeline implementation

The project uses pytest for testing and CircleCI for continuous integration. The pipeline automatically runs on every commit to ensure code quality and functionality.

Testing with Pytest

To run tests locally:

# Run all tests
poetry run pytest

# Run tests with flake8 linting (same as CI)
poetry run pytest --flake8

# Run tests with verbose output
poetry run pytest -v

# Run specific test file
poetry run pytest tests/test_file.py

CircleCI Pipeline

The CI/CD pipeline is configured via .circleci/config.yml and automatically runs on every push to the repository.

Pipeline Configuration

Version: CircleCI 2.1

Docker Image: cimg/python:3.12.12

Workflow: unit-tests

Pipeline Steps

The pipeline executes the following steps for Python 3.12:

  1. Checkout: Clone the repository code
  2. Install Poetry: Install the Poetry package manager (pip install poetry)
  3. Restore Cache: Restore dependencies from cache if available (cache key based on poetry.lock checksum)
  4. Install Dependencies: Install project dependencies using poetry install
  5. Save Cache: Cache the installed dependencies for faster future builds
  6. Run Tests: Execute tests with flake8 linting using poetry run pytest --flake8

Environment Variables

The pipeline supports the following environment variables (configured in CircleCI project settings):

  • DATABASE_URL: Database connection string (if needed)
  • API_KEY: API keys for external services (if needed for integration tests)

Jobs

  • py312: Runs the complete test suite on Python 3.12

Workflow

The unit-tests workflow triggers on every commit and runs the py312 job to validate:

  • Code functionality through pytest
  • Code quality and style through flake8 integration

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

forex_data_aggregator-0.1.8.tar.gz (46.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

forex_data_aggregator-0.1.8-py3-none-any.whl (46.9 kB view details)

Uploaded Python 3

File details

Details for the file forex_data_aggregator-0.1.8.tar.gz.

File metadata

  • Download URL: forex_data_aggregator-0.1.8.tar.gz
  • Upload date:
  • Size: 46.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for forex_data_aggregator-0.1.8.tar.gz
Algorithm Hash digest
SHA256 968fe76044ecbdeb367297f9ab2eb508d08895cfe2a96e7d4884016d954aac4d
MD5 c57d5523fa3ec1c1551c697b8994cf39
BLAKE2b-256 6f794a43c65774b96337ed8045274ae3d4f639e35e896b74ba8b0ac713378148

See more details on using hashes here.

Provenance

The following attestation bundles were made for forex_data_aggregator-0.1.8.tar.gz:

Publisher: release.yml on nikfio/forex_data

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file forex_data_aggregator-0.1.8-py3-none-any.whl.

File metadata

File hashes

Hashes for forex_data_aggregator-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 91cb33b90c77f8d8f787a217ba1a8bc8aebba47f96eb231c87ab1146248452a3
MD5 92a0683e7e4562aa77dfbdb6bb510e36
BLAKE2b-256 dab3a53115eded4eb55eee91d54e62bab2b5efd176d5383c793d70bb9020fea0

See more details on using hashes here.

Provenance

The following attestation bundles were made for forex_data_aggregator-0.1.8-py3-none-any.whl:

Publisher: release.yml on nikfio/forex_data

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page