Skip to main content

Tool for retrieving and combining financial and related data for informing security investments

Project description

KaxaNuk Data Curator

Python PyPI - License
Build Status Read the Docs
PyPI - Version PyPI Downloads Powered by KaxaNuk

Component library for downloading, validating, homogenizing, and combining financial stocks' data from different data providers. Can be run in standalone mode, configurable in Excel, or as a component of a larger Python-based system.

Features:

  • Configurable from an Excel file, or directly in a Python script. Docker image also available.
  • Fully readable and specific tag names, homogenized between data providers, based on the US GAAP taxonomy. Switch between data providers without changing your code.
  • Automatically validates market and fundamental data, discarding datasets that make no sense (like high price below low, etc.) or can't guarantee point-in-time validity (like amended statements).
  • Easily create your own calculated feature functions without need for Numpy or Pandas (though you can also use those if you want to).
  • Output to CSV or Parquet files, or to in-memory Pandas Dataframes for further processing.
  • Completely extensible architecture: implement your own data providers, feature combinations, and output handlers on top of clear, stable interfaces.
  • Readable, well-documented, and tested code.

Documentation

Full documentation is available at kaxanuk-data-curator.readthedocs.io.

Requirements

The system can run either on your local Python (versions 3.12 or 3.13) or on Docker.

Supported Data Providers

  • Financial Modeling Prep (free and discounted plans available through our referral link)
  • Yahoo Finance (requires installing a separate extension package, and doesn't support most data types)

Running on Local Python

Installation

  1. Make sure you're running the required version of Python, preferably in its own virtual environment.

  2. Open a terminal and run:

    pip install --upgrade pip
    pip install kaxanuk.data_curator
    
  3. If you want to use the Yahoo Finance data provider, install the extension package:

    pip install kaxanuk.data_curator_extensions.yahoo_finance
    

Configuration

  1. Open a terminal in any directory and run the following command:
    kaxanuk.data_curator init excel
    
    This should create 2 subdirectories, Config and Output, as well as the entry script __main__.py in the current directory.
  2. Open the Config/parameters_datacurator.xlsx file in Excel, fill out the fields in all the sheets, save the file and close it.
  3. If your data provider requires an API key, open the Config/.env file in a text editor, and paste the key after the = sign of the provider's corresponding API_KEY variable. Don't add any quotes or spaces before or after the key.

*If on MacOS, the .env file will be hidden in Finder by default. Just use the keys Command + Shift + . to toggle the visibility of hidden files.

Usage

Now you can run the entry script with either:

kaxanuk.data_curator run

or by executing the __main__.py script directly with Python:

python __main__.py

The system will download the data for the tickers configured in the file, and save the data to the Output folder.

Running on Docker

Pull the Docker image:

docker pull ghcr.io/kaxanuk/data-curator:latest

Docker Configuration

Volumes

You need to mount the following volume to the container:

  • Path on the host: (select the directory on your PC where you want the Data Curator configuration and output files to be created)
  • Path inside the container: /app

Environment Variables

If your data provider requires an API key, you need to pass it as an environment variable when running the container.

  • Name: KNDC_API_KEY_FMP
  • Value: API key for the Financial Modeling Prep data provider, as a string.

Running the Container

  1. On the first run, the container will create the Config and Output subdirectories in the mounted volume, as well as the entry script __main__.py.
  2. Open the Config/parameters_datacurator.xlsx file in Excel, fill out the fields in all the sheets, save the file and close it.

Now that the configuration is set up, each time you run the container again, it will download the data for the tickers/identifiers as configured in the parameters file, and save it to the Output folder.

Customization

The __main__.py entry script is customizable, so you can implement your own data providers and configuration and output handlers, and inject them from there.

You can also create your own calculated feature functions by adding them to the Config/custom_calculations.py file, and adding their function name to the Columns sheet in the Config/parameters_datacurator.xlsx file. As long as the names start with the c_ prefix, the system will use them as any other feature.

Check the API Reference to learn how to easily implement your own calculated features.

The Road to v1.0

We believe in the need for a stable API, and have expended considerable effort into finalizing the API as much as possible before the first public release. We plan to avoid any changes that severely break backwards compatibility before version 1.0, with one major exception: The Data Blocks functionality.

Data Blocks will generalize the link between the data providers and the feature column prefixes, which will allow users to create their own data providers and feature columns for any type of data from any source without having to modify the core code of the Data Curator. This will open the door to calculated features that incorporate all kinds of data, like economic indicators, alternative data, financial indices and benchmarks, etc.

Once Data Blocks are implemented, we will rapidly make any necessary adjustments to the public API, and when we're happy with it, we will work on finalizing the version 1.0 release.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kaxanuk_data_curator-0.43.0.tar.gz (622.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kaxanuk_data_curator-0.43.0-py3-none-any.whl (110.7 kB view details)

Uploaded Python 3

File details

Details for the file kaxanuk_data_curator-0.43.0.tar.gz.

File metadata

  • Download URL: kaxanuk_data_curator-0.43.0.tar.gz
  • Upload date:
  • Size: 622.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for kaxanuk_data_curator-0.43.0.tar.gz
Algorithm Hash digest
SHA256 f903c4e9f180753bd206d6b0b50b0052a1a7c5b90d3f9442f5f496707d18ff16
MD5 6d8967934f803285e6004b14522eadf4
BLAKE2b-256 4e767a9021e4751a0ca2d09ff0d304e2960ad7172f6bb6c38043ac3477b4047c

See more details on using hashes here.

File details

Details for the file kaxanuk_data_curator-0.43.0-py3-none-any.whl.

File metadata

File hashes

Hashes for kaxanuk_data_curator-0.43.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0241b4ffd55749686c4fde1c92898deeb4a4aa9d29f6943d5f641d5587c795a8
MD5 921af69366c332796a36fae30fadc324
BLAKE2b-256 c21fad26d916132b04fa237ea4dab8a047dd434a563556a4e9e03e139c95e1b0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page