Tool for retrieving and combining financial and related data for informing security investments
Project description
KaxaNuk Data Curator
Component library for downloading, validating, homogenizing, and combining financial stocks' data from different data providers. Can be run in standalone mode, configurable in Excel, or as a component of a larger Python-based system.
Features:
- Configurable from an Excel file, or directly in a Python script. Docker image also available.
- Fully readable and specific tag names, homogenized between data providers, based on the US GAAP taxonomy. Switch between data providers without changing your code.
- Automatically validates market and fundamental data, discarding datasets that make no sense (like high price below low, etc.) or can't guarantee point-in-time validity (like amended statements).
- Easily create your own calculated feature functions without need for Numpy or Pandas (though you can also use those if you want to).
- Output to CSV or Parquet files, or to in-memory Pandas Dataframes for further processing.
- Completely extensible architecture: implement your own data providers, feature combinations, and output handlers on top of clear, stable interfaces.
- Readable, well-documented, and tested code.
Documentation
Full documentation is available at kaxanuk-data-curator.readthedocs.io.
Requirements
The system can run either on your local Python (versions 3.12 or 3.13) or on Docker.
Supported Data Providers
- Financial Modeling Prep (free and discounted plans available through our referral link)
- LSEG Workspace (API key required)
- Yahoo Finance (requires installing a separate extension package, and doesn't support most data types)
Running on Local Python
Installation
-
Make sure you're running the required version of Python, preferably in its own virtual environment.
-
Open a terminal and run:
pip install --upgrade pip pip install kaxanuk.data_curator -
If you want to use the Yahoo Finance data provider, install the extension package:
pip install kaxanuk.data_curator_extensions.yahoo_finance
Configuration
- Open a terminal in any directory and run the following command:
This should create 2 subdirectories,kaxanuk.data_curator init excelConfigandOutput, as well as the entry script__main__.pyin the current directory. - Open the
Config/parameters_datacurator.xlsxfile in Excel, fill out the fields in all the sheets, save the file and close it. - If your data provider requires an API key, open the
Config/.envfile in a text editor, and paste the key after the=sign of the provider's correspondingAPI_KEYvariable. Don't add any quotes or spaces before or after the key.
*If on MacOS, the .env file will be hidden in Finder by default. Just use the keys Command + Shift + . to toggle
the visibility of hidden files.
Usage
Now you can run the entry script with either:
kaxanuk.data_curator run
or by executing the __main__.py script directly with Python:
python __main__.py
The system will download the data for the tickers configured in the file, and save the data to the Output folder.
Running on Docker
Pull the Docker image:
docker pull ghcr.io/kaxanuk/data-curator:latest
Docker Configuration
Volumes
You need to mount the following volume to the container:
- Path on the host: (select the directory on your PC where you want the Data Curator configuration and output files to be created)
- Path inside the container:
/app
Environment Variables
If your data provider requires an API key, you need to pass it as an environment variable when running the container.
- Name:
KNDC_API_KEY_FMP - Value: API key for the Financial Modeling Prep data provider, as a string.
Running the Container
- On the first run, the container will create the
ConfigandOutputsubdirectories in the mounted volume, as well as the entry script__main__.py. - Open the
Config/parameters_datacurator.xlsxfile in Excel, fill out the fields in all the sheets, save the file and close it.
Now that the configuration is set up, each time you run the container again, it will download the data for the tickers/identifiers
as configured in the parameters file, and save it to the Output folder.
Customization
The __main__.py entry script is customizable, so you can implement your own data providers and configuration and output
handlers, and inject them from there.
You can also create your own calculated feature functions by adding them to the Config/custom_calculations.py file,
and adding their function name to the Columns sheet in the Config/parameters_datacurator.xlsx file.
As long as the names start with the c_ prefix, the system will use them as any other feature.
Check the API Reference to learn how to easily implement your own calculated features.
The Road to v1.0
We believe in the need for a stable API, and have expended considerable effort into finalizing the API as much as possible before the first public release. We plan to avoid any changes that severely break backwards compatibility before version 1.0, with one major exception: The Data Blocks functionality.
Data Blocks will generalize the link between the data providers and the feature column prefixes, which will allow users to create their own data providers and feature columns for any type of data from any source without having to modify the core code of the Data Curator. This will open the door to calculated features that incorporate all kinds of data, like economic indicators, alternative data, financial indices and benchmarks, etc.
Once Data Blocks are implemented, we will rapidly make any necessary adjustments to the public API, and when we're happy with it, we will work on finalizing the version 1.0 release.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kaxanuk_data_curator-0.46.1.tar.gz.
File metadata
- Download URL: kaxanuk_data_curator-0.46.1.tar.gz
- Upload date:
- Size: 834.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
21da8fc8623dbfe0291dc9c7b4fb2945459087f5700083009cce0bd7fb94d845
|
|
| MD5 |
4b914dbb051048184cb954d3bfd22c0e
|
|
| BLAKE2b-256 |
02c489dd7f34b750c9e4b4ba63480e514a892ba95d9b381aff17fca8197ab5c9
|
File details
Details for the file kaxanuk_data_curator-0.46.1-py3-none-any.whl.
File metadata
- Download URL: kaxanuk_data_curator-0.46.1-py3-none-any.whl
- Upload date:
- Size: 167.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b10e3138ad504fe1ea493435bad2b7c732bb521805a76d4c508c6394a15c061
|
|
| MD5 |
273881902b12522f1cd837e1a04a31e1
|
|
| BLAKE2b-256 |
b94898496994c2279120f3ff0d8d280ac5ed3d2ef02f247674505b8dce2b6869
|