Skip to main content

Tool for retrieving and combining financial and related data for informing security investments

Project description

KaxaNuk Data Curator

Python Build Status

Tool for building a structured database for market, fundamental and alternative data obtained from different financial data provider web services.

Allows for easy creation of additional calculated feature functions.

Requirements

The system can run either on your local Python, or on Docker.

Requirements for Running on Local Python

  • Python 3.12 or 3.13

Supported Data Providers

  • Financial Modeling Prep
  • Yahoo Finance (requires installing a separate extension package, and doesn't support most data types)

Running on Python

Installation

  1. Make sure you're running the required version of Python, preferably in its own virtual environment.

  2. Open a terminal and run:

    pip install --upgrade pip
    pip install kaxanuk.data_curator
    
  3. If you want to use the Yahoo Finance data provider, install the extension package:

    pip install kaxanuk.data_curator_extensions.yahoo_finance
    

Configuration

  1. Open a terminal in any directory and run the following command:
    kaxanuk.data_curator init excel
    
    This should create 2 subdirectories, Config and Output, as well as the entry script __main__.py in the current directory.
  2. Open the Config/parameters_datacurator.xlsx file in Excel, fill out the fields in all the sheets, save the file and close it.
  3. If your data provider requires an API key, open the Config/.env file in a text editor, and paste the key after the = sign of the provider's corresponding API_KEY variable. Don't add any quotes or spaces before or after the key.

*If on MacOS, the .env file will be hidden in Finder by default. Just use the keys Command + Shift + . to toggle the visibility of hidden files.

Usage

Now you can run the entry script with either:

kaxanuk.data_curator run

or by executing the __main__.py script directly with Python:

python __main__.py

The system will download the data for the tickers configured in the file, and save the data to the Output folder.

Running on Docker

Pull the Docker image:

docker pull ghcr.io/kaxanuk/data-curator:latest

Docker Configuration

Volumes

You need to mount the following volume to the container:

  • Path on the host: (select the directory on your PC where you want the Data Curator configuration and output files to be created)
  • Path inside the container: /app

Environment Variables

If your data provider requires an API key, you need to pass it as an environment variable when running the container.

  • Name: KNDC_API_KEY_FMP
  • Value: API key for the Financial Modeling Prep data provider, as a string.

Running the Container

  1. On the first run, the container will create the Config and Output subdirectories in the mounted volume, as well as the entry script __main__.py.
  2. Open the Config/parameters_datacurator.xlsx file in Excel, fill out the fields in all the sheets, save the file and close it.

Now that the configuration is set up, each time you run the container again, it will download the data for the tickers/identifiers as configured in the parameters file, and save it to the Output folder.

Customization

The __main__.py entry script is customizable, so you can implement your own data providers and configuration and output handlers, and inject them from there.

You can also create your own calculated feature functions by adding them to the Config/custom_calculations.py file, and adding the function's name (which start with the c_ prefix) to the Columns sheet in the Config/parameters_datacurator.xlsx file.

Check the API Reference to learn how to easily implement your own calculated features.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kaxanuk_data_curator-0.42.0.tar.gz (620.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kaxanuk_data_curator-0.42.0-py3-none-any.whl (109.5 kB view details)

Uploaded Python 3

File details

Details for the file kaxanuk_data_curator-0.42.0.tar.gz.

File metadata

  • Download URL: kaxanuk_data_curator-0.42.0.tar.gz
  • Upload date:
  • Size: 620.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for kaxanuk_data_curator-0.42.0.tar.gz
Algorithm Hash digest
SHA256 7878af296e9c81b91ad9d0c14144e03e9a4844b4d298821521a29df4b3048ec6
MD5 0d65b12034237d95718068739c7b1a63
BLAKE2b-256 6de89f98319f396de81fcb04f0db4050bf6c6244c44c3a5c3030065c842dfefe

See more details on using hashes here.

File details

Details for the file kaxanuk_data_curator-0.42.0-py3-none-any.whl.

File metadata

File hashes

Hashes for kaxanuk_data_curator-0.42.0-py3-none-any.whl
Algorithm Hash digest
SHA256 483991ce8b634844f3876a941376fc1b0c3fd179db952a4442fdcd426068084f
MD5 dcdb31587df21e631d55ee37f5e6d866
BLAKE2b-256 a24ea634ebb435f7305207d83cdb653b69ddce3814a2a329431624524088712d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page