Skip to main content

Financial datasets for LLMs

Project description

Financial Datasets 🧪

Financial Datasets is an open-source Python library that lets you create question & answer financial datasets using Large Language Models (LLMs). With this library, you can easily generate realistic financial datasets from a 10-K, 10-Q, PDF, and other financial texts.

Twitter Follow

Usage

Example generated dataset:

[
  {
    "question": "What was Airbnb's revenue in 2023?",
    "answer": "$9.9 billion",
    "context": "In 2023, revenue increased by 18% to $9.9 billion compared to 2022, primarily due to a 14% increase in Nights and Experiences Booked of 54.5 million combined with higher average daily rates driving a 16% increase in Gross Booking Value of $10.0 billion."
  },
  {
    "question": "By what percentage did Airbnb's net income increase in 2023 compared to the prior year?",
    "answer": "153%",
    "context": "Net income in 2023 increased by 153% to $4.8 billion, compared to the prior year, driven by our revenue growth, increased interest income, discipline in managing our cost structure, and the release of a portion of our valuation allowance on deferred tax assets of $2.9 billion."
  }
]

Example #1 - generate from any text

Most flexible option. Generates dataset using a list of string texts. Colab code example here.

from financial_datasets.generator import DatasetGenerator

# Your list of texts
texts = ...

# Create dataset generator
generator = DatasetGenerator(model="gpt-4-turbo", api_key="your-openai-key")

# Generate dataset from texts
dataset = generator.generate_from_texts(
    texts=texts,
    max_questions=100,
)

Example #2 - generate from PDF

Generate a dataset using a PDF url only. Colab code example here.

from financial_datasets.generator import DatasetGenerator

# Create dataset generator
generator = DatasetGenerator(model="gpt-4-turbo", api_key="your-openai-key")

# Generate dataset from PDF url
dataset = generator.generate_from_pdf(
    url="https://www.berkshirehathaway.com/letters/2023ltr.pdf",
    max_questions=100,
)

Example #3 - generate from 10-K

Generate a dataset using a ticker and year. Colab code example here.

from financial_datasets.generator import DatasetGenerator

# Create dataset generator
generator = DatasetGenerator(model="gpt-4-turbo", api_key="your-openai-key")

# Generate dataset from 10-K
dataset = generator.generate_from_10K(
    ticker="AAPL",
    year=2023,
    max_questions=100,
    item_names=["Item 1", "Item 7"],  # optional - specify Item names to use
)

Installation

Using pip

You can install the Financial Datasets library using pip:

pip install financial-datasets

Using Poetry

If you prefer to use Poetry for dependency management, you can add Financial Datasets to your project:

poetry add financial-datasets

From the Repository

If you want to install the library directly from the repository, follow these steps:

  1. Clone the repository:

    git clone https://github.com/virattt/financial-datasets.git
    
  2. Navigate to the project directory:

    cd financial-datasets
    
  3. Install the dependencies using Poetry:

    poetry install
    
  4. You can now use the library in your Python projects.

Contributing

Contributions are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.

License

This project is licensed under the MIT License.

Contributors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

financial_datasets-0.1.18.tar.gz (10.0 kB view details)

Uploaded Source

Built Distribution

financial_datasets-0.1.18-py3-none-any.whl (11.1 kB view details)

Uploaded Python 3

File details

Details for the file financial_datasets-0.1.18.tar.gz.

File metadata

  • Download URL: financial_datasets-0.1.18.tar.gz
  • Upload date:
  • Size: 10.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.0

File hashes

Hashes for financial_datasets-0.1.18.tar.gz
Algorithm Hash digest
SHA256 71bb9ee622681aa0f1dec90d25fa9fad3f87986b698108e3c094130858991cc7
MD5 6c8790b8455c2ccf9eae7bbbdbd62a6a
BLAKE2b-256 b6ead5c6d5994ac3ec61155a5e549a2c393b3722cdaa14480f67417189b165c6

See more details on using hashes here.

File details

Details for the file financial_datasets-0.1.18-py3-none-any.whl.

File metadata

File hashes

Hashes for financial_datasets-0.1.18-py3-none-any.whl
Algorithm Hash digest
SHA256 e1cbccad71a50ff2073d617c1e749aa061fe1949caf80240599f6ae1264f457f
MD5 83a166d9094a0990f76917420ae26979
BLAKE2b-256 829cb29fe7241d1cae2659b0a9314f6aef1846b75da8dbf9b073e665d7e5393c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page