Skip to main content

llama-index readers sec_filings integration

Project description

SEC DATA DOWNLOADER

pip install llama-index-readers-sec-filings

Please checkout this repo that I am building on SEC Question Answering Agent SEC-QA

This repository downloads all the texts from SEC documents (10-K and 10-Q). Currently, it is not supporting documents that are amended, but that will be added in the near futures.

Install the required dependencies

python install -r requirements.txt

The SEC Downloader expects 5 attributes

  • tickers: It is a list of valid tickers
  • amount: Number of documents that you want to download
  • filing_type: 10-K or 10-Q filing type
  • num_workers: It is for multithreading and multiprocessing. We have multi-threading at the ticker level and multi-processing at the year level for a given ticker
  • include_amends: To include amendments or not.

Usage

from llama_index.readers.sec_filings import SECFilingsLoader

loader = SECFilingsLoader(tickers=["TSLA"], amount=3, filing_type="10-K")
loader.load_data()

It will download the data in the following directories and sub-directories

- AAPL
  - 2018
  - 10-K.json
  - 2019
  - 10-K.json
  - 2020
  - 10-K.json
  - 2021
  - 10-K.json
  - 10-Q_12.json
  - 2022
  - 10-K.json
  - 10-Q_03.json
  - 10-Q_06.json
  - 10-Q_12.json
  - 2023
  - 10-Q_04.json
- GOOGL
  - 2018
  - 10-K.json
  - 2019
  - 10-K.json
  - 2020
  - 10-K.json
  - 2021
  - 10-K.json
  - 10-Q_09.json
  - 2022
  - 10-K.json
  - 10-Q_03.json
  - 10-Q_06.json
  - 10-Q_09.json
  - 2023
  - 10-Q_03.json
- TSLA
  - 2018
  - 10-K.json
  - 2019
  - 10-K.json
  - 2020
  - 10-K.json
  - 2021
  - 10-K.json
  - 10-KA.json
  - 10-Q_09.json
  - 2022
  - 10-K.json
  - 10-Q_03.json
  - 10-Q_06.json
  - 10-Q_09.json
  - 2023
  - 10-Q_03.json

Here for each ticker we have separate folders with 10-K data inside respective years and 10-Q data is saved in the respective year along with the month. 10-Q_03.json means March data of 10-Q document. Also, the amended documents are stored in their respective year

EXAMPLES

This loader is can be used with both Langchain and LlamaIndex.

LlamaIndex

from llama_index.core import VectorStoreIndex, download_loader
from llama_index.core import SimpleDirectoryReader

from llama_index.readers.sec_filings import SECFilingsLoader

loader = SECFilingsLoader(tickers=["TSLA"], amount=3, filing_type="10-K")
loader.load_data()

documents = SimpleDirectoryReader("data\TSLA\2022").load_data()
index = VectorStoreIndex.from_documents(documents)
index.query("What are the risk factors of Tesla for the year 2022?")

Langchain

from langchain.llms import OpenAI
from langchain.chains import RetrievalQA
from langchain.document_loaders import DirectoryLoader
from langchain.indexes import VectorstoreIndexCreator

from llama_index.readers.sec_filings import SECFilingsLoader

loader = SECFilingsLoader(tickers=["TSLA"], amount=3, filing_type="10-K")
loader.load_data()

dir_loader = DirectoryLoader("data\TSLA\2022")

index = VectorstoreIndexCreator().from_loaders([dir_loader])
retriever = index.vectorstore.as_retriever()
qa = RetrievalQA.from_chain_type(
    llm=OpenAI(), chain_type="stuff", retriever=retriever
)

query = "What are the risk factors of Tesla for the year 2022?"
qa.run(query)

REFERENCES

  1. Unstructured SEC Filings API: repo link
  2. SEC Edgar Downloader: repo link

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_readers_sec_filings-0.5.0.tar.gz (22.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llama_index_readers_sec_filings-0.5.0-py3-none-any.whl (26.1 kB view details)

Uploaded Python 3

File details

Details for the file llama_index_readers_sec_filings-0.5.0.tar.gz.

File metadata

  • Download URL: llama_index_readers_sec_filings-0.5.0.tar.gz
  • Upload date:
  • Size: 22.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llama_index_readers_sec_filings-0.5.0.tar.gz
Algorithm Hash digest
SHA256 b088d8f55afa0d745bd2db4feff1e7708c8389e69489575f5d66f6514c200c4c
MD5 1018ed1c151dc14e613c4aecdfc8adaf
BLAKE2b-256 bacd2dc814340c9b9d33de0b3aa873060475884706d9579dcd802005e213cd4c

See more details on using hashes here.

File details

Details for the file llama_index_readers_sec_filings-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: llama_index_readers_sec_filings-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 26.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llama_index_readers_sec_filings-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b67ec56995a50dfb34e670bc608cfd03990be34977d5ef3f7e05b2559d2bc98a
MD5 dbf3eb7b70dde8c82e78d6b8af50659d
BLAKE2b-256 164c1b28d7632895a7d8d53841382ce0ab5aea6c85b7172d14c9dd737bed9a70

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page