Skip to main content

llama-index readers sec_filings integration

Project description

SEC DATA DOWNLOADER

pip install llama-index-readers-sec-filings

Please checkout this repo that I am building on SEC Question Answering Agent SEC-QA

This repository downloads all the texts from SEC documents (10-K and 10-Q). Currently, it is not supporting documents that are amended, but that will be added in the near futures.

Install the required dependencies

python install -r requirements.txt

The SEC Downloader expects 5 attributes

  • tickers: It is a list of valid tickers
  • amount: Number of documents that you want to download
  • filing_type: 10-K or 10-Q filing type
  • num_workers: It is for multithreading and multiprocessing. We have multi-threading at the ticker level and multi-processing at the year level for a given ticker
  • include_amends: To include amendments or not.

Usage

from llama_index.readers.sec_filings import SECFilingsLoader

loader = SECFilingsLoader(tickers=["TSLA"], amount=3, filing_type="10-K")
loader.load_data()

It will download the data in the following directories and sub-directories

- AAPL
  - 2018
  - 10-K.json
  - 2019
  - 10-K.json
  - 2020
  - 10-K.json
  - 2021
  - 10-K.json
  - 10-Q_12.json
  - 2022
  - 10-K.json
  - 10-Q_03.json
  - 10-Q_06.json
  - 10-Q_12.json
  - 2023
  - 10-Q_04.json
- GOOGL
  - 2018
  - 10-K.json
  - 2019
  - 10-K.json
  - 2020
  - 10-K.json
  - 2021
  - 10-K.json
  - 10-Q_09.json
  - 2022
  - 10-K.json
  - 10-Q_03.json
  - 10-Q_06.json
  - 10-Q_09.json
  - 2023
  - 10-Q_03.json
- TSLA
  - 2018
  - 10-K.json
  - 2019
  - 10-K.json
  - 2020
  - 10-K.json
  - 2021
  - 10-K.json
  - 10-KA.json
  - 10-Q_09.json
  - 2022
  - 10-K.json
  - 10-Q_03.json
  - 10-Q_06.json
  - 10-Q_09.json
  - 2023
  - 10-Q_03.json

Here for each ticker we have separate folders with 10-K data inside respective years and 10-Q data is saved in the respective year along with the month. 10-Q_03.json means March data of 10-Q document. Also, the amended documents are stored in their respective year

EXAMPLES

This loader is can be used with both Langchain and LlamaIndex.

LlamaIndex

from llama_index.core import VectorStoreIndex, download_loader
from llama_index.core import SimpleDirectoryReader

from llama_index.readers.sec_filings import SECFilingsLoader

loader = SECFilingsLoader(tickers=["TSLA"], amount=3, filing_type="10-K")
loader.load_data()

documents = SimpleDirectoryReader("data\TSLA\2022").load_data()
index = VectorStoreIndex.from_documents(documents)
index.query("What are the risk factors of Tesla for the year 2022?")

Langchain

from langchain.llms import OpenAI
from langchain.chains import RetrievalQA
from langchain.document_loaders import DirectoryLoader
from langchain.indexes import VectorstoreIndexCreator

from llama_index.readers.sec_filings import SECFilingsLoader

loader = SECFilingsLoader(tickers=["TSLA"], amount=3, filing_type="10-K")
loader.load_data()

dir_loader = DirectoryLoader("data\TSLA\2022")

index = VectorstoreIndexCreator().from_loaders([dir_loader])
retriever = index.vectorstore.as_retriever()
qa = RetrievalQA.from_chain_type(
    llm=OpenAI(), chain_type="stuff", retriever=retriever
)

query = "What are the risk factors of Tesla for the year 2022?"
qa.run(query)

REFERENCES

  1. Unstructured SEC Filings API: repo link
  2. SEC Edgar Downloader: repo link

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_readers_sec_filings-0.2.0.tar.gz (22.4 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file llama_index_readers_sec_filings-0.2.0.tar.gz.

File metadata

File hashes

Hashes for llama_index_readers_sec_filings-0.2.0.tar.gz
Algorithm Hash digest
SHA256 6dc20cfa282cbaad876b8be265807465278fd8be3164f0ea0b3498bb487eb808
MD5 f9a1bb4ad7cd642ce2e3cc138869e814
BLAKE2b-256 2f1d4807dd7712bd745ca6fc1ee405a6e6f05a14c9013f001157523ea5611bba

See more details on using hashes here.

File details

Details for the file llama_index_readers_sec_filings-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_index_readers_sec_filings-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0f4c09084ae8e41fc15fcbe9f500ba4ded7f539ce2b4392dffe62f0cde7b8171
MD5 d8e821e60a3e4ecfa24784f34e1aae2f
BLAKE2b-256 f11ab1b5d1e169e7a03ab2340b65753339ae490e049af2d69471d13dbcd9ca53

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page