Skip to main content

llama-index readers sec_filings integration

Project description

SEC DATA DOWNLOADER

pip install llama-index-readers-sec-filings

Please checkout this repo that I am building on SEC Question Answering Agent SEC-QA

This repository downloads all the texts from SEC documents (10-K and 10-Q). Currently, it is not supporting documents that are amended, but that will be added in the near futures.

Install the required dependencies

python install -r requirements.txt

The SEC Downloader expects 5 attributes

  • tickers: It is a list of valid tickers
  • amount: Number of documents that you want to download
  • filing_type: 10-K or 10-Q filing type
  • num_workers: It is for multithreading and multiprocessing. We have multi-threading at the ticker level and multi-processing at the year level for a given ticker
  • include_amends: To include amendments or not.

Usage

from llama_index.readers.sec_filings import SECFilingsLoader

loader = SECFilingsLoader(tickers=["TSLA"], amount=3, filing_type="10-K")
loader.load_data()

It will download the data in the following directories and sub-directories

- AAPL
  - 2018
  - 10-K.json
  - 2019
  - 10-K.json
  - 2020
  - 10-K.json
  - 2021
  - 10-K.json
  - 10-Q_12.json
  - 2022
  - 10-K.json
  - 10-Q_03.json
  - 10-Q_06.json
  - 10-Q_12.json
  - 2023
  - 10-Q_04.json
- GOOGL
  - 2018
  - 10-K.json
  - 2019
  - 10-K.json
  - 2020
  - 10-K.json
  - 2021
  - 10-K.json
  - 10-Q_09.json
  - 2022
  - 10-K.json
  - 10-Q_03.json
  - 10-Q_06.json
  - 10-Q_09.json
  - 2023
  - 10-Q_03.json
- TSLA
  - 2018
  - 10-K.json
  - 2019
  - 10-K.json
  - 2020
  - 10-K.json
  - 2021
  - 10-K.json
  - 10-KA.json
  - 10-Q_09.json
  - 2022
  - 10-K.json
  - 10-Q_03.json
  - 10-Q_06.json
  - 10-Q_09.json
  - 2023
  - 10-Q_03.json

Here for each ticker we have separate folders with 10-K data inside respective years and 10-Q data is saved in the respective year along with the month. 10-Q_03.json means March data of 10-Q document. Also, the amended documents are stored in their respective year

EXAMPLES

This loader is can be used with both Langchain and LlamaIndex.

LlamaIndex

from llama_index.core import VectorStoreIndex, download_loader
from llama_index.core import SimpleDirectoryReader

from llama_index.readers.sec_filings import SECFilingsLoader

loader = SECFilingsLoader(tickers=["TSLA"], amount=3, filing_type="10-K")
loader.load_data()

documents = SimpleDirectoryReader("data\TSLA\2022").load_data()
index = VectorStoreIndex.from_documents(documents)
index.query("What are the risk factors of Tesla for the year 2022?")

Langchain

from langchain.llms import OpenAI
from langchain.chains import RetrievalQA
from langchain.document_loaders import DirectoryLoader
from langchain.indexes import VectorstoreIndexCreator

from llama_index.readers.sec_filings import SECFilingsLoader

loader = SECFilingsLoader(tickers=["TSLA"], amount=3, filing_type="10-K")
loader.load_data()

dir_loader = DirectoryLoader("data\TSLA\2022")

index = VectorstoreIndexCreator().from_loaders([dir_loader])
retriever = index.vectorstore.as_retriever()
qa = RetrievalQA.from_chain_type(
    llm=OpenAI(), chain_type="stuff", retriever=retriever
)

query = "What are the risk factors of Tesla for the year 2022?"
qa.run(query)

REFERENCES

  1. Unstructured SEC Filings API: repo link
  2. SEC Edgar Downloader: repo link

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_readers_sec_filings-0.4.1.tar.gz (22.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file llama_index_readers_sec_filings-0.4.1.tar.gz.

File metadata

File hashes

Hashes for llama_index_readers_sec_filings-0.4.1.tar.gz
Algorithm Hash digest
SHA256 0fa9bad02f9defe66324e5a83663a8a3a10179df6374ac2c4709f36d62ace9bf
MD5 b2aab0216e6be817df86d670f439117b
BLAKE2b-256 12e47cdeea83d9b592692c05f7845cc54a6f1c266ee282872553451c57668f00

See more details on using hashes here.

File details

Details for the file llama_index_readers_sec_filings-0.4.1-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_index_readers_sec_filings-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2933ad5cb033c072304b0d112a381fc76f557c3ed343a7ce7a46c7cefa46e02e
MD5 fc5509a327b779c2a849c728ea7af516
BLAKE2b-256 895c8892aa403e1213d1348f44765cf2655746be434bb49a55f909f148986308

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page