llama-index readers sec_filings integration
Project description
SEC DATA DOWNLOADER
pip install llama-index-readers-sec-filings
Please checkout this repo that I am building on SEC Question Answering Agent SEC-QA
This repository downloads all the texts from SEC documents (10-K and 10-Q). Currently, it is not supporting documents that are amended, but that will be added in the near futures.
Install the required dependencies
python install -r requirements.txt
The SEC Downloader expects 5 attributes
- tickers: It is a list of valid tickers
- amount: Number of documents that you want to download
- filing_type: 10-K or 10-Q filing type
- num_workers: It is for multithreading and multiprocessing. We have multi-threading at the ticker level and multi-processing at the year level for a given ticker
- include_amends: To include amendments or not.
Usage
from llama_index.readers.sec_filings import SECFilingsLoader
loader = SECFilingsLoader(tickers=["TSLA"], amount=3, filing_type="10-K")
loader.load_data()
It will download the data in the following directories and sub-directories
- AAPL
- 2018
- 10-K.json
- 2019
- 10-K.json
- 2020
- 10-K.json
- 2021
- 10-K.json
- 10-Q_12.json
- 2022
- 10-K.json
- 10-Q_03.json
- 10-Q_06.json
- 10-Q_12.json
- 2023
- 10-Q_04.json
- GOOGL
- 2018
- 10-K.json
- 2019
- 10-K.json
- 2020
- 10-K.json
- 2021
- 10-K.json
- 10-Q_09.json
- 2022
- 10-K.json
- 10-Q_03.json
- 10-Q_06.json
- 10-Q_09.json
- 2023
- 10-Q_03.json
- TSLA
- 2018
- 10-K.json
- 2019
- 10-K.json
- 2020
- 10-K.json
- 2021
- 10-K.json
- 10-KA.json
- 10-Q_09.json
- 2022
- 10-K.json
- 10-Q_03.json
- 10-Q_06.json
- 10-Q_09.json
- 2023
- 10-Q_03.json
Here for each ticker we have separate folders with 10-K data inside respective years and 10-Q data is saved in the respective year along with the month. 10-Q_03.json
means March data of 10-Q document. Also, the amended documents are stored in their respective year
EXAMPLES
This loader is can be used with both Langchain and LlamaIndex.
LlamaIndex
from llama_index.core import VectorStoreIndex, download_loader
from llama_index.core import SimpleDirectoryReader
from llama_index.readers.sec_filings import SECFilingsLoader
loader = SECFilingsLoader(tickers=["TSLA"], amount=3, filing_type="10-K")
loader.load_data()
documents = SimpleDirectoryReader("data\TSLA\2022").load_data()
index = VectorStoreIndex.from_documents(documents)
index.query("What are the risk factors of Tesla for the year 2022?")
Langchain
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA
from langchain.document_loaders import DirectoryLoader
from langchain.indexes import VectorstoreIndexCreator
from llama_index.readers.sec_filings import SECFilingsLoader
loader = SECFilingsLoader(tickers=["TSLA"], amount=3, filing_type="10-K")
loader.load_data()
dir_loader = DirectoryLoader("data\TSLA\2022")
index = VectorstoreIndexCreator().from_loaders([dir_loader])
retriever = index.vectorstore.as_retriever()
qa = RetrievalQA.from_chain_type(
llm=OpenAI(), chain_type="stuff", retriever=retriever
)
query = "What are the risk factors of Tesla for the year 2022?"
qa.run(query)
REFERENCES
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file llama_index_readers_sec_filings-0.3.0.tar.gz
.
File metadata
- Download URL: llama_index_readers_sec_filings-0.3.0.tar.gz
- Upload date:
- Size: 22.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.11.10 Darwin/22.3.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cc162a48a27e4fe39eff08fc58e5cb6f31d18b111bce509c32bc0eda010bf24e |
|
MD5 | 07fc3dbeba572f1c3ddca23d79966612 |
|
BLAKE2b-256 | 359af49604c602bdb1db29dc3392f9b459340247d0b3130df26719c20c88db8c |
File details
Details for the file llama_index_readers_sec_filings-0.3.0-py3-none-any.whl
.
File metadata
- Download URL: llama_index_readers_sec_filings-0.3.0-py3-none-any.whl
- Upload date:
- Size: 25.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.11.10 Darwin/22.3.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 09193e8b3de2a92e66de93efc8655b6fe75cdc95fe27d545dbbbb1f7a87b9e1e |
|
MD5 | 3f08c054d16c504281a5da050fb52ba7 |
|
BLAKE2b-256 | a423d7342c425e0ac316f5645741f23215a69a25529702559beafd568e9a09df |