Making it easier to use SEC filings.
Project description
datamule
A Python package to work with SEC filings at scale. Also includes Mulebot, an open-source chatbot for SEC data that does not require storage. Integrated with datamule's APIs and datasets.
Articles: How to deploy a financial chatbot to the internet in 5 minutes
Features
- Monitor EDGAR for new filings
- Parse textual filings into simplified HTML, interactive HTML, or structured JSON
- Download SEC filings quickly and easily
- Access datasets such as every 10-K, SIC codes, etc.
- Interact with SEC data using MuleBot
Table of Contents
- Installation
- Quick Start
- Usage
- Examples
- Known Issues
- Roadmap
- Contributing
- License
- Change Log
- Other Useful SEC Packages
Installation
Basic installation:
pip install datamule
Installation with additional features:
pip install datamule[filing_viewer] # Install with filing viewer module
pip install datamule[mulebot] # Install with MuleBot
pip install datamule[all] # Install all extras
Available extras:
filing_viewer: Includes dependencies for the filing viewer modulemulebot: Includes MuleBot for interacting with SEC datamulebot_server: Includes Flask server for running MuleBotall: Installs all available extras
Quick Start
import datamule as dm
downloader = dm.Downloader()
downloader.download(form='10-K', ticker='AAPL')
Package Data CSVs
- company_former_names.csv - former names of companies
- company_metadata.csv - metadata including sic classification
- company_tickers.csv - cik, ticker, name
- sec-glossary.csv - form and description
- xbrl_descriptions.csv - category fact description
Updating Package Data
downloader.update_company_tickers()
downloader.update_metadata()
Usage
Downloader
downloader = dm.Downloader()
Downloading Filings
Uses the EFTS API to retrieve filings locations, and the SEC API to download filings.
download(self, output_dir='filings', return_urls=False, cik=None, ticker=None, form=None, date=None, sics=None, items=None, file_types=None)
# Download all 10-K filings for Tesla using CIK
downloader.download(form='10-K', cik='1318605', output_dir='filings')
# Download 10-K filings for multiple companies using tickers
downloader.download(form='10-K', ticker=['TSLA', 'META'], output_dir='filings')
# Download every form 3 for a specific date
downloader.download(form='3', date='2024-05-21', output_dir='filings')
# Download filing attachments such as information tables
downloader.download(form='13F-HR',file_types=['INFORMATION TABLE'],date=('2024-09-14','2024-09-16'))
# Download based on items
downloader.download(form='8-K',items=['8.01'])
View the SEC Filing Glossary here or download the json file here.
Downloading Company Concepts XBRL
Uses the Company Concepts API to retrieve XBRL.
download_company_concepts(self, output_dir = 'company_concepts',cik=None, ticker=None)
View the XBRL Fact Glossary here or as a csv file here.
Changing Rate Limits
The SEC.gov officially supports 10 requests / second. In practice this is not the case. After heavy experimentation the downloader's default rate limit for sec.gov has been set to 7 requests / second. If you intend to download less than 1,000 filings at a time, setting the rate limit to 10 should be fine. If you need to download more than 10,000 filings, setting the rate limit to 5 will likely avoid rate limiting. Also, downloading at off-peak times will likely let you set higher rate-limits. Experiment Details
downloader.set_limiter('www.sec.gov', 10)
Datasets
Available datasets:
- Every FTD since 2004.
ftd(1.3gb, ~60s to download) - Every 10-Q since 2001. (500mb-3gb per year, ~5 minutes to download)
- Every 10-K from 2001 to September 2024.
10k_{year}e.g.10k_2002. - Every 13F-HR Information Table since 2013. Up to the current date.
downloader.download_dataset(dataset='ftd')
downloader.download_dataset(dataset='10q_2023')
downloader.download_dataset(dataset='13f_information_table')
Note: Bulk datasets may become out of data. If this is the case use download_dataset() + download() to fill the gaps. Note: 13f_information_table will always be up to date as it automatically implements this.
Monitoring for New Filings
Monitor for new filings using form, cik, ticker, and passing in call back functions.
downloader.watch(self, interval=1, silent=True, form=None, cik=None, ticker=None, callback=None)
Parsing
Parse SEC XBRL
Parses XBRL in JSON format to tables. SEC XBRL. See Parse every SEC XBRL to csv in ten minutes
from datamule import parse_company_concepts
table_dict_list = parse_company_concepts(company_concepts) # Returns a list of tables with labels
Parse Textual Filings into structured data
Parse textual filings into different formats. Uses datamule parser endpoint. If it is too slow for your use-case let me know. A faster endpoint is coming soon.
# Simplified HTML
simplified_html = dm.parse_textual_filing(url='https://www.sec.gov/Archives/edgar/data/1318605/000095017022000796/tsla-20211231.htm', return_type='simplify')
# Interactive HTML
interactive_html = dm.parse_textual_filing(url='https://www.sec.gov/Archives/edgar/data/1318605/000095017022000796/tsla-20211231.htm', return_type='interactive')
# JSON
json_data = dm.parse_textual_filing(url='https://www.sec.gov/Archives/edgar/data/1318605/000095017022000796/tsla-20211231.htm', return_type='json')
Filing Viewer
Convert parsed filing JSON into HTML with features like a table of contents sidebar:
from datamule import parse_textual_filing
from datamule.filing_viewer import create_interactive_filing
data = parse_textual_filing(url='https://www.sec.gov/Archives/edgar/data/1318605/000095017022000796/tsla-20211231.htm', return_type='json')
create_interactive_filing(data)
Try out the Filings Viewer here. Note: This is an older version with bugs, that will be updated with the next release of the Parser API.
Mulebot
Interact with SEC data using MuleBot. Mulebot uses tool calling to interface with SEC and datamule endpoints.
from datamule.mulebot import MuleBot
mulebot = MuleBot(openai_api_key)
mulebot.run()
To use Mulebot you will need an OpenAI API Key.
Mulebot Server
Mulebot server is a customizable front-end for Mulebot. Example
Artifacts:
- Filing Viewer
- Company Facts Viewer
- List Viewer
Quickstart
from datamule.mulebot.mulebot_server import MuleBotServer
def main():
server = MuleBotServer()
# Your OpenAI API key
api_key = "sk-<YOUR_API_KEY>"
server.set_api_key(api_key)
# Run the server
print("Starting MuleBotServer...")
server.run(debug=True, host='0.0.0.0', port=5000)
if __name__ == "__main__":
main()
Known Issues
-
Some SEC files are malformed, which can cause parsing errors. For example, this Tesla Form D HTML from 2009 is missing a closing
</meta>tag.Workaround:
from lxml import etree with open('filings/000131860509000005primary_doc.xml', 'r', encoding='utf-8') as file: html = etree.parse(file, etree.HTMLParser())
Roadmap
- add documentation for filing and parser modules
- add current names to former names
- Need to make conductor more robust. We have new options now including desc / asc
- add facet filters for forms etc
- sec search engine
- mulebot add method to use custom html templates
- mulebot - look at adding summarization. Add some protections to too many tokens being used + add options to allow summarization etc.
- Paths may be messed up on non windows devices. Need to verify.
- Analytics?
- downloader succesful downloads message may be slightly off.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT LICENSE.
Change Log
Other Useful SEC Packages
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datamule-0.351.tar.gz.
File metadata
- Download URL: datamule-0.351.tar.gz
- Upload date:
- Size: 1.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
597364564950a048f896dd5fe9ef3e9a54a5807785b3dd54776f547d17fd8bf8
|
|
| MD5 |
c6c3d6b511f85444e74b4e02030177d2
|
|
| BLAKE2b-256 |
4621c2bed21ae86c2ee6bb0946562fdd654f5041f5856ee43ffb6ef5b857bb22
|
File details
Details for the file datamule-0.351-py3-none-any.whl.
File metadata
- Download URL: datamule-0.351-py3-none-any.whl
- Upload date:
- Size: 1.6 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ec8063e32f6096f93af3ed963d73e9ae157490cd334d07b479b7802694cb29e4
|
|
| MD5 |
75fab217b01dbc5ce7ac69d9d4513372
|
|
| BLAKE2b-256 |
61dbbb643d0021db8870825e6ba0593d7ee63646e4d5130222b125a8660ee465
|