Making it easier to use SEC filings.
Project description
datamule
A Python package to simplify working with SEC filings. Also includes Mulebot, an open-source chatbot for SEC data that does not require storage. Integrated with datamule's APIs and datasets.
Features
- Monitor EDGAR for new filings
- Parse textual filings into simplified HTML, interactive HTML, or structured JSON
- Download SEC filings quickly and easily
- Access datasets such as every MD&A from 2024 or every 2024 10-K converted to structured JSON
- Interact with SEC data using MuleBot (coming soon)
Table of Contents
- Features
- Installation
- Quick Start
- Usage
- Datasets
- Examples
- Known Issues
- Roadmap
- Contributing
- License
- Change Log
- Other Useful SEC Packages
Installation
Basic installation:
pip install datamule
Installation with additional features:
pip install datamule[filing_viewer] # Install with filing viewer module
pip install datamule[mulebot] # Install with MuleBot (coming soon)
pip install datamule[all] # Install all extras
Available extras:
filing_viewer: Includes dependencies for the filing viewer modulemulebot: Includes MuleBot for interacting with SEC data (coming soon)mulebot_server: Includes Flask server for running MuleBot (coming soon)all: Installs all available extras
Quick Start
import datamule as dm
downloader = dm.Downloader()
downloader.download(form='10-K', ticker='AAPL')
Usage
Downloader
The Downloader class uses the EFTS API to retrieve filings.
Downloading Filings
downloader = dm.Downloader()
# Download all 10-K filings for Tesla using CIK
downloader.download(form='10-K', cik='1318605', output_dir='filings')
# Download 10-K filings for multiple companies using tickers
downloader.download(form='10-K', ticker=['TSLA', 'META'], output_dir='filings')
# Download every form 3 for a specific date
downloader.download(form='3', date='2024-05-21', output_dir='filings')
Monitoring for New Filings
print("Monitoring SEC EDGAR for changes...")
changed_bool = downloader.watch(1, silent=False, cik=['0001267602', '0001318605'], form=['3', 'S-8 POS'])
if changed_bool:
print("New filing detected!")
Parsing
Parse textual filings into different formats. Uses datamule parser endpoint. If it is too slow for your use-case let me know. A faster endpoint is coming soon.
# Simplified HTML
simplified_html = dm.parse_textual_filing(url='https://www.sec.gov/Archives/edgar/data/1318605/000095017022000796/tsla-20211231.htm', return_type='simplify')
# Interactive HTML
interactive_html = dm.parse_textual_filing(url='https://www.sec.gov/Archives/edgar/data/1318605/000095017022000796/tsla-20211231.htm', return_type='interactive')
# JSON
json_data = dm.parse_textual_filing(url='https://www.sec.gov/Archives/edgar/data/1318605/000095017022000796/tsla-20211231.htm', return_type='json')
Filing Viewer
Convert parsed filing JSON into HTML with features like a table of contents sidebar:
from datamule import parse_textual_filing
from datamule.filing_viewer import create_interactive_html
data = parse_textual_filing(url='https://www.sec.gov/Archives/edgar/data/1318605/000095017022000796/tsla-20211231.htm', return_type='json')
create_interactive_html(data, 'output_path.html')
Mulebot
Interact with SEC data using MuleBot (coming soon). Mulebot uses tool calling to interface with SEC and datamule endpoints.
Features (coming soon):
- Interface with XBRL company facts
- Interface with 10-ks, etc and summarize sections.
To use Mulebot you will need an OpenAI API Key.
Mulebot Server
Mulebot server is a customizable front-end for Mulebot.
Features (coming soon):
- Display XBRL tables with download / copy button.
- Download XBRL tables for a specific company in ZIP.
- Display sections of filings, like MD&A with links to filing viewer and original.
Quickstart
from datamule.mulebot.mulebot_server import server
def main():
# Your OpenAI API key
api_key = "sk-<YOUR_API_KEY>"
server.set_api_key(api_key)
# Run the server
print("Starting MuleBotServer...")
server.run(debug=True, host='0.0.0.0', port=5000)
if __name__ == "__main__":
main()
Datasets
Access parsed datasets:
downloader = dm.Downloader()
# Download all 2024 10-K filings converted to JSON
downloader.download_dataset('10K')
# Download all MD&As extracted from 2024 10-K filings
downloader.download_dataset('MDA')
Known Issues
-
Some SEC files are malformed, which can cause parsing errors. For example, this Tesla Form D HTML from 2009 is missing a closing
</meta>tag.Workaround:
from lxml import etree with open('filings/000131860509000005primary_doc.xml', 'r', encoding='utf-8') as file: html = etree.parse(file, etree.HTMLParser())
Roadmap
- Add Mulebot
- Downloader refactor and integrate XBRL downloads
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT LICENSE.
Change Log
Other Useful SEC Packages
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datamule-0.311.tar.gz.
File metadata
- Download URL: datamule-0.311.tar.gz
- Upload date:
- Size: 183.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b511dd6412a5c3e83609fdc3335545f4eb7d75de1871126e95a0c7663f5fd5f0
|
|
| MD5 |
e1a3649d763aefb6c050f534f5479d69
|
|
| BLAKE2b-256 |
d1c9fcedbadca6132999121d743689d268bc34bc037c1ca957cde577734b7c13
|
File details
Details for the file datamule-0.311-py3-none-any.whl.
File metadata
- Download URL: datamule-0.311-py3-none-any.whl
- Upload date:
- Size: 184.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
66f2a0f3d141171024dbdb27c432932260004b20ac45f256bb4834af7e0bae77
|
|
| MD5 |
bcd0261e82f42889c17593d4ba1b5609
|
|
| BLAKE2b-256 |
791f40f1c08ec209659ac0c106ba46cc334499c9a3e2d1fb7951e0297b0e5eb8
|