Skip to main content

Making it easier to use SEC filings.

Project description

datamule

PyPI - Downloads Hits GitHub

A Python package to work with SEC filings at scale. Also includes Mulebot, an open-source chatbot for SEC data that does not require storage. Integrated with datamule's APIs and datasets.

Articles: How to deploy a financial chatbot to the internet in 5 minutes

Features

  • Monitor EDGAR for new filings
  • Parse textual filings into simplified HTML, interactive HTML, or structured JSON
  • Download SEC filings quickly and easily
  • Access datasets such as every 10-K since 2001, 2024 MD&A, 2024 10-K converted to structured JSON, and more.
  • Interact with SEC data using MuleBot

Table of Contents

Installation

Basic installation:

pip install datamule

Installation with additional features:

pip install datamule[filing_viewer]  # Install with filing viewer module
pip install datamule[mulebot]  # Install with MuleBot
pip install datamule[all]  # Install all extras

Available extras:

  • filing_viewer: Includes dependencies for the filing viewer module
  • mulebot: Includes MuleBot for interacting with SEC data
  • mulebot_server: Includes Flask server for running MuleBot
  • all: Installs all available extras

Quick Start

import datamule as dm

downloader = dm.Downloader()
downloader.download(form='10-K', ticker='AAPL')

Usage

Downloader

downloader = dm.Downloader()

Downloading Filings

Uses the EFTS API to retrieve filings locations, and the SEC API to download filings.

download(self, output_dir='filings', return_urls=False, cik=None, ticker=None, form=None, date=None,sics=None,items=None)
# Download all 10-K filings for Tesla using CIK
downloader.download(form='10-K', cik='1318605', output_dir='filings')

# Download 10-K filings for multiple companies using tickers
downloader.download(form='10-K', ticker=['TSLA', 'META'], output_dir='filings')

# Download every form 3 for a specific date
downloader.download(form='3', date='2024-05-21', output_dir='filings')

View the SEC Filing Glossary here or download the json file here.

Downloading Company Concepts XBRL

Uses the Company Concepts API to retrieve XBRL.

download_company_concepts(self, output_dir = 'company_concepts',cik=None, ticker=None)

View the XBRL Fact Glossary here or as a csv file here.

Changing Rate Limits

The SEC.gov officially supports 10 requests / second. In practice this is not the case. After heavy experimentation the downloader's default rate limit for sec.gov has been set to 7 requests / second. If you intend to download less than 1,000 filings at a time, setting the rate limit to 10 should be fine. If you need to download more than 10,000 filings, setting the rate limit to 5 will likely avoid rate limiting. Also, downloading at off-peak times will likely let you set higher rate-limits. Experiment Details

downloader.set_limiter('www.sec.gov', 10)

Datasets

Available datasets:

  • Every FTD since 2004. ftd (1.3gb, ~60s to download)
  • Every 10-K from 2001 to September 2024. 10k_{year} e.g. 10k_2002. (Speed depends on Zenodo's servers. Working on finding a better host)
downloader.download_dataset(dataset='ftd', dataset_path='datasets')

Note: I've signed up for a DropBox plus account and will be migrating the Zenodo datasets there. Target is that downloading every 10-K takes less than 5 minutes.

Monitoring for New Filings

print("Monitoring SEC EDGAR for changes...")
changed_bool = downloader.watch(1, silent=False, cik=['0001267602', '0001318605'], form=['3', 'S-8 POS'])
if changed_bool:
    print("New filing detected!")

Parsing

Parse SEC XBRL

Parses XBRL in JSON format to tables. SEC XBRL. See Parse every SEC XBRL to csv in ten minutes

from datamule import parse_company_concepts
table_dict_list = parse_company_concepts(company_concepts) # Returns a list of tables with labels

Parse Textual Filings into structured data

Parse textual filings into different formats. Uses datamule parser endpoint. If it is too slow for your use-case let me know. A faster endpoint is coming soon.

# Simplified HTML
simplified_html = dm.parse_textual_filing(url='https://www.sec.gov/Archives/edgar/data/1318605/000095017022000796/tsla-20211231.htm', return_type='simplify')

# Interactive HTML
interactive_html = dm.parse_textual_filing(url='https://www.sec.gov/Archives/edgar/data/1318605/000095017022000796/tsla-20211231.htm', return_type='interactive')

# JSON
json_data = dm.parse_textual_filing(url='https://www.sec.gov/Archives/edgar/data/1318605/000095017022000796/tsla-20211231.htm', return_type='json')

Filing Viewer

Convert parsed filing JSON into HTML with features like a table of contents sidebar:

from datamule import parse_textual_filing
from datamule.filing_viewer import create_interactive_filing

data = parse_textual_filing(url='https://www.sec.gov/Archives/edgar/data/1318605/000095017022000796/tsla-20211231.htm', return_type='json')
create_interactive_filing(data)

interactive

Try out the Filings Viewer here. Note: This is an older version with bugs, that will be updated with the next release of the Parser API.

Mulebot

Interact with SEC data using MuleBot. Mulebot uses tool calling to interface with SEC and datamule endpoints.

from datamule.mulebot import MuleBot
mulebot = MuleBot(openai_api_key)
mulebot.run()

To use Mulebot you will need an OpenAI API Key.

Mulebot Server

Mulebot server is a customizable front-end for Mulebot. Example

Artifacts:

  • Filing Viewer
  • Company Facts Viewer
  • List Viewer

Quickstart

from datamule.mulebot.mulebot_server import MuleBotServer

def main():
    server = MuleBotServer()

    # Your OpenAI API key
    api_key = "sk-<YOUR_API_KEY>"
    server.set_api_key(api_key)

    # Run the server
    print("Starting MuleBotServer...")
    server.run(debug=True, host='0.0.0.0', port=5000)

if __name__ == "__main__":
    main()

Known Issues

This is currently a low priority issue. Let me know if you need the data, and I'll move it up the priority list.

Roadmap

  • zenodo partitioning for better download speeds
  • enable searching across downloaded filings
  • mulebot add method to use custom html templates
  • mulebot - look at adding summarization. Add some protections to too many tokens being used + add options to allow summarization etc.
  • Paths may be messed up on non windows devices. Need to verify.
  • Analytics?
  • downloader succesful downloads message may be slightly off.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT LICENSE.

Change Log

Change Log.


Other Useful SEC Packages

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datamule-0.337.tar.gz (205.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datamule-0.337-py3-none-any.whl (209.1 kB view details)

Uploaded Python 3

File details

Details for the file datamule-0.337.tar.gz.

File metadata

  • Download URL: datamule-0.337.tar.gz
  • Upload date:
  • Size: 205.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.11.5

File hashes

Hashes for datamule-0.337.tar.gz
Algorithm Hash digest
SHA256 2bfdac3695be8ad57994d71bb26265fb4590bbdd7b4cce60f7942f050ef54122
MD5 dbb99b493068f4ad4355c35a125af65a
BLAKE2b-256 a807e253f0cb9ed9bd5649a3e7f7800f3a9021c83c9a63d8095a52de14c72b92

See more details on using hashes here.

File details

Details for the file datamule-0.337-py3-none-any.whl.

File metadata

  • Download URL: datamule-0.337-py3-none-any.whl
  • Upload date:
  • Size: 209.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.11.5

File hashes

Hashes for datamule-0.337-py3-none-any.whl
Algorithm Hash digest
SHA256 d826338681fb4565f211ee82e11cb4b8bb488821c85633f7141ae8935497358a
MD5 ed74c1a56044ac12a20be51bdfd54036
BLAKE2b-256 7581d652a131716a2d0de4e1a5ee8c9077d8ff1aeca7340897a4ca6995fe14b0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page