Skip to main content

Making it easier to use SEC filings.

Project description

datamule

PyPI - Downloads Hits GitHub

A Python package to work with SEC filings at scale. Also includes Mulebot, an open-source chatbot for SEC data that does not require storage. Integrated with datamule's APIs and datasets.

Features

  • Monitor EDGAR for new filings
  • Parse textual filings into simplified HTML, interactive HTML, or structured JSON
  • Download SEC filings quickly and easily
  • Access datasets such as every MD&A from 2024 or every 2024 10-K converted to structured JSON
  • Interact with SEC data using MuleBot (coming soon)

Table of Contents

Installation

Basic installation:

pip install datamule

Installation with additional features:

pip install datamule[filing_viewer]  # Install with filing viewer module
pip install datamule[mulebot]  # Install with MuleBot (coming soon)
pip install datamule[all]  # Install all extras

Available extras:

  • filing_viewer: Includes dependencies for the filing viewer module
  • mulebot: Includes MuleBot for interacting with SEC data (coming soon)
  • mulebot_server: Includes Flask server for running MuleBot (coming soon)
  • all: Installs all available extras

Quick Start

import datamule as dm

downloader = dm.Downloader()
downloader.download(form='10-K', ticker='AAPL')

Usage

Downloader

Download speed is close to theoretical maximum set by SEC rate limits.

downloader = dm.Downloader()

Downloading Filings

Uses the EFTS API to retrieve filings. I am considering renaming download to download_filings.

download(self, output_dir = 'filings',  return_urls=False,cik=None, ticker=None, form=None, date=None)
# Download all 10-K filings for Tesla using CIK
downloader.download(form='10-K', cik='1318605', output_dir='filings')

# Download 10-K filings for multiple companies using tickers
downloader.download(form='10-K', ticker=['TSLA', 'META'], output_dir='filings')

# Download every form 3 for a specific date
downloader.download(form='3', date='2024-05-21', output_dir='filings')

Downloading Company Concepts XBRL

Uses the Company Concepts API to retrieve XBRL.

download_company_concepts(self, output_dir = 'company_concepts',cik=None, ticker=None)

Datasets

Available datasets:

  • 2024 10-K filings converted to JSON 10K
  • Management's Discussion and Analysis (MD&A) sections extracted from 2024 10-K filings MDA
  • Every Company Concepts XBRL XBRL

Also available on Dropbox

# Download all 2024 10-K filings converted to JSON
downloader.download_dataset('10K')

Monitoring for New Filings

print("Monitoring SEC EDGAR for changes...")
changed_bool = downloader.watch(1, silent=False, cik=['0001267602', '0001318605'], form=['3', 'S-8 POS'])
if changed_bool:
    print("New filing detected!")

Parsing

Parse SEC XBRL

Parses XBRL in JSON format to tables. SEC XBRL. See Parse every SEC XBRL to csv in ten minutes

from datamule import parse_company_concepts
table_dict_list = parse_company_concepts(company_concepts) # Returns a list of tables with labels

Parse Textual Filings into structured data

Parse textual filings into different formats. Uses datamule parser endpoint. If it is too slow for your use-case let me know. A faster endpoint is coming soon.

# Simplified HTML
simplified_html = dm.parse_textual_filing(url='https://www.sec.gov/Archives/edgar/data/1318605/000095017022000796/tsla-20211231.htm', return_type='simplify')

# Interactive HTML
interactive_html = dm.parse_textual_filing(url='https://www.sec.gov/Archives/edgar/data/1318605/000095017022000796/tsla-20211231.htm', return_type='interactive')

# JSON
json_data = dm.parse_textual_filing(url='https://www.sec.gov/Archives/edgar/data/1318605/000095017022000796/tsla-20211231.htm', return_type='json')

Filing Viewer

Convert parsed filing JSON into HTML with features like a table of contents sidebar:

from datamule import parse_textual_filing
from datamule.filing_viewer import create_interactive_html

data = parse_textual_filing(url='https://www.sec.gov/Archives/edgar/data/1318605/000095017022000796/tsla-20211231.htm', return_type='json')
create_interactive_html(data, 'output_path.html')

interactive

Mulebot

Interact with SEC data using MuleBot (coming soon). Mulebot uses tool calling to interface with SEC and datamule endpoints.

Features (coming soon):

  • Interface with XBRL company facts
  • Interface with 10-ks, etc and summarize sections.

To use Mulebot you will need an OpenAI API Key.

Mulebot Server

Mulebot server is a customizable front-end for Mulebot.

Features (coming soon):

  • Display XBRL tables with download / copy button.
  • Download XBRL tables for a specific company in ZIP.
  • Display sections of filings, like MD&A with links to filing viewer and original.

Quickstart

from datamule.mulebot.mulebot_server import server

def main():
    # Your OpenAI API key
    api_key = "sk-<YOUR_API_KEY>"
    server.set_api_key(api_key)

    # Run the server
    print("Starting MuleBotServer...")
    server.run(debug=True, host='0.0.0.0', port=5000)

if __name__ == "__main__":
    main()

Known Issues

This is currently a low priority issue. Let me know if you need the data, and I'll move it up the priority list.

Roadmap

  • Add Mulebot. Add table artifact (current issue is with bad data input). Add section and filing viewer
  • Downloader refactor and integrate XBRL downloads
  • add sec_fetch. Either urllib or request should be fastest. integrate into mulebot
  • Paths may be messed up on non windows devices. Need to verify.
  • Analytics?

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT LICENSE.

Change Log

Change Log.


Other Useful SEC Packages

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datamule-0.314.tar.gz (188.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datamule-0.314-py3-none-any.whl (189.2 kB view details)

Uploaded Python 3

File details

Details for the file datamule-0.314.tar.gz.

File metadata

  • Download URL: datamule-0.314.tar.gz
  • Upload date:
  • Size: 188.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.11.5

File hashes

Hashes for datamule-0.314.tar.gz
Algorithm Hash digest
SHA256 daef17094213f4104c475f3fdd1912e73a17b033f500024508e96faa92fbd6bf
MD5 3773f3f1236a9febb297446c972e2953
BLAKE2b-256 b72bfcea665a20fd56a175abce54bb50efadb63c5245395fc8658b6df89f34cd

See more details on using hashes here.

File details

Details for the file datamule-0.314-py3-none-any.whl.

File metadata

  • Download URL: datamule-0.314-py3-none-any.whl
  • Upload date:
  • Size: 189.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.11.5

File hashes

Hashes for datamule-0.314-py3-none-any.whl
Algorithm Hash digest
SHA256 c76e7b51373f279d9a3969323c2ec6ac5e3338521e4f83a261152a5ba9b143a3
MD5 a55ff73b7500715c16ae689f293356c7
BLAKE2b-256 e4ec105e0e192db045e2084f1130203958a8d3e7653d59749588a8a27cc8f5b4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page