Skip to main content

Making it easier to use SEC filings.

Project description

datamule

PyPI - Downloads Hits GitHub

A Python package to simplify working with SEC filings. Also includes Mulebot, an open-source chatbot for SEC data that does not require storage. Integrated with datamule's APIs and datasets.

Features

  • Monitor EDGAR for new filings
  • Parse textual filings into simplified HTML, interactive HTML, or structured JSON
  • Download SEC filings quickly and easily
  • Access datasets such as every MD&A from 2024 or every 2024 10-K converted to structured JSON
  • Interact with SEC data using MuleBot (coming soon)

Table of Contents

Installation

Basic installation:

pip install datamule

Installation with additional features:

pip install datamule[filing_viewer]  # Install with filing viewer module
pip install datamule[mulebot]  # Install with MuleBot (coming soon)
pip install datamule[all]  # Install all extras

Available extras:

  • filing_viewer: Includes dependencies for the filing viewer module
  • mulebot: Includes MuleBot for interacting with SEC data (coming soon)
  • mulebot_server: Includes Flask server for running MuleBot (coming soon)
  • all: Installs all available extras

Quick Start

import datamule as dm

downloader = dm.Downloader()
downloader.download(form='10-K', ticker='AAPL')

Usage

Downloader

The Downloader class uses the EFTS API to retrieve filings.

Downloading Filings

downloader = dm.Downloader()

# Download all 10-K filings for Tesla using CIK
downloader.download(form='10-K', cik='1318605', output_dir='filings')

# Download 10-K filings for multiple companies using tickers
downloader.download(form='10-K', ticker=['TSLA', 'META'], output_dir='filings')

# Download every form 3 for a specific date
downloader.download(form='3', date='2024-05-21', output_dir='filings')

Monitoring for New Filings

print("Monitoring SEC EDGAR for changes...")
changed_bool = downloader.watch(1, silent=False, cik=['0001267602', '0001318605'], form=['3', 'S-8 POS'])
if changed_bool:
    print("New filing detected!")

Parsing

Parse textual filings into different formats. Uses datamule parser endpoint. If it is too slow for your use-case let me know. A faster endpoint is coming soon.

# Simplified HTML
simplified_html = dm.parse_textual_filing(url='https://www.sec.gov/Archives/edgar/data/1318605/000095017022000796/tsla-20211231.htm', return_type='simplify')

# Interactive HTML
interactive_html = dm.parse_textual_filing(url='https://www.sec.gov/Archives/edgar/data/1318605/000095017022000796/tsla-20211231.htm', return_type='interactive')

# JSON
json_data = dm.parse_textual_filing(url='https://www.sec.gov/Archives/edgar/data/1318605/000095017022000796/tsla-20211231.htm', return_type='json')

Filing Viewer

Convert parsed filing JSON into HTML with features like a table of contents sidebar:

from datamule import parse_textual_filing
from datamule.filing_viewer import create_interactive_html

data = parse_textual_filing(url='https://www.sec.gov/Archives/edgar/data/1318605/000095017022000796/tsla-20211231.htm', return_type='json')
create_interactive_html(data, 'output_path.html')

Mulebot

Interact with SEC data using MuleBot (coming soon). Mulebot uses tool calling to interface with SEC and datamule endpoints.

Features (coming soon):

  • Interface with XBRL company facts
  • Interface with 10-ks, etc and summarize sections.

To use Mulebot you will need an OpenAI API Key.

Mulebot Server

Mulebot server is a customizable front-end for Mulebot.

Features (coming soon):

  • Display XBRL tables with download / copy button.
  • Download XBRL tables for a specific company in ZIP.
  • Display sections of filings, like MD&A with links to filing viewer and original.

Quickstart

from datamule.mulebot.mulebot_server import server

def main():
    # Your OpenAI API key
    api_key = "sk-<YOUR_API_KEY>"
    server.set_api_key(api_key)

    # Run the server
    print("Starting MuleBotServer...")
    server.run(debug=True, host='0.0.0.0', port=5000)

if __name__ == "__main__":
    main()

Datasets

Access parsed datasets:

downloader = dm.Downloader()

# Download all 2024 10-K filings converted to JSON
downloader.download_dataset('10K')

# Download all MD&As extracted from 2024 10-K filings
downloader.download_dataset('MDA')

Known Issues

  • Some SEC files are malformed, which can cause parsing errors. For example, this Tesla Form D HTML from 2009 is missing a closing </meta> tag.

    Workaround:

    from lxml import etree
    
    with open('filings/000131860509000005primary_doc.xml', 'r', encoding='utf-8') as file:
        html = etree.parse(file, etree.HTMLParser())
    

Roadmap

  • Add Mulebot
  • Downloader refactor and integrate XBRL downloads

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT LICENSE.

Change Log

Change Log.


Other Useful SEC Packages

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datamule-0.311.tar.gz (183.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datamule-0.311-py3-none-any.whl (184.1 kB view details)

Uploaded Python 3

File details

Details for the file datamule-0.311.tar.gz.

File metadata

  • Download URL: datamule-0.311.tar.gz
  • Upload date:
  • Size: 183.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.11.5

File hashes

Hashes for datamule-0.311.tar.gz
Algorithm Hash digest
SHA256 b511dd6412a5c3e83609fdc3335545f4eb7d75de1871126e95a0c7663f5fd5f0
MD5 e1a3649d763aefb6c050f534f5479d69
BLAKE2b-256 d1c9fcedbadca6132999121d743689d268bc34bc037c1ca957cde577734b7c13

See more details on using hashes here.

File details

Details for the file datamule-0.311-py3-none-any.whl.

File metadata

  • Download URL: datamule-0.311-py3-none-any.whl
  • Upload date:
  • Size: 184.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.11.5

File hashes

Hashes for datamule-0.311-py3-none-any.whl
Algorithm Hash digest
SHA256 66f2a0f3d141171024dbdb27c432932260004b20ac45f256bb4834af7e0bae77
MD5 bcd0261e82f42889c17593d4ba1b5609
BLAKE2b-256 791f40f1c08ec209659ac0c106ba46cc334499c9a3e2d1fb7951e0297b0e5eb8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page