Making it easier to use SEC filings.

Project description

datamule

PyPI - Downloads GitHub

A Python package to work with SEC filings at scale. Also includes Mulebot, an open-source chatbot for SEC data that does not require storage. Integrated with datamule's APIs and datasets.

Features

Monitor EDGAR for new filings
Parse textual filings into simplified HTML, interactive HTML, or structured JSON
Download SEC filings quickly and easily
Access datasets such as every MD&A from 2024 or every 2024 10-K converted to structured JSON
Interact with SEC data using MuleBot (coming soon)

Features
Installation
Quick Start
Usage
Examples
Known Issues
Roadmap
Contributing
License
Change Log
Other Useful SEC Packages

Installation

Basic installation:

pip install datamule

Installation with additional features:

pip install datamule[filing_viewer]  # Install with filing viewer module
pip install datamule[mulebot]  # Install with MuleBot (coming soon)
pip install datamule[all]  # Install all extras

Available extras:

filing_viewer: Includes dependencies for the filing viewer module
mulebot: Includes MuleBot for interacting with SEC data (coming soon)
mulebot_server: Includes Flask server for running MuleBot (coming soon)
all: Installs all available extras

Quick Start

import datamule as dm

downloader = dm.Downloader()
downloader.download(form='10-K', ticker='AAPL')

Usage

Downloader

Download speed is close to theoretical maximum set by SEC rate limits.

downloader = dm.Downloader()

Downloading Filings

Uses the EFTS API to retrieve filings. I am considering renaming download to download_filings.

download(self, output_dir = 'filings',  return_urls=False,cik=None, ticker=None, form=None, date=None)

# Download all 10-K filings for Tesla using CIK
downloader.download(form='10-K', cik='1318605', output_dir='filings')

# Download 10-K filings for multiple companies using tickers
downloader.download(form='10-K', ticker=['TSLA', 'META'], output_dir='filings')

# Download every form 3 for a specific date
downloader.download(form='3', date='2024-05-21', output_dir='filings')

Downloading Company Concepts XBRL

Uses the Company Concepts API to retrieve XBRL.

download_company_concepts(self, output_dir = 'company_concepts',cik=None, ticker=None)

Datasets

Available datasets:

2024 10-K filings converted to JSON 10K
Management's Discussion and Analysis (MD&A) sections extracted from 2024 10-K filings MDA
Every Company Concepts XBRL XBRL

Also available on Dropbox

# Download all 2024 10-K filings converted to JSON
downloader.download_dataset('10K')

Monitoring for New Filings

print("Monitoring SEC EDGAR for changes...")
changed_bool = downloader.watch(1, silent=False, cik=['0001267602', '0001318605'], form=['3', 'S-8 POS'])
if changed_bool:
    print("New filing detected!")

Parsing

Parse SEC XBRL

Parses XBRL in JSON format to tables. SEC XBRL. See Parse every SEC XBRL to csv in ten minutes

from datamule import parse_company_concepts
table_dict_list = parse_company_concepts(company_concepts) # Returns a list of tables with labels

Parse Textual Filings into structured data

Parse textual filings into different formats. Uses datamule parser endpoint. If it is too slow for your use-case let me know. A faster endpoint is coming soon.

# Simplified HTML
simplified_html = dm.parse_textual_filing(url='https://www.sec.gov/Archives/edgar/data/1318605/000095017022000796/tsla-20211231.htm', return_type='simplify')

# Interactive HTML
interactive_html = dm.parse_textual_filing(url='https://www.sec.gov/Archives/edgar/data/1318605/000095017022000796/tsla-20211231.htm', return_type='interactive')

# JSON
json_data = dm.parse_textual_filing(url='https://www.sec.gov/Archives/edgar/data/1318605/000095017022000796/tsla-20211231.htm', return_type='json')

Filing Viewer

Convert parsed filing JSON into HTML with features like a table of contents sidebar:

from datamule import parse_textual_filing
from datamule.filing_viewer import create_interactive_html

data = parse_textual_filing(url='https://www.sec.gov/Archives/edgar/data/1318605/000095017022000796/tsla-20211231.htm', return_type='json')
create_interactive_html(data, 'output_path.html')

interactive

Mulebot

Interact with SEC data using MuleBot (coming soon). Mulebot uses tool calling to interface with SEC and datamule endpoints.

Features (coming soon):

Interface with XBRL company facts
Interface with 10-ks, etc and summarize sections.

To use Mulebot you will need an OpenAI API Key.

Mulebot Server

Mulebot server is a customizable front-end for Mulebot.

Features (coming soon):

Display XBRL tables with download / copy button.
Download XBRL tables for a specific company in ZIP.
Display sections of filings, like MD&A with links to filing viewer and original.

Quickstart

from datamule.mulebot.mulebot_server import server

def main():
    # Your OpenAI API key
    api_key = "sk-<YOUR_API_KEY>"
    server.set_api_key(api_key)

    # Run the server
    print("Starting MuleBotServer...")
    server.run(debug=True, host='0.0.0.0', port=5000)

if __name__ == "__main__":
    main()

Known Issues

Some SEC files are malformed, which can cause parsing errors. For example, this Tesla Form D HTML from 2009 is missing a closing </meta> tag.

Workaround:

from lxml import etree

with open('filings/000131860509000005primary_doc.xml', 'r', encoding='utf-8') as file:
    html = etree.parse(file, etree.HTMLParser())

SEC Endpoints have issues. e.g. The EFTS search returns the primary doc url for https://www.sec.gov/Archives/edgar/data/1036804/000095011601000004/ as https://www.sec.gov/Archives/edgar/data/1036804/000095011601000004/0001.txt, when it should send you to https://www.sec.gov/Archives/edgar/data/1036804/000095011601000004/0000950116-01-000004.txt.

This is currently a low priority issue. Let me know if you need the data, and I'll move it up the priority list.

Roadmap

Add Mulebot. Add table artifact (current issue is with bad data input). Add section and filing viewer
Downloader refactor and integrate XBRL downloads
add sec_fetch. Either urllib or request should be fastest. integrate into mulebot
Paths may be messed up on non windows devices. Need to verify.
Analytics?

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT LICENSE.

Change Log

Change Log.

Other Useful SEC Packages

Project details

Release history Release notifications | RSS feed

3.6.4

Mar 24, 2026

3.6.3

Mar 19, 2026

3.6.1

Mar 19, 2026

3.6.0

Mar 16, 2026

3.5.2

Mar 11, 2026

3.5.1

Mar 9, 2026

3.5.0

Feb 26, 2026

3.4.3

Feb 26, 2026

3.4.1

Feb 26, 2026

3.4.0

Feb 26, 2026

3.3.0

Feb 2, 2026

3.2.9

Feb 2, 2026

3.2.8

Jan 27, 2026

3.2.7

Jan 22, 2026

3.2.6

Jan 22, 2026

3.2.5

Jan 18, 2026

3.2.4

Jan 17, 2026

3.2.3

Jan 17, 2026

3.2.1

Jan 12, 2026

3.2.0

Jan 12, 2026

3.1.1

Jan 9, 2026

3.1.0

Dec 31, 2025

3.0.6

Dec 28, 2025

3.0.5

Dec 25, 2025

3.0.4

Dec 14, 2025

3.0.3

Dec 13, 2025

3.0.2

Dec 13, 2025

3.0.1

Dec 10, 2025

3.0.0

Dec 9, 2025

2.4.3

Dec 7, 2025

2.4.2

Nov 9, 2025

2.4.1

Oct 10, 2025

2.4.0

Oct 2, 2025

2.3.9

Oct 1, 2025

2.3.8

Sep 30, 2025

2.3.7

Sep 26, 2025

2.3.6

Sep 26, 2025

2.3.5

Sep 19, 2025

2.3.4

Sep 17, 2025

2.3.3

Sep 15, 2025

2.3.2

Sep 15, 2025

2.3.0

Sep 12, 2025

2.2.9

Sep 12, 2025

2.2.8

Sep 11, 2025

2.2.7

Sep 11, 2025

2.2.6

Sep 11, 2025

2.2.5

Aug 31, 2025

2.2.4

Aug 26, 2025

2.2.3

Aug 25, 2025

2.2.2

Aug 25, 2025

2.2.1

Aug 25, 2025

2.2.0

Aug 25, 2025

2.1.6

Aug 20, 2025

2.1.5

Aug 17, 2025

2.1.4

Aug 16, 2025

2.1.3

Aug 16, 2025

2.1.2

Aug 4, 2025

2.1.1

Jul 31, 2025

2.1.0

Jul 30, 2025

2.0.9

Jul 29, 2025

2.0.8

Jul 29, 2025

2.0.7

Jul 28, 2025

2.0.6

Jul 28, 2025

2.0.5

Jul 27, 2025

2.0.4

Jul 26, 2025

2.0.3

Jul 25, 2025

2.0.2

Jul 24, 2025

2.0.1

Jul 24, 2025

2.0.0

Jul 23, 2025

1.9.0

Jul 23, 2025

1.8.6

Jul 18, 2025

1.8.5

Jul 16, 2025

1.8.4

Jul 16, 2025

1.8.3

Jul 14, 2025

1.8.2

Jul 12, 2025

1.8.1

Jul 10, 2025

1.8.0

Jul 10, 2025

1.7.1

Jul 9, 2025

1.7.0

Jul 3, 2025

1.6.9

Jun 30, 2025

1.6.8

Jun 30, 2025

1.6.7

Jun 29, 2025

1.6.6

Jun 29, 2025

1.6.5

Jun 29, 2025

1.6.4

Jun 24, 2025

1.6.3

Jun 24, 2025

1.6.2

Jun 24, 2025

1.6.1

Jun 22, 2025

1.6.0

Jun 22, 2025

1.5.9

Jun 13, 2025

1.5.8

Jun 12, 2025

1.5.6

Jun 12, 2025

1.5.5

Jun 12, 2025

1.5.4

Jun 10, 2025

1.5.3

Jun 2, 2025

1.5.2

May 27, 2025

1.5.1

May 27, 2025

1.5.0

May 27, 2025

1.4.9

May 26, 2025

1.4.6

May 26, 2025

1.4.5

May 25, 2025

1.4.4

May 24, 2025

1.4.3

May 24, 2025

1.4.2

May 22, 2025

1.4.0

May 18, 2025

1.3.1

May 7, 2025

1.3.0

May 7, 2025

1.2.9

May 4, 2025

1.2.8

May 4, 2025

1.2.7

Apr 30, 2025

1.2.6

Apr 28, 2025

1.2.5

Apr 21, 2025

1.2.4

Apr 19, 2025

1.2.3

Apr 18, 2025

1.2.2

Apr 18, 2025

1.2.1

Apr 18, 2025

1.2.0

Apr 9, 2025

1.1.8

Apr 8, 2025

1.1.7

Apr 4, 2025

1.1.6

Mar 29, 2025

1.1.5

Mar 24, 2025

1.1.1

Mar 23, 2025

1.1.0

Mar 23, 2025

1.0.9

Mar 23, 2025

1.0.8

Mar 23, 2025

1.0.7

Mar 23, 2025

1.0.6

Mar 23, 2025

1.0.3

Feb 12, 2025

1.0.2

Feb 6, 2025

1.0.0

Feb 5, 2025

0.430

Jan 7, 2025

0.429

Jan 4, 2025

0.428

Jan 4, 2025

0.427

Jan 2, 2025

0.426

Dec 31, 2024

0.424

Dec 28, 2024

0.423

Dec 26, 2024

0.422

Dec 26, 2024

0.421

Dec 26, 2024

0.420

Dec 26, 2024

0.418

Dec 18, 2024

0.417

Dec 18, 2024

0.416

Dec 18, 2024

0.415

Dec 18, 2024

0.414

Dec 18, 2024

0.413

Dec 18, 2024

0.411

Dec 26, 2024

0.410

Dec 18, 2024

0.408

Dec 18, 2024

0.407

Dec 18, 2024

0.405

Dec 18, 2024

0.401

Dec 18, 2024

0.400

Dec 16, 2024

0.381

Nov 18, 2024

0.380

Nov 18, 2024

0.379

Nov 18, 2024

0.378

Nov 5, 2024

0.377

Nov 2, 2024

0.376

Nov 1, 2024

0.374

Oct 30, 2024

0.373

Oct 29, 2024

0.372

Oct 29, 2024

0.371

Oct 29, 2024

0.369

Oct 29, 2024

0.368

Oct 29, 2024

0.367

Oct 29, 2024

0.366

Oct 28, 2024

0.365

Oct 28, 2024

0.364

Oct 28, 2024

0.363

Oct 25, 2024

0.362

Oct 25, 2024

0.361

Oct 25, 2024

0.360

Oct 25, 2024

0.357

Oct 24, 2024

0.356

Oct 24, 2024

0.355

Oct 24, 2024

0.352

Oct 21, 2024

0.351

Oct 18, 2024

0.350

Oct 17, 2024

0.343

Oct 17, 2024

0.342

Oct 16, 2024

0.341

Oct 16, 2024

0.340

Oct 15, 2024

0.339

Oct 15, 2024

0.338

Oct 15, 2024

0.337

Oct 15, 2024

0.336

Oct 14, 2024

0.335

Oct 14, 2024

0.334

Oct 13, 2024

0.333

Oct 13, 2024

0.332

Oct 6, 2024

0.331

Oct 6, 2024

0.330

Oct 3, 2024

0.323

Sep 27, 2024

0.320

Sep 27, 2024

This version

0.314

Sep 26, 2024

0.312

Sep 21, 2024

0.311

Sep 19, 2024

0.302

Sep 19, 2024

0.301

Sep 18, 2024

0.29

Sep 18, 2024

0.26

Sep 16, 2024

0.25

Sep 16, 2024

0.24

Sep 16, 2024

0.23

Sep 16, 2024

0.22

Sep 14, 2024

0.21

Sep 14, 2024

0.20

Sep 14, 2024

0.17

Sep 10, 2024

0.16

Sep 10, 2024

0.15

Sep 10, 2024

0.14

Sep 7, 2024

0.12

Sep 6, 2024

0.11

Sep 6, 2024

0.5.0

Feb 5, 2025

0.1

Sep 6, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datamule-0.314.tar.gz (188.6 kB view details)

Uploaded Sep 26, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

datamule-0.314-py3-none-any.whl (189.2 kB view details)

Uploaded Sep 26, 2024 Python 3

File details

Details for the file datamule-0.314.tar.gz.

File metadata

Download URL: datamule-0.314.tar.gz
Upload date: Sep 26, 2024
Size: 188.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.0 CPython/3.11.5

File hashes

Hashes for datamule-0.314.tar.gz
Algorithm	Hash digest
SHA256	`daef17094213f4104c475f3fdd1912e73a17b033f500024508e96faa92fbd6bf`
MD5	`3773f3f1236a9febb297446c972e2953`
BLAKE2b-256	`b72bfcea665a20fd56a175abce54bb50efadb63c5245395fc8658b6df89f34cd`

See more details on using hashes here.

File details

Details for the file datamule-0.314-py3-none-any.whl.

File metadata

Download URL: datamule-0.314-py3-none-any.whl
Upload date: Sep 26, 2024
Size: 189.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.0 CPython/3.11.5

File hashes

Hashes for datamule-0.314-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c76e7b51373f279d9a3969323c2ec6ac5e3338521e4f83a261152a5ba9b143a3`
MD5	`a55ff73b7500715c16ae689f293356c7`
BLAKE2b-256	`e4ec105e0e192db045e2084f1130203958a8d3e7653d59749588a8a27cc8f5b4`

See more details on using hashes here.

datamule 0.314

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

datamule

Features

Table of Contents

Installation

Quick Start

Usage

Downloader

Downloading Filings

Downloading Company Concepts XBRL

Datasets

Monitoring for New Filings

Parsing

Parse SEC XBRL

Parse Textual Filings into structured data

Filing Viewer

Mulebot

Mulebot Server

Known Issues

Roadmap

Contributing

License

Change Log

Other Useful SEC Packages

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes