Skip to main content

Making it easier to use SEC filings.

Project description

PyPI - Downloads Hits GitHub

datamule

A python package to make using SEC filings easier. Integrated with datamule's APIs and datasets.

features

current:

  • parse textual filings into simplified html, interactive html, or structured json.
  • download sec filings quickly and easily
  • download datasets such as every MD&A from 2024 or every 2024 10K converted to structured json

Installation

pip install datamule

quickstart:

parsing

Uses endpoint: https://jgfriedman99.pythonanywhere.com/parse_url with params url and return_type. Current endpoint can be slow. If it's too slow for your use-case, please contact me.

simplified html

simplified_html = dm.parse_textual_filing(url='https://www.sec.gov/Archives/edgar/data/1318605/000095017022000796/tsla-20211231.htm',return_type='simplify')

Alt text Download Example

interactive html

interactive_html = dm.parse_textual_filing(url='https://www.sec.gov/Archives/edgar/data/1318605/000095017022000796/tsla-20211231.htm',return_type='interactive')

Alt text Download Example

json

d = dm.parse_textual_filing(url='https://www.sec.gov/Archives/edgar/data/1318605/000095017022000796/tsla-20211231.htm',return_type='json')

Alt text Download Example

downloading filings using the indices api

Limited to 10,000 results per query. Uses endpoint: https://api.datamule.xyz/submissions. A full list of params can be found here SEC Router

from datamule import Downloader
downloader = Downloader()
downloader.download_using_api(form='10-K',ticker='AAPL')

downloading filings without the indices api

Either download the pre-built indices from the links in the readme and set the indices_path to the folder

from datamule import Downloader
downloader = Downloader()
downloader.set_indices_path(indices_path)

Or run the indexer. If download = True, downloads the last uploaded indices (9/14/24). If false, re-runs the indexer.

from datamule import Indexer
indexer = Indexer()
indexer.run(download=False)

Example Downloads

# Example 1: Download all 10-K filings for Tesla using CIK
downloader.download(form='10-K', cik='1318605', output_dir='filings')

# Example 2: Download 10-K filings for Tesla and META using CIK
downloader.download(form='10-K', cik=['1318605','1326801'], output_dir='filings')

# Example 3: Download 10-K filings for Tesla using ticker
downloader.download(form='10-K', ticker='TSLA', output_dir='filings')

# Example 4: Download 10-K filings for Tesla and META using ticker
downloader.download(form='10-K', ticker=['TSLA','META'], output_dir='filings')

# Example 5: Download every form 3 for a specific date
downloader.download(form ='3', date='2024-05-21', output_dir='filings')

# Example 6: Download every 10K for a year
downloader.download(form='10-K', date=('2024-01-01', '2024-12-31'), output_dir='filings')

# Example 7: Download every form 4 for a list of dates
downloader.download(form = '4',date=['2024-01-01', '2024-12-31'], output_dir='filings')

datasets

Need a better way to store datasets, as I'm running out of storage. Currently stored on Dropbox 2gb free tier.

downloader.download_dataset('10K')
downloader.download_dataset('MDA')

TODO

  • standardize accession number to not include '-'. Currently db does not have '-' but submissions_index.csv does.
  • add code to convert parsed json to interactive html
  • add mulebot

Update Log

9/15/24

  • fixed downloading filings overwriting each other due to same name.

9/14/24

  • added support for parser API

9/13/24

  • added download_datasets
  • added option to download indices
  • added support for jupyter notebooks

9/9/24

  • added download_using_api(self, output_dir, **kwargs). No indices required.

9/8/24

  • Added integration with datamule's SEC Router API

9/7/24

  • Simplified indices approach
  • Switched from pandas to polar. Loading indices now takes under 500 milliseconds.

Project details


Release history Release notifications | RSS feed

This version

0.23

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datamule-0.23.tar.gz (11.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datamule-0.23-py3-none-any.whl (11.8 kB view details)

Uploaded Python 3

File details

Details for the file datamule-0.23.tar.gz.

File metadata

  • Download URL: datamule-0.23.tar.gz
  • Upload date:
  • Size: 11.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.11.5

File hashes

Hashes for datamule-0.23.tar.gz
Algorithm Hash digest
SHA256 1164fa7ee979aefa05311a0995d935d7e391a6ee9dc28687be8199da68ffb43c
MD5 165aa38435f6aac88f08f7af67e636a3
BLAKE2b-256 13aef8b4e059aa103b2ae083992c80016ffe4dc5702a8a158e9d78986b92478f

See more details on using hashes here.

File details

Details for the file datamule-0.23-py3-none-any.whl.

File metadata

  • Download URL: datamule-0.23-py3-none-any.whl
  • Upload date:
  • Size: 11.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.11.5

File hashes

Hashes for datamule-0.23-py3-none-any.whl
Algorithm Hash digest
SHA256 aa8c8dc1df77a5037d6277e8fbbdcdf0ccc038d534dac18667027efd0d8bb8ed
MD5 7a883302dd151a1c31b89cd752b90af8
BLAKE2b-256 4bad511b4bd27c867200e19a2abd6594f1f691a4df827039cc04260f9a5398ec

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page