Skip to main content

Useful extensions for sec-edgar-downloader.

Project description

sec-downloader

GitHub Workflow Status PyPI - Python Version PyPI version Licence

Useful extensions for sec-edgar-downloader. Built with nbdev.

Install

pip install sec_downloader

Features

  • Files are downloaded to a temporary folder, immediately read into memory, and then deleted.
  • Use “glob” pattern to select which files are read to memory.

How to use

Option 1: Wrapper of sec-edgar-downloader

Let’s demonstrate how to download a single file (latest 10-Q filing details in HTML format) to memory.

from sec_downloader import Downloader

dl = Downloader("MyCompanyName", "email@example.com")
html = dl.get_latest_html("10-Q", "AAPL")
# Use dl.get_latest_n_html("10-Q", "AAPL", n=5) to get the latest 5 10-Qs
print(f"{html[:50]}...")
<?xml version="1.0" ?><!--XBRL Document Created wi...

Note The company name and email address are used to form a user-agent string that adheres to the SEC EDGAR’s fair access policy for programmatic downloading. Source

Which is implemented approximately as:

from sec_edgar_downloader import Downloader as SecEdgarDownloader
from sec_downloader import DownloadStorage

ONLY_HTML = "**/*.htm*"

storage = DownloadStorage(filter_pattern=ONLY_HTML)
with storage as path:
    dl = SecEdgarDownloader("MyCompanyName", "email@example.com", path)
    dl.get("10-Q", "AAPL", limit=1, download_details=True)
# all files are now deleted and only stored in memory

content = storage.get_file_contents()[0].content
print(f"{content[:50]}...")
<?xml version="1.0" ?><!--XBRL Document Created wi...

Downloading multiple documents:

storage = DownloadStorage()
with storage as path:
    dl = SecEdgarDownloader("MyCompanyName", "email@example.com", path)
    dl.get("10-K", "GOOG", limit=2)
# all files are now deleted and only stored in memory

for path, content in storage.get_file_contents():
    print(f"Path: {path}\nContent [len={len(content)}]: {content[:30]}...\n")
Path: sec-edgar-filings/GOOG/10-K/0001652044-22-000019/full-submission.txt
Content [len=15044932]: <SEC-DOCUMENT>0001652044-22-00...

Path: sec-edgar-filings/GOOG/10-K/0001652044-23-000016/full-submission.txt
Content [len=15264470]: <SEC-DOCUMENT>0001652044-23-00...

Option 2: Fork implementation of sec-edgar-downloader

Download the metadata

dl = Downloader("MyCompanyName", "email@example.com")
dl.get_filing_metadata(accession_number="0000320193-23-000077")
FilingMetadata(accession_number='0000320193-23-000077', form_type='10-Q', primary_doc_url='https://www.sec.gov/Archives/edgar/data/320193/000032019323000077/aapl-20230701.htm', items='', primary_doc_description='10-Q', filing_date='2023-08-04', report_date='2023-07-01', company_name='Apple Inc.', tickers=[Ticker(symbol='AAPL', exchange='Nasdaq')])
dl = Downloader("MyCompanyName", "email@example.com")
metadatas = dl.get_filing_metadatas(
    [
        # Here you can provide any number of these:
        # -----------------------------------------
        # EXAMPLE 1: Accession Number
        "0000320193-23-000077",
        # -----------------------------------------
        # EXAMPLE 2: SEC EDGAR Filing URL
        "https://www.sec.gov/ix?doc=/Archives/edgar/data/320193/000032019323000077/aapl-20230701.htm",
        # -----------------------------------------
        # EXAMPLE 3: Latest 10-Q filing from Netflix
        # Note: Use a Ticker or CIK. Format: [amount=1]/ticker_or_cik/[form_type=10-Q]
        "NFLX",
        # -----------------------------------------
        # Example 4: Two latest 10-K filings from Microsoft
        # Note: Equivalent to RequestedFilings(limit=2, ticker_or_cik="MSFT", form_type="10-K")
        "2/MSFT/10-K",
    ]
)

# Below is just for demo purposes to view the values in the result
import pandas as pd
from dataclasses import asdict

r = pd.DataFrame([asdict(metadata) for metadata in metadatas])
r = r[["company_name"] + [col for col in r.columns if col != "company_name"]]
r
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style>
company_name accession_number form_type primary_doc_url items primary_doc_description filing_date report_date tickers
0 Apple Inc. 0000320193-23-000077 10-Q https://www.sec.gov/Archives/edgar/data/320193... 10-Q 2023-08-04 2023-07-01 [{'symbol': 'AAPL', 'exchange': 'Nasdaq'}]
1 Apple Inc. 0000320193-23-000077 10-Q https://www.sec.gov/Archives/edgar/data/320193... 10-Q 2023-08-04 2023-07-01 [{'symbol': 'AAPL', 'exchange': 'Nasdaq'}]
2 NETFLIX INC 0001065280-23-000273 10-Q https://www.sec.gov/Archives/edgar/data/106528... 10-Q 2023-10-20 2023-09-30 [{'symbol': 'NFLX', 'exchange': 'Nasdaq'}]
3 MICROSOFT CORP 0000950170-23-035122 10-K https://www.sec.gov/Archives/edgar/data/789019... 10-K 2023-07-27 2023-06-30 [{'symbol': 'MSFT', 'exchange': 'Nasdaq'}]
4 MICROSOFT CORP 0001564590-22-026876 10-K https://www.sec.gov/Archives/edgar/data/789019... 10-K 2022-07-28 2022-06-30 [{'symbol': 'MSFT', 'exchange': 'Nasdaq'}]

Download the HTML files

You can download the HTML for any of the filings:

for filing in dl.download_filings(metadatas):
    html = filing.primary_document.decode()
    print(html[:50])
    break  # same for all filings, let's just print the first one
<?xml version="1.0" ?><!--XBRL Document Created wi

Contributing

Follow these steps to install the project locally for development:

  1. Install the project with the command pip install -e ".[dev]".

Note We highly recommend using virtual environments for Python development. If you’d like to use virtual environments, follow these steps instead: - Create a virtual environment python3 -m venv .venv - Activate the virtual environment source .venv/bin/activate - Install the project with the command pip install -e ".[dev]"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sec-downloader-0.4.2.tar.gz (12.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sec_downloader-0.4.2-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file sec-downloader-0.4.2.tar.gz.

File metadata

  • Download URL: sec-downloader-0.4.2.tar.gz
  • Upload date:
  • Size: 12.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for sec-downloader-0.4.2.tar.gz
Algorithm Hash digest
SHA256 f94f3457eaa1d23f89bf18bb93f208b5c55ca089f1d058f6bb4d742e803094c9
MD5 dacff132f488657d1737d375028d3699
BLAKE2b-256 8011cd5d3c4fe0d07a5304f0f22f70c0ea164f24e3de2f7bfb41d73246443bd2

See more details on using hashes here.

File details

Details for the file sec_downloader-0.4.2-py3-none-any.whl.

File metadata

  • Download URL: sec_downloader-0.4.2-py3-none-any.whl
  • Upload date:
  • Size: 10.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for sec_downloader-0.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 fb24451d71ffd48ff86296673446cd72e130eea0ce43350e77ccfeb0c4e3fbb5
MD5 2880a7cec84c20301d9106dac87fceff
BLAKE2b-256 492856baabbbc256164cae83b048e0cc89f9f2926ba72e4c0670c0dcd360415b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page