Skip to main content

Useful extensions for sec-edgar-downloader.

Project description

sec-downloader

GitHub Workflow Status PyPI - Python Version PyPI version Licence

A better version of sec-edgar-downloader. Includes an alternative implementation (a wrapper instead of a fork), to keep compatibility with new sec-edgar-downloader releases. This library partially uses nbdev.

Features

Advantages over sec-edgar-downloader:

Flexibility in Download Process

  • Tailored for choosing what, where, and how to download.
  • Files stored in memory for faster operations and no unnecessary disk clutter.

Separate Metadata and File Downloads

  • Easily skip unneeded files.
  • Download metadata first, then selectively download files.
  • Option to save metadata for better organization.

More Input Options

  • Ticker or CIK (e.g., AAPL, 0000320193) for latest filings.
  • Accession Number (e.g., 0000320193-23-000077). Not supported in sec-edgar-downloader.
  • SEC EDGAR URL (e.g., https://www.sec.gov/ix?doc=/Archives/edgar/data/0001067983/000119312523272204/d564412d8k.htm). Not supported in sec-edgar-downloader.

Install

pip install sec-downloader

How to use

Download the metadata

Note The company name and email address are used to form a user-agent string that adheres to the SEC EDGAR’s fair access policy for programmatic downloading. Source

from sec_downloader import Downloader

dl = Downloader("MyCompanyName", "email@example.com")

Find a filing with an Accession Number

metadatas = dl.get_filing_metadatas("AAPL/0000320193-23-000077")
print(metadatas)
[FilingMetadata(accession_number='0000320193-23-000077',
                form_type='10-Q',
                primary_doc_url='https://www.sec.gov/Archives/edgar/data/320193/000032019323000077/aapl-20230701.htm',
                items='',
                primary_doc_description='10-Q',
                filing_date='2023-08-04',
                report_date='2023-07-01',
                cik='0000320193',
                company_name='Apple Inc.',
                tickers=[Ticker(symbol='AAPL', exchange='Nasdaq')])]

Alternatively, you can also use any of these to get the same answer:

metadatas = dl.get_filing_metadatas("aapl/000032019323000077")
metadatas = dl.get_filing_metadatas("320193/000032019323000077")
metadatas = dl.get_filing_metadatas("320193/0000320193-23-000077")
metadatas = dl.get_filing_metadatas("0000320193/0000320193-23-000077")
metadatas = dl.get_filing_metadatas(CompanyAndAccessionNumber(ticker_or_cik="320193", accession_number="0000320193-23-000077"))

Find the filing matching a SEC EDGAR Filing URL. Only CIK and Accession Number are used from the URL:

metadatas = dl.get_filing_metadatas(
    "https://www.sec.gov/ix?doc=/Archives/edgar/data/0001067983/000119312523272204/d564412d8k.htm"
)
print(metadatas)
[FilingMetadata(accession_number='0001193125-23-272204',
                form_type='8-K',
                primary_doc_url='https://www.sec.gov/Archives/edgar/data/1067983/000119312523272204/d564412d8k.htm',
                items='2.02,9.01',
                primary_doc_description='8-K',
                filing_date='2023-11-07',
                report_date='2023-11-04',
                cik='0001067983',
                company_name='BERKSHIRE HATHAWAY INC',
                tickers=[Ticker(symbol='BRK-B', exchange='NYSE'),
                         Ticker(symbol='BRK-A', exchange='NYSE')])]

Alternatively, you can also URLs in other formats and get the same answer:

metadatas = dl.get_filing_metadatas("https://www.sec.gov/Archives/edgar/data/1067983/000119312523272204/d564412d8k.htm")

Find latest filings by company ticker or CIK:

from sec_downloader.types import RequestedFilings

metadatas = dl.get_filing_metadatas(
    RequestedFilings(ticker_or_cik="MSFT", form_type="10-K", limit=2)
)
print(metadatas)
[FilingMetadata(accession_number='0000950170-23-035122',
                form_type='10-K',
                primary_doc_url='https://www.sec.gov/Archives/edgar/data/789019/000095017023035122/msft-20230630.htm',
                items='',
                primary_doc_description='10-K',
                filing_date='2023-07-27',
                report_date='2023-06-30',
                cik='0000789019',
                company_name='MICROSOFT CORP',
                tickers=[Ticker(symbol='MSFT', exchange='Nasdaq')]),
 FilingMetadata(accession_number='0001564590-22-026876',
                form_type='10-K',
                primary_doc_url='https://www.sec.gov/Archives/edgar/data/789019/000156459022026876/msft-10k_20220630.htm',
                items='',
                primary_doc_description='10-K',
                filing_date='2022-07-28',
                report_date='2022-06-30',
                cik='0000789019',
                company_name='MICROSOFT CORP',
                tickers=[Ticker(symbol='MSFT', exchange='Nasdaq')])]

Alternatively, you can also use any of these to get the same answer:

metadatas = dl.get_filing_metadatas("2/msft/10-K")
metadatas = dl.get_filing_metadatas("2/789019/10-K")
metadatas = dl.get_filing_metadatas("2/0000789019/10-K")

The parameters limit and form_type are optional. If omitted, limit defaults to 1, and form_type defaults to ‘10-Q’.

metadatas = dl.get_filing_metadatas("NFLX")
print(metadatas)
[FilingMetadata(accession_number='0001065280-23-000273',
                form_type='10-Q',
                primary_doc_url='https://www.sec.gov/Archives/edgar/data/1065280/000106528023000273/nflx-20230930.htm',
                items='',
                primary_doc_description='10-Q',
                filing_date='2023-10-20',
                report_date='2023-09-30',
                cik='0001065280',
                company_name='NETFLIX INC',
                tickers=[Ticker(symbol='NFLX', exchange='Nasdaq')])]

Alternatively, you can also use any of these to get the same answer:

metadatas = dl.get_filing_metadatas("nflx")
metadatas = dl.get_filing_metadatas("1/NFLX")
metadatas = dl.get_filing_metadatas("NFLX/10-Q")
metadatas = dl.get_filing_metadatas("1/NFLX/10-Q")
metadatas = dl.get_filing_metadatas(RequestedFilings(ticker_or_cik="NFLX"))
metadatas = dl.get_filing_metadatas(RequestedFilings(limit=1, ticker_or_cik="NFLX", form_type="10-Q"))

Download the HTML files

After obtaining the Primary Document URL, for example from the metadata, you can proceed to download the HTML using this URL.

for metadata in metadatas:
    html = dl.download_filing(url=metadata.primary_doc_url).decode()
    print(html[:50])
    break  # same for all filings, let's just print the first one
'<?xml version="1.0" ?><!--XBRL Document Created wi'

Alternative implementation: Wrapper

Files are downloaded to a temporary folder, immediately read into memory, and then deleted. Let’s demonstrate how to download a single file (latest 10-Q filing details in HTML format) to memory. The “glob” pattern is used to select which files are read to memory.

from sec_edgar_downloader import Downloader as SecEdgarDownloader
from sec_downloader.download_storage import DownloadStorage

ONLY_HTML = "**/*.htm*"

storage = DownloadStorage(filter_pattern=ONLY_HTML)
with storage as path:
    dl = SecEdgarDownloader("MyCompanyName", "email@example.com", path)
    dl.get("10-Q", "AAPL", limit=1, download_details=True)
# all files are now deleted and only stored in memory

content = storage.get_file_contents()[0].content
print(f"{content[:50]}...")
"<?xml version='1.0' encoding='ASCII'?>\n<html xmlns..."

Downloading multiple documents:

storage = DownloadStorage()
with storage as path:
    dl = SecEdgarDownloader("MyCompanyName", "email@example.com", path)
    dl.get("10-K", "GOOG", limit=2)
# all files are now deleted and only stored in memory

for path, content in storage.get_file_contents():
    print(f"Path: {path}\nContent [len={len(content)}]: {content[:30]}...\n")
('Path: sec-edgar-filings/GOOG/10-K/0001652044-24-000022/full-submission.txt\n'
 'Content [len=13927595]: <SEC-DOCUMENT>0001652044-24-00...\n')
('Path: sec-edgar-filings/GOOG/10-K/0001652044-23-000016/full-submission.txt\n'
 'Content [len=15264470]: <SEC-DOCUMENT>0001652044-23-00...\n')

Contributing

Follow these steps to install the project locally for development:

  1. Install the project with the command pip install -e ".[dev]".

Note We highly recommend using virtual environments for Python development. If you’d like to use virtual environments, follow these steps instead:

  • Create a virtual environment python3 -m venv .venv
  • Activate the virtual environment source .venv/bin/activate
  • Install the project with the command pip install -e ".[dev]"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sec-downloader-0.11.1.tar.gz (13.7 kB view details)

Uploaded Source

Built Distribution

sec_downloader-0.11.1-py3-none-any.whl (11.4 kB view details)

Uploaded Python 3

File details

Details for the file sec-downloader-0.11.1.tar.gz.

File metadata

  • Download URL: sec-downloader-0.11.1.tar.gz
  • Upload date:
  • Size: 13.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.4

File hashes

Hashes for sec-downloader-0.11.1.tar.gz
Algorithm Hash digest
SHA256 48ff5199b91d0f5393650e028bfefbb9f2f8e33665014cd75e8ed688339519c2
MD5 d0603360e14cb264499f86a74551ba07
BLAKE2b-256 f33e3804ce8afeeb6155f415d0902d37154243a7955ebae1851a540162ea42b8

See more details on using hashes here.

File details

Details for the file sec_downloader-0.11.1-py3-none-any.whl.

File metadata

File hashes

Hashes for sec_downloader-0.11.1-py3-none-any.whl
Algorithm Hash digest
SHA256 57b09dcc1286ef2e357da2f90b6baf2ecb959a64140fdb8e7fcfd3301be74bdb
MD5 f10112e00b3c653c262ba5e189868cdd
BLAKE2b-256 a579ad90ebabbc3d4f8c2fed8b7900f661340089eaa6540265a56785970767cf

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page