Useful extensions for sec-edgar-downloader.
Project description
sec-downloader
Useful extensions for sec-edgar-downloader. Built with nbdev.
Install
pip install sec_downloader
Features
- Files are downloaded to a temporary folder, immediately read into memory, and then deleted.
- Use “glob” pattern to select which files are read to memory.
How to use
Option 1: Wrapper of sec-edgar-downloader
Let’s demonstrate how to download a single file (latest 10-Q filing details in HTML format) to memory.
from sec_downloader import Downloader
dl = Downloader("MyCompanyName", "email@example.com")
html = dl.get_latest_html("10-Q", "AAPL")
# Use dl.get_latest_n_html("10-Q", "AAPL", n=5) to get the latest 5 10-Qs
print(f"{html[:50]}...")
<?xml version="1.0" ?><!--XBRL Document Created wi...
Note The company name and email address are used to form a user-agent string that adheres to the SEC EDGAR’s fair access policy for programmatic downloading. Source
Which is implemented approximately as:
from sec_edgar_downloader import Downloader as SecEdgarDownloader
from sec_downloader import DownloadStorage
ONLY_HTML = "**/*.htm*"
storage = DownloadStorage(filter_pattern=ONLY_HTML)
with storage as path:
dl = SecEdgarDownloader("MyCompanyName", "email@example.com", path)
dl.get("10-Q", "AAPL", limit=1, download_details=True)
# all files are now deleted and only stored in memory
content = storage.get_file_contents()[0].content
print(f"{content[:50]}...")
<?xml version="1.0" ?><!--XBRL Document Created wi...
Downloading multiple documents:
storage = DownloadStorage()
with storage as path:
dl = SecEdgarDownloader("MyCompanyName", "email@example.com", path)
dl.get("10-K", "GOOG", limit=2)
# all files are now deleted and only stored in memory
for path, content in storage.get_file_contents():
print(f"Path: {path}\nContent [len={len(content)}]: {content[:30]}...\n")
Path: sec-edgar-filings/GOOG/10-K/0001652044-22-000019/full-submission.txt
Content [len=15044932]: <SEC-DOCUMENT>0001652044-22-00...
Path: sec-edgar-filings/GOOG/10-K/0001652044-23-000016/full-submission.txt
Content [len=15264470]: <SEC-DOCUMENT>0001652044-23-00...
Option 2: Fork implementation of sec-edgar-downloader
Download the metadata
dl = Downloader("MyCompanyName", "email@example.com")
dl.get_filing_metadata(accession_number="0000320193-23-000077")
FilingMetadata(accession_number='0000320193-23-000077', form_type='10-Q', primary_doc_url='https://www.sec.gov/Archives/edgar/data/320193/000032019323000077/aapl-20230701.htm', items='', primary_doc_description='10-Q', filing_date='2023-08-04', report_date='2023-07-01', company_name='Apple Inc.', tickers=[Ticker(symbol='AAPL', exchange='Nasdaq')])
dl = Downloader("MyCompanyName", "email@example.com")
metadatas = dl.get_filing_metadatas(
[
# Here you can provide any number of these:
# -----------------------------------------
# EXAMPLE 1: Accession Number
"0000320193-23-000077",
# -----------------------------------------
# EXAMPLE 2: SEC EDGAR Filing URL
"https://www.sec.gov/ix?doc=/Archives/edgar/data/320193/000032019323000077/aapl-20230701.htm",
# -----------------------------------------
# EXAMPLE 3: Latest 10-Q filing from Netflix
# Note: Use a Ticker or CIK. Format: [amount=1]/ticker_or_cik/[form_type=10-Q]
"NFLX",
# -----------------------------------------
# Example 4: Two latest 10-K filings from Microsoft
# Note: Equivalent to RequestedFilings(limit=2, ticker_or_cik="MSFT", form_type="10-K")
"2/MSFT/10-K",
]
)
# Below is just for demo purposes to view the values in the result
import pandas as pd
from dataclasses import asdict
r = pd.DataFrame([asdict(metadata) for metadata in metadatas])
r = r[["company_name"] + [col for col in r.columns if col != "company_name"]]
r
| company_name | accession_number | form_type | primary_doc_url | items | primary_doc_description | filing_date | report_date | tickers | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | Apple Inc. | 0000320193-23-000077 | 10-Q | https://www.sec.gov/Archives/edgar/data/320193... | 10-Q | 2023-08-04 | 2023-07-01 | [{'symbol': 'AAPL', 'exchange': 'Nasdaq'}] | |
| 1 | Apple Inc. | 0000320193-23-000077 | 10-Q | https://www.sec.gov/Archives/edgar/data/320193... | 10-Q | 2023-08-04 | 2023-07-01 | [{'symbol': 'AAPL', 'exchange': 'Nasdaq'}] | |
| 2 | NETFLIX INC | 0001065280-23-000273 | 10-Q | https://www.sec.gov/Archives/edgar/data/106528... | 10-Q | 2023-10-20 | 2023-09-30 | [{'symbol': 'NFLX', 'exchange': 'Nasdaq'}] | |
| 3 | MICROSOFT CORP | 0000950170-23-035122 | 10-K | https://www.sec.gov/Archives/edgar/data/789019... | 10-K | 2023-07-27 | 2023-06-30 | [{'symbol': 'MSFT', 'exchange': 'Nasdaq'}] | |
| 4 | MICROSOFT CORP | 0001564590-22-026876 | 10-K | https://www.sec.gov/Archives/edgar/data/789019... | 10-K | 2022-07-28 | 2022-06-30 | [{'symbol': 'MSFT', 'exchange': 'Nasdaq'}] |
Download the HTML files
You can download the HTML for any of the filings:
for filing in dl.download_filings(metadatas):
html = filing.primary_document.decode()
print(html[:50])
break # same for all filings, let's just print the first one
<?xml version="1.0" ?><!--XBRL Document Created wi
Contributing
Follow these steps to install the project locally for development:
- Install the project with the command
pip install -e ".[dev]".
Note We highly recommend using virtual environments for Python development. If you’d like to use virtual environments, follow these steps instead: - Create a virtual environment
python3 -m venv .venv- Activate the virtual environmentsource .venv/bin/activate- Install the project with the commandpip install -e ".[dev]"
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sec-downloader-0.4.2.tar.gz.
File metadata
- Download URL: sec-downloader-0.4.2.tar.gz
- Upload date:
- Size: 12.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f94f3457eaa1d23f89bf18bb93f208b5c55ca089f1d058f6bb4d742e803094c9
|
|
| MD5 |
dacff132f488657d1737d375028d3699
|
|
| BLAKE2b-256 |
8011cd5d3c4fe0d07a5304f0f22f70c0ea164f24e3de2f7bfb41d73246443bd2
|
File details
Details for the file sec_downloader-0.4.2-py3-none-any.whl.
File metadata
- Download URL: sec_downloader-0.4.2-py3-none-any.whl
- Upload date:
- Size: 10.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fb24451d71ffd48ff86296673446cd72e130eea0ce43350e77ccfeb0c4e3fbb5
|
|
| MD5 |
2880a7cec84c20301d9106dac87fceff
|
|
| BLAKE2b-256 |
492856baabbbc256164cae83b048e0cc89f9f2926ba72e4c0670c0dcd360415b
|