Useful extensions for sec-edgar-downloader.
Project description
sec-downloader
Useful extensions for sec-edgar-downloader. Built with nbdev.
Install
pip install sec_downloader
Features
- Files are downloaded to a temporary folder, immediately read into memory, and then deleted.
- Use “glob” pattern to select which files are read to memory.
How to use
Option 1: Wrapper of sec-edgar-downloader
Let’s demonstrate how to download a single file (latest 10-Q filing details in HTML format) to memory.
from sec_downloader import Downloader
dl = Downloader("MyCompanyName", "email@example.com")
html = dl.get_latest_html("10-Q", "AAPL")
# Use dl.get_latest_n_html("10-Q", "AAPL", n=5) to get the latest 5 10-Qs
print(f"{html[:50]}...")
ImportError: cannot import name 'Filing' from 'sec_downloader.types' (/Users/user/Development/alphanome-ai/sec-downloader/sec_downloader/types.py)
Note The company name and email address are used to form a user-agent string that adheres to the SEC EDGAR’s fair access policy for programmatic downloading. Source
Which is implemented approximately as:
from sec_edgar_downloader import Downloader as SecEdgarDownloader
from sec_downloader import DownloadStorage
ONLY_HTML = "**/*.htm*"
storage = DownloadStorage(filter_pattern=ONLY_HTML)
with storage as path:
dl = SecEdgarDownloader("MyCompanyName", "email@example.com", path)
dl.get("10-Q", "AAPL", limit=1, download_details=True)
# all files are now deleted and only stored in memory
content = storage.get_file_contents()[0].content
print(f"{content[:50]}...")
<?xml version="1.0" ?><!--XBRL Document Created wi...
Downloading multiple documents:
storage = DownloadStorage()
with storage as path:
dl = SecEdgarDownloader("MyCompanyName", "email@example.com", path)
dl.get("10-K", "GOOG", limit=2)
# all files are now deleted and only stored in memory
for path, content in storage.get_file_contents():
print(f"Path: {path}\nContent [len={len(content)}]: {content[:30]}...\n")
Path: sec-edgar-filings/GOOG/10-K/0001652044-22-000019/full-submission.txt
Content [len=15044932]: <SEC-DOCUMENT>0001652044-22-00...
Path: sec-edgar-filings/GOOG/10-K/0001652044-23-000016/full-submission.txt
Content [len=15264470]: <SEC-DOCUMENT>0001652044-23-00...
Option 2: Fork implementation of sec-edgar-downloader
Download the metadata
dl = Downloader("MyCompanyName", "email@example.com")
dl.get_filing_metadata(accession_number="0000320193-23-000077")
FilingMetadata(accession_number='0000320193-23-000077', form_type='10-Q', primary_doc_url='https://www.sec.gov/Archives/edgar/data/320193/000032019323000077/aapl-20230701.htm', items='', primary_doc_description='10-Q', filing_date='2023-08-04', report_date='2023-07-01', company_name='Apple Inc.', tickers=[Ticker(symbol='AAPL', exchange='Nasdaq')])
dl = Downloader("MyCompanyName", "email@example.com")
metadatas = dl.get_filing_metadatas(
[
# Here you can provide any number of these:
# -----------------------------------------
# EXAMPLE 1: Accession Number
"0000320193-23-000077",
# -----------------------------------------
# EXAMPLE 2: SEC EDGAR Filing URL
"https://www.sec.gov/ix?doc=/Archives/edgar/data/320193/000032019323000077/aapl-20230701.htm",
# -----------------------------------------
# EXAMPLE 3: Latest 10-Q filing from Netflix
# Note: Use a Ticker or CIK. Format: [amount=1]/ticker_or_cik/[form_type=10-Q]
"NFLX",
# -----------------------------------------
# Example 4: Two latest 10-K filings from Microsoft
# Note: Equivalent to RequestedFilings(limit=2, ticker_or_cik="MSFT", form_type="10-K")
"2/MSFT/10-K",
]
)
# Below is just for demo purposes to view the values in the result
import pandas as pd
from dataclasses import asdict
r = pd.DataFrame([asdict(metadata) for metadata in metadatas])
r = r[["company_name"] + [col for col in r.columns if col != "company_name"]]
r
company_name | accession_number | form_type | primary_doc_url | items | primary_doc_description | filing_date | report_date | tickers | |
---|---|---|---|---|---|---|---|---|---|
0 | Apple Inc. | 0000320193-23-000077 | 10-Q | https://www.sec.gov/Archives/edgar/data/320193... | 10-Q | 2023-08-04 | 2023-07-01 | [{'symbol': 'AAPL', 'exchange': 'Nasdaq'}] | |
1 | Apple Inc. | 0000320193-23-000077 | 10-Q | https://www.sec.gov/Archives/edgar/data/320193... | 10-Q | 2023-08-04 | 2023-07-01 | [{'symbol': 'AAPL', 'exchange': 'Nasdaq'}] | |
2 | NETFLIX INC | 0001065280-23-000273 | 10-Q | https://www.sec.gov/Archives/edgar/data/106528... | 10-Q | 2023-10-20 | 2023-09-30 | [{'symbol': 'NFLX', 'exchange': 'Nasdaq'}] | |
3 | MICROSOFT CORP | 0000950170-23-035122 | 10-K | https://www.sec.gov/Archives/edgar/data/789019... | 10-K | 2023-07-27 | 2023-06-30 | [{'symbol': 'MSFT', 'exchange': 'Nasdaq'}] | |
4 | MICROSOFT CORP | 0001564590-22-026876 | 10-K | https://www.sec.gov/Archives/edgar/data/789019... | 10-K | 2022-07-28 | 2022-06-30 | [{'symbol': 'MSFT', 'exchange': 'Nasdaq'}] |
Download the HTML files
You can download the HTML for any of the filings:
for metadata in metadatas:
html = dl.download_filing(url=metadata.primary_doc_url).decode()
print(html[:50])
break # same for all filings, let's just print the first one
<?xml version="1.0" ?><!--XBRL Document Created wi
Contributing
Follow these steps to install the project locally for development:
- Install the project with the command
pip install -e ".[dev]"
.
Note We highly recommend using virtual environments for Python development. If you’d like to use virtual environments, follow these steps instead: - Create a virtual environment
python3 -m venv .venv
- Activate the virtual environmentsource .venv/bin/activate
- Install the project with the commandpip install -e ".[dev]"
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for sec_downloader-0.5.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 60f43395a92e09f5ec333591407c09d29546f5d177cef492b8a24bc3f90392e8 |
|
MD5 | e614afada5fc7e08a222545bdb3b2115 |
|
BLAKE2b-256 | be5299b3fd2dcf6219375e6522c17ab9d8f3230aff285cbf878fed193a8c7118 |