Useful extensions for sec-edgar-downloader.
Project description
sec-downloader
A better version of sec-edgar-downloader
. Includes an alternative
implementation (a wrapper instead of a fork), to keep compatibility with
new sec-edgar-downloader
releases. This library partially uses
nbdev.
Features
Advantages over sec-edgar-downloader
:
Flexibility in Download Process
- Tailored for choosing what, where, and how to download.
- Files stored in memory for faster operations and no unnecessary disk clutter.
Separate Metadata and File Downloads
- Easily skip unneeded files.
- Download metadata first, then selectively download files.
- Option to save metadata for better organization.
More Input Options
- Ticker or CIK (e.g.,
AAPL
,0000320193
) for latest filings. - Accession Number (e.g.,
0000320193-23-000077
). Not supported insec-edgar-downloader
. - SEC EDGAR URL (e.g.,
https://www.sec.gov/ix?doc=/Archives/edgar/data/0001067983/000119312523272204/d564412d8k.htm
). Not supported insec-edgar-downloader
.
Install
pip install sec-downloader
How to use
Download the metadata
Note The company name and email address are used to form a user-agent string that adheres to the SEC EDGAR’s fair access policy for programmatic downloading. Source
from sec_downloader import Downloader
dl = Downloader("MyCompanyName", "email@example.com")
Find a filing with an Accession Number
metadatas = dl.get_filing_metadatas("AAPL/0000320193-23-000077")
print(metadatas)
[FilingMetadata(accession_number='0000320193-23-000077',
form_type='10-Q',
primary_doc_url='https://www.sec.gov/Archives/edgar/data/320193/000032019323000077/aapl-20230701.htm',
items='',
primary_doc_description='10-Q',
filing_date='2023-08-04',
report_date='2023-07-01',
cik='0000320193',
company_name='Apple Inc.',
tickers=[Ticker(symbol='AAPL', exchange='Nasdaq')])]
Alternatively, you can also use any of these to get the same answer:
metadatas = dl.get_filing_metadatas("aapl/000032019323000077")
metadatas = dl.get_filing_metadatas("320193/000032019323000077")
metadatas = dl.get_filing_metadatas("320193/0000320193-23-000077")
metadatas = dl.get_filing_metadatas("0000320193/0000320193-23-000077")
metadatas = dl.get_filing_metadatas(CompanyAndAccessionNumber(ticker_or_cik="320193", accession_number="0000320193-23-000077"))
Find the filing matching a SEC EDGAR Filing URL. Only CIK and Accession Number are used from the URL:
metadatas = dl.get_filing_metadatas(
"https://www.sec.gov/ix?doc=/Archives/edgar/data/0001067983/000119312523272204/d564412d8k.htm"
)
print(metadatas)
[FilingMetadata(accession_number='0001193125-23-272204',
form_type='8-K',
primary_doc_url='https://www.sec.gov/Archives/edgar/data/1067983/000119312523272204/d564412d8k.htm',
items='2.02,9.01',
primary_doc_description='8-K',
filing_date='2023-11-07',
report_date='2023-11-04',
cik='0001067983',
company_name='BERKSHIRE HATHAWAY INC',
tickers=[Ticker(symbol='BRK-B', exchange='NYSE'),
Ticker(symbol='BRK-A', exchange='NYSE')])]
Alternatively, you can also URLs in other formats and get the same answer:
metadatas = dl.get_filing_metadatas("https://www.sec.gov/Archives/edgar/data/1067983/000119312523272204/d564412d8k.htm")
Find latest filings by company ticker or CIK:
from sec_downloader.types import RequestedFilings
metadatas = dl.get_filing_metadatas(
RequestedFilings(ticker_or_cik="MSFT", form_type="10-K", limit=2)
)
print(metadatas)
[FilingMetadata(accession_number='0000950170-23-035122',
form_type='10-K',
primary_doc_url='https://www.sec.gov/Archives/edgar/data/789019/000095017023035122/msft-20230630.htm',
items='',
primary_doc_description='10-K',
filing_date='2023-07-27',
report_date='2023-06-30',
cik='0000789019',
company_name='MICROSOFT CORP',
tickers=[Ticker(symbol='MSFT', exchange='Nasdaq')]),
FilingMetadata(accession_number='0001564590-22-026876',
form_type='10-K',
primary_doc_url='https://www.sec.gov/Archives/edgar/data/789019/000156459022026876/msft-10k_20220630.htm',
items='',
primary_doc_description='10-K',
filing_date='2022-07-28',
report_date='2022-06-30',
cik='0000789019',
company_name='MICROSOFT CORP',
tickers=[Ticker(symbol='MSFT', exchange='Nasdaq')])]
Alternatively, you can also use any of these to get the same answer:
metadatas = dl.get_filing_metadatas("2/msft/10-K")
metadatas = dl.get_filing_metadatas("2/789019/10-K")
metadatas = dl.get_filing_metadatas("2/0000789019/10-K")
The parameters limit
and form_type
are optional. If omitted, limit
defaults to 1, and form_type
defaults to ‘10-Q’.
metadatas = dl.get_filing_metadatas("NFLX")
print(metadatas)
[FilingMetadata(accession_number='0001065280-23-000273',
form_type='10-Q',
primary_doc_url='https://www.sec.gov/Archives/edgar/data/1065280/000106528023000273/nflx-20230930.htm',
items='',
primary_doc_description='10-Q',
filing_date='2023-10-20',
report_date='2023-09-30',
cik='0001065280',
company_name='NETFLIX INC',
tickers=[Ticker(symbol='NFLX', exchange='Nasdaq')])]
Alternatively, you can also use any of these to get the same answer:
metadatas = dl.get_filing_metadatas("nflx")
metadatas = dl.get_filing_metadatas("1/NFLX")
metadatas = dl.get_filing_metadatas("NFLX/10-Q")
metadatas = dl.get_filing_metadatas("1/NFLX/10-Q")
metadatas = dl.get_filing_metadatas(RequestedFilings(ticker_or_cik="NFLX"))
metadatas = dl.get_filing_metadatas(RequestedFilings(limit=1, ticker_or_cik="NFLX", form_type="10-Q"))
Download the HTML files
After obtaining the Primary Document URL, for example from the metadata, you can proceed to download the HTML using this URL.
for metadata in metadatas:
html = dl.download_filing(url=metadata.primary_doc_url).decode()
print(html[:50])
break # same for all filings, let's just print the first one
'<?xml version="1.0" ?><!--XBRL Document Created wi'
Alternative implementation: Wrapper
Files are downloaded to a temporary folder, immediately read into memory, and then deleted. Let’s demonstrate how to download a single file (latest 10-Q filing details in HTML format) to memory. The “glob” pattern is used to select which files are read to memory.
from sec_edgar_downloader import Downloader as SecEdgarDownloader
from sec_downloader.download_storage import DownloadStorage
ONLY_HTML = "**/*.htm*"
storage = DownloadStorage(filter_pattern=ONLY_HTML)
with storage as path:
dl = SecEdgarDownloader("MyCompanyName", "email@example.com", path)
dl.get("10-Q", "AAPL", limit=1, download_details=True)
# all files are now deleted and only stored in memory
content = storage.get_file_contents()[0].content
print(f"{content[:50]}...")
"<?xml version='1.0' encoding='ASCII'?>\n<html xmlns..."
Downloading multiple documents:
storage = DownloadStorage()
with storage as path:
dl = SecEdgarDownloader("MyCompanyName", "email@example.com", path)
dl.get("10-K", "GOOG", limit=2)
# all files are now deleted and only stored in memory
for path, content in storage.get_file_contents():
print(f"Path: {path}\nContent [len={len(content)}]: {content[:30]}...\n")
('Path: sec-edgar-filings/GOOG/10-K/0001652044-24-000022/full-submission.txt\n'
'Content [len=13927595]: <SEC-DOCUMENT>0001652044-24-00...\n')
('Path: sec-edgar-filings/GOOG/10-K/0001652044-23-000016/full-submission.txt\n'
'Content [len=15264470]: <SEC-DOCUMENT>0001652044-23-00...\n')
Contributing
Follow these steps to install the project locally for development:
- Install the project with the command
pip install -e ".[dev]"
.
Note We highly recommend using virtual environments for Python development. If you’d like to use virtual environments, follow these steps instead:
- Create a virtual environment
python3 -m venv .venv
- Activate the virtual environment
source .venv/bin/activate
- Install the project with the command
pip install -e ".[dev]"
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file sec-downloader-0.11.1.tar.gz
.
File metadata
- Download URL: sec-downloader-0.11.1.tar.gz
- Upload date:
- Size: 13.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 48ff5199b91d0f5393650e028bfefbb9f2f8e33665014cd75e8ed688339519c2 |
|
MD5 | d0603360e14cb264499f86a74551ba07 |
|
BLAKE2b-256 | f33e3804ce8afeeb6155f415d0902d37154243a7955ebae1851a540162ea42b8 |
File details
Details for the file sec_downloader-0.11.1-py3-none-any.whl
.
File metadata
- Download URL: sec_downloader-0.11.1-py3-none-any.whl
- Upload date:
- Size: 11.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 57b09dcc1286ef2e357da2f90b6baf2ecb959a64140fdb8e7fcfd3301be74bdb |
|
MD5 | f10112e00b3c653c262ba5e189868cdd |
|
BLAKE2b-256 | a579ad90ebabbc3d4f8c2fed8b7900f661340089eaa6540265a56785970767cf |