A tool that replicates the quarterly Financial Statement Datasets from the SEC (https://www.sec.gov/dera/data/financial-statement-data-sets), but on a daily basis.
Project description
SEC Financial Statement Data Set Daily Processing
Purpose
The secdaily package replicates the quarterly Financial Statement Datasets from the SEC, but on a daily basis. While the SEC only provides these datasets once per quarter, this tool allows you to:
- Add daily updates by processing new 10-K and 10-Q filings as they become available
- Generate daily zip files in the same format as the official quarterly datasets
This enables financial analysts, researchers, and developers to access structured financial statement data without waiting for the quarterly releases.
Installation
The package requires Python 3.10 or higher. Install using pip:
pip install secdaily
Usage
The main entry point is the SecDailyOrchestrator class. Here's a basic example:
from secdaily.SecDaily import SecDailyOrchestrator
# Initialize the orchestrator
orchestrator = SecDailyOrchestrator(
workdir="/path/to/your/data/directory/",
user_agent_def="Your Company Name yourname@example.com",
start_year=2024, # Optional: specify starting year
start_qrtr=1 # Optional: specify starting quarter
)
# Run the full process
orchestrator.process()
Parameters
workdir: Directory where all data will be stored (including the SQLite database)user_agent_def: Required - Your user agent string for SEC.gov requests. Must follow the format specified in SEC's EDGAR access requirements: "Company Name contact@company.com"start_year: Optional - Year to start processing from (defaults to current year)start_qrtr: Optional - Quarter to start processing from (defaults to current quarter)
Individual Process Steps
You can also run individual parts of the process:
# Only process index data
orchestrator.process_index_data()
# Only process XML data
orchestrator.process_xml_data()
# Only create SEC-style formatted files
orchestrator.create_sec_style()
# Only create daily zip files
orchestrator.create_daily_zip()
Directory Structure of the Created Data
The tool creates the following directory structure in your specified workdir:
workdir/
├── sec_processing.db # SQLite database for tracking processing
├── _1_xml/ # Downloaded XML files
│ ├── 2024q4/
│ │ ├── 2024-10-01/
│ │ │ ├── xyz_htm.xml.zip
│ │ │ ├── xyz_pre.xml.zip
│ │ │ ├── xyz_lab.xml.zip
│ │ │ └── ...
│ │ └── ...
│ └── ...
├── _2_csv/ # Parsed CSV files
│ ├── 2024q4/
│ │ ├── 2024-10-01/
│ │ │ ├── xyz_num.csv.zip
│ │ │ ├── xyz_pre.csv.zip
│ │ │ ├── xyz_lab.csv.zip
│ │ │ └── ...
│ │ └── ...
│ └── ...
├── _3_secstyle/ # SEC-style formatted files
│ ├── 2024q4/
│ │ ├── 2024-10-01/
│ │ │ ├── xyz_num.csv.zip
│ │ │ ├── xyz_pre.csv.zip
│ │ │ └── ...
│ │ └── ...
│ └── ...
└── _4_daily/ # Daily zip files
├── 2024q4/
│ ├── 20241001.zip
│ ├── 20241002.zip
│ └── ...
└── ...
Each daily zip file contains:
sub.txt- Submission informationpre.txt- Presentation informationnum.txt- Numeric data
Limitations
num.txtdoesn't contain content for the segments column- XBRL data embedded in HTML files (approximately 20% of reports) is not processed yet
- Numbering of columns "report" and "line" in
pre.txtmay not be the same as in the quarterly files, but the order should be the same - The tool throttles requests to SEC.gov to comply with their limit of 10 requests per second
High-level Process Description
- Index Processing: Parse SEC's index.json to identify new filings
- XML Processing: Download and extract necessary XML files
- Data Parsing: Process the XML files into CSV format (creating initial versions of
num.txt,pre.txt,lab.txt) - SEC-style Formatting: Format the data to match the official SEC dataset structure
- Daily Zip Creation: Package the formatted data into daily zip files
Robustness Features
- Implements retry mechanisms for failed downloads
- Uses a SQLite database to track processing state, allowing for safe restarts
- Throttles requests to comply with SEC.gov's rate limits
- Stores downloaded and created files in a compressed format to conserve disk space
- Uses parallel processing where appropriate for improved performance
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
SEC Financial Statement Data Sets Tools (secfsdstools)
Also check out the SEC Financial Statement Data Sets Tools project.
Links
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file secdaily-0.1.0.tar.gz.
File metadata
- Download URL: secdaily-0.1.0.tar.gz
- Upload date:
- Size: 2.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
94afe2c670e2857798e6c56af2a787c548e09c034658dea41d814e849f97209c
|
|
| MD5 |
b52ba5950131f8c2b7c7b4b03005de8f
|
|
| BLAKE2b-256 |
ca8b76a4e51c844fe84508a4cfd53ea6387a051088937948e95b2e48af2ae81b
|
File details
Details for the file secdaily-0.1.0-py3-none-any.whl.
File metadata
- Download URL: secdaily-0.1.0-py3-none-any.whl
- Upload date:
- Size: 2.1 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
93a9687836ff871be20d1686b1ebb73a396c01081f220b8819d92ab22fe504a4
|
|
| MD5 |
a3c2fbb034e29afe4fd032fc50b93185
|
|
| BLAKE2b-256 |
ee2531bbd7a3357b4a2dbecc3d72e131cd8c7d03ca2930fde195c70835fc3803
|