A tool that replicates the quarterly Financial Statement Datasets from the SEC (https://www.sec.gov/dera/data/financial-statement-data-sets), but on a daily basis.
Project description
Purpose
The purpose of this project is to download new 10-K and 10-Q reports from edgar at sec.gov and parse and preprocess these xml files in a way, so that structure of the resulting csv files is similar to the structure of the "Financial Statement Datasets" from the sec.gov. While the "Financial Statement Dataset" is only provided once for every quarter, this project has the goal to provide the same data on a daily basis.
Highlevel Process Description
The implementation is "robust". It uses several fail-over and retry measures to ensure that code can run automatically without the need of manual restarts. However, should it be necessary, it also isn't a problem to restart process manually. It also ensures the the access to the sec.gov site is throttled (there is a limit of 10 request per second) and the logic uses parallel processing if meaningful.
In order to keep track of the different steps of the process, a simple SQLite database is used.
The main steps of the process are as follows:
- check https://www.sec.gov/Archives/edgar/monthly/index.json for a new monthly file or an update on an existing monthly file
- if there are new and or updated monthly files, download and parse them.
- add the meta information for new 10k and 10q reports to the appropriate table
- select unprocessed reports and create appropriate entries in the processing table
- select reports for which the xml-files have not been downloaded and download this files
- select reports for which the downloaded xml-files have not been parsed already and parse them
- for every filing day, create a new zipfile containing all the information for all reports which were filed on that day. use the same structure as used in the "Financial Statement Data Sets"
Folder content of the Project
- ddl
This folder contains the flyway scripts to setup the used SQLite DB. - doc
This folder contains the documentation of the project. - src
The source code of the project. - test
Unit Tests - test_ext
"Extented" testing, contains three subfolders:- testintegration
Mainly contains "mass-testing" code. This is code that is used to compare the parse content with the original content of the "Financial Statement Data Sets" Zip-Files. - trials
A Sandbox to try out different things and with code that might be worth to keep - utils_debug
Some code the helps to simplify debugging of parsing issues
- testintegration
Setup and first run
Setup the Python environment
The simplest way to setup the environment is do use the conda envinronment.yml file, provided that you have miniconda or anaconda installed. just execute
conda env create --file environment.yml
This will create a new conda python environment based on Python 3.7 with the name "sec_processing".
If you wanna setup your environment manually, create a new python 3.7 environment and install the packages
- pandas
- lxml
- requests
- pytest
First run
In order to excute the download and the parsing of the reports, just instantiate the SecDataOrchestrator form the SecData module and call the process method. Note: when creating an instance of the SecDataOrchestrator, you have to provide the folder, in which the sqlite-db file was created. If you don't any additional information, then the SecDataOrchestrator will start to download and parse the reports from the following and the 3 previous months.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file secdaily-0.0.1.tar.gz.
File metadata
- Download URL: secdaily-0.0.1.tar.gz
- Upload date:
- Size: 2.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
674d027cf31f7526fc2038643ee2c08b75ac2ff544fc08ea20299e1023425c0a
|
|
| MD5 |
5dea3327ae74c1c43d7ad88b8d3417dd
|
|
| BLAKE2b-256 |
71561c5d977d53a5f3625b7a96f932d8f224b3677808467942cc7890916fc54e
|
File details
Details for the file secdaily-0.0.1-py3-none-any.whl.
File metadata
- Download URL: secdaily-0.0.1-py3-none-any.whl
- Upload date:
- Size: 2.1 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0555f52e8f08bf8ac7fd34019a9a21d613cd492ec9fab4a21257b92c2fc24d4a
|
|
| MD5 |
ea0a03cc4dfaf30ac880aa85debcaa54
|
|
| BLAKE2b-256 |
0f6873d3d63909ef46408554e75fdb0ed3c0354c9a879d775353fcc93104a1f4
|