Skip to main content

A tool that replicates the quarterly Financial Statement Datasets from the SEC (https://www.sec.gov/dera/data/financial-statement-data-sets), but on a daily basis.

Project description

Purpose

The purpose of this project is to download new 10-K and 10-Q reports from edgar at sec.gov and parse and preprocess these xml files in a way, so that structure of the resulting csv files is similar to the structure of the "Financial Statement Datasets" from the sec.gov. While the "Financial Statement Dataset" is only provided once for every quarter, this project has the goal to provide the same data on a daily basis.

Highlevel Process Description

The implementation is "robust". It uses several fail-over and retry measures to ensure that code can run automatically without the need of manual restarts. However, should it be necessary, it also isn't a problem to restart process manually. It also ensures the the access to the sec.gov site is throttled (there is a limit of 10 request per second) and the logic uses parallel processing if meaningful.

In order to keep track of the different steps of the process, a simple SQLite database is used.

The main steps of the process are as follows:

  1. check https://www.sec.gov/Archives/edgar/monthly/index.json for a new monthly file or an update on an existing monthly file
  2. if there are new and or updated monthly files, download and parse them.
  3. add the meta information for new 10k and 10q reports to the appropriate table
  4. select unprocessed reports and create appropriate entries in the processing table
  5. select reports for which the xml-files have not been downloaded and download this files
  6. select reports for which the downloaded xml-files have not been parsed already and parse them
  7. for every filing day, create a new zipfile containing all the information for all reports which were filed on that day. use the same structure as used in the "Financial Statement Data Sets"

Folder content of the Project

  1. ddl
    This folder contains the flyway scripts to setup the used SQLite DB.
  2. doc
    This folder contains the documentation of the project.
  3. src
    The source code of the project.
  4. test
    Unit Tests
  5. test_ext
    "Extented" testing, contains three subfolders:
    1. testintegration
      Mainly contains "mass-testing" code. This is code that is used to compare the parse content with the original content of the "Financial Statement Data Sets" Zip-Files.
    2. trials
      A Sandbox to try out different things and with code that might be worth to keep
    3. utils_debug
      Some code the helps to simplify debugging of parsing issues

Setup and first run

Setup the Python environment

The simplest way to setup the environment is do use the conda envinronment.yml file, provided that you have miniconda or anaconda installed. just execute

conda env create --file environment.yml

This will create a new conda python environment based on Python 3.7 with the name "sec_processing".

If you wanna setup your environment manually, create a new python 3.7 environment and install the packages

  • pandas
  • lxml
  • requests
  • pytest

First run

In order to excute the download and the parsing of the reports, just instantiate the SecDataOrchestrator form the SecData module and call the process method. Note: when creating an instance of the SecDataOrchestrator, you have to provide the folder, in which the sqlite-db file was created. If you don't any additional information, then the SecDataOrchestrator will start to download and parse the reports from the following and the 3 previous months.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

secdaily-0.0.1.tar.gz (2.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

secdaily-0.0.1-py3-none-any.whl (2.1 MB view details)

Uploaded Python 3

File details

Details for the file secdaily-0.0.1.tar.gz.

File metadata

  • Download URL: secdaily-0.0.1.tar.gz
  • Upload date:
  • Size: 2.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for secdaily-0.0.1.tar.gz
Algorithm Hash digest
SHA256 674d027cf31f7526fc2038643ee2c08b75ac2ff544fc08ea20299e1023425c0a
MD5 5dea3327ae74c1c43d7ad88b8d3417dd
BLAKE2b-256 71561c5d977d53a5f3625b7a96f932d8f224b3677808467942cc7890916fc54e

See more details on using hashes here.

File details

Details for the file secdaily-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: secdaily-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 2.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for secdaily-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0555f52e8f08bf8ac7fd34019a9a21d613cd492ec9fab4a21257b92c2fc24d4a
MD5 ea0a03cc4dfaf30ac880aa85debcaa54
BLAKE2b-256 0f6873d3d63909ef46408554e75fdb0ed3c0354c9a879d775353fcc93104a1f4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page