Skip to main content

Parse SEC EDGAR HTML documents into a tree of elements that correspond to the visual structure of the document.

Project description

 

sec-parser

Essentials ➔       Documentation Status Licence Project Type: Federation Beta
Health ➔              GitHub Workflow Status: ci.yml GitHub Workflow Status: cd.yml Last Commit
Quality ➔             codecov Code Style: Black Ruff
Distribution ➔    PyPI version PyPI - Python Version PyPI downloads
Community ➔     HitCount X (formerly Twitter) Follow GitHub stars

Parse SEC EDGAR HTML documents into a tree of elements that correspond to the visual structure of the document.

Overview

The sec-parser project simplifies extracting meaningful information from SEC EDGAR HTML documents by organizing them into semantic elements and a tree structure. Semantic elements might include section titles, paragraphs, and tables, each classified for easier data manipulation. This forms a semantic tree that corresponds to the visual and informational structure of the document.

This tool is especially beneficial for Artificial Intelligence (AI) and Large Language Models (LLM) applications by streamlining data pre-processing and feature extraction.

Installation

Open a terminal and run the following command to install sec-parser:

pip install sec-parser

Usage

To retrieve the most recent 10-Q SEC EDGAR document in HTML format for Apple, follow these steps:

report into a collection of semantic elements extracted from the document.

The following code snippet demonstrates how to do this:

import sec_parser as sp

elements = sp.SecParser().parse(html)

Here is an example of the output you can expect:

TopLevelSectionStartMarker: PART I — FINANCIAL INFORMATION
├── TitleElement: Item 1. Financial Statements
│   ├── TitleElement: CONDENSED CONSOLIDATED STATEMENTS OF OPERATIONS (U...
│   │   ├── TextElement: (In millions, except number of shares which are re...
│   │   ├── TableElement: ...
│   ...

For more examples and advanced usage, you can continue learning how to use sec-parser by referring to the Quickstart User Guide.

Contributing

Contributing to sec-parser is a rewarding way to improve this open-source project. Whether you are a user interested in expanding your knowledge or a developer who wants to dive deeper into the codebase, we have comprehensive guides to get you started.

  • User Guide: If you are new to sec-parser and would like to get started, please refer to the Quickstart User Guide.

  • Developer Guide: For those interested in contributing to sec-parser, the Comprehensive Developer Guide provides an in-depth walkthrough of the codebase and offers examples to help you contribute effectively.

Both guides are interactive and allow you to engage with the code and concepts as you learn. You can run and modify all the code examples for yourself by cloning the repository and running the respective notebooks in a Jupyter environment.

Alternatively, you can run the notebooks directly in your browser using Google Colab.

Note Before contributing, we highly recommend familiarizing yourself with these guides. They will help you understand the structure and style of our codebase, enabling you to make effective contributions.

Best Practices

Importing modules

  1. Standard: import sec_parser as sp
  2. Package-Level: from sec_parser import SomeClass
  3. Submodule: from sec_parser import semantic_tree
  4. Submodule-Level: from sec_parser.semantic_tree import SomeClass

Note The root-level package sec_parser contains only the most common symbols. For more specialized functionalities, you should use submodule or submodule-level imports.

Warning To allow us to maintain backward compatibility with your code during internal structure refactoring for sec-parser, avoid deep or chained imports such as sec_parser.semantic_tree.internal_utils import SomeInternalClass.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sec_parser-0.16.0.tar.gz (24.8 kB view hashes)

Uploaded Source

Built Distribution

sec_parser-0.16.0-py3-none-any.whl (34.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page