Parse SEC EDGAR HTML documents into a tree of elements that correspond to the visual structure of the document.
Project description
sec-parser
Essentials ➔
Health ➔
Quality ➔
Distribution ➔
Community ➔
Overview
The sec-parser
project simplifies extracting meaningful information from SEC EDGAR HTML documents by organizing them into semantic elements and a tree structure. Semantic elements might include section titles, paragraphs, and tables, each classified for easier data manipulation. This forms a semantic tree that corresponds to the visual and informational structure of the document.
This tool is especially beneficial for Artificial Intelligence (AI) and Large Language Models (LLM) applications by streamlining data pre-processing and feature extraction.
- Explore the Demo
- Read the Documentation
- Ask questions in Discussions
- Report bugs in Issues
Installation
Open a terminal and run the following command to install sec-parser
:
pip install sec-parser
Usage
import sec_parser as sp
# Fetch and parse the latest Apple 10-Q report
tree = sp.parse_latest("10-Q", ticker="AAPL")
# Display the tree structure of the parsed document
print(tree.render())
Console output:
RootSectionElement: PART I — FINANCIAL INFORMATION
├── TitleElement: Item 1. Financial Statements
│ ├── TitleElement: CONDENSED CONSOLIDATED STATEMENTS OF OPERATIONS (U...
│ │ ├── TextElement: (In millions, except number of shares which are re...
│ │ ├── TableElement: ...
│ ...
For more examples and advanced usage, you can continue learning how to use sec-parser by referring to the Quickstart User Guide.
Contributing
Contributing to sec-parser
is a rewarding way to improve this open-source project. Whether you are a user interested in expanding your knowledge or a developer who wants to dive deeper into the codebase, we have comprehensive guides to get you started.
-
User Guide: If you are new to
sec-parser
and would like to get started, please refer to the Quickstart User Guide. -
Developer Guide: For those interested in contributing to
sec-parser
, the Comprehensive Developer Guide provides an in-depth walkthrough of the codebase and offers examples to help you contribute effectively.
Both guides are interactive and allow you to engage with the code and concepts as you learn. You can run and modify all the code examples for yourself by cloning the repository and running the respective notebooks in a Jupyter environment.
Alternatively, you can run the notebooks directly in your browser using Google Colab.
Note Before contributing, we highly recommend familiarizing yourself with these guides. They will help you understand the structure and style of our codebase, enabling you to make effective contributions.
Best Practices
Importing modules
- Standard:
import sec_parser as sp
- Package-Level:
from sec_parser import SomeClass
- Submodule:
from sec_parser import semantic_tree
- Submodule-Level:
from sec_parser.semantic_tree import SomeClass
Note The root-level package
sec_parser
contains only the most common symbols. For more specialized functionalities, you should use submodule or submodule-level imports.
Warning To allow us to maintain backward compatibility with your code during internal structure refactoring for
sec-parser
, avoid deep or chained imports such assec_parser.semantic_tree.internal_utils import SomeInternalClass
.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for sec_parser-0.15.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3a3223e1adfd6ffc24d5321d420c591f62c909a3dc1e857817c5a4a2cd3fcf87 |
|
MD5 | 1f98d4654abd363549eeae9d46eace67 |
|
BLAKE2b-256 | bee3d177b27e464b39c0c2591f92cb65631fdf1dc224acdc13285f8fa5c756af |