Skip to main content

Parse SEC EDGAR HTML documents into a tree of elements that correspond to the visual structure of the document.

Project description

 

sec-parser

Essentials ➔       Documentation Status Licence Project Type: Federation Beta
Health ➔              GitHub Workflow Status: ci.yml GitHub Workflow Status: cd.yml Last Commit
Quality ➔             codecov Code Style: Black Ruff
Distribution ➔    PyPI version PyPI - Python Version PyPI downloads
Community ➔     HitCount X (formerly Twitter) Follow GitHub stars

Parse SEC EDGAR HTML documents into a tree of elements that correspond to the visual structure of the document.

Overview

The sec-parser project simplifies extracting meaningful information from SEC EDGAR HTML documents by organizing them into semantic elements and a tree structure. Semantic elements might include section titles, paragraphs, and tables, each classified for easier data manipulation. This forms a semantic tree that corresponds to the visual and informational structure of the document.

This tool is especially beneficial for Artificial Intelligence (AI), Machine Learning (ML), and Large Language Models (LLM) applications by streamlining data pre-processing and feature extraction.

Getting Started

To get started, first install the sec-parser package:

pip install sec-parser

As an example, let's extract the "Segment Operating Performance" section as a semantic tree from the latest Apple 10-Q filing.

First, we'll need to download the filing from the SEC EDGAR website.

# pip install sec-downloader
from sec_downloader import Downloader  

dl = Downloader("MyCompanyName", "email@example.com")
html = dl.get_latest_html("10-Q", "AAPL")

Note The company name and email address are used to form a user-agent string that adheres to the SEC EDGAR's fair access policy for programmatic downloading. Source

Now, we can parse the filing into semantic elements and arrange them into a tree structure:

import sec_parser as sp

# Parse the HTML into a list of semantic elements
elements = sp.Edgar10QParser().parse(html)

# Construct a semantic tree to allow for easy filtering by section
tree = sp.TreeBuilder().build(elements)

# Find section "Segment Operating Performance"
section = [n for n in tree.nodes if n.text.startswith("Segment")][0]

# Preview the tree
print("\n".join(sp.render(section).split("\n")[:13]) + "...")
TitleElement: Segment Operating Performance
├── TextElement: The following table sho... (dollars in millions):
├── TableElement: 414 characters.
├── TitleElement[L1]: Americas
│   └── TextElement: Americas net sales decr... net sales of Services.
├── TitleElement[L1]: Europe
│   └── TextElement: The weakness in foreign...er net sales of iPhone.
├── TitleElement[L1]: Greater China
│   └── TextElement: The weakness in the ren...er net sales of iPhone.
├── TitleElement[L1]: Japan
│   └── TextElement: The weakness in the yen..., Home and Accessories.
└── TitleElement[L1]: Rest of Asia Pacific
    ├── TextElement: The weakness in foreign...lower net sales of Mac....

For more examples and advanced usage, you can continue learning how to use sec-parser by referring to the User Guide, Developer Guide, and Documentation.

What's Next?

You've successfully parsed an SEC document into semantic elements and arranged them into a tree structure. To further analyze this data with analytics or AI, you can use any tool of your choice.

For a tailored experience, consider using our free and open-source library for AI-powered financial analysis:

pip install sec-ai

Explore sec-ai on GitHub

Best Practices

Importing modules

  1. Standard: import sec_parser as sp
  2. Package-Level: from sec_parser import SomeClass
  3. Submodule: from sec_parser import semantic_tree
  4. Submodule-Level: from sec_parser.semantic_tree import SomeClass

Note The root-level package sec_parser contains only the most common symbols. For more specialized functionalities, you should use submodule or submodule-level imports.

Warning To allow us to maintain backward compatibility with your code during internal structure refactoring for sec-parser, avoid deep or chained imports such as sec_parser.semantic_tree.internal_utils import SomeInternalClass.

Contributing

For information about setting up the development environment, coding standards, and contribution workflows, please refer to our CONTRIBUTING.md guide.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sec_parser-0.16.0.post11.tar.gz (26.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sec_parser-0.16.0.post11-py3-none-any.whl (36.0 kB view details)

Uploaded Python 3

File details

Details for the file sec_parser-0.16.0.post11.tar.gz.

File metadata

  • Download URL: sec_parser-0.16.0.post11.tar.gz
  • Upload date:
  • Size: 26.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.11.5 Linux/6.2.0-1012-azure

File hashes

Hashes for sec_parser-0.16.0.post11.tar.gz
Algorithm Hash digest
SHA256 bcbfafc5123ed6a72891cd81c2272e528eb669f9db6b5f79b3b0a011aaa7c7d1
MD5 9f8ed1771e9595c648ec36de237752d5
BLAKE2b-256 f26483eb6e4444e82249a67cc1d725040ca947ca484801b68034fd49e4aed1e7

See more details on using hashes here.

File details

Details for the file sec_parser-0.16.0.post11-py3-none-any.whl.

File metadata

  • Download URL: sec_parser-0.16.0.post11-py3-none-any.whl
  • Upload date:
  • Size: 36.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.11.5 Linux/6.2.0-1012-azure

File hashes

Hashes for sec_parser-0.16.0.post11-py3-none-any.whl
Algorithm Hash digest
SHA256 d413d74bfe127ef1008ef6453dfe5a5c01d8d2199283e65489fda7d2fa67f8d9
MD5 00d627744188051c4494745a36df4d3a
BLAKE2b-256 f435ccccc4c6e2a7af85d75986841fb5430b6fbfacf09aa999d3d5bffaf7844a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page