Skip to main content

Parse Securities and Exchange Commission Standard Generalized Markup Language (SEC SGML) files

Project description

SEC SGML

A python library to parse Securities and Exchange Commission Standardized Generalized Markup Language. Used to power the open-source datamule project.

Currently parses two types of files:

  1. Daily Archives
  2. Submissions

Will be expanded to also parse SGML Tables.

All Variations

Installation

pip install secsgml

Quickstart

Parse into memory

from secsgml import parse_sgml_submission_into_memory
metadata,results = parse_sgml_submission_into_memory(filepath="000000443897000001.sgml")

Parse to file

from secsgml import parse_sgml_submission
# from file
parse_sgml_submission(filepath='samples/0000891618-94-000021.txt',output_dir='results')

# from content
parse_sgml_submission(content=sgml_content,output_dir='results')

Note

Will be giving parse_sgml_submission_into_memory more love, will have to refactor parse_sgml_submission afterwards.

Future

  • SGML Table parsing
  • Optimization + refactor in Cython/ C bindings.
  • Standardize metadata for different file types. Keys and values vary across variations, e.g. 'CIK' vs 'CENTRAL INDEX KEY' as well as values such as '34' vs '1934'

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

secsgml-0.0.3.tar.gz (6.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

secsgml-0.0.3-py3-none-any.whl (8.9 kB view details)

Uploaded Python 3

File details

Details for the file secsgml-0.0.3.tar.gz.

File metadata

  • Download URL: secsgml-0.0.3.tar.gz
  • Upload date:
  • Size: 6.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for secsgml-0.0.3.tar.gz
Algorithm Hash digest
SHA256 fd9b811ccafc8709e0a278f2f88d4ea9ab0c2e43eed1e29b658ba1caf6aa4678
MD5 90dc680ddfb887f6d8da201a863930c2
BLAKE2b-256 6a2551a7bc3ce4504e96e5e6445f0a4c4cf4db39db11f227407c681e4f39ebf8

See more details on using hashes here.

File details

Details for the file secsgml-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: secsgml-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 8.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for secsgml-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 db7f2a4221f21701d985af912463769e9a20741213f0d0600439803b897a5c43
MD5 40369f4a80945cabfa5d42b632279c2a
BLAKE2b-256 e6f323b1e8df1b6b4ac901ce2a2d2ec17aeaf696ee33dacaf90ba704ac3de8f0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page