Skip to main content

10-K Report Item Segmentation with Line-based Attention (ISLA)

Project description

itemseg

10-K Item Segmentation with Line-based Attention (ISLA) is a tool to process EDGAR 10-K reports and extract item-specific text.

PyPI - Version PyPI - Python Version


Table of Contents

Installation

pip3 install itemseg

Download resource file

python3 -m itemseg --get_resource

Download nltk data

Launch python3 console

>>> import nltk
>>> nltk.download('punkt')

Segment items in a 10-K file

Using Apple 10-K (2023) as an example:

python3 -m itemseg --input https://www.sec.gov/Archives/edgar/data/320193/000032019323000106/0000320193-23-000106.txt

See the results in ./segout01/

The *.csv file contain line-by-line prediction for items in a Begin-Inside-Outside (BIO) style tags. Other files contain item-sepcific text. Change output file types via --outfn_type.

About 10-K files.

A 10-K report is an annual report filed by publicly traded companies with the U.S. Securities and Exchange Commission (SEC). It provides a comprehensive overview of the company's financial performance and is more detailed than an annual report. Key items of a 10-K report include:

  • Item 1 (Business): Describes the company's main operations, products, and services.
  • Item 1A (Risk Factors): Outlines risks that could affect the company's business, financial condition, or operating results.
  • Item 3 (Legal Proceedings)
  • Item 7 (Management’s Discussion and Analysis of Financial Condition and Results of Operations; MD&A): Offers management's perspective on the financial results, including discussion of liquidity, capital resources, and results of operations.

You can search and read 10-K reports through the EDGAR web interface. The itemseg module takes the URL of the Complete submission text file, convert the HTML to formated txt file, and segment the txt file by items.

As an example, the AMAZON 10-K report page for fiscal year 2022 shows the link to the HTML 10-K report and a Complete submission text file 0001018724-23-000004.txt. Pass this link (https://www.sec.gov/Archives/edgar/data/1018724/000101872423000004/0001018724-23-000004.txt) to the itemseg module, and it will retrive the file and segment items for you.

python3 -m itemseg --input https://www.sec.gov/Archives/edgar/data/1018724/000101872423000004/0001018724-23-000004.txt

The default setting is to output line-by-line tag (BIO style) in a csv file, together with Item 1, Item 1A, Item 3, and Item 7 in separate files (--outfn_type "csv,item1,item1a,item3,item7"). You can change output file type combination with --outfn_type. For example, if you only want to output Item 1A and Item 7, then set --outfn_type "item1a,item7".

If you are trying to process large amounts of 10-K files, a good starting point is the master index (https://www.sec.gov/Archives/edgar/full-index/), which lists all available files and provides a convenient venue to construct a comprehensive list of target files.

The module also comes with a script file that allow you to run the module via itemseg command. The default location (for Ubuntu) is at ~/.local/bin. Add this location to your path to enable itemseg command.

License

itemseg is distributed under the terms of the CC BY-NC license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

itemseg-1.6.0.tar.gz (31.6 kB view details)

Uploaded Source

Built Distribution

itemseg-1.6.0-py3-none-any.whl (33.4 kB view details)

Uploaded Python 3

File details

Details for the file itemseg-1.6.0.tar.gz.

File metadata

  • Download URL: itemseg-1.6.0.tar.gz
  • Upload date:
  • Size: 31.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for itemseg-1.6.0.tar.gz
Algorithm Hash digest
SHA256 95db659e2030853677bb8570fe7bb5427f42447c21aee3f86c954f22c7519fcc
MD5 fb09a79396bffcfbc8fbd177f8eb2255
BLAKE2b-256 950d59631b5191e200f67673ea6092c8c89933ec704de5b9c6aa15b004308709

See more details on using hashes here.

File details

Details for the file itemseg-1.6.0-py3-none-any.whl.

File metadata

  • Download URL: itemseg-1.6.0-py3-none-any.whl
  • Upload date:
  • Size: 33.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for itemseg-1.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ffa4d2f231ecb12df61e2e761d19b9aff78387aae4d51bcce942706a4485a6d9
MD5 f8347c647eaec65646cdaf2ebb5bef08
BLAKE2b-256 59daac83889445ee20a0273f978571f321cf8ddc475712bf9f04ee285cade410

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page