10-K Report Item Segmentation with Line-based Attention (ISLA)
Project description
itemseg
10-K Item Segmentation with Line-based Attention (ISLA) is a tool to process EDGAR 10-K reports and extract item-specific text.
Table of Contents
Installation
pip3 install itemseg
Download resource file
python3 -m itemseg --get_resource
Download nltk data
Launch python3 console
>>> import nltk
>>> nltk.download('punkt')
Obtain 10-K file and segment items
Use Apple 10-K (2023) as an example
python3 -m itemseg --input https://www.sec.gov/Archives/edgar/data/320193/000032019323000106/0000320193-23-000106.txt
See the results in ./segout01/
License
itemseg
is distributed under the terms of the CC BY-NC license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
itemseg-1.2.0.tar.gz
(30.5 kB
view hashes)
Built Distribution
itemseg-1.2.0-py3-none-any.whl
(32.3 kB
view hashes)