Skip to main content

No project description provided

Project description

Changelog

All notable changes to this project will be documented in this file.

[0.3.12]

  • Ignore large exhibit files when identifying the main statement

[0.3.10]

  • Handle cases where page-break a comment indicates the page-break

[0.3.9]

  • In from_zip_to_json update fix error where unique_anchor is None

[0.3.8]

  • In from_zip_to_json update handle absence of Metalinks.json file

[0.3.7]

  • In from_zip_to_json update to keep contextref and name attributes during merge

[0.3.6]

  • In from_zip_to_json update financial table detection with children elements

[0.3.5]

  • In from_zip_to_json add filtering of financial tables based on metalinks file

[0.3.4]

  • In from_zip_to_json fix merge issue in is_row_merge_case method

[0.3.3]

  • In from_zip_to_json row size mismatch handling in create_table_html_empty_cell_grid method

[0.3.2]

  • In xbrl_parser fix error when tr is empty in _is_anchor method
  • In xbrl_parser fix error when padding is missing from unique_paddings list

[0.3.1]

  • In xbrl_parser Add handling for paddings/margins given as integers in HTML

[0.3.0]

  • In xbrl_parser Save page breaks in the source.html file

[0.2.9]

  • In xbrl_parser Annotate page breaks

[0.2.8]

  • In xbrl_parser Fix border attribute error

[0.2.7]

  • Handle HtmlExtractor._merge_cells index error

[0.2.6]

  • In xbrl_parser Add tr and td ids in json data
  • In xbrl_parser Make cosmetic changes to html table extractor
  • In xbrl_parser replace uuid1 with uuid4

[0.2.5]

  • In xbrl_parser Add random uuid to all html tags

[0.2.4]

  • In xbrl_parser Add table flip functionality

[0.2.3]

  • Add case handling for only numeric cells regex

[0.2.2]

  • Fix handling of tables that only contain non-numeric

[0.2.1]

  • In xbrl_parser Update the heuristics for merging irregular cells

[0.2.0]

  • Add handling of indentations using empty td cells
  • Add handling of tag attributes with lxml parser

[0.1.10]

  • In xbrl_parser Remove hidden cells

[0.1.9]

  • In xbrl_parser Change html parser to lxml (from xml)

[0.1.8]

  • In xbrl_parser Handle cases where indent is given to child text block
  • In xbrl_parser Handle processing of tables that have at least one numeric
    value

[0.1.7]

  • In xbrl_parser Handle cases where border value is not identified

[0.1.6]

  • In xbrl_parser fix border attribute checks

[0.1.5]

  • In xbrl_parser add border-top and border-bottom information

[0.1.4]

  • In xbrl_parser activate remove empty tables
  • In xbrl_parser Change some attributes of output json to camelCase

[0.1.3]

  • In xbrl_parser remove empty tables

[0.1.2]

  • In xbrl_parser add bold and italic information

[0.1.1]

  • In xbrl_parser merge tables using heuristics, add left padding

[0.1.0]

  • In xbrl_parser read zip from folder instead of full filepath and save outputs in the same folder
  • In xbrl_parser add table ids in output html and json files

[0.0.9]

  • In xbrl_parser skip merge logic if the table is empty or has inconsistent number of tds

[0.0.8]

  • In xbrl_parser merge th tags into one with the corresponding colspan value

[0.0.7]

  • In xbrl_parser fix tables with empty merges

[0.0.6]

  • Prevent taking the bold text as a title if it's inside a table

[0.0.5]

  • Take the first bold text above the table as title

[0.0.4]

  • Fix list index out of range error for table title extraction

[0.0.3]

  • Extract table titles and store in json output
  • Fix value extraction from table cells

[0.0.2]

  • Store thead trs in a list for table json output

[0.0.1] - Initial version of the package

  • Extract tables information into a json file from a htm/html file or a zip of htmls

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

domtag-0.3.12.tar.gz (17.0 kB view hashes)

Uploaded Source

Built Distribution

domtag-0.3.12-py3-none-any.whl (17.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page