Skip to main content

No project description provided

Project description

Changelog

All notable changes to this project will be documented in this file.

[0.3.12]

  • Ignore large exhibit files when identifying the main statement

[0.3.10]

  • Handle cases where page-break a comment indicates the page-break

[0.3.9]

  • In from_zip_to_json update fix error where unique_anchor is None

[0.3.8]

  • In from_zip_to_json update handle absence of Metalinks.json file

[0.3.7]

  • In from_zip_to_json update to keep contextref and name attributes during merge

[0.3.6]

  • In from_zip_to_json update financial table detection with children elements

[0.3.5]

  • In from_zip_to_json add filtering of financial tables based on metalinks file

[0.3.4]

  • In from_zip_to_json fix merge issue in is_row_merge_case method

[0.3.3]

  • In from_zip_to_json row size mismatch handling in create_table_html_empty_cell_grid method

[0.3.2]

  • In xbrl_parser fix error when tr is empty in _is_anchor method
  • In xbrl_parser fix error when padding is missing from unique_paddings list

[0.3.1]

  • In xbrl_parser Add handling for paddings/margins given as integers in HTML

[0.3.0]

  • In xbrl_parser Save page breaks in the source.html file

[0.2.9]

  • In xbrl_parser Annotate page breaks

[0.2.8]

  • In xbrl_parser Fix border attribute error

[0.2.7]

  • Handle HtmlExtractor._merge_cells index error

[0.2.6]

  • In xbrl_parser Add tr and td ids in json data
  • In xbrl_parser Make cosmetic changes to html table extractor
  • In xbrl_parser replace uuid1 with uuid4

[0.2.5]

  • In xbrl_parser Add random uuid to all html tags

[0.2.4]

  • In xbrl_parser Add table flip functionality

[0.2.3]

  • Add case handling for only numeric cells regex

[0.2.2]

  • Fix handling of tables that only contain non-numeric

[0.2.1]

  • In xbrl_parser Update the heuristics for merging irregular cells

[0.2.0]

  • Add handling of indentations using empty td cells
  • Add handling of tag attributes with lxml parser

[0.1.10]

  • In xbrl_parser Remove hidden cells

[0.1.9]

  • In xbrl_parser Change html parser to lxml (from xml)

[0.1.8]

  • In xbrl_parser Handle cases where indent is given to child text block
  • In xbrl_parser Handle processing of tables that have at least one numeric
    value

[0.1.7]

  • In xbrl_parser Handle cases where border value is not identified

[0.1.6]

  • In xbrl_parser fix border attribute checks

[0.1.5]

  • In xbrl_parser add border-top and border-bottom information

[0.1.4]

  • In xbrl_parser activate remove empty tables
  • In xbrl_parser Change some attributes of output json to camelCase

[0.1.3]

  • In xbrl_parser remove empty tables

[0.1.2]

  • In xbrl_parser add bold and italic information

[0.1.1]

  • In xbrl_parser merge tables using heuristics, add left padding

[0.1.0]

  • In xbrl_parser read zip from folder instead of full filepath and save outputs in the same folder
  • In xbrl_parser add table ids in output html and json files

[0.0.9]

  • In xbrl_parser skip merge logic if the table is empty or has inconsistent number of tds

[0.0.8]

  • In xbrl_parser merge th tags into one with the corresponding colspan value

[0.0.7]

  • In xbrl_parser fix tables with empty merges

[0.0.6]

  • Prevent taking the bold text as a title if it's inside a table

[0.0.5]

  • Take the first bold text above the table as title

[0.0.4]

  • Fix list index out of range error for table title extraction

[0.0.3]

  • Extract table titles and store in json output
  • Fix value extraction from table cells

[0.0.2]

  • Store thead trs in a list for table json output

[0.0.1] - Initial version of the package

  • Extract tables information into a json file from a htm/html file or a zip of htmls

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

domtag-0.3.12.tar.gz (17.0 kB view details)

Uploaded Source

Built Distribution

domtag-0.3.12-py3-none-any.whl (17.7 kB view details)

Uploaded Python 3

File details

Details for the file domtag-0.3.12.tar.gz.

File metadata

  • Download URL: domtag-0.3.12.tar.gz
  • Upload date:
  • Size: 17.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.9.18

File hashes

Hashes for domtag-0.3.12.tar.gz
Algorithm Hash digest
SHA256 a51e9b3f5be442658ac6dc9979c3719a303cd33887820dc5b097e37ccd23b266
MD5 156060635e472bc5dced4fbae0459c74
BLAKE2b-256 e75801ab8298194502358179a45fa4be3db95bd76395736fca222b05a135fac9

See more details on using hashes here.

File details

Details for the file domtag-0.3.12-py3-none-any.whl.

File metadata

  • Download URL: domtag-0.3.12-py3-none-any.whl
  • Upload date:
  • Size: 17.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.9.18

File hashes

Hashes for domtag-0.3.12-py3-none-any.whl
Algorithm Hash digest
SHA256 f2bd14980b814ddbc5572f9081b1bd5ceadd35ab2d17fa789a0c75e4410fc44b
MD5 7bd1852a07c5bfc592b8325e71830e80
BLAKE2b-256 3b05243562c6ec92875b40e4158e4545370673f4e2719a3ea366f80de76e1e3b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page