Skip to main content

A collection of reusable methods.

Project description

Readme

KPCommons is a collection of methods which are regularly needed.

Installation

pip install kpcommons

This installs the library without pySBD. To use the SentenceChunker, install with the following command:

pip install kpcommons[chunk]

Chunker

SentenceChunker can be used to split a text into chunks which are roughly sentences or multiple sentences. BaseChunker can be used to implement other variants.

Util

Util.py contains the following methods:

  • calculate_overlap is a method to calculate the overlap between two ranges.

    overlap = Util.calculate_overlap(0, 10, 5, 10)
    

    The first two arguments are the start and end position of the first range, and the last two arguments are the positions of the second range. The result is an Overlap object with the start, end, length of the overlap, and the two ratios between the overlap and the ranges.

    Note: In case of no overlap, overlap.length is the distance between the two ranges as a negative value.

  • get_namespace gets the namespace from a root tag of a xml file.

  • create_dated_folder creates a subfolder with a date as the name. By default, NowDateProvider is used.

Footnote

Footnote.py contains a collection of methods for working with footnotes.

  • get_footnotes_ranges takes a text and returns two list of tuples of start and end character positions of footnote ranges, that is, text surrounded by '[[[' and ']]]'. The first list is without an offset, that is, the actual positions, and the second list is with an offset, that is, as if the footnotes were removed.
  • get_footnote_ranges_without_offset and get_footnote_ranges_with_offset are variants of get_footnotes_ranges which only return one of the lists.
  • is_position_in_ranges checks if a position is in one of the ranges.
  • is_range_in_ranges checks if a range given by a start and end position overlaps with one of the given ranges.
  • remove_footnotes removes footnotes from a text. Footnotes are marked by '[[[' and ']]]'.
  • map_to_real_pos maps start and end character positions of a text with footnotes removed to real positions, that is, positions before footnotes where removed.

XML

get_text_from_element extracts the text from an (annotated) xml root element. If the xml file contains annotations for quotations or references, these will be tagged in the resulting text in the following way.

Footnotes

Footnotes are enclosed with triple brackets, for example:

Some running text [[[This is a footnote]]] and more running text...

Direct Quotations

A direct quotation can fall into one of two groups. A quotation from the primary literary work or a quotation from some other source. Direct quotations from the primary literary work are enclosed with @@. An optional id to a corresponding reference is part of the starting tag, for example:

Some text with @id@a quote@@

Direct quotations from other sources are enclosed with .

References

References, for example, a page reference for a quotation (S. 14) are enclosed with §µ§ and an id in the starting tag, for example:

Some text, @1@a quote@@ (§µ1§S.5§µ§)

Indirect Quotations

Indirect quotations, i.e. summarizations and paraphrases, are enclosed with αα and the source of the quotation as part of the starting tag, for example:

Some text, αl_10αindirect quoteαα

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kpcommons-0.1.4.tar.gz (14.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kpcommons-0.1.4-py3-none-any.whl (17.6 kB view details)

Uploaded Python 3

File details

Details for the file kpcommons-0.1.4.tar.gz.

File metadata

  • Download URL: kpcommons-0.1.4.tar.gz
  • Upload date:
  • Size: 14.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for kpcommons-0.1.4.tar.gz
Algorithm Hash digest
SHA256 5d9c190f945a6a6ce4b14b622a19959b51493d2787b9bf74ad584e10bb3c1039
MD5 9e3901be3b6aeaf99a9655783b3e4964
BLAKE2b-256 acdd01b1b9bf2836c89bec60364b17b17b31c74f62a4b5bd76de397f35640fcb

See more details on using hashes here.

File details

Details for the file kpcommons-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: kpcommons-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 17.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for kpcommons-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 d7287c789fb1c45417e68a5777f8f1bedd5961a9aee6fc6ae1b6d12981dcb2dd
MD5 3efabb54e3d328007e07e69dde06e4fa
BLAKE2b-256 3b360c9b54fc81c7efc975374e8290375819e5b427e0c5da14ebec335ac67c6c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page