Skip to main content

A collection of reusable methods.

Project description

Readme

KPCommons is a collection of methods which are regularly needed.

Installation

pip install kpcommons

This installs the library without pySBD. To use the SentenceChunker, install with the following command:

pip install kpcommons[chunk]

Chunker

SentenceChunker can be used to split a text into chunks which are roughly sentences or multiple sentences. BaseChunker can be used to implement other variants.

Util

Util.py contains the following methods:

  • calculate_overlap is a method to calculate the overlap between two ranges.

    overlap = Util.calculate_overlap(0, 10, 5, 10)
    

    The first two arguments are the start and end position of the first range, and the last two arguments are the positions of the second range. The result is an Overlap object with the start, end, length of the overlap, and the two ratios between the overlap and the ranges.

    Note: In case of no overlap, overlap.length is the distance between the two ranges as a negative value.

  • get_namespace gets the namespace from a root tag of a xml file.

  • create_dated_folder creates a subfolder with a date as the name. By default, NowDateProvider is used.

Footnote

Footnote.py contains a collection of methods for working with footnotes.

  • get_footnotes_ranges takes a text and returns two list of tuples of start and end character positions of footnote ranges, that is, text surrounded by '[[[' and ']]]'. The first list is without an offset, that is, the actual positions, and the second list is with an offset, that is, as if the footnotes were removed.
  • get_footnote_ranges_without_offset and get_footnote_ranges_with_offset are variants of get_footnotes_ranges which only return one of the lists.
  • is_position_in_ranges checks if a position is in one of the ranges.
  • is_range_in_ranges checks if a range given by a start and end position overlaps with one of the given ranges.
  • remove_footnotes removes footnotes from a text. Footnotes are marked by '[[[' and ']]]'.
  • map_to_real_pos maps start and end character positions of a text with footnotes removed to real positions, that is, positions before footnotes where removed.

XML

get_text_from_element extracts the text from an (annotated) xml root element. If the xml file contains annotations for quotations or references, these will be tagged in the resulting text in the following way.

Footnotes

Footnotes are enclosed with triple brackets, for example:

Some running text [[[This is a footnote]]] and more running text...

Direct Quotations

A direct quotation can fall into one of two groups. A quotation from the primary literary work or a quotation from some other source. Direct quotations from the primary literary work are enclosed with @@. An optional id to a corresponding reference is part of the starting tag, for example:

Some text with @id@a quote@@

Direct quotations from other sources are enclosed with .

References

References, for example, a page reference for a quotations (S. 14) are enclosed with §µ§ and an id in the starting tag, for example:

Some text, @1@a quote@@ (§µ1§S.5§µ§)

Indirect Quotations

TBD

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kpcommons-0.1.1.tar.gz (14.2 kB view details)

Uploaded Source

Built Distribution

KPCommons-0.1.1-py3-none-any.whl (17.2 kB view details)

Uploaded Python 3

File details

Details for the file kpcommons-0.1.1.tar.gz.

File metadata

  • Download URL: kpcommons-0.1.1.tar.gz
  • Upload date:
  • Size: 14.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for kpcommons-0.1.1.tar.gz
Algorithm Hash digest
SHA256 e10b919e842a918ee7b4a72831e26e3f5a87287e6e90c46731eed86e748cdd52
MD5 50e537966e609450279031f7662a0dcb
BLAKE2b-256 23817795293e519be237cedad14a22cc11f4bd4fa58984ed875b7ba2d21617c4

See more details on using hashes here.

File details

Details for the file KPCommons-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: KPCommons-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 17.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for KPCommons-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8e34fca8bb2cb644f6d44c80bcbba5a1851e5397dd05a7e2774a71d3e0df5496
MD5 6db1395f10150a866704f4bfa7976a3a
BLAKE2b-256 7f4106d9f4ec3710bbeec8ceeab15e35cf3400326e3f19c86bcf950326f3d4b1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page