A collection of reusable methods.
Project description
Readme
KPCommons is a collection of methods which are regularly needed.
Installation
pip install kpcommons
This installs the library without pySBD. To use the SentenceChunker
,
install with the following command:
pip install kpcommons[chunk]
Chunker
SentenceChunker
can be used to split a text into chunks which are roughly sentences or multiple sentences.
BaseChunker
can be used to implement other variants.
Util
Util.py
contains the following methods:
-
calculate_overlap
is a method to calculate the overlap between two ranges.overlap = Util.calculate_overlap(0, 10, 5, 10)
The first two arguments are the start and end position of the first range, and the last two arguments are the positions of the second range. The result is an
Overlap
object with thestart
,end
,length
of the overlap, and the two ratios between the overlap and the ranges.Note: In case of no overlap,
overlap.length
is the distance between the two ranges as a negative value. -
get_namespace
gets the namespace from a root tag of a xml file. -
create_dated_folder
creates a subfolder with a date as the name. By default,NowDateProvider
is used.
Footnote
Footnote.py
contains a collection of methods for working with footnotes.
get_footnotes_ranges
takes a text and returns two list of tuples of start and end character positions of footnote ranges, that is, text surrounded by '[[[' and ']]]'. The first list is without an offset, that is, the actual positions, and the second list is with an offset, that is, as if the footnotes were removed.get_footnote_ranges_without_offset
andget_footnote_ranges_with_offset
are variants ofget_footnotes_ranges
which only return one of the lists.is_position_in_ranges
checks if a position is in one of the ranges.is_range_in_ranges
checks if a range given by a start and end position overlaps with one of the given ranges.remove_footnotes
removes footnotes from a text. Footnotes are marked by '[[[' and ']]]'.map_to_real_pos
maps start and end character positions of a text with footnotes removed to real positions, that is, positions before footnotes where removed.
XML
get_text_from_element
extracts the text from an (annotated) xml root element. If the xml file contains annotations for
quotations or references, these will be tagged in the resulting text in the following way.
Footnotes
Footnotes are enclosed with triple brackets, for example:
Some running text [[[This is a footnote]]] and more running text...
Direct Quotations
A direct quotation can fall into one of two groups. A quotation from the primary literary work or a quotation from some
other source.
Direct quotations from the primary literary work are enclosed with @@
. An optional id to a corresponding reference is
part of the starting tag, for example:
Some text with @id@a quote@@
Direct quotations from other sources are enclosed with €
.
References
References, for example, a page reference for a quotations (S. 14
) are enclosed with §µ§
and an id in the starting
tag, for example:
Some text, @1@a quote@@ (§µ1§S.5§µ§)
Indirect Quotations
TBD
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file kpcommons-0.1.1.tar.gz
.
File metadata
- Download URL: kpcommons-0.1.1.tar.gz
- Upload date:
- Size: 14.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e10b919e842a918ee7b4a72831e26e3f5a87287e6e90c46731eed86e748cdd52 |
|
MD5 | 50e537966e609450279031f7662a0dcb |
|
BLAKE2b-256 | 23817795293e519be237cedad14a22cc11f4bd4fa58984ed875b7ba2d21617c4 |
File details
Details for the file KPCommons-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: KPCommons-0.1.1-py3-none-any.whl
- Upload date:
- Size: 17.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8e34fca8bb2cb644f6d44c80bcbba5a1851e5397dd05a7e2774a71d3e0df5496 |
|
MD5 | 6db1395f10150a866704f4bfa7976a3a |
|
BLAKE2b-256 | 7f4106d9f4ec3710bbeec8ceeab15e35cf3400326e3f19c86bcf950326f3d4b1 |