A collection of reusable methods.
Project description
Readme
KPCommons is a collection of methods which are regularly needed.
Installation
pip install kpcommons
This installs the library without pySBD. To use the SentenceChunker,
install with the following command:
pip install kpcommons[chunk]
Chunker
SentenceChunker can be used to split a text into chunks which are roughly sentences or multiple sentences.
BaseChunker can be used to implement other variants.
Util
Util.py contains the following methods:
-
calculate_overlapis a method to calculate the overlap between two ranges.overlap = Util.calculate_overlap(0, 10, 5, 10)The first two arguments are the start and end position of the first range, and the last two arguments are the positions of the second range. The result is an
Overlapobject with thestart,end,lengthof the overlap, and the two ratios between the overlap and the ranges.Note: In case of no overlap,
overlap.lengthis the distance between the two ranges as a negative value. -
get_namespacegets the namespace from a root tag of a xml file. -
create_dated_foldercreates a subfolder with a date as the name. By default,NowDateProvideris used.
Footnote
Footnote.py contains a collection of methods for working with footnotes.
get_footnotes_rangestakes a text and returns two list of tuples of start and end character positions of footnote ranges, that is, text surrounded by '[[[' and ']]]'. The first list is without an offset, that is, the actual positions, and the second list is with an offset, that is, as if the footnotes were removed.get_footnote_ranges_without_offsetandget_footnote_ranges_with_offsetare variants ofget_footnotes_rangeswhich only return one of the lists.is_position_in_rangeschecks if a position is in one of the ranges.is_range_in_rangeschecks if a range given by a start and end position overlaps with one of the given ranges.remove_footnotesremoves footnotes from a text. Footnotes are marked by '[[[' and ']]]'.map_to_real_posmaps start and end character positions of a text with footnotes removed to real positions, that is, positions before footnotes where removed.
XML
get_text_from_element extracts the text from an (annotated) xml root element. If the xml file contains annotations for
quotations or references, these will be tagged in the resulting text in the following way.
Footnotes
Footnotes are enclosed with triple brackets, for example:
Some running text [[[This is a footnote]]] and more running text...
Direct Quotations
A direct quotation can fall into one of two groups. A quotation from the primary literary work or a quotation from some
other source.
Direct quotations from the primary literary work are enclosed with @@. An optional id to a corresponding reference is
part of the starting tag, for example:
Some text with @id@a quote@@
Direct quotations from other sources are enclosed with €.
References
References, for example, a page reference for a quotation (S. 14) are enclosed with §µ§ and an id in the starting
tag, for example:
Some text, @1@a quote@@ (§µ1§S.5§µ§)
Indirect Quotations
Indirect quotations, i.e. summarizations and paraphrases, are enclosed with αα and the source of the quotation as
part of the starting tag, for example:
Some text, αl_10αindirect quoteαα
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kpcommons-0.1.3.tar.gz.
File metadata
- Download URL: kpcommons-0.1.3.tar.gz
- Upload date:
- Size: 14.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ad3ac933adeac5c0cf5c9675db51ab0ee0457ba20e0e61347c318d698dd315a3
|
|
| MD5 |
6ae13d4d0ac44c1e51d6406debb7f44f
|
|
| BLAKE2b-256 |
62b137f3eaf317fc62f7a33820b94332ccd99a83b26b87a6fc1d606d04aff7c1
|
File details
Details for the file kpcommons-0.1.3-py3-none-any.whl.
File metadata
- Download URL: kpcommons-0.1.3-py3-none-any.whl
- Upload date:
- Size: 17.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5a2626ba1fdb482e298b087ba1c1d0eb2f77b5bbf3d53d6347b685edd63ca238
|
|
| MD5 |
8d55d2bc9ddcf95d283bc7d41b2f6593
|
|
| BLAKE2b-256 |
2e9ec511ee586c88834dd9b7ea92efe25b2313f4e852c3343be9600f463e5c07
|