Utility functions for natural language processing (NLP)
Project description
Text processing utils
Small helpers for NLP tasks.
Installation
Create and activate a virtual environment.
Then, install the package via pip and download the spaCy pipeline.
pip install text-processing-utils
python3 -m spacy download en_core_web_md
Contents
- batches
- get_batches_of_strict_size_with_remainder
- get_n_batches
- get_batches_of_roughly_equal_size
- boolean_checks
- is_gibberish
- is_plural
- an_vs_a
- char_offsets
- is_inside
- get_span_distance_sorted
- get_span_distance
- remove_whitespace_from_annotation
- merge_annotation_offsets
- bio_tags
- bio_tags_to_spans
- remove_overlapping_bio_tags
- transform_into_char_offsets_and_readable_tag
- token_spans_to_char_annotations
- locate
- get_sent_idx
- locate_span_in_context
- highlight_context
- enclose_with_special_symbol
- sentences
- lower_first_letter_if_sent_start
- correct_sentence_boundary_detection
- regex
- make_named_group_unique
- types
- Offset
- Annotations
About Us
We are the Institute of Climate and Energy Systems (ICE) - Jülich Systems Analysis belonging to the Forschungszentrum Jülich. Our interdisciplinary department's research is focusing on energy-related process and systems analyses. Data searches and system simulations are used to determine energy and mass balances, as well as to evaluate performance, emissions and costs of energy systems. The results are used for performing comparative assessment studies between the various systems. Our current priorities include the development of energy strategies, in accordance with the German Federal Government’s greenhouse gas reduction targets, by designing new infrastructures for sustainable and secure energy supply chains and by conducting cost analysis studies for integrating new technologies into future energy market frameworks.
Acknowledgements
The authors would like to thank the German Federal Government, the German state governments, and the Joint Science Conference (GWK) for their funding and support as part of the NFDI4Ing consortium. Funded by the German Research Foundation (DFG) – project number: 442146713. Furthermore, this work was supported by the Helmholtz Association under the program "Energy System Design".
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file text_processing_utils-0.0.0.tar.gz.
File metadata
- Download URL: text_processing_utils-0.0.0.tar.gz
- Upload date:
- Size: 15.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4ac77e65a7c388194d123deced3b98513f4bf668874d1b98aad94cdc8cd265ce
|
|
| MD5 |
0094d692ebfdfa7b5f18438863c96b32
|
|
| BLAKE2b-256 |
e0722ba8f0bd025db71857a5c9e5260ace0895d292ce3f54d81df7ac535cd00c
|
File details
Details for the file text_processing_utils-0.0.0-py3-none-any.whl.
File metadata
- Download URL: text_processing_utils-0.0.0-py3-none-any.whl
- Upload date:
- Size: 14.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2e9dbcaff42394b93e165717c219bf2cbb124fef1cd9889ebeba54b590dab9ba
|
|
| MD5 |
446906333a6480318ab19eb870527b70
|
|
| BLAKE2b-256 |
63a3844ee2db966439efbc35f99d42eb0c15c1bcaaa493c23722b82345184c50
|