CLI to select lines of a text file.
Project description
text-selection
CLI to select lines of a text file.
Features
- dataset
create
: create a dataset based on a text fileexport-statistics
: exporting statistics to a CSV
- subsets
add
: add subsetsremove
: remove subsetsrename
: rename subsetselect-all
: select all linesselect-fifo
: select lines FIFO-styleselect-greedily
: select lines greedily regarding unitsselect-greedily-ep
: select lines greedily regarding units (epoch-based)select-uniformly
: select lines with units uniformly distributedfilter-duplicates
: filter duplicate linesfilter-by-regex
: filter lines by regexfilter-by-text
: filter lines by textfilter-by-weight
: filter lines by weightfilter-by-vocabulary
: filter lines by unit vocabularyfilter-by-count
: filter lines by global unit frequenciesfilter-by-unit-freq
: filter lines by unit frequencies per linefilter-by-line-nr
: filter lines by line numbersort-by-line-nr
: sort lines by line numbersort-by-text
: sort lines by textsort-by-weight
: sort lines by weightsreverse
: reverse linesexport
: export lines
- weights
create-uniform
: create uniform weightscreate-from-count
: create weights from unit countdivide
: divide weights
Roadmap
- select/sort randomly
- add tests
- refactoring
- outsourcing greedy- and KLD-iterator
Installation
pip install text-selection --user
Usage
text-selection-cli
Dependencies
- pandas
- tqdm
- scipy
- numpy
- ordered-set >=4.1.0
License
MIT License
Acknowledgments
Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 416228727 – CRC 1410
Citation
If you want to cite this repo, you can use this BibTeX-entry:
@misc{tsts22,
author = {Taubert, Stefan},
title = {text-selection},
year = {2022},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/stefantaubert/text-selection}}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
text-selection-0.0.1.tar.gz
(74.3 kB
view hashes)
Built Distribution
Close
Hashes for text_selection-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0037157059355c2dcf84af7555378b590a448a6a7a0a318fb78a0531584ba3fd |
|
MD5 | 70e1f6425eb5c3bcbf970336e0e10544 |
|
BLAKE2b-256 | ada75955db87c8c4834b1596021bb037d580d16a710a178fa66c2fad0a3a1ffb |