A fundamental sentence splitter based on spacy.
Project description
Fun Sentence Splitter
A fundamental sentence splitter based on spacy.
Requirements
Python 3.10 or higher and poetry.
Local Dev Setup
Download the Spacy language model used in the tests:
python -m spacy download de_core_news_sm
Run static checks and tests:
ruff .
mypy .
pytest --cov=fun_sentence_splitter
Run Evaluation
-
Change the
spacy
dependency in thepyproject.toml
to the version you want to evaluate and run:poetry lock --no-update poetry install
-
Download the Spacy language model you want to evaluate, e.g.:
python -m spacy download de_core_news_lg
Evaluate:
python -m tests.evaluate_sentence_splitter path/to/splits_dir [--spacy-model de_core_news_lg] [--max-len 47]
path/to/splits_dir
: directory containing pairs of *.split and *.txt files. .split files contain the expected
sentences, each on a separate line. .txt files contain the original text to split.
--spacy-model
: name or location of the spacy language model. Optional, defaults to de_core_news_sm
.
--max-len
: maximum line length before before spacy is used. Optional, defaults to 100
.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for fun_sentence_splitter-0.2.344.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | d3ff9e9c7ca7a2b1f1f11ae2f8a695ca91530225899c8d638e628d527782d09c |
|
MD5 | 48fca46bb0fea2b5dcce34fa24cb7b66 |
|
BLAKE2b-256 | 05bcf507e7fa7f8e6500a5be50ba983c9ed7ab9a5be58e5d135abe9ffed1db85 |
Hashes for fun_sentence_splitter-0.2.344-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c9c12c8c572847e8f5b7a731a7dee74ba42de832e401d093ce53291a9aa2b813 |
|
MD5 | b5bd80802456895cc7929bf43ae38a4a |
|
BLAKE2b-256 | d08ec47334f5cb1c1f00dbda6e0ca0d75471f73e7f05f0e8406eeec59c65464d |