Skip to main content

A fundamental sentence splitter based on spacy.

Project description

Fun Sentence Splitter

A fundamental sentence splitter based on spacy.

Requirements

Python 3.10 or higher and poetry.

Local Dev Setup

Download the Spacy language model used in the tests:

python -m spacy download de_core_news_sm

Run static checks and tests:

ruff .
mypy .
pytest --cov=fun_sentence_splitter

Run Evaluation

  1. Change the spacy dependency in the pyproject.toml to the version you want to evaluate and run:

    poetry lock --no-update
    poetry install
    
  2. Download the Spacy language model you want to evaluate, e.g.:

    python -m spacy download de_core_news_lg
    

Evaluate:

python -m tests.evaluate_sentence_splitter path/to/splits_dir [--spacy-model de_core_news_lg] [--max-len 47]

path/to/splits_dir: directory containing pairs of *.split and *.txt files. .split files contain the expected sentences, each on a separate line. .txt files contain the original text to split.

--spacy-model: name or location of the spacy language model. Optional, defaults to de_core_news_sm.

--max-len: maximum line length before before spacy is used. Optional, defaults to 100.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fun_sentence_splitter-0.2.344.tar.gz (3.7 kB view hashes)

Uploaded Source

Built Distribution

fun_sentence_splitter-0.2.344-py3-none-any.whl (4.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page