Skip to main content

Split German text to sentence! Uses TRIE-Regex to filter ordinal numbers (23.04., 2.5. ...), roman numbers, (II., XI. ...) 15,000 abbreviations (z.B., Abk. ...) , about 7,500 second-level domains (.com.br, ac.at ...), 1500 file name extensions (hallo.docx, tabelle.xlsx ...)

Project description

Satzmetzer

Split German text to sentence! Uses TRIE-Regex to filter ordinal numbers (23.04., 2.5. ...), roman numbers, (II., XI. ...) 15,000 abbreviations (z.B., Abk. ...) , about 7,500 second-level domains (.com.br, ac.at ...), 1500 file name extensions (hallo.docx, tabelle.xlsx ...)

Before writing this class, I had tested about 20 classes, modules, functions, and methods to split German text to sentences. I wasn't happy with any of the results, so I wrote this class here! It doesn't use any AI, just old school Regex!

It is very simple to use! Here is everything you need to know:

textzumsplitten ='''Hallo, ich bin ein Text. Zerhack mich bitte! Ich halte es nicht mehr aus. Wenn du mich bis zum 23.04. nicht zerhackst, rufe ich Papst Hackerpeter X. an und schicke ihm das Dokument erhatmichnichtzerhackt.docx, er wird z. B. sehr böse auf dich sein! Darauf kannst du einen lassen!'''

from satzmetzger import Satzmetzger
losgehts = Satzmetzger()
textfertig = losgehts.zerhack_den_text(textzumsplitten, debug=False)
for indi, zerhacktersatz in enumerate(textfertig):
    print(indi, end='\t\t')
    print(zerhacktersatz)
#Output:
#0		Hallo, ich bin ein Text.
#1		Zerhack mich bitte!
#2		Ich halte es nicht mehr aus.
#3		Wenn du mich bis zum 23.04. nicht zerhackst, rufe ich Papst Hackerpeter X. an und schicke ihm das Dokument erhatmichnichtzerhackt.docx, er wird z. B. sehr böse auf dich sein!
#4		Darauf kannst du einen lassen!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

satzmetzger-0.0.1-py3-none-any.whl (85.7 kB view details)

Uploaded Python 3

File details

Details for the file satzmetzger-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: satzmetzger-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 85.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for satzmetzger-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d3d88cb7239bdb968e7f9cfb8e7aee99326cb2f3227294096db7d28d14ac5222
MD5 c16c9de624877a6e98d7900c53e2bdc3
BLAKE2b-256 e824efcecd840ad19b857a3bcf546447a89ec6dfd396c10b472a6e90a85e546b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page