An NLP python library for analyzing the german language.
Project description
Wortsalat
Wortsalat is a Python library that provides various linguistic analysis tools for text data. It includes functions for tokenizing text into words and sentences, identifying words that match specific word lists or POS tags, counting the total number of words and sentences, calculating the average word length and words per sentence, and calculating various readability scores.
Installation
You can install Wortsalat from PyPI:
pip install wortsalat
Usage
Here are some examples of how to use the functions provided by Wortsalat:
Tokenize Text into Words
from wortsalat.preprocess import tokenize_words
text = "Dies ist ein Beispieltext."
words = tokenize_words(text)
print(words)
This will output: ['Dies', 'ist', 'ein', 'Beispieltext', '.']
Split Text into Sentences
from wortsalat.preprocess import split_sentences
text = "Dies ist ein Beispieltext. Hier ist eine weitere Satz."
sentences = split_sentences(text)
print(sentences)
This will output: ['Dies ist ein Beispieltext.', 'Hier ist eine weitere Satz.']
Identify Words That Match a Specific Word List
from wortsalat.identify_words import identify_words
text = "Dies ist ein Beispieltext. Hier ist eine weitere Satz."
words = identify_words(['Dies', 'ist', 'ein'], text)
print(words)
This will output: ['Dies', 'ist', 'ein']
Identify Words That Match a Specific POS Tag
from wortsalat.identify_tags import identify_tags
text = "Dies ist ein Beispieltext. Hier ist eine weitere Satz."
words = identify_tags('ART', text)
print(words)
This will output: ['Dies', 'ein']
Count the Total Number of Words
from wortsalat.count import count_total_words
text = "Dies ist ein Beispieltext. Hier ist eine weitere Satz."
num_total_words = count_total_words(text)
print(num_total_words)
This will output: 6
Calculate the Flesch Reading Ease Score
from wortsalat.wrapper import calculate_flesch_score
text = "Dies ist ein Beispieltext. Hier ist eine weitere Satz."
flesch_score = calculate_flesch_score(text)
print(flesch_score)
This will output the Flesch reading ease score of the text.
References
Contributing - It takes a village!
If you find a bug or have a feature request, please open an issue on GitHub. If you want to contribute to the development of Wortsalat, please fork the repository, make your changes, and then open a pull request.
Additional thanks to:
License
Wortsalat is released under the Apache 2.0 License.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wortsalat-0.0.1.tar.gz.
File metadata
- Download URL: wortsalat-0.0.1.tar.gz
- Upload date:
- Size: 17.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9faef95ff4bed5551b6201939574a65e1414d5b14495f5a931d368624589ade7
|
|
| MD5 |
211957634ece1e22566feaec4c99bac0
|
|
| BLAKE2b-256 |
ad3b91a30945f741062d023522a2198490cf516e5090512881a77963144ed4a7
|
File details
Details for the file wortsalat-0.0.1-py3-none-any.whl.
File metadata
- Download URL: wortsalat-0.0.1-py3-none-any.whl
- Upload date:
- Size: 16.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1dea4fb9338ae841b117e1cec4e752cb55655203790c84eb036f52daeb6a124a
|
|
| MD5 |
80472f3b9e70006348a4971da9ae12b0
|
|
| BLAKE2b-256 |
bca9e69d6d7ba361ce215bf8fa0c03c8c86d86165d716bf42cb42a9a4ccc829d
|