testing and debugging project
Project description
UPDATE 05.05.2024:
Due to the changes related with the hosting, it is recommended to update the version of the package to the newest one, using command:
pip install --upgrade speakleash
SpeakLeash is a lightweight library providing datasets for the Polish language and tools to make them useful:
- Website: https://speakleash.org/
- Datasets: https://speakleash.org/dashboard/
- Source code: https://github.com/speakleash/speakleash
- Data in action: https://github.com/speakleash/speakleash-examples
- Bug reports: https://github.com/speakleash/speakleash/issues
Installation
Speakleash package can be installed from PyPi and has to be installed in a virtual environment:
pip install speakleash
Basic Usage
If you just want to see the details of the datasets:
from speakleash import Speakleash
import os
base_dir = os.path.join(os.path.dirname(__file__))
replicate_to = os.path.join(base_dir, "datasets")
sl = Speakleash(replicate_to)
for d in sl.datasets:
size_mb = round(d.characters/1024/1024)
print("Dataset: {0}, size: {1} MB, characters: {2}, documents: {3}".format(d.name, size_mb, d.characters, d.documents))
You can use individual properties (e.g.:characters, documents), but you can display the entire manifest:
sl = Speakleash(replicate_to)
print(sl.get("plwiki").manifest)
If you chose one of them (.get(name of dataset)) then you will get a lot of text data:
from speakleash import Speakleash
import os
base_dir = os.path.join(os.path.dirname(__file__))
replicate_to = os.path.join(base_dir, "datasets")
sl = Speakleash(replicate_to)
wiki = sl.get("plwiki").data
for doc in wiki:
print(doc[:40])
If you also need meta data then use the ext_data property:
ds = sl.get("plwiki").ext_data
for doc in ds:
print(doc)
txt, meta = doc
print(meta.get("title"))
print(txt)
Popular meta data:
- title
- length
- sentences
- words
- verbs
- nouns
- symbols
- punctuations
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for test_1_varify_ic-0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 931869f701a5fb3934dc3344b1fae50939f511dee187d89e28e89c8123643f97 |
|
MD5 | ac2c57c1a2498263e8c5aba0e596e137 |
|
BLAKE2b-256 | b656e8a0c32baa475aab9d54ab0a12541bf0938ed4a33e33a91dafb8885e430d |