SpeakLeash agnostic dataset for Polish
Project description
UPDATE 05.05.2024:
Due to the changes related with the hosting, it is recommended to update the version of the package to the newest one, using command:
pip install --upgrade speakleash
SpeakLeash is a lightweight library providing datasets for the Polish language and tools to make them useful:
- Website: https://speakleash.org/
- Datasets: https://speakleash.org/dashboard/
- Source code: https://github.com/speakleash/speakleash
- Data in action: https://github.com/speakleash/speakleash-examples
- Bug reports: https://github.com/speakleash/speakleash/issues
Installation
Speakleash package can be installed from PyPi and has to be installed in a virtual environment:
pip install speakleash
Basic Usage
If you just want to see the details of the datasets:
from speakleash import Speakleash
import os
base_dir = os.path.join(os.path.dirname(__file__))
replicate_to = os.path.join(base_dir, "datasets")
sl = Speakleash(replicate_to)
for d in sl.datasets:
size_mb = round(d.characters/1024/1024)
print("Dataset: {0}, size: {1} MB, characters: {2}, documents: {3}".format(d.name, size_mb, d.characters, d.documents))
You can use individual properties (e.g.:characters, documents), but you can display the entire manifest:
sl = Speakleash(replicate_to)
print(sl.get("plwiki").manifest)
If you chose one of them (.get(name of dataset)) then you will get a lot of text data:
from speakleash import Speakleash
import os
base_dir = os.path.join(os.path.dirname(__file__))
replicate_to = os.path.join(base_dir, "datasets")
sl = Speakleash(replicate_to)
wiki = sl.get("plwiki").data
for doc in wiki:
print(doc[:40])
If you also need meta data then use the ext_data property:
ds = sl.get("plwiki").ext_data
for doc in ds:
print(doc)
txt, meta = doc
print(meta.get("title"))
print(txt)
Popular meta data:
- title
- length
- sentences
- words
- verbs
- nouns
- symbols
- punctuations
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file speakleash-0.3.51.tar.gz
.
File metadata
- Download URL: speakleash-0.3.51.tar.gz
- Upload date:
- Size: 12.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.9.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b24bbcb20ade82ae4bc81c42ec6214f40fac6cded6dbd3d9491fecfaa01b1335 |
|
MD5 | 36e345aff5dd5fe39b94ad867e62a6ca |
|
BLAKE2b-256 | bbaf0bf80fcd047e0d0f62843d83b858b74b43defb25bbacb266671a241bd0b6 |
File details
Details for the file speakleash-0.3.51-py3-none-any.whl
.
File metadata
- Download URL: speakleash-0.3.51-py3-none-any.whl
- Upload date:
- Size: 14.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.9.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2c469f7f2a7466f4807b05b13beb7c34453b82f0556dd1cf7eee23061770beb8 |
|
MD5 | 919cbb3404480056c481ecd484ceba60 |
|
BLAKE2b-256 | 62d7b5891b38e57b89d6abe3d0938599b01adb0a357b6a5472280cef4b503e8d |