Skip to main content

SpeakLeash agnostic dataset for Polish

Project description


UPDATE 05.05.2024:

Due to the changes related with the hosting, it is recommended to update the version of the package to the newest one, using command:

pip install --upgrade speakleash

SpeakLeash is a lightweight library providing datasets for the Polish language and tools to make them useful:

Installation

Speakleash package can be installed from PyPi and has to be installed in a virtual environment:

pip install speakleash

Basic Usage

If you just want to see the details of the datasets:

from speakleash import Speakleash
import os

base_dir = os.path.join(os.path.dirname(__file__))
replicate_to = os.path.join(base_dir, "datasets")

sl = Speakleash(replicate_to)

for d in sl.datasets:
    size_mb = round(d.characters/1024/1024)
    print("Dataset: {0}, size: {1} MB, characters: {2}, documents: {3}".format(d.name, size_mb, d.characters, d.documents))

You can use individual properties (e.g.:characters, documents), but you can display the entire manifest:

sl = Speakleash(replicate_to)
print(sl.get("plwiki").manifest)

If you chose one of them (.get(name of dataset)) then you will get a lot of text data:

from speakleash import Speakleash
import os

base_dir = os.path.join(os.path.dirname(__file__))
replicate_to = os.path.join(base_dir, "datasets")

sl = Speakleash(replicate_to)

wiki = sl.get("plwiki").data
for doc in wiki:
    print(doc[:40])

If you also need meta data then use the ext_data property:

ds = sl.get("plwiki").ext_data
for doc in ds:
    print(doc)
    txt, meta = doc
    print(meta.get("title"))
    print(txt)

Popular meta data:

  • title
  • length
  • sentences
  • words
  • verbs
  • nouns
  • symbols
  • punctuations

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speakleash-0.3.51.tar.gz (12.5 kB view details)

Uploaded Source

Built Distribution

speakleash-0.3.51-py3-none-any.whl (14.6 kB view details)

Uploaded Python 3

File details

Details for the file speakleash-0.3.51.tar.gz.

File metadata

  • Download URL: speakleash-0.3.51.tar.gz
  • Upload date:
  • Size: 12.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.9.10

File hashes

Hashes for speakleash-0.3.51.tar.gz
Algorithm Hash digest
SHA256 b24bbcb20ade82ae4bc81c42ec6214f40fac6cded6dbd3d9491fecfaa01b1335
MD5 36e345aff5dd5fe39b94ad867e62a6ca
BLAKE2b-256 bbaf0bf80fcd047e0d0f62843d83b858b74b43defb25bbacb266671a241bd0b6

See more details on using hashes here.

File details

Details for the file speakleash-0.3.51-py3-none-any.whl.

File metadata

  • Download URL: speakleash-0.3.51-py3-none-any.whl
  • Upload date:
  • Size: 14.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.9.10

File hashes

Hashes for speakleash-0.3.51-py3-none-any.whl
Algorithm Hash digest
SHA256 2c469f7f2a7466f4807b05b13beb7c34453b82f0556dd1cf7eee23061770beb8
MD5 919cbb3404480056c481ecd484ceba60
BLAKE2b-256 62d7b5891b38e57b89d6abe3d0938599b01adb0a357b6a5472280cef4b503e8d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page