Skip to main content

testing and debugging project

Project description

UPDATE 05.05.2024:

Due to the changes related with the hosting, it is recommended to update the version of the package to the newest one, using command:

pip install --upgrade speakleash

SpeakLeash is a lightweight library providing datasets for the Polish language and tools to make them useful:

Installation

Speakleash package can be installed from PyPi and has to be installed in a virtual environment:

pip install speakleash

Basic Usage

If you just want to see the details of the datasets:

from speakleash import Speakleash
import os

base_dir = os.path.join(os.path.dirname(__file__))
replicate_to = os.path.join(base_dir, "datasets")

sl = Speakleash(replicate_to)

for d in sl.datasets:
    size_mb = round(d.characters/1024/1024)
    print("Dataset: {0}, size: {1} MB, characters: {2}, documents: {3}".format(d.name, size_mb, d.characters, d.documents))

You can use individual properties (e.g.:characters, documents), but you can display the entire manifest:

sl = Speakleash(replicate_to)
print(sl.get("plwiki").manifest)

If you chose one of them (.get(name of dataset)) then you will get a lot of text data:

from speakleash import Speakleash
import os

base_dir = os.path.join(os.path.dirname(__file__))
replicate_to = os.path.join(base_dir, "datasets")

sl = Speakleash(replicate_to)

wiki = sl.get("plwiki").data
for doc in wiki:
    print(doc[:40])

If you also need meta data then use the ext_data property:

ds = sl.get("plwiki").ext_data
for doc in ds:
    print(doc)
    txt, meta = doc
    print(meta.get("title"))
    print(txt)

Popular meta data:

  • title
  • length
  • sentences
  • words
  • verbs
  • nouns
  • symbols
  • punctuations

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

test_1_varify_ic-0.2.tar.gz (2.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

test_1_varify_ic-0.2-py3-none-any.whl (2.0 kB view details)

Uploaded Python 3

File details

Details for the file test_1_varify_ic-0.2.tar.gz.

File metadata

  • Download URL: test_1_varify_ic-0.2.tar.gz
  • Upload date:
  • Size: 2.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.10

File hashes

Hashes for test_1_varify_ic-0.2.tar.gz
Algorithm Hash digest
SHA256 1fdbe46077824ec1ca6f68c23ad8e4ac6bb3d0c61aa68a9f289f1a70c9ae1f63
MD5 a4af70aeea394915b36d4161fa133001
BLAKE2b-256 786ea02e94e91505a2436e6cd9a3941b5fd5a4eed962cf04057360799abf10f5

See more details on using hashes here.

File details

Details for the file test_1_varify_ic-0.2-py3-none-any.whl.

File metadata

  • Download URL: test_1_varify_ic-0.2-py3-none-any.whl
  • Upload date:
  • Size: 2.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.10

File hashes

Hashes for test_1_varify_ic-0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 931869f701a5fb3934dc3344b1fae50939f511dee187d89e28e89c8123643f97
MD5 ac2c57c1a2498263e8c5aba0e596e137
BLAKE2b-256 b656e8a0c32baa475aab9d54ab0a12541bf0938ed4a33e33a91dafb8885e430d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page