Skip to main content

Special opportunities for the Karakalpak language

Project description

Tokens Example

from jaziw import tokenize



text = "Asslawma áleykum! Aqılı pútin, ziyalı qáwim."

tokens = tokenize(text)

print(tokens)



# Result

# [Substring(start=0, stop=8, text='Asslawma'), Substring(start=9, stop=16, text='áleykum'), Substring(start=16, stop=17, text='!'), Substring(start=18, stop=23, text='Aqılı'), Substring(start=24, stop=29, text='pútin'), Substring(start=31, stop=37, text='ziyalı'), Substring(start=38, stop=43, text='qáwim'), Substring(start=43, stop=44, text='.')]

Sentences Example

from jaziw import sentenize



text = "Asslawma áleykum! Aqılı pútin, ziyalı qáwim."

sentences = sentenize(text)

print(sentences)



# Result

# [Substring(start=0, stop=17, text='Asslawma áleykum!'), Substring(start=18, stop=44, text='Aqılı pútin, ziyalı qáwim.')]

Normalize Example

from jaziw import normalize



bad_text = """– Há, júwermek qatqır-aw, tursań bolmay ma, 

mollańa keshiktiń ǵoy! – degen anamnıń sesi tatlı uyqımdı bóldi. 



Ol bunday sózdi tek ashıwı kelgende ǵana aytatuǵın edi. 

                Anamnıń ǵarǵısınan beter «molla» degen sózin esitkennen-aq, quyqam juwlap, uyqım shayday ashıldı. Qorqınısh denemdi qaplap turdı. Bunnan 7-8 kún burın pallaqqa asılǵandaǵı tut shıbıqtan tilingen eki ayaǵım sol qálpinde matalıp jatqanday bolıp sezildi. Haqıyqatında da, ayaǵımnıń tilikleri pite qoyǵan joq edi. Aqsap zorǵa júretuǵın edim.

"""



print(normalize(bad_text))



# Result

# – Há, júwermek qatqır-aw, tursań bolmay ma, mollańa keshiktiń ǵoy! – degen anamnıń sesi tatlı uyqımdı bóldi. Ol bunday sózdi tek ashıwı kelgende ǵana aytatuǵın edi. Anamnıń ǵarǵısınan beter «molla» degen sózin esitkennen-aq, quyqam juwlap, uyqım shayday ashıldı. Qorqınısh denemdi qaplap turdı. Bunnan 7-8 kún burın pallaqqa asılǵandaǵı tut shıbıqtan tilingen eki ayaǵım sol qálpinde matalıp jatqanday bolıp sezildi. Haqıyqatında da, ayaǵımnıń tilikleri pite qoyǵan joq edi. Aqsap zorǵa júretuǵın edim.

Recommended

from jaziw import normalize, tokenize, sentenize





filename = "a-shamuratov-lat.txt"

with open(filename, "r", encoding="utf-8") as file:

    text = file.read()

    

    normalized_text = normalize(text)

    tokenized_text = tokenize(normalized_text)

    sentenized_text = sentenize(normalized_text)



    with open("normalized-" + filename, "w", encoding="utf-8") as new_file:

        new_file.write(text)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jaziw-0.0.2.tar.gz (3.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jaziw-0.0.2-py3-none-any.whl (4.4 kB view details)

Uploaded Python 3

File details

Details for the file jaziw-0.0.2.tar.gz.

File metadata

  • Download URL: jaziw-0.0.2.tar.gz
  • Upload date:
  • Size: 3.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for jaziw-0.0.2.tar.gz
Algorithm Hash digest
SHA256 7228f877622a5f215f0a9da3de076a65f1850ad4431d14983bace06bef487e33
MD5 8555dcf612af25642a6bfe3c71eea6ca
BLAKE2b-256 285cc32f48f916872d5db0a0dec0ea821b4490f9e9b9f67f2eeaab9e78a43eb0

See more details on using hashes here.

File details

Details for the file jaziw-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: jaziw-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 4.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for jaziw-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 dc5ab092a6e49df4e1e8f60a9c4a147cc9dc9d7dc24a08d8ef371842f0770e2e
MD5 edfea055c69d2e9f9de9b87e9525e045
BLAKE2b-256 58b47133290cb1863c5805d7fc27ccc6ae3a0372e7bf9c642c690f58fecede58

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page