Skip to main content

Helps you convert Polish text of unknown encoding into UTF-8

Project description

polishify

Setup

Simply

pip install polishify

Usage

If you have some text that is in Polish but characters look weird it might not be encoded with windows-1250 or iso-8859-2 encoding. If your file is sometext.txt you may

polishify sometext.txt

and it will show you something like

detected encoding is:  windows-1250

If you wish to get this file converted to utf-8 just do

polishify sometext.txt properly-encoded.txt

If you do it in bash script you might not want to see any outputs, the script supports silent mode as follows

polishify sometext.txt properly-encoded.txt --silent

This package contains words with polish letters, you might want to use your own dataset dataset.json file.

polishify sometext.txt properly-encoded.txt --silent --dataset dataset.json

We also provide a tool that generates it from a text

polishify-extract sometext.txt dataset.json --encoding windows-1250

Author

Made by Marek Narożniak, for the world and especially people who have people in the family who needs subtitles in Polish and want to bulk convert their encodings. No warranty provided. Licensed under GPL-3.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polishify-0.1.0.tar.gz (20.6 kB view hashes)

Uploaded Source

Built Distribution

polishify-0.1.0-py3-none-any.whl (21.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page