Skip to main content

Natural Language Processing (NLP) library for Urdu language.

Project description

Urduhack: NLP library for ( 🇵🇰 ) Urdu language

License: MIT image image wheel Build Status codecov Last commit image Downloads Join Slack Say Thanks!

Urduhack

Urduhack as the name suggests is your NLP library for urdu language. It comes with a lot of methods to help you process Urdu data in the easiest way possible. It has support for removing unnecessary data. It smoothly handles numbers, phone numbers, email addresses, URLs and symbols like currency symbols.

Feature Support

  • Normalization
    • Arabic and Urdu Unicode Redundancy Problem
    • Character Normalization
    • Combined Characters Normalization
    • Diacritics Removal
    • Spaces Before & After Digits
    • Spaces After Punctuations
    • Joined Words Fix
  • Tokenization
    • Sentence Tokenization
    • Words Tokenization

PreProcessing

  • Data Pre-processing
    • Handles all kind of numbers, emails, currencies and urls etc.

Roadmap

  • Classification
    • Sentimental Analysis
    • Sentence Classification
    • Documents Classification
  • Name Entity Recognition
  • Image to Text
  • Speech to Text

Installation

Urduhack officially supports Python 3.6–3.7, and runs great on PyPy.

To install Requests, simply use pip

$ pip install urduhack

Documentation

Fantastic documentation is available at https://urduhack.readthedocs.io/

How to Contribute

  1. Check for open issues or open a fresh issue to start a discussion around a feature idea or a bug. There is a Contributor Friendly tag for issues that should be ideal for people who are not very familiar with the codebase yet.
  2. Write a test which shows that the bug was fixed or that the feature works as expected.
  3. Send a pull request and bug the maintainer until it gets merged and published. :)

Contributors

Special thanks to everyone who contributed to getting the UrduHack to the current state.

Backers Backers on Open Collective

Thank you to all our backers! 🙏 [Become a backer]

Sponsors Sponsors on Open Collective

Support this project by becoming a sponsor. Your logo will show up here with a link to your website. [Become a sponsor]

Copyright and license

Code released under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

urduhack-0.2.1.tar.gz (61.8 kB view details)

Uploaded Source

Built Distribution

urduhack-0.2.1-py3-none-any.whl (66.0 kB view details)

Uploaded Python 3

File details

Details for the file urduhack-0.2.1.tar.gz.

File metadata

  • Download URL: urduhack-0.2.1.tar.gz
  • Upload date:
  • Size: 61.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.40.0 CPython/3.7.1

File hashes

Hashes for urduhack-0.2.1.tar.gz
Algorithm Hash digest
SHA256 e6e7c1682d3c96e2b4cef71d464b4b3604553a10024a66e47aeed2e033697815
MD5 9b8922edda1ec512cb0adbc483014d84
BLAKE2b-256 63e741c99987a6855926fe5a37c4fa6f1637ec272e9681f861e1d48fcbf7d191

See more details on using hashes here.

File details

Details for the file urduhack-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: urduhack-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 66.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.40.0 CPython/3.7.1

File hashes

Hashes for urduhack-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7a56afc32f0d935bb16f78e5bf32ccaf29d956e53846f78afcda040115176f19
MD5 a057f64902c3d62475f553c71075c84c
BLAKE2b-256 f340e1183cf4bd8398349403448d9f61048d40547e47cf3f99757812bbd23a17

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page