Skip to main content

Natural Language Processing (NLP) library for Urdu language.

Project description

Urduhack: A Python NLP library for Urdu language

image image Azure DevOps builds Azure DevOps tests Build Status CodeFactor codecov image Downloads Gitter License: MIT

Urduhack is a NLP library for urdu language. It comes with a lot of battery included features to help you process Urdu data in the easiest way possible.

Note: Releasing a stable version v1.0.0 soon with lots of new models and new api.

Our Goal

  • Academic users Easier experimentation to prove their hypothesis without coding from scratch.
  • NLP beginners Learn how to build an NLP project with production level code quality.
  • NLP developers Build a production level application within minutes.

🔥 Features Support

  • Normalization
    • Arabic and Urdu Unicode Redundancy Problem
    • Character Normalization
    • Combined Characters Normalization
    • Diacritics Removal
    • Spaces Before & After Digits
    • Spaces After Punctuations
    • Joined Words Fix
  • Tokenization
    • Sentence Tokenization
    • Words Tokenization
  • Data Pre-processing
    • Handles all kind of numbers, emails, currencies and urls etc.
  • Tasks
    • Sentimental analysis
    • Sentence classification
    • Documents classification
    • Name entity recognition
    • Image to text
    • Speech to text
  • Datasets
    • IMDB Urdu movies review dataset
    • Hand written digits datasets

🛠 Installation

Urduhack officially supports Python 3.6–3.7, and runs great on PyPy.

Installing with tensorflow cpu version.

$ pip install urduhack[tf]

Installing with tensorflow gpu version.

$ pip install urduhack[tf-gpu]

🔗 Documentation

Fantastic documentation is available at https://urduhack.readthedocs.io/

Documentation
Installation How to install Urduhack and download models
Quickstart New to Urduhack? Here's everything you need to know!
API Reference The detailed reference for Urduhack's API.

How to Contribute

  1. Check for open issues or open a fresh issue to start a discussion around a feature idea or a bug. There is a Contributor Friendly tag for issues that should be ideal for people who are not very familiar with the codebase yet.
  2. Write a test which shows that the bug was fixed or that the feature works as expected.
  3. Send a pull request and bug the maintainer until it gets merged and published. :)

👍 Contributors

Special thanks to everyone who contributed to getting the UrduHack to the current state.

Backers Backers on Open Collective

Thank you to all our backers! 🙏 [Become a backer]

Sponsors Sponsors on Open Collective

Support this project by becoming a sponsor. Your logo will show up here with a link to your website. [Become a sponsor]

📝 Copyright and license

Code released under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

urduhack-1.0.0.tar.gz (83.9 kB view details)

Uploaded Source

Built Distribution

urduhack-1.0.0-py3-none-any.whl (99.5 kB view details)

Uploaded Python 3

File details

Details for the file urduhack-1.0.0.tar.gz.

File metadata

  • Download URL: urduhack-1.0.0.tar.gz
  • Upload date:
  • Size: 83.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.1

File hashes

Hashes for urduhack-1.0.0.tar.gz
Algorithm Hash digest
SHA256 304007d8faae5ce4e203e1c22274b2201fdf34ea39b64c12809e59c84dac370d
MD5 1e5eb0cd77c10ae4a0d656c4273e7dd0
BLAKE2b-256 32dcf5585f2ccd835b33cbf2cbcbc0dc74b59783190a1630eaaa70652146a7e4

See more details on using hashes here.

File details

Details for the file urduhack-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: urduhack-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 99.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.1

File hashes

Hashes for urduhack-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 711e01705700642cd65d21051ad1945d35e181351f1ffd10f2fd4423030be12e
MD5 066f74f70d4f9ea79432a49812caab46
BLAKE2b-256 93c80dfeec27922bd80dc635f5a590b97df40f702fd03c18cf795d20c67ea4cb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page