Skip to main content

Transliterations to/from Indian languages

Project description

indicate: transliterate indic languages to english

Transliterations to/from Indian languages are still generally low quality. One problem is access to data. Another is that there is no standard transliteration.

For Hindi–English, we build novel dataset for names using the ESPNcricinfo. For instance, see here for Hindi version of the English scorecard.

We also create a dataset from election affidavits

We also exploit the Google Dakshina dataset.

To overcome the fact that there isn’t one standard way of transliteration, we provide k-best transliterations.

Install

We strongly recommend installing indicate inside a Python virtual environment (see venv documentation)

pip install indicate

General API

Examples

Functions

We expose 6 functions, each of which either take a pandas DataFrame or a CSV. If the CSV doesn’t have a header, we make some assumptions about where the data is:

  • census_ln(df, namecol, year=2000)

    • What it does:

      • Removes extra space

      • For names in the census file, it appends relevant data of what probability the name provided is of a certain race/ethnicity

Parameters

df : {DataFrame, csv} Pandas dataframe of CSV file contains the names of the individual to be inferred

namecol : {string, list, int} string or list of the name or location of the column containing the last name

Year : {2000, 2010}, default=2000 year of census to use

  • Output: Appends the following columns to the pandas DataFrame or CSV: pctwhite, pctblack, pctapi, pctaian, pct2prace, pcthispanic. See here for what the column names mean.

    >>> import pandas as pd
    
    >>> from ethnicolr import census_ln, pred_census_ln
    
    >>> names = [{'name': 'smith'},
    ...         {'name': 'zhang'},
    ...         {'name': 'jackson'}]
    
    >>> df = pd.DataFrame(names)
    
    >>> df
          name
    0    smith
    1    zhang
    2  jackson
    
    >>> census_ln(df, 'name')
          name pctwhite pctblack pctapi pctaian pct2prace pcthispanic
    0    smith    73.35    22.22   0.40    0.85      1.63        1.56
    1    zhang     0.61     0.09  98.16    0.02      0.96        0.16
    2  jackson    41.93    53.02   0.31    1.04      2.18        1.53

Data

Evaluation

Authors

Rajashekar Chintalapati and Gaurav Sood

Contributor Code of Conduct

The project welcomes contributions from everyone! In fact, it depends on it. To maintain this welcoming atmosphere, and to collaborate in a fun and productive way, we expect contributors to the project to abide by the Contributor Code of Conduct.

License

The package is released under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

indicate-0.0.1rc1.tar.gz (7.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

indicate-0.0.1rc1-py2.py3-none-any.whl (7.4 kB view details)

Uploaded Python 2Python 3

File details

Details for the file indicate-0.0.1rc1.tar.gz.

File metadata

  • Download URL: indicate-0.0.1rc1.tar.gz
  • Upload date:
  • Size: 7.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for indicate-0.0.1rc1.tar.gz
Algorithm Hash digest
SHA256 910cd58d41dafe40da6bf66fbd0c8dfb350460a767746ce33c043b4fce2db012
MD5 7cd34112472adb44e097a7c870753c10
BLAKE2b-256 405926e205934e786aff0c0db245cc282c04a1d8074be700897d659f29d5878b

See more details on using hashes here.

File details

Details for the file indicate-0.0.1rc1-py2.py3-none-any.whl.

File metadata

  • Download URL: indicate-0.0.1rc1-py2.py3-none-any.whl
  • Upload date:
  • Size: 7.4 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for indicate-0.0.1rc1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 bb7285fa1d60e55d9e8f698b5b2378247e021dcaad0b1ac32548085a6dcad780
MD5 c671278f2c515f6def488ca29ce8176f
BLAKE2b-256 0dea1679533d92440f62f2bcfd5a366c0aa80f6e2bebc2db23854f75004f453a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page