Skip to main content

No project description provided

Project description

category

Categorical transformation for data science

PyPI version Python version License PyPI Downloads

Installation

pip install works for this library.

pip install category

Single Category

>>> from category import Category
>>> book = Category(['a', 'b', 'c', 'Category_d', 'e', 'f', 'g', 'h', 'i', 'j'], pad_mst = False)
>>> book.i2c[2]
'c'

>>> book.c2i[['Category_d','f']]
array([3, 5])

You can set pad_mst to True to handle the missing token

>>> from category import Category
>>> book = Category(['a', 'b', 'c', 'Category_d', 'e', 'f', 'g', 'h', 'i', 'j'], pad_mst = True)
>>> book.i2c[2] # the 1st token is the missing token, not 'a' any more
'b'
>>> book.c2i[['Stranger','Category_d','Unknown','f']]
array([0, 4, 0, 6])

Multi-Category

>>> from category import (Category, MultiCategory)
>>> cates = list(f"category{i}" for i in range(1000))
>>> multi_cate = MultiCategory(Category(cates, pad_mst = True))
>>> multi_cate.string_to_index("category42, category108")
array([42, 108])

You can also try to convert a list of strings, containing multicategorical info (which the data input is frequently used in tabular data), to nhot encoded array, and back

>>> nhot = multi_cate.batch_strings_to_nhot(["category42, category108","category999"])
>>> multi_cate.nhot_to_list(nhot)[0]
["category42", "category108"]

Performance

The running speed of this library mostly depends on python dictionary and numpy operations. Though python is a 'slow' language, such application is pretty fast, and not easy to improve using other language.

Here we compare the this library with the Rust implementation

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

category-0.1.0.tar.gz (5.0 kB view details)

Uploaded Source

Built Distribution

category-0.1.0-py3-none-any.whl (17.2 kB view details)

Uploaded Python 3

File details

Details for the file category-0.1.0.tar.gz.

File metadata

  • Download URL: category-0.1.0.tar.gz
  • Upload date:
  • Size: 5.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.4

File hashes

Hashes for category-0.1.0.tar.gz
Algorithm Hash digest
SHA256 cedb3b5e2d1b56ab302404a61b254beccc422a9ed743aa36567d4cca687c53a6
MD5 5d47449fbfd2bbfacc869cb870efd16a
BLAKE2b-256 d09ee6d5f07016b53ae17e9cdd445b27a78659ef00401ff57fc72b5796b5ed60

See more details on using hashes here.

File details

Details for the file category-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: category-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 17.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.4

File hashes

Hashes for category-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0ac14d7d9eaec9657b80f3bf91257b79db85b0318eac28370e5d497187fd08dd
MD5 ca6f4a5a85016ddf36154c42011de94b
BLAKE2b-256 d238e74c362dc3cb09ed13588237180f4046af687752cbdbdb893c2e8f300a46

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page