Skip to main content

Text normalization and tokenization tools

Project description

dataknobs-xization

Text normalization and tokenization tools.

Installation

pip install dataknobs-xization

Features

  • Text Normalization: Standardize text for consistent processing
  • Masking Tokenizer: Advanced tokenization with masking capabilities
  • Annotations: Text annotation system
  • Authorities: Authority management for text processing
  • Lexicon: Lexicon-based text analysis

Usage

from dataknobs_xization import normalize, MaskingTokenizer

# Text normalization
normalized = normalize.normalize_text("Hello, World!")

# Tokenization with masking
tokenizer = MaskingTokenizer()
tokens = tokenizer.tokenize("This is a sample text.")

# Working with annotations
from dataknobs_xization import annotations
doc = annotations.create_document("Sample text", {"metadata": "value"})

Dependencies

This package depends on:

  • dataknobs-common
  • dataknobs-structures
  • dataknobs-utils
  • nltk

License

See LICENSE file in the root repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataknobs_xization-1.0.1.tar.gz (33.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dataknobs_xization-1.0.1-py3-none-any.whl (34.7 kB view details)

Uploaded Python 3

File details

Details for the file dataknobs_xization-1.0.1.tar.gz.

File metadata

  • Download URL: dataknobs_xization-1.0.1.tar.gz
  • Upload date:
  • Size: 33.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.12

File hashes

Hashes for dataknobs_xization-1.0.1.tar.gz
Algorithm Hash digest
SHA256 98b8f6d39bdc1a445c70e628b6c3b76bdcb58b62ddbe7d3f409fe7a437d64898
MD5 e630edbefb030c3a506b810309ebe071
BLAKE2b-256 7a75ca8d79ee7e79b1595112c4bd23130c4cfc1a4d9afa7a88497495c0792609

See more details on using hashes here.

File details

Details for the file dataknobs_xization-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for dataknobs_xization-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5891b4786631fd4058af6e8c2e6a33c752f9367b6fec65e37b64d1bf9f4ea1ec
MD5 6071a39329bbcf988119fd80437d38af
BLAKE2b-256 924e74ef257897c767feed9d1c515a7ab638c4635309ce9d6df9a01a5fb1a790

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page