Skip to main content

Text normalization and tokenization tools

Project description

dataknobs-xization

Text normalization and tokenization tools.

Installation

pip install dataknobs-xization

Features

  • Text Normalization: Standardize text for consistent processing
  • Masking Tokenizer: Advanced tokenization with masking capabilities
  • Annotations: Text annotation system
  • Authorities: Authority management for text processing
  • Lexicon: Lexicon-based text analysis

Usage

from dataknobs_xization import normalize, MaskingTokenizer

# Text normalization
normalized = normalize.normalize_text("Hello, World!")

# Tokenization with masking
tokenizer = MaskingTokenizer()
tokens = tokenizer.tokenize("This is a sample text.")

# Working with annotations
from dataknobs_xization import annotations
doc = annotations.create_document("Sample text", {"metadata": "value"})

Dependencies

This package depends on:

  • dataknobs-common
  • dataknobs-structures
  • dataknobs-utils
  • nltk

License

See LICENSE file in the root repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataknobs_xization-1.0.0.tar.gz (33.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dataknobs_xization-1.0.0-py3-none-any.whl (34.7 kB view details)

Uploaded Python 3

File details

Details for the file dataknobs_xization-1.0.0.tar.gz.

File metadata

  • Download URL: dataknobs_xization-1.0.0.tar.gz
  • Upload date:
  • Size: 33.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.12

File hashes

Hashes for dataknobs_xization-1.0.0.tar.gz
Algorithm Hash digest
SHA256 73058e777be1d70344f20131fe340aebb460b81bf988e2b173e240d9ab101cc8
MD5 6f438a7cecc6405262aa59b6f98ce488
BLAKE2b-256 2736852c3ba32cfaed8653be4404de130880172b81898dd45d99277cc04a4aa6

See more details on using hashes here.

File details

Details for the file dataknobs_xization-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for dataknobs_xization-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5d88fcded217885421f4be7e977c088dc0deeae5431a9409b82ade16d3294ef3
MD5 d4f39d37c0255b9c7c09edca71534d02
BLAKE2b-256 be2228004626e331a335cd84a14026cf75e983219e048b687b6b3147a61ca272

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page