NLP toolkit based on the flexi-dict data structure, designed for efficient fuzzy search, with a focus on simplicity, performance, and flexibility.

These details have not been verified by PyPI

Project description

flexi-nlp-tools

A natural language processing toolkit based on the flexi-dict data structure, designed for efficient fuzzy search, with a focus on simplicity, performance, and flexibility.

Tools
- Numeral Converter
- Lite Search
Installation
Demo
License

Tools

Numeral Converter

Overview

Numeral Converter is a Python library that provides functionality to convert numbers to text and vice versa, supporting multiple languages. It also allows the processing of numbers in text with support for grammatical cases, gender, and pluralization. Additionally, it can detect and convert numbers embedded in sentences into their numerical equivalents.

Supported Languages

English (en)
Ukrainian (uk)
Russian (ru)

Core Functions

`get_available_languages()`

Retrieves a list of languages supported by the numeral converter.

Returns:
- A list of language codes (e.g., ['uk', 'en', 'ru']).
Example:

from flexi_nlp_tools.numeral_converter import get_available_languages

print(get_available_languages())  # Output: ['uk', 'en', 'ru']

`get_max_order(lang)`

Returns the maximum numerical order supported for a specific language.

Parameters:
- lang (str): The language code (e.g., 'en', 'uk', 'ru').
Returns:
- The maximum numerical order as an integer.
Example:

from flexi_nlp_tools.numeral_converter import get_max_order

print(get_max_order('en'))  # Output: 47
print(get_max_order('uk'))  # Output: 65

`numeral2int(numeral, lang)`

Converts a numeral in text form into its integer representation.

Parameters:
- numeral (str): The numeral string (e.g., 'one', 'одного').
- lang (str): The language code (e.g., 'en', 'uk', 'ru').
Returns:
- An integer representing the value of the numeral.
Example:

from flexi_nlp_tools.numeral_converter import numeral2int

print(numeral2int('one', 'en'))  # Output: 1
print(numeral2int('одного', 'ru'))  # Output: 1
print(numeral2int('тисячний', 'uk'))  # Output: 1000

`int2numeral(value, lang, num_class=None, gender=None, case=None, number=None)`

Converts an integer into its textual representation.

Parameters:
- value (int): The numerical value to convert.
- lang (str): The language code.
- num_class (NumClass, optional): Specifies the numeral class (CARDINAL or ORDINAL).
- gender (Gender, optional): Specifies the grammatical gender (MASCULINE, FEMININE, NEUTER).
- case (Case, optional): Specifies the grammatical case (NOMINATIVE, GENITIVE, etc.).
- number (Number, optional): Specifies singular or plural (SINGULAR, PLURAL).
Returns:
- A string representing the numeral in text form.
Example:

from flexi_nlp_tools.numeral_converter import int2numeral

print(int2numeral(
  2023,
  lang="uk",
  num_class='ORDINAL',
  number='SINGULAR')
# Output: "дві тисячі двадцять третій"

`convert_numerical_in_text(text, lang, **kwargs)`

Detects numbers in a string and converts them into their numerical representation.

Parameters:
- text (str): The input text containing numerical values.
- lang (str): The language code.
Returns:
- A string with detected numbers converted to numerical form.
Example:

from flexi_nlp_tools.numeral_converter import convert_numerical_in_text

text = (
  "After twenty, numbers such as twenty-five and fifty follow. "
  "For example thirty-three is thirty plus three."
)
result = convert_numerical_in_text(text, lang="en")
print(result)
# Output: "After 20, numbers such as 25 and 50 follow. "
#         "For example 33 is 30 plus 3."

Lite Search

Overview

Lite Search designed for efficient fuzzy searching and indexing of text data. It enables you to build a search index from textual data and perform approximate matches on queries, supporting optional transliteration for non-Latin scripts. The library is lightweight and ideal for scenarios where quick, non-exact text matching is required.

Core Functions

`build_search_index(data, transliterate_latin=False)`

Builds a search index from a dataset.

Parameters:
- data (list of tuples): The dataset to index, where each tuple contains a unique identifier and a string value (e.g., [(1, "text1"), (2, "text2")]).
- transliterate_latin (bool, optional): Enables transliteration of non-Latin scripts for better matching.
Returns:
- A search index object that can be used with fuzzy_search.
Example:

from flexi_nlp_tools.lite_search import build_search_index

data = [(1, "one"), (2, "two"), (3, "three")]
search_index = build_search_index(data)

`fuzzy_search(query, search_index, topn=None)`

Performs a fuzzy search on the given query.

Parameters:
- query (str): The search query string.
- search_index (object): The search index generated by build_search_index.
- topn (int, optional): Limits the number of results returned. If None, all matching results are returned.
Returns:
- A list of identifiers (from the dataset) ranked by relevance.
Example:

from flexi_nlp_tools.lite_search import fuzzy_search

result = fuzzy_search(query="one", search_index=search_index)
print(result)
# Output: [1]

`fuzzy_search_internal(query, search_index, topn=None)`

Returns detailed information about the matching process, including corrections applied to the query.

Parameters:
- Same as fuzzy_search.
Returns:
- A list of objects containing detailed matching information.

Usage Examples

Example 1: Basic Fuzzy Search

from flexi_nlp_tools.lite_search import build_search_index, fuzzy_search

data = [(1, "one"), (2, "two"), (3, "three")]
search_index = build_search_index(data)

result = fuzzy_search(query="one", search_index=search_index)
print(result)  # Output: [1]

Example 2: Fuzzy Search with Transliteration

from flexi_nlp_tools.lite_search import build_search_index, fuzzy_search

data = [(1, "ван"), (2, "ту"), (3, "срі")]
search_index = build_search_index(data, transliterate_latin=True)

result = fuzzy_search(query="ван", search_index=search_index)
print(result)  # Output: [1]

Example 3: Advanced Query Matching

from flexi_nlp_tools.lite_search import build_search_index, fuzzy_search

data = [
  (1, "Burger Vegan"),
  (2, "Burger with Pork"),
  (3, "Burger with Meat and Garlic"),
]
search_index = build_search_index(data)

query = "burger"
result = fuzzy_search(query=query, search_index=search_index)
print(result)  # Output: [1, 2, 3]

Example 4: Detailed Search Results

from flexi_nlp_tools.lite_search import fuzzy_search_internal

query = "bollo"
result = fuzzy_search_internal(query=query, search_index=search_index)
for match in result:
  print(match)

Environment Variables

The following environment variables can be used to customize the behavior of the package. Modules are validates environment variables to ensure they meet the expected constraints. Invalid values will raise an InvalidEnvironmentVariable exception. Default values are used when the variables are not explicitly set.

FlexiDict environment variables

DEFAULT_TOPN_LEAVES (default: 10): A positive integer representing the maximum number of top leaves to retrieve in searches. Must be greater than 0.
MIN_CORRECTION_PRICE (default: 1e-5): A float in the range [0, 1], representing the minimum price for applying a correction.
MAX_CORRECTION_RATE (default: 2/3): A float in the range [0, 1], representing the maximum correction rate allowed.
MAX_CORRECTION_RATE_FOR_SEARCH (default: 1.): A float in the range [0, 1], representing the maximum correction rate allowed when adding leaves.
DEFAULT_DELETION_PRICE (default: 0.4): A float in the range [0, 1], representing the cost of a deletion operation.
DEFAULT_SUBSTITUTION_PRICE (default: 0.2): A float in the range [0, 1], representing the cost of a substitution operation.
DEFAULT_INSERTION_PRICE (default: 0.05): A float in the range [0, 1], representing the cost of an insertion operation.
DEFAULT_TRANSPOSITION_PRICE (default: 0.35): A float in the range [0, 1], representing the cost of a transposition operation.
MAX_QUEUE_SIZE (default: 1024): A positive integer defining the maximum queue size for processing tasks. Must be greater than 0.

LiteSearch environment variables

MIN_START_TOKEN_LENGTH (default: 3): A positive integer defining the minimum length of a starting token. Must be greater than 0.
DEFAULT_QUERY_TRANSFORMATION_PRICE (default: 0.4): A float in the range [0, ∞), representing the cost of a query transformation. Must be non-negative.

Installation

You can easily install nlp-flexi-tools from PyPI using pip:

pip install flexi-nlp-dict

Demo

Check out the live demo of Flexi NLP Tools here:

Flexi NLP Tools Demo

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.6.0

Aug 20, 2025

0.5.5

Feb 12, 2025

0.5.4

Feb 12, 2025

0.5.3

Feb 7, 2025

0.5.2

Feb 3, 2025

0.5.1

Feb 1, 2025

0.5.0

Jan 31, 2025

This version

0.4.0

Jan 27, 2025

0.3.3

Jan 23, 2025

0.3.2

Jan 22, 2025

0.3.1

Jan 22, 2025

0.3.0

Jan 22, 2025

0.2.3

Jan 19, 2025

0.2.2

Jan 18, 2025

0.2.1

Jan 18, 2025

0.2.0

Jan 18, 2025

0.1.4

Jan 17, 2025

0.1.3

Jan 17, 2025

0.1.2

Jan 17, 2025

0.1.1

Jan 16, 2025

0.1.0

Jan 16, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flexi_nlp_tools-0.4.0.tar.gz (60.2 kB view details)

Uploaded Jan 27, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

flexi_nlp_tools-0.4.0-py3-none-any.whl (68.7 kB view details)

Uploaded Jan 27, 2025 Python 3

File details

Details for the file flexi_nlp_tools-0.4.0.tar.gz.

File metadata

Download URL: flexi_nlp_tools-0.4.0.tar.gz
Upload date: Jan 27, 2025
Size: 60.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for flexi_nlp_tools-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`a2bd4ca4564950745fef21d67aaa46c63beb90c79608f9f37780df43c041d8a6`
MD5	`50352b496c57268111fb9c11e2defdc1`
BLAKE2b-256	`b2083aafd5d6a0ef357fc63d0651d027fa548059cb03e9bebb988770e38afc42`

See more details on using hashes here.

File details

Details for the file flexi_nlp_tools-0.4.0-py3-none-any.whl.

File metadata

Download URL: flexi_nlp_tools-0.4.0-py3-none-any.whl
Upload date: Jan 27, 2025
Size: 68.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for flexi_nlp_tools-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`62a4ce41fd1b835a4f9cb1eeb880990950352dcebd5ffe29ad71aacbc2307725`
MD5	`89927c430037b99d747a74395b28f2ce`
BLAKE2b-256	`b941e259376de84dad6eda60060f41e8f8090f9106b875f574c8e56e99847e24`

See more details on using hashes here.

flexi-nlp-tools 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

flexi-nlp-tools

Table of Contents

Tools

Numeral Converter

Overview

Supported Languages

Core Functions

get_available_languages()

get_max_order(lang)

numeral2int(numeral, lang)

int2numeral(value, lang, num_class=None, gender=None, case=None, number=None)

convert_numerical_in_text(text, lang, **kwargs)

Lite Search

Overview

Core Functions

build_search_index(data, transliterate_latin=False)

fuzzy_search(query, search_index, topn=None)

fuzzy_search_internal(query, search_index, topn=None)

Usage Examples

Example 1: Basic Fuzzy Search

Example 2: Fuzzy Search with Transliteration

Example 3: Advanced Query Matching

Example 4: Detailed Search Results

Environment Variables

FlexiDict environment variables

LiteSearch environment variables

Installation

Demo

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`get_available_languages()`

`get_max_order(lang)`

`numeral2int(numeral, lang)`

`int2numeral(value, lang, num_class=None, gender=None, case=None, number=None)`

`convert_numerical_in_text(text, lang, **kwargs)`

`build_search_index(data, transliterate_latin=False)`

`fuzzy_search(query, search_index, topn=None)`

`fuzzy_search_internal(query, search_index, topn=None)`