Skip to main content

NLP toolkit based on the flexi-dict data structure, designed for efficient fuzzy search, with a focus on simplicity, performance, and flexibility.

Project description

flexi-nlp-tools

Python Versions Version

A natural language processing toolkit based on the flexi-dict data structure, designed for efficient fuzzy search, with a focus on simplicity, performance, and flexibility.

Table of Contents

  1. Tools
  2. Installation
  3. Demo
  4. License

Tools

Numeral Converter

Overview

Numeral Converter is a Python library that provides functionality to convert numbers to text and vice versa, supporting multiple languages. It also allows the processing of numbers in text with support for grammatical cases, gender, and pluralization. Additionally, it can detect and convert numbers embedded in sentences into their numerical equivalents.

Core Functions

get_available_languages()

Retrieves a list of languages supported by the numeral converter.

  • Returns:

    • A list of language codes (e.g., ['uk', 'en', 'ru']).
  • Example:

from numeral_converter import get_available_languages

print(get_available_languages())  # Output: ['uk', 'en', 'ru']

get_max_order(lang)

Returns the maximum numerical order supported for a specific language.

  • Parameters:

    • lang (str): The language code (e.g., 'en', 'uk', 'ru').
  • Returns:

    • The maximum numerical order as an integer.
  • Example:

from numeral_converter import get_max_order

print(get_max_order('en'))  # Output: 47
print(get_max_order('uk'))  # Output: 65

numeral2int(numeral, lang)

Converts a numeral in text form into its integer representation.

  • Parameters:

    • numeral (str): The numeral string (e.g., 'one', 'одного').
    • lang (str): The language code (e.g., 'en', 'uk', 'ru').
  • Returns:

    • An integer representing the value of the numeral.
  • Example:

from numeral_converter import numeral2int

print(numeral2int('one', 'en')) # Output: 1
print(numeral2int('одного', 'ru')) # Output: 1
print(numeral2int('тисячний', 'uk')) # Output: 1000

int2numeral(value, lang, num_class=None, gender=None, case=None, number=None)

Converts an integer into its textual representation.

  • Parameters:

    • value (int): The numerical value to convert.
    • lang (str): The language code.
    • num_class (NumClass, optional): Specifies the numeral class (CARDINAL or ORDINAL).
    • gender (Gender, optional): Specifies the grammatical gender (MASCULINE, FEMININE, NEUTER).
    • case (Case, optional): Specifies the grammatical case (NOMINATIVE, GENITIVE, etc.).
    • number (Number, optional): Specifies singular or plural (SINGULAR, PLURAL).
  • Returns:

    • A string representing the numeral in text form.
  • Example:

from numeral_converter import int2numeral

print(int2numeral(
    2023, 
    lang="uk", 
    num_class='ORDINAL', 
    number='SINGULAR')
# Output: "дві тисячі двадцять третій"

convert_numerical_in_text(text, lang, **kwargs)

Detects numbers in a string and converts them into their numerical representation.

  • Parameters:

    • text (str): The input text containing numerical values.
    • lang (str): The language code.
  • Returns:

    • A string with detected numbers converted to numerical form.
  • Example:

from numeral_converter import convert_numerical_in_text
text = (
    "After twenty, numbers such as twenty-five and fifty follow. "
    "For example thirty-three is thirty plus three."
)
result = convert_numerical_in_text(text, lang="en")
print(result)
# Output: "After 20, numbers such as 25 and 50 follow. "
#         "For example 33 is 30 plus 3." 

Supported Languages

  • English (en)
  • Ukrainian (uk)
  • Russian (ru)

Lite Search

Overview

Lite Search designed for efficient fuzzy searching and indexing of text data. It enables you to build a search index from textual data and perform approximate matches on queries, supporting optional transliteration for non-Latin scripts. The library is lightweight and ideal for scenarios where quick, non-exact text matching is required.

Core Functions

build_search_index(data, transliterate_latin=False)

Builds a search index from a dataset.

  • Parameters:

    • data (list of tuples): The dataset to index, where each tuple contains a unique identifier and a string value (e.g., [(1, "text1"), (2, "text2")]).
    • transliterate_latin (bool, optional): Enables transliteration of non-Latin scripts for better matching.
  • Returns:

    • A search index object that can be used with fuzzy_search.
  • Example:

from lite_search import build_search_index

data = [(1, "one"), (2, "two"), (3, "three")]
search_index = build_search_index(data)
fuzzy_search(query, search_index, topn=None)

Performs a fuzzy search on the given query.

  • Parameters:

    • query (str): The search query string.
    • search_index (object): The search index generated by build_search_index.
    • topn (int, optional): Limits the number of results returned. If None, all matching results are returned.
  • Returns:

    • A list of identifiers (from the dataset) ranked by relevance.
  • Example:

from lite_search import fuzzy_search
result = fuzzy_search(query="one", search_index=search_index)
print(result)  
# Output: [1]
fuzzy_search_internal(query, search_index, topn=None)

Returns detailed information about the matching process, including corrections applied to the query.

  • Parameters:

    • Same as fuzzy_search.
  • Returns:

    • A list of objects containing detailed matching information.

Usage Examples

Example 1: Basic Fuzzy Search
from lite_search import build_search_index, fuzzy_search

data = [(1, "one"), (2, "two"), (3, "three")]
search_index = build_search_index(data)

result = fuzzy_search(query="one", search_index=search_index)
print(result)  # Output: [1]
Example 2: Fuzzy Search with Transliteration
from lite_search import build_search_index, fuzzy_search

data = [(1, "ван"), (2, "ту"), (3, "срі")]
search_index = build_search_index(data, transliterate_latin=True)

result = fuzzy_search(query="ван", search_index=search_index)
print(result)  # Output: [1]
Example 3: Advanced Query Matching
from lite_search import build_search_index, fuzzy_search

data = [
    (1, "Burger Vegan"),
    (2, "Burger with Pork"),
    (3, "Burger with Meat and Garlic"),
]
search_index = build_search_index(data)

query = "burger"
result = fuzzy_search(query=query, search_index=search_index)
print(result)  # Output: [1, 2, 3]
Example 4: Detailed Search Results
from lite_search import fuzzy_search_internal

query = "bollo"
result = fuzzy_search_internal(query=query, search_index=search_index)
for match in result:
    print(match)

Installation

You can easily install flexi-nlp-tools from PyPI using pip:

pip install flexi-nlp-tools

Demo

Check out the live demo of Flexi NLP Tools here:

Flexi NLP Tools Demo


License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flexi_nlp_tools-0.3.3.tar.gz (60.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flexi_nlp_tools-0.3.3-py3-none-any.whl (68.3 kB view details)

Uploaded Python 3

File details

Details for the file flexi_nlp_tools-0.3.3.tar.gz.

File metadata

  • Download URL: flexi_nlp_tools-0.3.3.tar.gz
  • Upload date:
  • Size: 60.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for flexi_nlp_tools-0.3.3.tar.gz
Algorithm Hash digest
SHA256 1bc8e76422b2dca77cbf8f8f3c92dfe45f4118eab4d5d08b791d611b4b11e733
MD5 da28bd6ba064162cbb3c188a8a991342
BLAKE2b-256 637e66fe764f7ce235e973970b9dd8c3baaddc8d92a2aaf7914d7fce0cde5d47

See more details on using hashes here.

File details

Details for the file flexi_nlp_tools-0.3.3-py3-none-any.whl.

File metadata

File hashes

Hashes for flexi_nlp_tools-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 720dd56b6682be707005bc039e0dacbf9626f6438ae3e8a9a499ca54b897f1bf
MD5 9b1f5335691f30cf6057ac5ed6e71368
BLAKE2b-256 43e1f3cad5ec3dd827771acff06ec13dad86d72c461dc3b0e353d8acb63b232c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page