Skip to main content

Teste dos inputs na API de FuzzyMatch.

Project description

Project Description

blipfuzzytest

blipfuzzytest is a tool for making Take’s Fuzzy Match API easier to use when testing. It can be used to process inputs using both endpoints of the Fuzzy Match API (“menu” and “map”, detailed below), returning the response’s parameters associated with entry data. blipfuzzytest can deal with single inputs (string), iterables (list, pd.Series, etc) and a specific form of .csv input file (detailed below).

Fuzzy Match API Endpoints

There are two endpoints available in Fuzzy Match API, which use different formats for defining what the text input will be compared to. Both endpoints can be used in blipfuzzytest, and they are:

  • Menu: Receives a list of menu options with or without number markings and compares a given input to the text of these options.

For example:

Both “1. Option A, 2. Option B, (...), N. Option N” and “Option A, Option B, (...), Option N” are valid and will both make it so text inputs are compared to “Option A”, then “Option B” all the way to “Option N”, as the number markings are omitted before comparison.

  • Map: Receives a dictionary-like structure comprising Categories and their respective Elements, and compares a given input to each Categories’ Elements.

For example:

{
“category1” : [“element1_1”, “element1_2”, (...), “element1_N”],
“category2” : [“element2_1”, “element2_2”],
(...),
“categoryN” : [“elementN_1”, “elementN_2”, “elementN_3”]
}

This is a valid map and will make it so text inputs are compared to all of the Elements of each Category

Fuzzy Methods

Some of the functions described below require a Fuzzy Method to be informed, which determines the type of comparison Fuzzy Match API makes between the input and the menus/maps given. The methods are:

  • default
  • ratio
  • partialRatio
  • tokenSet
  • partialTokenSet
  • tokenSort
  • partialTokenSort
  • tokenAbbreviation
  • partialTokenAbbreviation

A full explanation of some of these methods is available here.

Testing File

Some of the functions described below require a test file to be given. This test file must be saved in the .csv extension and have the following format:

input expectedCategory categories elementsCat1 elementsCat2 ... elementsCatN

The expected content of the file’s columns is:

  • input: The text inputs that will be processed

* **expectedCategory:** What is the expected/correct outcome of the fuzzy matching process. This field enables blipfuzzytest to automatically check if the expected outcome was achieved, and give this information to the user. * * When using the “menu” endpoint, this field can be either the text of the menu option (without number markings) or the number of the option, if number markings are present in the list of menu options. (e.g. “Option B” or “2”) * * When using the “map” endpoint, this field can be either the name of a category or the name of a specific element. (e.g. “categoryN” or “elementN_1”) Note that expecting a specific element will make it so that the identification of another element of the same category is registered as a failure.
* **categories:** * * When using the “menu” endpoint, this field contains the menu options used for matching, separated by “, ” or “,”. (e.g. “1. Option A, 2. Option B, (...), N. Option N”) * * When using the “map” endpoint, this field contains the categories that compose your full map, separated by “, ” or “,”. (e.g. “category1, category2, (...), categoryN”)
* **elementsCat1, (...), elementsCatN:** * * When using the “map” endpoint, each of these columns contains the elements of one of the map’s categories, in order, separated by “, ” or “,”. (e.g. “element1_1, element1_2, (...), element1_N” in the column elementsCat1, “element2_1, element2_2” in the column elementsCat2 and so on) * * When using the “menu” endpoint, you must leave these fields empty.

Note 1: The order of the columns is important for blipfuzzytest to function, although the names are not. Note 2: You can have rows configured for both the “menu” and “map” endpoints in the same testing file.

Instancing the Class:

In order to use blipfuzzytest, you will have to instance the class, providing a Blip Botkey and an Organization ID (both str):

FT = blipfuzzytest("authorization key", "organization id")

Functions:

blipfuzzytest.run_onemethod_file(file_df, method, score_threshold): using a single chosen Fuzzy Method, returns results for all inputs in a given testing file

  • file_df: pd.core.frame.DataFrame file_df is the pandas DataFrame that needs to be processed, created from the Testing File described above

  • method: str method is the Fuzzy Method to be used for processing

  • score_threshold: int score_threshold is the minimum score value that an identification must have to be considered reliable

blipfuzzytest.run_allmethods_file(file_df, score_threshold): using all Fuzzy Methods, returns results for all inputs in a given testing file

  • file_df: pd.core.frame.DataFrame file_df is the pandas DataFrame that needs to be processed, created from the Testing File described above

  • score_threshold: int score_threshold is the minimum score value that an identification must have to be considered reliable

blipfuzzytest.run_onemethod_map(inputs, method, score_threshold, map): using a single chosen Fuzzy Method, returns results for all inputs in a given iterable (using the “map” endpoint)

  • inputs: list or pd.core.series.Series inputs is the iterable containing the text inputs that need to be processed

  • method: str method is the Fuzzy Method to be used for processing

  • score_threshold: int score_threshold is the minimum score value that an identification must have to be considered reliable

  • map: dict map is the dictionary containing the categories and elements to be used for processing

blipfuzzytest.run_allmethods_map(inputs, score_threshold, map): using all Fuzzy Methods, returns results for all inputs in a given iterable (using the “map” endpoint)

  • inputs: list or pd.core.series.Series inputs is the iterable containing the text inputs that need to be processed

  • score_threshold: int score_threshold is the minimum score value that an identification must have to be considered reliable

  • map: dict map is the dictionary containing the categories and elements to be used for processing

blipfuzzytest.test_onemethod_menu(menu, userInput, score, method): using a single chosen Fuzzy Method, returns the result for a single input (using the “menu” endpoint)

  • menu: list menu is the list containing the menu options to be used for processing

  • userInput: str userInput is the input to be used for processing

  • score: int score is the minimum score value that an identification must have to be considered reliable

  • method: str method is the Fuzzy Method to be used for processing

blipfuzzytest.test_allmethods_menu(menu, userInput, score): using all Fuzzy Methods, returns the results for a single input (using the “menu” endpoint)

  • menu: list menu is the list containing the menu options to be used for processing

  • userInput: str userInput is the input to be used for processing

  • score: int score is the minimum score value that an identification must have to be considered reliable

blipfuzzytest.run_onemethod_menu(menu, inputs, score, method): using a single chosen Fuzzy Method, returns results for all inputs in a given iterable (using the “menu” endpoint)

  • menu: list menu is the list containing the menu options to be used for processing

  • inputs: list or pd.core.series.Series inputs is the iterable containing the text inputs that need to be processed

  • score: int score_threshold is the minimum score value that an identification must have to be considered reliable

  • method: str method is the Fuzzy Method to be used for processing

blipfuzzytest.run_allmethods_menu(menu, inputs, score): using all Fuzzy Methods, returns results for all inputs in a given iterable (using the “menu” endpoint)

  • menu: list menu is the list containing the menu options to be used for processing

  • inputs: list or pd.core.series.Series inputs is the iterable containing the text inputs that need to be processed

  • score: int score_threshold is the minimum score value that an identification must have to be considered reliable

Installation

Use the package manager pip to install blipfuzzytest

pip install blipfuzzytest==0.1.0

Usage

Using the Testing File:

from blipfuzzytest.blipfuzzytest import *
import pandas as pd

testing_file_df = pd.read_csv("C:/Documents/blipfuzzytestingFile.csv")

FT = blipfuzzytest("authorization key", "organization id")

result = FT.run_onemethod_file(testing_file_df, "ratio", 0)
print(result)

Using an iterable:

from blipfuzzytest.blipfuzzytest import *

input_list = ["input1", "input2", "input3"]

my_map = {"category1" : ["element1", "element2"], "category2" : ["element3"]}

FT = blipfuzzytest("authorization key", "organization id")

result = FT.run_allmethods_map(input_list, 0, my_map)
print(result)

##Authors

Caio Souza Diogo Machado

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

blipfuzzytest-0.0.2.tar.gz (6.5 kB view details)

Uploaded Source

Built Distribution

blipfuzzytest-0.0.2-py3-none-any.whl (6.7 kB view details)

Uploaded Python 3

File details

Details for the file blipfuzzytest-0.0.2.tar.gz.

File metadata

  • Download URL: blipfuzzytest-0.0.2.tar.gz
  • Upload date:
  • Size: 6.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for blipfuzzytest-0.0.2.tar.gz
Algorithm Hash digest
SHA256 46fc7a279c7bd7d9a0fe569c9e6da3a26917405c6f996dc10ff171e2f9ad6b4a
MD5 e4d896be71776232ca2f63c6641d67bc
BLAKE2b-256 06e9a6b1b3a38b055dc1b870cc9667e75ec4d49a2e15665b174617e294943718

See more details on using hashes here.

File details

Details for the file blipfuzzytest-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for blipfuzzytest-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3557b64d6642690383fddd510bccf662b0e3759fbee8dcd4e4b486b0d01d2c7e
MD5 6e2ac4074ebc3e7ca02689b765999ff3
BLAKE2b-256 08f09173884ba9a1912025138a13f695687f672a45a07fd09e4eedba856853d3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page