Skip to main content

Keyword based text extraction Pacakage (keytext)

Project description

keyword based text extraction toolkit (keytext)

What is it?

keytext is an all-in-one versatile and efficient Python package designed for keyword-based text search, manipulation, and data cleansing. Whether you need to extract contextual information around specific keywords, remove unwanted terms from texts and dataframes, precisely locate the positions of keywords within a Pandas DataFrame, replacing single or a set of keywords, keytext is your indispensable toolkit for advanced robust toolkit text analysis and data management.

Main Features

Here are just a few of the things that keytext does well:

  • Keyword Positioning: Locate the exact start and end positions of a keyword within a given text, facilitating precise information retrieval.
  • Keyword Frequency: Extract the count of a keyword or set of keywords within a given text, facilitating precise information retrieval.
  • Replacing keyword: Replace a single keyword or list of keywords with its corresponding replacement(s) in the given text.
  • Contextual Extraction: Extract left and right texts, characters, words, and sentences surrounding a specified keyword as well as words between .
  • Flexible Configuration: Customize the number of left and right characters, words, or sentences to tailor the extraction to your specific requirements.
  • Text Between Keywords: Extract the text between two occurrences of the same keyword, offering deeper insights into the context of your data.
  • Word Removal: Efficiently remove a list of specified words from texts, enhancing text cleanliness and relevance.
  • Dataframe Cleansing: Seamlessly remove unwanted words from text columns in Pandas DataFrames, ensuring data integrity.
  • Cell Positioning in DataFrame: Identify the row and column positions of a keyword within a Pandas DataFrame, enabling precise data manipulation.
  • Random Pattern Search: Check for list of arbitrary patterns or regular expressions within the text data of a DataFrame, uncovering hidden insights and potential correlations.
  • Easy Integration: Integrate keytext into your Python projects effortlessly, enhancing your text processing and data cleansing workflows.

Installation Procedure

PyPI
pip install keytext

Dependencies:

Functionalities (with parameters description):

keytext.keywords_occurrences(keywords, text)

- text (str): The input text
- keywords (str or list): The keyword or a list of keywords to count occurrences for
- Returns a dictionary mapping each keyword to its frequency in the text

keytext.replace_keywords(keywords, replacements, text)

- text (str): The input text
- keyword (str or list): The keyword or a list of keywords to be replaced
- replacement (str or list): The replacement string or a list of replacement strings corresponding to the keyword(s)
- Returns the text with replacements

keytext.keypos_text(keyword, text)

- text (str): The input text
- keyword (str): The keyword need to be searched
- Return all starting and ending position of the keyword from a text
- Output will be in list of tuples

keytext.extract_sents(keyword, text, format)

- text (str): The input text
- keyword (str): The keyword need to be searched in the sentences of given text
- format (str): By default format is l, that means list of sentences. If we pass p then the outpt format will be paragraph.
- This function extract all the sentences from a giuven text that contain the keyword

keytext.extract_words(keyword, text, left, right)

- text (str): The input text
- keyword (str): The keyword need to be searched in the given text
- left (int): The number of words from the left side of the keyword
- right (int): The number of words from the right side of the keyword
- This function extract the neighbourhood words of the keyword from a given text.
- In case of left_w = 0, right_w = n it will provide n number of words from the right side of the keyword
- In case of left_w = m, right_w = 0 it will provide m number of words from the left side of the keyword
- In case of left_w = m, right_w = n it will provide m left words and n right words of the keyword

keytext.extract_chr(keyword, text, left_chr, right_chr)

- text (str): The input text
- keyword (str): The keyword need to be searched in the given text
- left_chr (int): The number of charecters from the left side of the keyword
- right_chr (int): The number of charecters from the right side of the keyword
- This function extract the neighbourhood charecters of the keyword from a given text.
- In case of left_chr = 0, right_chr = n it will provide n number of charecters from the right side of the keyword
- In case of left_chr = m, right_chr = 0 it will provide m number of charecters from the left side of the keyword
- In case of left_chr = m, right_chr = n it will provide m left charecters and n right charecters of the keyword

keytext.left_texts(keyword, text, occurrence)

- text (str): The input text
- keyword (str): The keyword need to be searched in the given text
- occurrence (int or str): The number of charecters from the left side of the keyword, Occurene should be 1,2,...,n,'all'
- This function will return the left side of the keyword i.e. from the keyword to beginning of the text based on all occurence of keyword
- If we pass the 1 or 2 in occurence then it will return the left side text of 1st or 2nd occurence of the keyword from a text
- Provid the output in list format if occurence is "all"

keytext.right_texts(keyword, text, occurrence)

- text (str): The input text
- keyword (str): The keyword need to be searched in the given text
- occurrence (int or str): The number of charecters from the left side of the keyword, Occurene should be 1,2,...,n,'all'
- occurence means the repeation of the keyword in  text
- This function will return the right side of the keyword i.e. from the keyword to ending of the text based on all occurence of keyword
- If we pass the 1 in occurence then it will return the right side text of 1st occurence of the keyword from a text
- Provid the output in list format if occurence is "all"

keytext.between_fixed_keyword(keyword, text)

- text (str): The input text
- keyword (str): The keyword replicating in given text
- Provide the part of the text between two same keyword
- Output will come in list format

keytext.between_distinct_keywords(keyword_start, keyword_end, text, keyword_start_occurence, keyword_end_occurence)

- text (str): The input text
- keyword_start (str): The starting keyword
- keyword_end (str): The ending keyword that should be different from strating keyword
- keyword_start_occurence (int): indicates the the repeatition of the starting keyword in given string
- keyword_end_occurence (int): indicates the the repeatition of the ending keyword in given string
- Provide the part of the text between two distinct keyword
- Output will come in list format
- For getting all snap texts in list format pass keyword_start_occurence = 0 and keyword_end_occurence = 0

keytext.text_keyword_remover(remover_list, text, replaced_by)

- text (str): The input text
- remover_list (list): List of keywords along with regex patterns need to be removed
- replaced_by (str): Replacing the unwanted list of keyword or patterns with some special charecters like space(" ")
- Non alphanumeric charecters need to be write in regex format
- Return the text after removing the unranted keyword or patterns

keytext.text_pattern_finder(pattern_list, text)

- text (str): The input text
- pattern_list (list): List of regex patterns need to be searched within the text
- It will return the matched word with location

keytext.keypos_df(keyword, dataframe)

- dataframe (dataframe): The input table
- keyword (str): The keyword need to be searched in the dataframe
- Return all cells position of the keyword from a giuven dataframe
- Output will be in list of tuples

keytext.dataframe_keyword_remover(remover_list, dataframe, replaced_by)

- dataframe (dataframe): The input table
- remover_list (list): List of keywords along with regex patterns need to be removed
- replaced_by (str): Replacing the unwanted list of keyword or patterns with some special charecters like space(" ")
- This function remove the keyword from the dataframe
- Non alphanumeric charecters need to be write in regex format

keytext.dataframe_pattern_finder(pattern, dataframe)

- dataframe (dataframe): The input table
- pattern (str): List of regex patterns need to be searched within the dataframe
- This function find the list of regex patterns from the dataframe
- It will return the matched word with cell identity

Contributing to keytext

All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome. Feel free to ask questions on the mailing list

Change Log

0.1 (03/01/2024)

  • First Release

0.2 (03/01/2024)

  • Second Release

0.3 (03/01/2024)

  • Third Release

0.4 (04/01/2024)

  • Fourth Release

0.5 (24/01/2024)

  • Fifth Release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

keytext-0.5.tar.gz (7.0 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page