search for keywords and their context

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Keyword Context Finder

Searching for context doesn't have to be a chore!

Searching for context doesn't have to be a chore!

A Python utility for finding keywords and their surrounding context in MangoCR markdown files. This tool supports fuzzy matching and provides flexible context extraction around matched terms.

Features

Fuzzy string matching for approximate keyword finding
Customizable context window sizes (before, after, and around matches)
Page number tracking for MangoCR formatted documents
Adjustable similarity threshold for matches
Returns results in a pandas DataFrame for easy analysis

Installation

pip install pandas rapidfuzz

Dependencies

Python 3.6+
pandas
rapidfuzz
regex

Usage

from fuzzy_context_finder import keyword_context_finder

# Example usage
content = """
## document_page_1
This is the content of page one with some keywords.
## document_page_2
More content on page two with different keywords.
"""

search_terms = ["keyword", "content"]
file_name = "example_document.pdf"

results = keyword_context_finder(
    content=content,
    terms=search_terms,
    file_name=file_name,
    words_before=250,
    words_after=250,
    words_around=50,
    match_threshold=80
)

Parameters

content (str): Document content with pages separated by MangoCR markers (## filename_page_number)
terms (list): List of search terms to find in the document
file_name (str): Name of the file being processed
words_before (int, default=250): Number of words to capture before the term
words_after (int, default=250): Number of words to capture after the term
words_around (int, default=50): Number of words to capture around the term
match_threshold (int, default=80): Minimum similarity score (0-100) for fuzzy matching

Return Value

Returns a pandas DataFrame with the following columns:

File Name
Page Marker
Page Number
Matched Term
Original Term
Similarity Score
Search Term with Context (configurable width)
Previous Words Context
Next Words Context

Returns None if no matches are found.

Example Output

>>> results.head()
   File Name    Page Marker  Page Number  Matched Term  Original Term  Similarity Score  ...
0  example.md  document_p_1          1      keyword       keyword              100      ...

Document Format Requirements

The tool expects documents to follow the MangoCR format with page markers:

## filename_page_1
Content for page 1
## filename_page_2
Content for page 2

Error Handling

Empty pages are automatically skipped
Returns None if no matches are found
Handles out-of-bounds context windows gracefully

Contributing

Feel free to open issues or submit pull requests with improvements.

License

MIT License

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.1.2

Nov 25, 2024

0.1.1

Nov 25, 2024

0.1.0

Nov 25, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fuzzy_context_finder-0.1.2.tar.gz (4.3 kB view details)

Uploaded Nov 25, 2024 Source

Built Distribution

fuzzy_context_finder-0.1.2-py3-none-any.whl (5.0 kB view details)

Uploaded Nov 25, 2024 Python 3

File details

Details for the file fuzzy_context_finder-0.1.2.tar.gz.

File metadata

Download URL: fuzzy_context_finder-0.1.2.tar.gz
Upload date: Nov 25, 2024
Size: 4.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.2

File hashes

Hashes for fuzzy_context_finder-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`17f528e68756b124d0ea12fc7f07c6063c0e3c97848dd19a61b52bb2b7a2e4a9`
MD5	`37446d85daf1314470f08998e2c648c1`
BLAKE2b-256	`70732661de8756d29730a6e737c0352c7858f1cb4a4425c3a366f8e65faa4c56`

See more details on using hashes here.

File details

Details for the file fuzzy_context_finder-0.1.2-py3-none-any.whl.

File metadata

Download URL: fuzzy_context_finder-0.1.2-py3-none-any.whl
Upload date: Nov 25, 2024
Size: 5.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.2

File hashes

Hashes for fuzzy_context_finder-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`55da05bc3e925a92c318a7a1661ebaa61a0b8a60bbae787cd3e5c28b7050a317`
MD5	`4415500010ea5372267c3808310d8ea3`
BLAKE2b-256	`114b63789f16bb9a6c1c60228e1df509dcde41dbfaa65147809ad7cce19ef235`

See more details on using hashes here.

fuzzy-context-finder 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Keyword Context Finder

Features

Installation

Dependencies

Usage

Parameters

Return Value

Example Output

Document Format Requirements

Error Handling

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes