Skip to main content

A Python package for parsing and evaluating boolean text queries

Project description

Boolean Query Parser

A Python package for parsing and evaluating complex boolean text queries with support for AND, OR, NOT operations, parentheses for nesting, and regular expression pattern matching.

Features

  • Boolean operators: AND, OR, NOT
  • Parentheses for grouping and complex nested expressions
  • Regular expression pattern matching with support for:
    • Regular expression flags (i for case-insensitive, m for multiline, s for dotall, x for verbose)
    • Complex patterns including capture groups, lookaheads, and lookbehinds
    • Special character escaping
  • Simple, intuitive query syntax
  • Comprehensive error handling

Installation

From PyPI

pip install boolean-query-parser

From Source

Clone the repository and install using pip:

git clone https://github.com/yourusername/boolean_query_parser.git
cd boolean_query_parser
pip install .

Usage

Basic Example

from boolean_query_parser import parse_query, apply_query

# Define some sample text data
documents = [
    "The quick brown fox jumps over the lazy dog",
    "Python is a programming language",
    "The Python programming language is powerful and easy to learn",
    "Regular expressions can be complex but useful"
]

# Parse a query
query = 'Python AND programming AND NOT complex'
parsed_query = parse_query(query)

# Apply the query to filter documents
matching_documents = [doc for doc in documents if apply_query(parsed_query, doc)]

# Print results
for doc in matching_documents:
    print(doc)

Output:

Python is a programming language
The Python programming language is powerful and easy to learn

Advanced Example with Nested Expressions

from boolean_query_parser import parse_query, apply_query

# Parse a complex query with parentheses and multiple operations
query = '(Python OR programming) AND (language OR easy) AND NOT (complex OR difficult)'
parsed_query = parse_query(query)

# Sample text data
documents = [
    "Python is a great language for beginners",
    "Programming can be complex and difficult at times",
    "Python makes programming tasks easy to accomplish",
    "This text has nothing relevant"
]

# Apply the query
for doc in documents:
    if apply_query(parsed_query, doc):
        print(f"Match: {doc}")
    else:
        print(f"No match: {doc}")

Using Regular Expressions

from boolean_query_parser import parse_query, apply_query

# Parse a query with regex patterns
query = '/py.*on/i AND NOT /difficult/'
parsed_query = parse_query(query)

documents = [
    "Python is easy to learn",
    "python programming is fun",
    "This is difficult Python code",
    "PyThOn is case-insensitive in this example"
]

# Apply the query
for doc in documents:
    if apply_query(parsed_query, doc):
        print(f"Match: {doc}")

Using Regular Expression Flags

from boolean_query_parser import parse_query, apply_query

# Case-insensitive matching with 'i' flag
query = '/python/i'
parsed_query = parse_query(query)
print(apply_query(parsed_query, "This contains PYTHON"))  # True

# Multiline matching with 'm' flag
multiline_text = "First line\nSecond line with python\nThird line"
query = '/^Second.*python$/m'
parsed_query = parse_query(query)
print(apply_query(parsed_query, multiline_text))  # True

# Dot-all mode with 's' flag (dot matches newlines)
text_with_newlines = "Start\nMiddle\nEnd"
query = '/Start.*End/s'
parsed_query = parse_query(query)
print(apply_query(parsed_query, text_with_newlines))  # True

Complex Regex Patterns

from boolean_query_parser import parse_query, apply_query

# Email validation with regex
email_pattern = '/([A-Za-z0-9]+[._-])*[A-Za-z0-9]+@[A-Za-z0-9-]+(\\.[A-Za-z]{2,})/'
email_query = parse_query(email_pattern)

# HTML tag matching with capture groups and backreferences
html_pattern = '/\\<([a-z][a-z0-9]*)(\\s[^\\>]*)?\\>([^\\<]*)\\<\\/\\1\\>/i'
html_query = parse_query(html_pattern)

# Password validation with lookaheads
password_pattern = '/^(?=.*[a-z])(?=.*[A-Z])(?=.*\\d).{8,}$/'
password_query = parse_query(password_pattern)

# Test them
print(apply_query(email_query, "Contact us at info@example.com"))  # True
print(apply_query(html_query, "<div>Content</div>"))  # True
print(apply_query(password_query, "Password123"))  # True

API Documentation

parse_query(query_str: str) -> Union[dict, str]

Parses a boolean query string into a structured representation.

Parameters:

  • query_str (str): The boolean query string to parse.

Returns:

  • A nested dictionary structure representing the parsed query.

Raises:

  • SyntaxError: If the query has invalid syntax or mismatched parentheses.

Query Syntax:

  • Boolean operators: AND, OR, NOT
  • Terms can be wrapped in quotes for exact matching: "exact phrase"
  • Regular expressions can be specified with forward slashes: /pattern/
  • Regular expressions can include flags: /pattern/i (i=case-insensitive, m=multiline, s=dotall, x=verbose)
  • Parentheses can be used for grouping expressions

apply_query(parsed_query: Union[dict, str], text: str) -> bool

Applies a parsed query to a text string and returns whether the text matches the query.

Parameters:

  • parsed_query (Union[dict, str]): The parsed query structure from parse_query.
  • text (str): The text to match against the query.

Returns:

  • bool: True if the text matches the query, False otherwise.

Real-World Use Cases

Log Analysis

Parse through server logs to find specific error patterns:

from boolean_query_parser import parse_query, apply_query
import glob

# Query to find critical errors related to database but not connection timeouts
query = '(ERROR OR CRITICAL) AND database AND NOT "connection timeout"'
parsed_query = parse_query(query)

# Process log files
matching_logs = []
for log_file in glob.glob('/var/log/application/*.log'):
    with open(log_file, 'r') as f:
        for line in f:
            if apply_query(parsed_query, line):
                matching_logs.append(line.strip())

print(f"Found {len(matching_logs)} matching log entries")

Document Classification

Categorize documents based on their content:

from boolean_query_parser import parse_query, apply_query

# Define category queries
categories = {
    'finance': parse_query('(banking OR investment OR financial) AND NOT (gaming OR entertainment)'),
    'technology': parse_query('(programming OR software OR hardware OR "machine learning") AND NOT financial'),
    'health': parse_query('(medical OR health OR doctor OR patient) AND NOT (technology OR finance)')
}

# Function to classify a document
def classify_document(text):
    results = []
    for category, query in categories.items():
        if apply_query(query, text):
            results.append(category)
    return results or ['uncategorized']

Email Filtering example

Filter emails based on complex patterns:

from boolean_query_parser import parse_query, apply_query

# Query to find emails that:
# 1. Have attachments (mention .pdf, .doc, etc.)
# 2. Are not from known domains
# 3. Contain specific keywords in the subject
query = parse_query('(/\\.pdf/i OR /\\.doc/i OR /\\.docx/i) AND NOT /from:.*@(company\\.com|trusted\\.org)/ AND /subject:.*urgent/i')

# Apply to email bodies
def filter_suspicious_emails(emails):
    return [email for email in emails if apply_query(query, email)]

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

boolean_query_parser-1.0.1.tar.gz (14.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

boolean_query_parser-1.0.1-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file boolean_query_parser-1.0.1.tar.gz.

File metadata

  • Download URL: boolean_query_parser-1.0.1.tar.gz
  • Upload date:
  • Size: 14.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for boolean_query_parser-1.0.1.tar.gz
Algorithm Hash digest
SHA256 8ba6b33ee46d41155471adfccf6de92ab57861bde535562d16bbcf1db135a720
MD5 debe62ccac8684d0654d06bc5c2c15d9
BLAKE2b-256 4fbf89fe23acd6be1d4925500996d6ba454f9ccaf0ff3802a3f5c1a6bfe9b61b

See more details on using hashes here.

File details

Details for the file boolean_query_parser-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for boolean_query_parser-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 21376cda76eca289fad53ba52cc00e80f213649b521481b4140f84d7a1d929bc
MD5 60550f1a10c9fcc8d5aa3abc8a637bab
BLAKE2b-256 4316875ef24dfbaa43cde005d60cc419925f8d67c0b360fa6861aa405cdfdf49

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page