A Python package for parsing and evaluating boolean text queries
Project description
Boolean Query Parser
A Python package for parsing and evaluating complex boolean text queries with support for AND, OR, NOT operations, parentheses for nesting, and regular expression pattern matching.
Features
- Boolean operators:
AND,OR,NOT - Parentheses for grouping and complex nested expressions
- Regular expression pattern matching with support for:
- Regular expression flags (
ifor case-insensitive,mfor multiline,sfor dotall,xfor verbose) - Complex patterns including capture groups, lookaheads, and lookbehinds
- Special character escaping
- Regular expression flags (
- Simple, intuitive query syntax
- Comprehensive error handling
Installation
From PyPI
pip install boolean-query-parser
From Source
Clone the repository and install using pip:
git clone https://github.com/yourusername/boolean_query_parser.git
cd boolean_query_parser
pip install .
Usage
Basic Example
from boolean_query_parser import parse_query, apply_query
# Define some sample text data
documents = [
"The quick brown fox jumps over the lazy dog",
"Python is a programming language",
"The Python programming language is powerful and easy to learn",
"Regular expressions can be complex but useful"
]
# Parse a query
query = 'Python AND programming AND NOT complex'
parsed_query = parse_query(query)
# Apply the query to filter documents
matching_documents = [doc for doc in documents if apply_query(parsed_query, doc)]
# Print results
for doc in matching_documents:
print(doc)
Output:
Python is a programming language
The Python programming language is powerful and easy to learn
Advanced Example with Nested Expressions
from boolean_query_parser import parse_query, apply_query
# Parse a complex query with parentheses and multiple operations
query = '(Python OR programming) AND (language OR easy) AND NOT (complex OR difficult)'
parsed_query = parse_query(query)
# Sample text data
documents = [
"Python is a great language for beginners",
"Programming can be complex and difficult at times",
"Python makes programming tasks easy to accomplish",
"This text has nothing relevant"
]
# Apply the query
for doc in documents:
if apply_query(parsed_query, doc):
print(f"Match: {doc}")
else:
print(f"No match: {doc}")
Using Regular Expressions
from boolean_query_parser import parse_query, apply_query
# Parse a query with regex patterns
query = '/py.*on/i AND NOT /difficult/'
parsed_query = parse_query(query)
documents = [
"Python is easy to learn",
"python programming is fun",
"This is difficult Python code",
"PyThOn is case-insensitive in this example"
]
# Apply the query
for doc in documents:
if apply_query(parsed_query, doc):
print(f"Match: {doc}")
Using Regular Expression Flags
from boolean_query_parser import parse_query, apply_query
# Case-insensitive matching with 'i' flag
query = '/python/i'
parsed_query = parse_query(query)
print(apply_query(parsed_query, "This contains PYTHON")) # True
# Multiline matching with 'm' flag
multiline_text = "First line\nSecond line with python\nThird line"
query = '/^Second.*python$/m'
parsed_query = parse_query(query)
print(apply_query(parsed_query, multiline_text)) # True
# Dot-all mode with 's' flag (dot matches newlines)
text_with_newlines = "Start\nMiddle\nEnd"
query = '/Start.*End/s'
parsed_query = parse_query(query)
print(apply_query(parsed_query, text_with_newlines)) # True
Complex Regex Patterns
from boolean_query_parser import parse_query, apply_query
# Email validation with regex
email_pattern = '/([A-Za-z0-9]+[._-])*[A-Za-z0-9]+@[A-Za-z0-9-]+(\\.[A-Za-z]{2,})/'
email_query = parse_query(email_pattern)
# HTML tag matching with capture groups and backreferences
html_pattern = '/\\<([a-z][a-z0-9]*)(\\s[^\\>]*)?\\>([^\\<]*)\\<\\/\\1\\>/i'
html_query = parse_query(html_pattern)
# Password validation with lookaheads
password_pattern = '/^(?=.*[a-z])(?=.*[A-Z])(?=.*\\d).{8,}$/'
password_query = parse_query(password_pattern)
# Test them
print(apply_query(email_query, "Contact us at info@example.com")) # True
print(apply_query(html_query, "<div>Content</div>")) # True
print(apply_query(password_query, "Password123")) # True
API Documentation
parse_query(query_str: str) -> Union[dict, str]
Parses a boolean query string into a structured representation.
Parameters:
query_str(str): The boolean query string to parse.
Returns:
- A nested dictionary structure representing the parsed query.
Raises:
SyntaxError: If the query has invalid syntax or mismatched parentheses.
Query Syntax:
- Boolean operators:
AND,OR,NOT - Terms can be wrapped in quotes for exact matching:
"exact phrase" - Regular expressions can be specified with forward slashes:
/pattern/ - Regular expressions can include flags:
/pattern/i(i=case-insensitive, m=multiline, s=dotall, x=verbose) - Parentheses can be used for grouping expressions
apply_query(parsed_query: Union[dict, str], text: str) -> bool
Applies a parsed query to a text string and returns whether the text matches the query.
Parameters:
parsed_query(Union[dict, str]): The parsed query structure fromparse_query.text(str): The text to match against the query.
Returns:
bool: True if the text matches the query, False otherwise.
Real-World Use Cases
Log Analysis
Parse through server logs to find specific error patterns:
from boolean_query_parser import parse_query, apply_query
import glob
# Query to find critical errors related to database but not connection timeouts
query = '(ERROR OR CRITICAL) AND database AND NOT "connection timeout"'
parsed_query = parse_query(query)
# Process log files
matching_logs = []
for log_file in glob.glob('/var/log/application/*.log'):
with open(log_file, 'r') as f:
for line in f:
if apply_query(parsed_query, line):
matching_logs.append(line.strip())
print(f"Found {len(matching_logs)} matching log entries")
Document Classification
Categorize documents based on their content:
from boolean_query_parser import parse_query, apply_query
# Define category queries
categories = {
'finance': parse_query('(banking OR investment OR financial) AND NOT (gaming OR entertainment)'),
'technology': parse_query('(programming OR software OR hardware OR "machine learning") AND NOT financial'),
'health': parse_query('(medical OR health OR doctor OR patient) AND NOT (technology OR finance)')
}
# Function to classify a document
def classify_document(text):
results = []
for category, query in categories.items():
if apply_query(query, text):
results.append(category)
return results or ['uncategorized']
Email Filtering example
Filter emails based on complex patterns:
from boolean_query_parser import parse_query, apply_query
# Query to find emails that:
# 1. Have attachments (mention .pdf, .doc, etc.)
# 2. Are not from known domains
# 3. Contain specific keywords in the subject
query = parse_query('(/\\.pdf/i OR /\\.doc/i OR /\\.docx/i) AND NOT /from:.*@(company\\.com|trusted\\.org)/ AND /subject:.*urgent/i')
# Apply to email bodies
def filter_suspicious_emails(emails):
return [email for email in emails if apply_query(query, email)]
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file boolean_query_parser-1.0.1.tar.gz.
File metadata
- Download URL: boolean_query_parser-1.0.1.tar.gz
- Upload date:
- Size: 14.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8ba6b33ee46d41155471adfccf6de92ab57861bde535562d16bbcf1db135a720
|
|
| MD5 |
debe62ccac8684d0654d06bc5c2c15d9
|
|
| BLAKE2b-256 |
4fbf89fe23acd6be1d4925500996d6ba454f9ccaf0ff3802a3f5c1a6bfe9b61b
|
File details
Details for the file boolean_query_parser-1.0.1-py3-none-any.whl.
File metadata
- Download URL: boolean_query_parser-1.0.1-py3-none-any.whl
- Upload date:
- Size: 13.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
21376cda76eca289fad53ba52cc00e80f213649b521481b4140f84d7a1d929bc
|
|
| MD5 |
60550f1a10c9fcc8d5aa3abc8a637bab
|
|
| BLAKE2b-256 |
4316875ef24dfbaa43cde005d60cc419925f8d67c0b360fa6861aa405cdfdf49
|