A flexible, human-friendly query language for searching and filtering structured data
Project description
Tellaro Query Language (TQL)
A flexible, human-friendly query language for searching and filtering structured data across files, databases, and search engines.
TQL provides a unified, readable syntax for expressing complex queries that works seamlessly with:
- Files: Query JSON, JSONL, CSV files directly with CLI or Python API
- OpenSearch/Elasticsearch: Convert TQL to DSL queries automatically
- In-Memory Data: Filter Python dictionaries and lists
- Statistical Analysis: Built-in aggregations and grouping
# Query JSON files directly
results = tql.query("logs.jsonl", "status = 200 AND response_time > 500")
# Query OpenSearch with automatic DSL translation
results = tql.execute_opensearch(client, "events-*",
"user.role = 'admin' AND timestamp > '2024-01-01'")
# Aggregate data with stats
results = tql.query("sales.json", "region = 'west' | stats sum(revenue) by product")
๐ Quick Start
Installation
# Install from PyPI (Python package)
pip install tellaro-query-language
# Install with OpenSearch support
pip install tellaro-query-language[opensearch]
# Or install Rust CLI (300x faster for large files)
cargo install tellaro-query-language
Query Files with CLI
TQL includes a blazing-fast command-line interface for querying files:
# Query JSON/JSONL files
tql 'status = "active"' users.json
tql 'age > 25 AND city = "NYC"' data.jsonl
# Query CSV files (auto-detects headers)
tql 'price > 100 AND category = "electronics"' products.csv
# Statistical aggregations
tql '| stats count() by status' events.jsonl
tql 'status = 200 | stats average(response_time) by endpoint' logs.jsonl
# Process folders recursively
tql 'level = "ERROR"' logs/ --pattern "*.jsonl" --recursive
# Pipe data from stdin
cat data.jsonl | tql 'score > 90'
Performance: The Rust CLI processes 50MB files in milliseconds vs. seconds for Python implementations.
Query Files with Python API
from tql import TQL
tql = TQL()
# Query JSON files directly
results = tql.query("data.json", "user.role = 'admin' AND status = 'active'")
# Query with field transformations
results = tql.query("logs.jsonl", "email | lowercase contains '@example.com'")
# Statistical analysis
results = tql.query("sales.json", "| stats sum(revenue), avg(price) by category")
Query In-Memory Data
from tql import TQL
tql = TQL()
data = [
{'name': 'Alice', 'age': 30, 'city': 'NYC'},
{'name': 'Bob', 'age': 25, 'city': 'LA'},
{'name': 'Charlie', 'age': 35, 'city': 'NYC'}
]
# Simple queries
results = tql.query(data, 'age > 27')
# Returns: [{'name': 'Alice', 'age': 30, 'city': 'NYC'},
# {'name': 'Charlie', 'age': 35, 'city': 'NYC'}]
# Logical operators
results = tql.query(data, 'age >= 30 AND city = "NYC"')
# Returns: [{'name': 'Alice', ...}, {'name': 'Charlie', ...}]
# Field transformations
results = tql.query(data, 'name | lowercase = "alice"')
Query OpenSearch
from opensearchpy import OpenSearch
from tql import TQL
# Initialize OpenSearch client
client = OpenSearch(
hosts=['localhost:9200'],
http_auth=('admin', 'admin'),
use_ssl=True,
verify_certs=False
)
# Initialize TQL with field mappings
mappings = {
'user.name': {'type': 'keyword'},
'user.email': {'type': 'text'},
'timestamp': {'type': 'date'}
}
tql = TQL(mappings)
# Execute queries with automatic DSL translation
results = tql.execute_opensearch(
opensearch_client=client,
index='users-*',
query='user.name = "admin" AND status = "active"'
)
# Complex queries with mutators and post-processing
results = tql.execute_opensearch(
opensearch_client=client,
index='logs-*',
query='email | lowercase contains "@example.com" AND level = "ERROR"'
)
# TQL automatically applies post-processing for mutators
๐ฏ Core Features
๐ Unified Query Syntax
Write one query, run it anywhere - files, OpenSearch, in-memory data:
# Same query works everywhere
query = 'status = "active" AND age > 25'
# Query files
tql.query("users.json", query)
# Query OpenSearch
tql.execute_opensearch(client, "users-*", query)
# Query Python data
tql.query(python_list, query)
๐ First-Class File Support
Query files as easily as databases:
# JSON/JSONL files
tql.query("logs.jsonl", "level = 'ERROR'")
# CSV files with automatic header detection
tql.query("products.csv", "price > 100 AND stock < 10")
# Folders with glob patterns
tql.query("logs/2024/*.jsonl", "status = 500", recursive=True)
# Streaming for large files (CLI)
$ tql 'status = 200' large-file.jsonl # Processes without loading to memory
๐ 25+ Field Mutators
Transform data inline before comparison:
# String transformations
'email | lowercase | trim = "admin@example.com"'
'name | uppercase = "JOHN DOE"'
# Encoding/decoding
'data | b64decode | lowercase = "secret"'
'password | md5 = "5f4dcc3b5aa765d61d8327deb882cf99"'
# Network operations
'ip | is_private = true' # Check if IP is RFC 1918
'domain | defang = "hxxp://evil[.]com"' # Security analysis
# DNS lookups
'hostname | nslookup contains "8.8.8.8"'
# GeoIP enrichment
'ip | geoip.country_name = "United States"'
# List operations
'scores | avg > 80'
'prices | sum between [100, 500]'
๐ Statistical Aggregations
Analyze data with built-in stats functions:
# Simple aggregations
tql.query(data, '| stats count(), sum(revenue), avg(price)')
# Grouped analysis
tql.query(data, '| stats count() by status, region')
# Top N analysis
tql.query(data, '| stats sum(sales, top 10) by product')
# Combined filtering and stats
tql.query(data, 'region = "west" | stats avg(revenue) by category')
๐ง OpenSearch Integration
Seamless OpenSearch/Elasticsearch integration:
- Automatic DSL Translation: TQL queries โ OpenSearch Query DSL
- Smart Field Mapping: Handles keyword vs text fields automatically
- Post-Processing: Apply mutators that OpenSearch can't handle
- Pagination Support: Handle large result sets efficiently
# TQL handles field mapping automatically
mappings = {'user.email': {'type': 'text', 'fields': {'keyword': {'type': 'keyword'}}}}
tql = TQL(mappings)
# Exact match uses .keyword automatically
query = 'user.email = "admin@example.com"' # Uses user.email.keyword
# Mutators trigger post-processing when needed
query = 'user.email | lowercase contains "admin"' # Post-processes results
๐ Syntax Guide
Comparison Operators
# Equality
'status = "active"' # Exact match (alias: eq)
'status != "inactive"' # Not equal (alias: ne)
# Numeric comparisons
'age > 25' # Greater than
'age >= 18' # Greater or equal
'age < 65' # Less than
'age <= 100' # Less or equal
# String operations
'email contains "@example.com"' # Substring
'name startswith "John"' # Prefix
'filename endswith ".pdf"' # Suffix
'email regexp "^\\w+@\\w+\\.\\w+$"' # Regex
# Range and membership
'age between [18, 65]' # Inclusive range
'status in ["active", "pending"]' # Value in list
'priority range [1, 5]' # Alias for between
# Existence checks
'field exists' # Field is present
'field not exists' # Field is missing
'field is null' # Field is null
'field is not null' # Field is not null
# Network operations
'ip cidr "192.168.0.0/16"' # IP in CIDR range
Logical Operators
# AND (all conditions must be true)
'age > 25 AND city = "NYC"'
'status = "active" AND role in ["admin", "moderator"]'
# OR (any condition must be true)
'city = "NYC" OR city = "LA"'
'status = "admin" OR role = "superuser"'
# NOT (negates condition)
'NOT (age < 18)'
'NOT status = "deleted"'
# Complex expressions with parentheses
'(age > 25 AND city = "NYC") OR (status = "vip" AND score > 90)'
Collection Operators
# ANY - at least one array element matches
'ANY tags = "premium"'
'ANY user.roles = "admin"'
# ALL - every array element matches
'ALL scores >= 80'
'ALL status = "active"'
# NONE - no array elements match
'NONE flags = "spam"'
'NONE violations.severity = "critical"'
Nested Field Access
# Dot notation for nested objects
'user.profile.email contains "@example.com"'
'metadata.tags.priority = "high"'
# Array indexing
'tags[0] = "urgent"'
'history[5].status = "completed"'
Field Mutators Reference
String Mutators
lowercase,uppercase- Case conversiontrim- Remove whitespacesplit(delimiter)- Split string into arraylength- Get string lengthreplace(old, new)- Replace substring
Encoding Mutators
b64encode,b64decode- Base64 encoding/decodingurldecode- URL decodehexencode,hexdecode- Hex encoding/decodingmd5,sha256- Cryptographic hashing
Network/Security Mutators
refang- Convert defanged indicators (hxxp โ http)defang- Defang URLs for safe displayis_private- Check if IP is private (RFC 1918)is_global- Check if IP is globally routable
DNS Mutators
nslookup- Resolve hostname to IP addresses
GeoIP Mutators
geoip- Enrich IP with geolocation data- Returns:
geo.country_name,geo.city_name,geo.location,geo.continent_code, etc.
List Mutators
any,all- Boolean aggregationsavg,average- Calculate meansum- Calculate summin,max- Find min/max values
๐ Statistical Aggregations
TQL includes a powerful stats engine for data analysis:
Available Functions
# Counting
'| stats count()' # Count all records
'| stats count(field)' # Count non-null values
'| stats unique_count(field)' # Count distinct values
# Numeric aggregations
'| stats sum(revenue)' # Calculate sum
'| stats avg(price)' # Calculate average (aliases: average, mean)
'| stats min(age), max(age)' # Find min/max values
'| stats median(score)' # Calculate median
# Statistical measures
'| stats std(values)' # Standard deviation (aliases: stdev)
'| stats percentile(score, 95)' # Calculate percentile
# Value extraction
'| stats values(category)' # Get unique values
Grouping and Top N
# Group by single field
'| stats count() by status'
# Group by multiple fields
'| stats sum(revenue) by region, category'
# Top N analysis
'| stats sum(sales, top 10) by product'
# Multiple aggregations
'| stats count(), sum(revenue), avg(price) by status'
Combined Filtering and Stats
# Filter then aggregate
'status = "success" AND region = "west" | stats avg(revenue) by category'
# Complex analytics
'timestamp > "2024-01-01" | stats count(), sum(bytes), avg(response_time) by endpoint'
๐ OpenSearch Integration Guide
Setup
from opensearchpy import OpenSearch
from tql import TQL
# Create OpenSearch client
client = OpenSearch(
hosts=['localhost:9200'],
http_auth=('admin', 'admin'),
use_ssl=True,
verify_certs=False
)
# Get index mappings
response = client.indices.get_mapping(index='users-*')
mappings = response['users-2024']['mappings']['properties']
# Initialize TQL with mappings
tql = TQL(mappings)
Query Translation
TQL automatically translates queries to OpenSearch DSL:
# TQL Query
query = 'age > 25 AND status = "active"'
# Translates to OpenSearch DSL:
{
"query": {
"bool": {
"must": [
{"range": {"age": {"gt": 25}}},
{"term": {"status.keyword": "active"}}
]
}
}
}
# Execute seamlessly
results = tql.execute_opensearch(client, 'users-*', query)
Field Mapping Intelligence
TQL automatically handles field types:
# Text field with keyword subfield
mappings = {
'email': {
'type': 'text',
'fields': {
'keyword': {'type': 'keyword'}
}
}
}
# Exact match - uses .keyword automatically
'email = "admin@example.com"' # โ term query on email.keyword
# Full-text search - uses text field
'email contains "example"' # โ match query on email
# Case-insensitive - triggers post-processing
'email | lowercase = "admin@example.com"' # โ fetch + filter
Post-Processing
When OpenSearch can't handle operations, TQL applies post-processing:
# Mutators that require post-processing
'email | lowercase contains "admin"' # Post-process: case conversion
'data | b64decode contains "secret"' # Post-process: decode
'ip | geoip.country = "US"' # Post-process: GeoIP lookup
# TQL automatically:
# 1. Executes base query in OpenSearch
# 2. Fetches results
# 3. Applies mutators in Python
# 4. Filters results
# 5. Returns final matches
Query Analysis
Analyze queries before execution to understand performance implications:
# Analyze query health
analysis = tql.analyze_query('email | lowercase contains "admin"', context='opensearch')
print(f"Health: {analysis['health']['status']}") # 'fair' (post-processing)
print(f"Score: {analysis['health']['score']}") # 85
print(f"Post-processing: {analysis['mutator_health']['requires_post_processing']}") # True
# Recommendations for optimization
for issue in analysis['health']['issues']:
print(f"Issue: {issue['message']}")
print(f"Fix: {issue['recommendation']}")
๐ Documentation
Comprehensive documentation is available in the docs/ directory:
- Getting Started - Quick introduction and basic concepts
- Query Basics - Syntax fundamentals
- Operators Reference - Complete operator guide
- Mutators Reference - All 25+ mutator functions
- OpenSearch Integration - Complete OpenSearch guide
- Stats & Aggregations - Statistical analysis guide
- API Reference - Python API documentation
- Examples - Real-world query examples
โก Performance
Benchmarks
Python Implementation:
- In-memory queries: ~10,000 records/sec
- File parsing (JSON): ~5MB/sec
- OpenSearch queries: Limited by network latency
Rust CLI (300x faster):
- In-memory queries: ~3,000,000 records/sec
- File parsing (JSON): ~150MB/sec
- Large file streaming: Process 50MB in ~200ms
Optimization Tips
# Use CLI for large files (300x faster)
$ tql 'status = 200' 50MB-file.jsonl # โ Fast (Rust)
$ python -m tql 'status = 200' 50MB-file.jsonl # โ Slow (Python)
# Pre-compile queries for reuse
ast = tql.parse('age > 25 AND status = "active"')
results1 = tql.evaluate(ast, dataset1)
results2 = tql.evaluate(ast, dataset2)
# Use OpenSearch for large datasets
tql.execute_opensearch(client, 'huge-index-*', query) # Leverages OpenSearch's speed
# Minimize post-processing
'email.keyword = "admin@example.com"' # โ Fast (OpenSearch only)
'email | lowercase = "admin@example.com"' # โ Slower (post-processing)
๐ ๏ธ Development
Installation
# Clone repository
git clone https://github.com/tellaro/tellaro-query-language.git
cd tellaro-query-language
# Install with Poetry (recommended)
poetry install
# Or with pip
pip install -e .
Testing
# Run all tests
poetry run tests
# Run specific test file
poetry run pytest tests/test_parser.py -v
# Run with coverage
poetry run cov
# Run integration tests (requires OpenSearch)
cp .env.example .env # Configure OpenSearch connection
poetry run pytest tests/test_opensearch_integration.py -v
Code Quality
# Format code
poetry run black .
# Type checking
poetry run mypy src
# Linting
poetry run pylint src
poetry run flake8 src
# Security checks
poetry run bandit -r src/
๐บ๏ธ Roadmap
โ Implemented Features
- โ Core query engine with all operators
- โ 25+ field mutators (string, encoding, network, DNS, GeoIP, list)
- โ Statistical aggregations with grouping
- โ File support (JSON, JSONL, CSV)
- โ OpenSearch/Elasticsearch backend
- โ Intelligent post-processing
- โ Rust CLI for performance
- โ Mutator caching for GeoIP/DNS
- โ Query health analysis
๐ง In Progress
- ๐ง OpenSearch stats aggregation translation
- ๐ง Additional hash functions (SHA1, SHA512)
- ๐ง JSON parsing mutator
- ๐ง Timestamp conversion mutators
๐ Planned Features
- ๐ ElasticSearch backend support
- ๐ PostgreSQL/MySQL backends
- ๐ Query optimization engine
- ๐ Custom mutator plugins
- ๐ GraphQL-style field selection
- ๐ Parallel record evaluation
- ๐ Incremental file processing
๐ฎ Future Considerations
- ๐ฎ Time-series specific operators
- ๐ฎ Machine learning integration
- ๐ฎ Distributed query execution
- ๐ฎ Query caching layer
๐ค Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
How to Contribute
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Run tests (
poetry run tests) - Run linters (
poetry run lint) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
๐ License
Tellaro Query Language (TQL) is source-available software with specific usage terms:
โ Permitted Uses:
- Personal use (individual, non-commercial)
- Organizational use (within your company/organization)
- Integration into your applications and services
- Internal tools and automation
โ Restricted Uses:
- Creating derivative query language products
- Commercial redistribution or resale
- Offering TQL-based commercial services to third parties
- Using source code to build competing products
For commercial licensing inquiries, contact: support@tellaro.io
See LICENSE for complete terms and conditions.
๐ Related Projects
- TQL Rust - High-performance Rust implementation
- Tellaro Platform - Security operations platform using TQL
๐ฌ Support
- Issues: GitHub Issues
- Documentation: Full Documentation
- Examples: Cookbook
- Email: support@tellaro.io
๐ Quick Examples
Security Log Analysis
# Find high-severity events from private IPs
query = '''
source_ip | is_private = true AND
severity in ["high", "critical"] AND
(ANY tags = "malware" OR url | defang contains "suspicious")
'''
results = tql.query("security-logs.jsonl", query)
E-commerce Analytics
# Analyze sales by region for premium products
query = '''
product_tier = "premium" AND
order_date > "2024-01-01" |
stats sum(revenue), avg(order_value), count() by region
'''
results = tql.query("sales.json", query)
System Monitoring
# Find servers with high resource usage
query = '''
hostname | nslookup exists AND
(cpu_usage > 80 OR memory_usage > 90) AND
status = "production"
'''
results = tql.execute_opensearch(client, "metrics-*", query)
Made with โค๏ธ by the Tellaro Team
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tellaro_query_language-1.3.8.tar.gz.
File metadata
- Download URL: tellaro_query_language-1.3.8.tar.gz
- Upload date:
- Size: 164.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.3 CPython/3.11.15 Linux/6.17.0-1008-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5faa7932bbfd6083c326c1d44b798e8b58505cf30eb514b648c910305543e651
|
|
| MD5 |
ec1f39bed0b33e09142a8723f94c148c
|
|
| BLAKE2b-256 |
7cca5981c8b30221d6d39853df778e67ddcad50bda48db9e282312f98494e520
|
File details
Details for the file tellaro_query_language-1.3.8-py3-none-any.whl.
File metadata
- Download URL: tellaro_query_language-1.3.8-py3-none-any.whl
- Upload date:
- Size: 186.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.3 CPython/3.11.15 Linux/6.17.0-1008-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
71f4c5a05fda962318cae662f1911d8be5e03be7f915d9607e232fac13b7d5a2
|
|
| MD5 |
29e51e9167c9cc48905f25f5ed99ca2d
|
|
| BLAKE2b-256 |
9cb0ffbfcc29eac6cbb4cce05bed10ec300f56d948db69b86c84410fc41a592e
|