Skip to main content

A flexible, human-friendly query language for searching and filtering structured data

Project description

Tellaro Query Language (TQL)

PyPI version Tests Status Coverage Status Flake8 Status Python 3.11-3.13 License: Source Available

A flexible, human-friendly query language for searching and filtering structured data across files, databases, and search engines.

TQL provides a unified, readable syntax for expressing complex queries that works seamlessly with:

  • Files: Query JSON, JSONL, CSV files directly with CLI or Python API
  • OpenSearch/Elasticsearch: Convert TQL to DSL queries automatically
  • In-Memory Data: Filter Python dictionaries and lists
  • Statistical Analysis: Built-in aggregations and grouping
# Query JSON files directly
results = tql.query("logs.jsonl", "status = 200 AND response_time > 500")

# Query OpenSearch with automatic DSL translation
results = tql.execute_opensearch(client, "events-*",
    "user.role = 'admin' AND timestamp > '2024-01-01'")

# Aggregate data with stats
results = tql.query("sales.json", "region = 'west' | stats sum(revenue) by product")

๐Ÿš€ Quick Start

Installation

# Install from PyPI (Python package)
pip install tellaro-query-language

# Install with OpenSearch support
pip install tellaro-query-language[opensearch]

# Or install Rust CLI (300x faster for large files)
cargo install tellaro-query-language

Query Files with CLI

TQL includes a blazing-fast command-line interface for querying files:

# Query JSON/JSONL files
tql 'status = "active"' users.json
tql 'age > 25 AND city = "NYC"' data.jsonl

# Query CSV files (auto-detects headers)
tql 'price > 100 AND category = "electronics"' products.csv

# Statistical aggregations
tql '| stats count() by status' events.jsonl
tql 'status = 200 | stats average(response_time) by endpoint' logs.jsonl

# Process folders recursively
tql 'level = "ERROR"' logs/ --pattern "*.jsonl" --recursive

# Pipe data from stdin
cat data.jsonl | tql 'score > 90'

Performance: The Rust CLI processes 50MB files in milliseconds vs. seconds for Python implementations.

Query Files with Python API

from tql import TQL

tql = TQL()

# Query JSON files directly
results = tql.query("data.json", "user.role = 'admin' AND status = 'active'")

# Query with field transformations
results = tql.query("logs.jsonl", "email | lowercase contains '@example.com'")

# Statistical analysis
results = tql.query("sales.json", "| stats sum(revenue), avg(price) by category")

Query In-Memory Data

from tql import TQL

tql = TQL()
data = [
    {'name': 'Alice', 'age': 30, 'city': 'NYC'},
    {'name': 'Bob', 'age': 25, 'city': 'LA'},
    {'name': 'Charlie', 'age': 35, 'city': 'NYC'}
]

# Simple queries
results = tql.query(data, 'age > 27')
# Returns: [{'name': 'Alice', 'age': 30, 'city': 'NYC'},
#           {'name': 'Charlie', 'age': 35, 'city': 'NYC'}]

# Logical operators
results = tql.query(data, 'age >= 30 AND city = "NYC"')
# Returns: [{'name': 'Alice', ...}, {'name': 'Charlie', ...}]

# Field transformations
results = tql.query(data, 'name | lowercase = "alice"')

Query OpenSearch

from opensearchpy import OpenSearch
from tql import TQL

# Initialize OpenSearch client
client = OpenSearch(
    hosts=['localhost:9200'],
    http_auth=('admin', 'admin'),
    use_ssl=True,
    verify_certs=False
)

# Initialize TQL with field mappings
mappings = {
    'user.name': {'type': 'keyword'},
    'user.email': {'type': 'text'},
    'timestamp': {'type': 'date'}
}
tql = TQL(mappings)

# Execute queries with automatic DSL translation
results = tql.execute_opensearch(
    opensearch_client=client,
    index='users-*',
    query='user.name = "admin" AND status = "active"'
)

# Complex queries with mutators and post-processing
results = tql.execute_opensearch(
    opensearch_client=client,
    index='logs-*',
    query='email | lowercase contains "@example.com" AND level = "ERROR"'
)
# TQL automatically applies post-processing for mutators

๐ŸŽฏ Core Features

๐Ÿ” Unified Query Syntax

Write one query, run it anywhere - files, OpenSearch, in-memory data:

# Same query works everywhere
query = 'status = "active" AND age > 25'

# Query files
tql.query("users.json", query)

# Query OpenSearch
tql.execute_opensearch(client, "users-*", query)

# Query Python data
tql.query(python_list, query)

๐Ÿ“ First-Class File Support

Query files as easily as databases:

# JSON/JSONL files
tql.query("logs.jsonl", "level = 'ERROR'")

# CSV files with automatic header detection
tql.query("products.csv", "price > 100 AND stock < 10")

# Folders with glob patterns
tql.query("logs/2024/*.jsonl", "status = 500", recursive=True)

# Streaming for large files (CLI)
$ tql 'status = 200' large-file.jsonl  # Processes without loading to memory

๐Ÿ”„ 25+ Field Mutators

Transform data inline before comparison:

# String transformations
'email | lowercase | trim = "admin@example.com"'
'name | uppercase = "JOHN DOE"'

# Encoding/decoding
'data | b64decode | lowercase = "secret"'
'password | md5 = "5f4dcc3b5aa765d61d8327deb882cf99"'

# Network operations
'ip | is_private = true'           # Check if IP is RFC 1918
'domain | defang = "hxxp://evil[.]com"'  # Security analysis

# DNS lookups
'hostname | nslookup contains "8.8.8.8"'

# GeoIP enrichment
'ip | geoip.country_name = "United States"'

# List operations
'scores | avg > 80'
'prices | sum between [100, 500]'

๐Ÿ“Š Statistical Aggregations

Analyze data with built-in stats functions:

# Simple aggregations
tql.query(data, '| stats count(), sum(revenue), avg(price)')

# Grouped analysis
tql.query(data, '| stats count() by status, region')

# Top N analysis
tql.query(data, '| stats sum(sales, top 10) by product')

# Combined filtering and stats
tql.query(data, 'region = "west" | stats avg(revenue) by category')

๐Ÿ”ง OpenSearch Integration

Seamless OpenSearch/Elasticsearch integration:

  • Automatic DSL Translation: TQL queries โ†’ OpenSearch Query DSL
  • Smart Field Mapping: Handles keyword vs text fields automatically
  • Post-Processing: Apply mutators that OpenSearch can't handle
  • Pagination Support: Handle large result sets efficiently
# TQL handles field mapping automatically
mappings = {'user.email': {'type': 'text', 'fields': {'keyword': {'type': 'keyword'}}}}
tql = TQL(mappings)

# Exact match uses .keyword automatically
query = 'user.email = "admin@example.com"'  # Uses user.email.keyword

# Mutators trigger post-processing when needed
query = 'user.email | lowercase contains "admin"'  # Post-processes results

๐Ÿ“– Syntax Guide

Comparison Operators

# Equality
'status = "active"'           # Exact match (alias: eq)
'status != "inactive"'        # Not equal (alias: ne)

# Numeric comparisons
'age > 25'                    # Greater than
'age >= 18'                   # Greater or equal
'age < 65'                    # Less than
'age <= 100'                  # Less or equal

# String operations
'email contains "@example.com"'      # Substring
'name startswith "John"'            # Prefix
'filename endswith ".pdf"'          # Suffix
'email regexp "^\\w+@\\w+\\.\\w+$"'  # Regex

# Range and membership
'age between [18, 65]'              # Inclusive range
'status in ["active", "pending"]'   # Value in list
'priority range [1, 5]'             # Alias for between

# Existence checks
'field exists'                      # Field is present
'field not exists'                  # Field is missing
'field is null'                     # Field is null
'field is not null'                 # Field is not null

# Network operations
'ip cidr "192.168.0.0/16"'          # IP in CIDR range

Logical Operators

# AND (all conditions must be true)
'age > 25 AND city = "NYC"'
'status = "active" AND role in ["admin", "moderator"]'

# OR (any condition must be true)
'city = "NYC" OR city = "LA"'
'status = "admin" OR role = "superuser"'

# NOT (negates condition)
'NOT (age < 18)'
'NOT status = "deleted"'

# Complex expressions with parentheses
'(age > 25 AND city = "NYC") OR (status = "vip" AND score > 90)'

Collection Operators

# ANY - at least one array element matches
'ANY tags = "premium"'
'ANY user.roles = "admin"'

# ALL - every array element matches
'ALL scores >= 80'
'ALL status = "active"'

# NONE - no array elements match
'NONE flags = "spam"'
'NONE violations.severity = "critical"'

Nested Field Access

# Dot notation for nested objects
'user.profile.email contains "@example.com"'
'metadata.tags.priority = "high"'

# Array indexing
'tags[0] = "urgent"'
'history[5].status = "completed"'

Field Mutators Reference

String Mutators

  • lowercase, uppercase - Case conversion
  • trim - Remove whitespace
  • split(delimiter) - Split string into array
  • length - Get string length
  • replace(old, new) - Replace substring

Encoding Mutators

  • b64encode, b64decode - Base64 encoding/decoding
  • urldecode - URL decode
  • hexencode, hexdecode - Hex encoding/decoding
  • md5, sha256 - Cryptographic hashing

Network/Security Mutators

  • refang - Convert defanged indicators (hxxp โ†’ http)
  • defang - Defang URLs for safe display
  • is_private - Check if IP is private (RFC 1918)
  • is_global - Check if IP is globally routable

DNS Mutators

  • nslookup - Resolve hostname to IP addresses

GeoIP Mutators

  • geoip - Enrich IP with geolocation data
  • Returns: geo.country_name, geo.city_name, geo.location, geo.continent_code, etc.

List Mutators

  • any, all - Boolean aggregations
  • avg, average - Calculate mean
  • sum - Calculate sum
  • min, max - Find min/max values

๐Ÿ“Š Statistical Aggregations

TQL includes a powerful stats engine for data analysis:

Available Functions

# Counting
'| stats count()'              # Count all records
'| stats count(field)'         # Count non-null values
'| stats unique_count(field)'  # Count distinct values

# Numeric aggregations
'| stats sum(revenue)'         # Calculate sum
'| stats avg(price)'           # Calculate average (aliases: average, mean)
'| stats min(age), max(age)'   # Find min/max values
'| stats median(score)'        # Calculate median

# Statistical measures
'| stats std(values)'          # Standard deviation (aliases: stdev)
'| stats percentile(score, 95)' # Calculate percentile

# Value extraction
'| stats values(category)'     # Get unique values

Grouping and Top N

# Group by single field
'| stats count() by status'

# Group by multiple fields
'| stats sum(revenue) by region, category'

# Top N analysis
'| stats sum(sales, top 10) by product'

# Multiple aggregations
'| stats count(), sum(revenue), avg(price) by status'

Combined Filtering and Stats

# Filter then aggregate
'status = "success" AND region = "west" | stats avg(revenue) by category'

# Complex analytics
'timestamp > "2024-01-01" | stats count(), sum(bytes), avg(response_time) by endpoint'

๐Ÿ”Œ OpenSearch Integration Guide

Setup

from opensearchpy import OpenSearch
from tql import TQL

# Create OpenSearch client
client = OpenSearch(
    hosts=['localhost:9200'],
    http_auth=('admin', 'admin'),
    use_ssl=True,
    verify_certs=False
)

# Get index mappings
response = client.indices.get_mapping(index='users-*')
mappings = response['users-2024']['mappings']['properties']

# Initialize TQL with mappings
tql = TQL(mappings)

Query Translation

TQL automatically translates queries to OpenSearch DSL:

# TQL Query
query = 'age > 25 AND status = "active"'

# Translates to OpenSearch DSL:
{
    "query": {
        "bool": {
            "must": [
                {"range": {"age": {"gt": 25}}},
                {"term": {"status.keyword": "active"}}
            ]
        }
    }
}

# Execute seamlessly
results = tql.execute_opensearch(client, 'users-*', query)

Field Mapping Intelligence

TQL automatically handles field types:

# Text field with keyword subfield
mappings = {
    'email': {
        'type': 'text',
        'fields': {
            'keyword': {'type': 'keyword'}
        }
    }
}

# Exact match - uses .keyword automatically
'email = "admin@example.com"'  # โ†’ term query on email.keyword

# Full-text search - uses text field
'email contains "example"'      # โ†’ match query on email

# Case-insensitive - triggers post-processing
'email | lowercase = "admin@example.com"'  # โ†’ fetch + filter

Post-Processing

When OpenSearch can't handle operations, TQL applies post-processing:

# Mutators that require post-processing
'email | lowercase contains "admin"'    # Post-process: case conversion
'data | b64decode contains "secret"'    # Post-process: decode
'ip | geoip.country = "US"'            # Post-process: GeoIP lookup

# TQL automatically:
# 1. Executes base query in OpenSearch
# 2. Fetches results
# 3. Applies mutators in Python
# 4. Filters results
# 5. Returns final matches

Query Analysis

Analyze queries before execution to understand performance implications:

# Analyze query health
analysis = tql.analyze_query('email | lowercase contains "admin"', context='opensearch')

print(f"Health: {analysis['health']['status']}")  # 'fair' (post-processing)
print(f"Score: {analysis['health']['score']}")    # 85
print(f"Post-processing: {analysis['mutator_health']['requires_post_processing']}")  # True

# Recommendations for optimization
for issue in analysis['health']['issues']:
    print(f"Issue: {issue['message']}")
    print(f"Fix: {issue['recommendation']}")

๐Ÿ“š Documentation

Comprehensive documentation is available in the docs/ directory:


โšก Performance

Benchmarks

Python Implementation:

  • In-memory queries: ~10,000 records/sec
  • File parsing (JSON): ~5MB/sec
  • OpenSearch queries: Limited by network latency

Rust CLI (300x faster):

  • In-memory queries: ~3,000,000 records/sec
  • File parsing (JSON): ~150MB/sec
  • Large file streaming: Process 50MB in ~200ms

Optimization Tips

# Use CLI for large files (300x faster)
$ tql 'status = 200' 50MB-file.jsonl  # โœ“ Fast (Rust)
$ python -m tql 'status = 200' 50MB-file.jsonl  # โœ— Slow (Python)

# Pre-compile queries for reuse
ast = tql.parse('age > 25 AND status = "active"')
results1 = tql.evaluate(ast, dataset1)
results2 = tql.evaluate(ast, dataset2)

# Use OpenSearch for large datasets
tql.execute_opensearch(client, 'huge-index-*', query)  # Leverages OpenSearch's speed

# Minimize post-processing
'email.keyword = "admin@example.com"'  # โœ“ Fast (OpenSearch only)
'email | lowercase = "admin@example.com"'  # โœ— Slower (post-processing)

๐Ÿ› ๏ธ Development

Installation

# Clone repository
git clone https://github.com/tellaro/tellaro-query-language.git
cd tellaro-query-language

# Install with Poetry (recommended)
poetry install

# Or with pip
pip install -e .

Testing

# Run all tests
poetry run tests

# Run specific test file
poetry run pytest tests/test_parser.py -v

# Run with coverage
poetry run cov

# Run integration tests (requires OpenSearch)
cp .env.example .env  # Configure OpenSearch connection
poetry run pytest tests/test_opensearch_integration.py -v

Code Quality

# Format code
poetry run black .

# Type checking
poetry run mypy src

# Linting
poetry run pylint src
poetry run flake8 src

# Security checks
poetry run bandit -r src/

๐Ÿ—บ๏ธ Roadmap

โœ… Implemented Features

  • โœ… Core query engine with all operators
  • โœ… 25+ field mutators (string, encoding, network, DNS, GeoIP, list)
  • โœ… Statistical aggregations with grouping
  • โœ… File support (JSON, JSONL, CSV)
  • โœ… OpenSearch/Elasticsearch backend
  • โœ… Intelligent post-processing
  • โœ… Rust CLI for performance
  • โœ… Mutator caching for GeoIP/DNS
  • โœ… Query health analysis

๐Ÿšง In Progress

  • ๐Ÿšง OpenSearch stats aggregation translation
  • ๐Ÿšง Additional hash functions (SHA1, SHA512)
  • ๐Ÿšง JSON parsing mutator
  • ๐Ÿšง Timestamp conversion mutators

๐Ÿ“‹ Planned Features

  • ๐Ÿ“‹ ElasticSearch backend support
  • ๐Ÿ“‹ PostgreSQL/MySQL backends
  • ๐Ÿ“‹ Query optimization engine
  • ๐Ÿ“‹ Custom mutator plugins
  • ๐Ÿ“‹ GraphQL-style field selection
  • ๐Ÿ“‹ Parallel record evaluation
  • ๐Ÿ“‹ Incremental file processing

๐Ÿ”ฎ Future Considerations

  • ๐Ÿ”ฎ Time-series specific operators
  • ๐Ÿ”ฎ Machine learning integration
  • ๐Ÿ”ฎ Distributed query execution
  • ๐Ÿ”ฎ Query caching layer

๐Ÿค Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

How to Contribute

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes
  4. Run tests (poetry run tests)
  5. Run linters (poetry run lint)
  6. Commit changes (git commit -m 'Add amazing feature')
  7. Push to branch (git push origin feature/amazing-feature)
  8. Open a Pull Request

๐Ÿ“„ License

Tellaro Query Language (TQL) is source-available software with specific usage terms:

โœ… Permitted Uses:

  • Personal use (individual, non-commercial)
  • Organizational use (within your company/organization)
  • Integration into your applications and services
  • Internal tools and automation

โŒ Restricted Uses:

  • Creating derivative query language products
  • Commercial redistribution or resale
  • Offering TQL-based commercial services to third parties
  • Using source code to build competing products

For commercial licensing inquiries, contact: support@tellaro.io

See LICENSE for complete terms and conditions.


๐Ÿ”— Related Projects


๐Ÿ’ฌ Support


๐ŸŒŸ Quick Examples

Security Log Analysis

# Find high-severity events from private IPs
query = '''
    source_ip | is_private = true AND
    severity in ["high", "critical"] AND
    (ANY tags = "malware" OR url | defang contains "suspicious")
'''
results = tql.query("security-logs.jsonl", query)

E-commerce Analytics

# Analyze sales by region for premium products
query = '''
    product_tier = "premium" AND
    order_date > "2024-01-01" |
    stats sum(revenue), avg(order_value), count() by region
'''
results = tql.query("sales.json", query)

System Monitoring

# Find servers with high resource usage
query = '''
    hostname | nslookup exists AND
    (cpu_usage > 80 OR memory_usage > 90) AND
    status = "production"
'''
results = tql.execute_opensearch(client, "metrics-*", query)

Made with โค๏ธ by the Tellaro Team

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tellaro_query_language-1.3.8.tar.gz (164.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tellaro_query_language-1.3.8-py3-none-any.whl (186.0 kB view details)

Uploaded Python 3

File details

Details for the file tellaro_query_language-1.3.8.tar.gz.

File metadata

  • Download URL: tellaro_query_language-1.3.8.tar.gz
  • Upload date:
  • Size: 164.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.3 CPython/3.11.15 Linux/6.17.0-1008-azure

File hashes

Hashes for tellaro_query_language-1.3.8.tar.gz
Algorithm Hash digest
SHA256 5faa7932bbfd6083c326c1d44b798e8b58505cf30eb514b648c910305543e651
MD5 ec1f39bed0b33e09142a8723f94c148c
BLAKE2b-256 7cca5981c8b30221d6d39853df778e67ddcad50bda48db9e282312f98494e520

See more details on using hashes here.

File details

Details for the file tellaro_query_language-1.3.8-py3-none-any.whl.

File metadata

File hashes

Hashes for tellaro_query_language-1.3.8-py3-none-any.whl
Algorithm Hash digest
SHA256 71f4c5a05fda962318cae662f1911d8be5e03be7f915d9607e232fac13b7d5a2
MD5 29e51e9167c9cc48905f25f5ed99ca2d
BLAKE2b-256 9cb0ffbfcc29eac6cbb4cce05bed10ec300f56d948db69b86c84410fc41a592e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page