LINQ-inspired query library for Python collections with Polars-style syntax
Project description
Config Query Language API (cqla)
A query language for Python collections, inspired by LINQ with Polars-style syntax.
Installation
pip install cqla
The Problem
If you've worked with Pydantic models or dataclasses, you've probably written methods inside these classes, like this:
@dataclass
class Config:
name: str
value: str
enabled: bool
priority: int
@dataclass
class ConfigStore:
configurations: list[Config]
def search_by_name(self, name: str) -> Config | None:
for cfg in self.configurations:
if cfg.name == name:
return cfg
def get_enabled(self) -> list[Config]:
return [cfg for cfg in self.configurations if cfg.enabled]
def get_high_priority(self, threshold: int) -> list[Config]:
return [cfg for cfg in self.configurations if cfg.priority > threshold]
def get_enabled_high_priority(self, threshold: int) -> list[Config]:
return [
cfg for cfg in self.configurations
if cfg.enabled and cfg.priority > threshold
]
# ...and so on, a new method for every query pattern
This gets tedious. Every new query requirement means another method. The logic is scattered, repetitive, and hard to compose.
With cqla, you don't need any of those methods:
import cqla as cq
configs = [...] # list of Config objects
# Search by name
cq.Query(configs).filter(cq.field("name") == "database_url").first()
# Get enabled configs
cq.Query(configs).filter(cq.field("enabled") == True).collect()
# High priority enabled configs, sorted
(cq.Query(configs)
.filter((cq.field("enabled") == True) & (cq.field("priority") > 5))
.collect())
Works With
cqla works with any Python objects:
- Plain dicts (JSON-like data)
- dataclasses
- Pydantic models
- msgspec Structs
- Any object with attributes
Inspiration: LINQ and Polars had a Baby
cqla is inspired by LINQ (Language Integrated Query) from C#/.NET. LINQ lets you query collections using a SQL-like, composable syntax:
// C# LINQ
var results = configs
.Where(c => c.Enabled && c.Priority > 5)
.Select(c => new { c.Name, c.Value })
.OrderBy(c => c.Name);
cqla brings this same idea to Python, but with syntax borrowed from Polars:
# cqla (Python)
results = (
cq.Query(configs)
.filter((cq.field("enabled") == True) & (cq.field("priority") > 5))
.select("name", "value")
.collect()
)
Alternatives?
Libraries like pydash and toolz are excellent for functional programming patterns:
# pydash
import pydash as _
_.filter_(configs, lambda c: c.enabled and c.priority > 5)
_.map_(configs, lambda c: c.name.upper())
_.group_by(configs, "category")
# toolz
from toolz import filter, map, groupby
from toolz.curried import pipe
list(filter(lambda c: c.enabled, configs))
list(map(lambda c: c.name.upper(), configs))
groupby(lambda c: c.category, configs)
# composing operations in toolz
pipe(configs,
lambda x: filter(lambda c: c.enabled, x),
lambda x: map(lambda c: {"name": c.name.upper(), "priority": c.priority}, x),
list)
These work, but if you think in SQL, they feel inside-out. The data comes last, the operations are functions you wrap around things, and composing multiple operations requires nesting or piping.
cqla reads like SQL, and when you need custom transformations, .apply() lets you drop into a lambda:
# cqla - reads top to bottom, left to right
(
cq.Query(configs)
.filter(cq.field("enabled") == True) # WHERE enabled = true
.filter(cq.field("priority") > 5) # AND priority > 5
.group_by("category") # GROUP BY category
.having(cq.field("priority").count() > 2) # HAVING COUNT(priority) > 2
.agg( # SELECT ...
count=cq.field("name").count(),
avg_priority=cq.field("priority").mean(),
)
.collect()
)
# apply() for custom transformations
(
cq.Query(configs)
.filter(cq.field("enabled") == True)
.select(
"name",
name_upper=cq.field("name").apply(str.upper),
slug=cq.field("name").apply(lambda s: s.lower().replace(" ", "-")),
)
.collect()
)
Why Not Polars or Pandas?
Polars and Pandas are built for tabular data — rows and columns, where every row has the same schema. They're optimized for numerical computation on large datasets.
But configuration data, API responses, and domain objects are often semi-structured or nested:
configs = [
{
"name": "app",
"settings": {
"database": {"host": "localhost", "port": 5432},
"features": ["auth", "logging", "metrics"],
},
"metadata": {"version": 1, "tags": ["production"]},
},
{
"name": "worker",
"settings": {
"queue": "redis://localhost",
# no "database" key here
},
"metadata": {"version": 2}, # no "tags" key
},
]
Try loading this into Pandas:
import pandas as pd
df = pd.DataFrame(configs)
print(df)
# name settings metadata
# 0 app {'database': {'host': 'localhost', 'port': 54... {'version': 1, 'tags': ['production']}
# 1 worker {'queue': 'redis://localhost'} {'version': 2}
# Want to filter by database host? Good luck.
df[df["settings"].apply(lambda s: s.get("database", {}).get("host")) == "localhost"]
The nested dicts stay as opaque objects. You're back to writing lambdas and .apply().
Polars has the same issue:
import polars as pl
df = pl.DataFrame(configs)
# polars.exceptions.SchemaError:
# could not append value: {"database": {"host": "localhost" ...
# struct fields must have a consistent schema
Polars won't even load it because the schemas don't match.
cqla handles this naturally:
import cqla as cq
# Filter by nested field
(
cq.Query(configs)
.filter(cq.field("settings.database.host") == "localhost")
.collect()
)
# Access nested fields in select
(
cq.Query(configs)
.select(
"name",
db_host=cq.field("settings.database.host"),
version=cq.field("metadata.version"),
)
.collect()
)
Features
cqla supports the operations you'd expect from a query language:
import cqla as cq
data = [...] # list of dicts, dataclasses, Pydantic models, or any objects
# Filtering
cq.Query(data).filter(cq.field("age") > 30).collect()
cq.Query(data).filter((cq.field("age") > 30) & (cq.field("active") == True)).collect()
# Selecting fields
cq.Query(data).select("name", "email").collect()
cq.Query(data).select("name", uppercased=cq.field("name").str.to_uppercase()).collect()
# Adding computed columns
cq.Query(data).with_columns(
year=cq.field("created_at").dt.year(),
name_lower=cq.field("name").str.to_lowercase(),
).collect()
# Conditional expressions
cq.Query(data).select(
"name",
tier=cq.when(cq.field("score") >= 90).then("gold")
.when(cq.field("score") >= 70).then("silver")
.otherwise("bronze"),
).collect()
# Grouping and aggregation
cq.Query(data).group_by("department").agg(
count=cq.field("id").count(),
avg_salary=cq.field("salary").mean(),
).collect()
# Filtering groups (HAVING)
cq.Query(data).group_by("department").having(
cq.field("id").count() >= 5
).agg(
count=cq.field("id").count(),
).collect()
# Window functions
cq.Query(data).with_columns(
dept_avg=cq.field("salary").mean().over("department"),
).collect()
# Explode: expand list field into multiple rows
cq.Query(data).explode("tags").collect()
# [{"name": "alice", "tags": ["a", "b"]}] -> [{"name": "alice", "tags": "a"}, {"name": "alice", "tags": "b"}]
# Accessors for strings, lists, sets, datetimes
cq.field("name").str.contains("smith", literal=True)
cq.field("tags").list.len()
cq.field("categories").set.contains("electronics")
cq.field("created_at").dt.year()
Scalability
cqla is built on generators. Operations like filter, select, and with_columns don't materialize the full dataset until you call .collect(). This means you can process large datasets without loading everything into memory:
# Process a million records lazily
query = (
cq.Query(huge_dataset)
.filter(cq.field("status") == "active")
.select("id", "name")
)
# Only materializes when you iterate or collect
for record in query:
process(record)
# Or take just the first 10
query.limit(10).collect()
Examples
The examples/ directory contains interactive marimo notebooks demonstrating cqla with different data types:
json_example.py— querying plain dicts, nested field accesspydantic_example.py— querying Pydantic models, set operationsmsgspec_example.py— querying msgspec Structsstress_test.py— benchmarks with large datasets
To run the examples, clone the repo and install development dependencies:
git clone https://github.com/ahmedmuhammad/cqla.git
cd cqla
uv sync
uv run marimo edit examples/
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cqla-0.1.0.tar.gz.
File metadata
- Download URL: cqla-0.1.0.tar.gz
- Upload date:
- Size: 13.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2a1b2937deaf8ffd81a65d74288b502f4f4de391b72342e2b30866f240dadd49
|
|
| MD5 |
1391c559993e561e7df01353723b4a9a
|
|
| BLAKE2b-256 |
06494ff7dcd6c6f27bb2298a8ba1dfa4845fa05df955ba8d8ec1f7f398216a5f
|
File details
Details for the file cqla-0.1.0-py3-none-any.whl.
File metadata
- Download URL: cqla-0.1.0-py3-none-any.whl
- Upload date:
- Size: 18.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
928ceb5a5080b949fd80737da7202b4c3e40d4ffd5d74e617e8e7b174cbc945f
|
|
| MD5 |
d71a02d2667b5b5a547cab795c4738a5
|
|
| BLAKE2b-256 |
766cda1bfbd93f3a163852b30aeb31dd90860871d303a629f1b9b1b75095d45e
|