Skip to main content

LINQ-inspired query library for Python collections with Polars-style syntax

Project description

Config Query Language API (cqla)

A query language for Python collections, inspired by LINQ with Polars-style syntax.

Installation

pip install cqla

The Problem

If you've worked with Pydantic models or dataclasses, you've probably written methods inside these classes, like this:

@dataclass
class Config:
    name: str
    value: str
    enabled: bool
    priority: int

@dataclass
class ConfigStore:
    configurations: list[Config]

    def search_by_name(self, name: str) -> Config | None:
        for cfg in self.configurations:
            if cfg.name == name:
                return cfg

    def get_enabled(self) -> list[Config]:
        return [cfg for cfg in self.configurations if cfg.enabled]

    def get_high_priority(self, threshold: int) -> list[Config]:
        return [cfg for cfg in self.configurations if cfg.priority > threshold]

    def get_enabled_high_priority(self, threshold: int) -> list[Config]:
        return [
            cfg for cfg in self.configurations
            if cfg.enabled and cfg.priority > threshold
        ]

    # ...and so on, a new method for every query pattern

This gets tedious. Every new query requirement means another method. The logic is scattered, repetitive, and hard to compose.

With cqla, you don't need any of those methods:

import cqla as cq

configs = [...]  # list of Config objects

# Search by name
cq.Query(configs).filter(cq.field("name") == "database_url").first()

# Get enabled configs
cq.Query(configs).filter(cq.field("enabled") == True).collect()

# High priority enabled configs, sorted
(cq.Query(configs)
   .filter((cq.field("enabled") == True) & (cq.field("priority") > 5))
   .collect())

Works With

cqla works with any Python objects:

  • Plain dicts (JSON-like data)
  • dataclasses
  • Pydantic models
  • msgspec Structs
  • Any object with attributes

Inspiration: LINQ and Polars had a Baby

cqla is inspired by LINQ (Language Integrated Query) from C#/.NET. LINQ lets you query collections using a SQL-like, composable syntax:

// C# LINQ
var results = configs
    .Where(c => c.Enabled && c.Priority > 5)
    .Select(c => new { c.Name, c.Value })
    .OrderBy(c => c.Name);

cqla brings this same idea to Python, but with syntax borrowed from Polars:

# cqla (Python)
results = (
    cq.Query(configs)
    .filter((cq.field("enabled") == True) & (cq.field("priority") > 5))
    .select("name", "value")
    .collect()
)

Alternatives?

Libraries like pydash and toolz are excellent for functional programming patterns:

# pydash
import pydash as _

_.filter_(configs, lambda c: c.enabled and c.priority > 5)
_.map_(configs, lambda c: c.name.upper())
_.group_by(configs, "category")

# toolz
from toolz import filter, map, groupby
from toolz.curried import pipe

list(filter(lambda c: c.enabled, configs))
list(map(lambda c: c.name.upper(), configs))
groupby(lambda c: c.category, configs)

# composing operations in toolz
pipe(configs,
     lambda x: filter(lambda c: c.enabled, x),
     lambda x: map(lambda c: {"name": c.name.upper(), "priority": c.priority}, x),
     list)

These work, but if you think in SQL, they feel inside-out. The data comes last, the operations are functions you wrap around things, and composing multiple operations requires nesting or piping.

cqla reads like SQL, and when you need custom transformations, .apply() lets you drop into a lambda:

# cqla - reads top to bottom, left to right
(
    cq.Query(configs)
    .filter(cq.field("enabled") == True)        # WHERE enabled = true
    .filter(cq.field("priority") > 5)           # AND priority > 5
    .group_by("category")                       # GROUP BY category
    .having(cq.field("priority").count() > 2)   # HAVING COUNT(priority) > 2
    .agg(                                       # SELECT ...
        count=cq.field("name").count(),
        avg_priority=cq.field("priority").mean(),
    )
    .collect()
)

# apply() for custom transformations
(
    cq.Query(configs)
    .filter(cq.field("enabled") == True)
    .select(
        "name",
        name_upper=cq.field("name").apply(str.upper),
        slug=cq.field("name").apply(lambda s: s.lower().replace(" ", "-")),
    )
    .collect()
)

Why Not Polars or Pandas?

Polars and Pandas are built for tabular data — rows and columns, where every row has the same schema. They're optimized for numerical computation on large datasets.

But configuration data, API responses, and domain objects are often semi-structured or nested:

configs = [
    {
        "name": "app",
        "settings": {
            "database": {"host": "localhost", "port": 5432},
            "features": ["auth", "logging", "metrics"],
        },
        "metadata": {"version": 1, "tags": ["production"]},
    },
    {
        "name": "worker",
        "settings": {
            "queue": "redis://localhost",
            # no "database" key here
        },
        "metadata": {"version": 2},  # no "tags" key
    },
]

Try loading this into Pandas:

import pandas as pd

df = pd.DataFrame(configs)
print(df)
#      name                                           settings                              metadata
# 0     app  {'database': {'host': 'localhost', 'port': 54...  {'version': 1, 'tags': ['production']}
# 1  worker              {'queue': 'redis://localhost'}                         {'version': 2}

# Want to filter by database host? Good luck.
df[df["settings"].apply(lambda s: s.get("database", {}).get("host")) == "localhost"]

The nested dicts stay as opaque objects. You're back to writing lambdas and .apply().

Polars has the same issue:

import polars as pl

df = pl.DataFrame(configs)
# polars.exceptions.SchemaError:
# could not append value: {"database": {"host": "localhost" ...
# struct fields must have a consistent schema

Polars won't even load it because the schemas don't match.

cqla handles this naturally:

import cqla as cq

# Filter by nested field
(
    cq.Query(configs)
    .filter(cq.field("settings.database.host") == "localhost")
    .collect()
)

# Access nested fields in select
(
    cq.Query(configs)
    .select(
        "name",
        db_host=cq.field("settings.database.host"),
        version=cq.field("metadata.version"),
    )
    .collect()
)

Features

cqla supports the operations you'd expect from a query language:

import cqla as cq

data = [...]  # list of dicts, dataclasses, Pydantic models, or any objects

# Filtering
cq.Query(data).filter(cq.field("age") > 30).collect()
cq.Query(data).filter((cq.field("age") > 30) & (cq.field("active") == True)).collect()

# Selecting fields
cq.Query(data).select("name", "email").collect()
cq.Query(data).select("name", uppercased=cq.field("name").str.to_uppercase()).collect()

# Adding computed columns
cq.Query(data).with_columns(
    year=cq.field("created_at").dt.year(),
    name_lower=cq.field("name").str.to_lowercase(),
).collect()

# Conditional expressions
cq.Query(data).select(
    "name",
    tier=cq.when(cq.field("score") >= 90).then("gold")
          .when(cq.field("score") >= 70).then("silver")
          .otherwise("bronze"),
).collect()

# Grouping and aggregation
cq.Query(data).group_by("department").agg(
    count=cq.field("id").count(),
    avg_salary=cq.field("salary").mean(),
).collect()

# Filtering groups (HAVING)
cq.Query(data).group_by("department").having(
    cq.field("id").count() >= 5
).agg(
    count=cq.field("id").count(),
).collect()

# Window functions
cq.Query(data).with_columns(
    dept_avg=cq.field("salary").mean().over("department"),
).collect()

# Explode: expand list field into multiple rows
cq.Query(data).explode("tags").collect()
# [{"name": "alice", "tags": ["a", "b"]}] -> [{"name": "alice", "tags": "a"}, {"name": "alice", "tags": "b"}]

# Accessors for strings, lists, sets, datetimes
cq.field("name").str.contains("smith", literal=True)
cq.field("tags").list.len()
cq.field("categories").set.contains("electronics")
cq.field("created_at").dt.year()

Scalability

cqla is built on generators. Operations like filter, select, and with_columns don't materialize the full dataset until you call .collect(). This means you can process large datasets without loading everything into memory:

# Process a million records lazily
query = (
    cq.Query(huge_dataset)
    .filter(cq.field("status") == "active")
    .select("id", "name")
)

# Only materializes when you iterate or collect
for record in query:
    process(record)

# Or take just the first 10
query.limit(10).collect()

Examples

The examples/ directory contains interactive marimo notebooks demonstrating cqla with different data types:

  • json_example.py — querying plain dicts, nested field access
  • pydantic_example.py — querying Pydantic models, set operations
  • msgspec_example.py — querying msgspec Structs
  • stress_test.py — benchmarks with large datasets

To run the examples, clone the repo and install development dependencies:

git clone https://github.com/ahmedmuhammad/cqla.git
cd cqla
uv sync
uv run marimo edit examples/

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cqla-0.1.0.tar.gz (13.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cqla-0.1.0-py3-none-any.whl (18.8 kB view details)

Uploaded Python 3

File details

Details for the file cqla-0.1.0.tar.gz.

File metadata

  • Download URL: cqla-0.1.0.tar.gz
  • Upload date:
  • Size: 13.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for cqla-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2a1b2937deaf8ffd81a65d74288b502f4f4de391b72342e2b30866f240dadd49
MD5 1391c559993e561e7df01353723b4a9a
BLAKE2b-256 06494ff7dcd6c6f27bb2298a8ba1dfa4845fa05df955ba8d8ec1f7f398216a5f

See more details on using hashes here.

File details

Details for the file cqla-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: cqla-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 18.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for cqla-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 928ceb5a5080b949fd80737da7202b4c3e40d4ffd5d74e617e8e7b174cbc945f
MD5 d71a02d2667b5b5a547cab795c4738a5
BLAKE2b-256 766cda1bfbd93f3a163852b30aeb31dd90860871d303a629f1b9b1b75095d45e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page