A lightweight, chainable query pipeline for Python
Project description
What is lazyq?
lazyq is a lightweight, chainable query pipeline for Python.
Instead of executing operations immediately, lazyq builds up a map of instructions and only runs them when you actually need the results. This makes it memory-efficient and great for working with large datasets!
Usage
Installation
Install latest from the GitHub repository:
$ pip install git+https://github.com/vikasAWA/lazyq.git
or from pypi
$ pip install lazyq
Documentation
Documentation can be found hosted on this GitHub repository’s pages. Additionally you can find package manager specific guidelines on conda and pypi respectively.
How to use
Usage
lazyq lets you build queries step by step. Let’s explore with some country data!
Exploring with a list of dicts
Let’s start simple — here’s a list of countries. We’ll use
Query.from_iterable()
to wrap it.
from lazyq import *
countries = [
{"name": "India", "continent": "Asia", "population": 1428000000, "gdp": 3750000000000, "area_km2": 3287263},
{"name": "China", "continent": "Asia", "population": 1425000000, "gdp": 17700000000000, "area_km2": 9596960},
{"name": "USA", "continent": "North America", "population": 331000000, "gdp": 25460000000000, "area_km2": 9833517},
{"name": "Brazil", "continent": "South America", "population": 215000000, "gdp": 1920000000000, "area_km2": 8515767},
{"name": "Nigeria", "continent": "Africa", "population": 218000000, "gdp": 477000000000, "area_km2": 923768},
{"name": "Germany", "continent": "Europe", "population": 84000000, "gdp": 4070000000000, "area_km2": 357114},
{"name": "Australia", "continent": "Oceania", "population": 26000000, "gdp": 1693000000000, "area_km2": 7692024},
{"name": "Egypt", "continent": "Africa", "population": 105000000, "gdp": 387000000000, "area_km2": 1002450},
{"name": "France", "continent": "Europe", "population": 68000000, "gdp": 2780000000000, "area_km2": 551695},
{"name": "Canada", "continent": "North America", "population": 38000000, "gdp": 2140000000000, "area_km2": 9984670},
]
Query 1: Let’s just get all country names.
# Build the query - nothing runs yet!
q = Query.from_iterable(countries).select('name')
print(q) # shows the pipeline map
Query(select(name))
# lets run it
q.collect()
[{'name': 'India'},
{'name': 'China'},
{'name': 'USA'},
{'name': 'Brazil'},
{'name': 'Nigeria'},
{'name': 'Germany'},
{'name': 'Australia'},
{'name': 'Egypt'},
{'name': 'France'},
{'name': 'Canada'}]
# or you can use show() also
q.show() # by default will show 5 values only. You can pass no. as argument
{'name': 'India'}
{'name': 'China'}
{'name': 'USA'}
{'name': 'Brazil'}
{'name': 'Nigeria'}
Query 2: Which continents have more than one country in our list?
Let’s group countries by continent, count how many are in each group, then filter to only show continents with more than one country.
q = Query.from_iterable(countries)
q
Query()
# first we can groupby continents
q.groupby('continent').collect(1) # will show only 1 value
[('Asia',
[{'name': 'India',
'continent': 'Asia',
'population': 1428000000,
'gdp': 3750000000000,
'area_km2': 3287263},
{'name': 'China',
'continent': 'Asia',
'population': 1425000000,
'gdp': 17700000000000,
'area_km2': 9596960}])]
# we can use count() to count the number of items in a group
q.groupby('continent').count().collect()
[('Asia', 2),
('North America', 2),
('South America', 1),
('Africa', 2),
('Europe', 2),
('Oceania', 1)]
# so to get the continents having more than 1 country we can use filter
q.groupby('continent').count().filter(lambda x: x[1] > 1).show()
('Asia', 2)
('North America', 2)
('Africa', 2)
('Europe', 2)
Query 3: Top 3 richest countries by GDP
Let’s find the top 3 richest countries. We use .sort() to order by GDP
(highest first), then .collect(3) to grab only the top 3 — all in one
lazy chain!
q.sort('gdp', reverse=True).collect(3)
[{'name': 'USA',
'continent': 'North America',
'population': 331000000,
'gdp': 25460000000000,
'area_km2': 9833517},
{'name': 'China',
'continent': 'Asia',
'population': 1425000000,
'gdp': 17700000000000,
'area_km2': 9596960},
{'name': 'Germany',
'continent': 'Europe',
'population': 84000000,
'gdp': 4070000000000,
'area_km2': 357114}]
# If you want to just select a particular key only. Use select()
q.sort('gdp', reverse=True).select('name').collect(3)
[{'name': 'USA'}, {'name': 'China'}, {'name': 'Germany'}]
Query 4: Countries with population over 200 million
We use F('population') to reference the population field and
> 200_000_000 to build a condition. Only countries matching it pass
through the filter!
q.filter(F('population') > 200_000_000).collect()
[{'name': 'India',
'continent': 'Asia',
'population': 1428000000,
'gdp': 3750000000000,
'area_km2': 3287263},
{'name': 'China',
'continent': 'Asia',
'population': 1425000000,
'gdp': 17700000000000,
'area_km2': 9596960},
{'name': 'USA',
'continent': 'North America',
'population': 331000000,
'gdp': 25460000000000,
'area_km2': 9833517},
{'name': 'Brazil',
'continent': 'South America',
'population': 215000000,
'gdp': 1920000000000,
'area_km2': 8515767},
{'name': 'Nigeria',
'continent': 'Africa',
'population': 218000000,
'gdp': 477000000000,
'area_km2': 923768}]
Query 5: Total Population per Continent
We use .groupby() to group countries by continent, then
.sum('population') to add up the population in each group. Great for
aggregating data!
q.groupby('continent').sum('population').collect()
[('Asia', 2853000000.0),
('North America', 369000000.0),
('South America', 215000000.0),
('Africa', 323000000.0),
('Europe', 152000000.0),
('Oceania', 26000000.0)]
Query 6: Largest Country by Area in Each Continent
We use .groupby() then .max('area_km2') to find the biggest country
in each continent.
q.groupby('continent').max('area_km2').collect()
[{'name': 'China',
'continent': 'Asia',
'population': 1425000000,
'gdp': 17700000000000,
'area_km2': 9596960},
{'name': 'Canada',
'continent': 'North America',
'population': 38000000,
'gdp': 2140000000000,
'area_km2': 9984670},
{'name': 'Brazil',
'continent': 'South America',
'population': 215000000,
'gdp': 1920000000000,
'area_km2': 8515767},
{'name': 'Egypt',
'continent': 'Africa',
'population': 105000000,
'gdp': 387000000000,
'area_km2': 1002450},
{'name': 'France',
'continent': 'Europe',
'population': 68000000,
'gdp': 2780000000000,
'area_km2': 551695},
{'name': 'Australia',
'continent': 'Oceania',
'population': 26000000,
'gdp': 1693000000000,
'area_km2': 7692024}]
🐍 CS50 Language Popularity Survey CSV
Let’s load a CSV file and count how many respondents use each programming language.
survey = Query.from_csv('../data/favorites.csv')
survey.collect(2)
[{'Timestamp': '10/20/2025 9:45:26',
'language': 'Python',
'problem': 'Readability'},
{'Timestamp': '10/20/2025 10:08:03',
'language': 'Python',
'problem': 'Mario'}]
How many people use Python vs other languages?
survey.groupby('language').count().collect()
[('Python', 190), ('Scratch', 24), ('C', 58)]
Most Common Problem
Let’s find the most frequently mentioned problem using groupby,
count, and sort!
survey.groupby('problem').count().sort(1, reverse=True).show(1)
('Hello, World', 42)
Filter by Language
Use F('language') == 'C' to filter only C programmers, then select()
to pick specific fields.
survey.filter(F('language') == 'C').select(['language', 'problem']).collect(4) # only showing 4
[{'language': 'C', 'problem': 'Cash'},
{'language': 'C', 'problem': 'Filter'},
{'language': 'C', 'problem': 'DNA'},
{'language': 'C', 'problem': 'Speller'}]
Working with SQLite
lazyq can connect directly to a SQLite database. Use
Query.from_sqlite()
to load a table, or
Query.from_sqlite_query()
to run a custom SQL query and then chain lazyq operations on the result.
IMDB shows
shows_db = Query.from_sqlite('../data/shows.db', table='shows')
shows_db.collect(3)
[{'id': 62614, 'title': "Zeg 'ns Aaa", 'year': 1981, 'episodes': 227},
{'id': 63881, 'title': 'Catweazle', 'year': 1970, 'episodes': 26},
{'id': 63962, 'title': 'UFO', 'year': 1970, 'episodes': 26}]
Explore Available Tables
Use from_sqlite_query() with a metadata query to list all tables in
your SQLite database!
tables = Query.from_sqlite_query('../data/shows.db', "SELECT name FROM sqlite_master WHERE type='table'").collect()
tables
[{'name': 'genres'},
{'name': 'people'},
{'name': 'ratings'},
{'name': 'shows'},
{'name': 'stars'},
{'name': 'writers'}]
Top 5 Shows with Most Episodes
Load from SQLite, drop null episodes, sort and select only the fields we need.
shows_db.dropna('episodes').sort('episodes', reverse=True).select(['title', 'episodes']).collect(5)
[{'title': 'NRK Nyheter', 'episodes': 18593},
{'title': 'The Young and the Restless', 'episodes': 13297},
{'title': 'See the World by Train', 'episodes': 10674},
{'title': 'WREG News 3 at 10PM', 'episodes': 10663},
{'title': 'Barátok közt', 'episodes': 10456}]
Top 5 Highest Rated Shows
Use from_sqlite_query() to JOIN tables in SQL, then chain lazyq
operations on the result!
Query.from_sqlite_query('../data/shows.db', """
SELECT shows.title, ratings.rating
FROM shows JOIN ratings ON shows.id = ratings.show_id
""").sort('rating', reverse=True).limit(5).collect()
[{'title': 'Peanut Headz: Black History Toonz', 'rating': 10.0},
{'title': 'Quase Anjos: Edição Especial', 'rating': 10.0},
{'title': 'Living with Siblings', 'rating': 10.0},
{'title': 'DIY Science Time', 'rating': 10.0},
{'title': 'The Show with the Campervan', 'rating': 9.9}]
Filter Shows by Year
Use F('year') == 1970 to filter shows from a specific year directly
from SQLite!
shows_db.filter(F('year') == 1970).select(['title', 'year']).collect(5)
[{'title': 'Catweazle', 'year': 1970},
{'title': 'UFO', 'year': 1970},
{'title': 'Ace of Wands', 'year': 1970},
{'title': 'The Adventures of Don Quick', 'year': 1970},
{'title': 'Albert and Victoria', 'year': 1970}]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lazyq-0.0.5.tar.gz.
File metadata
- Download URL: lazyq-0.0.5.tar.gz
- Upload date:
- Size: 15.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
735e6b095df0d423e88d466624d3556882d3f90850bf21d924ae5bc057bef5ce
|
|
| MD5 |
843cdcc4f8923fe7c24eeeb13174bc0d
|
|
| BLAKE2b-256 |
47f5294ea1f39004f1e27166ac1dd0c7ee74a3da151e9b312014e3a4a3991afa
|
File details
Details for the file lazyq-0.0.5-py3-none-any.whl.
File metadata
- Download URL: lazyq-0.0.5-py3-none-any.whl
- Upload date:
- Size: 12.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7158587e439284a668ce07d03b599abcea33d9b07405caf47ba963b65c5edf61
|
|
| MD5 |
a3c6ef45e2e7aa6692e77d39a8393c68
|
|
| BLAKE2b-256 |
1b2e6a400eb10214414dfab330a3993885f9325ca742f629bd7fc2179aad583a
|