Skip to main content

A lightweight, chainable query pipeline for Python

Project description

What is lazyq?

lazyq is a lightweight, chainable query pipeline for Python.

Instead of executing operations immediately, lazyq builds up a map of instructions and only runs them when you actually need the results. This makes it memory-efficient and great for working with large datasets!

Usage

Installation

Install latest from the GitHub repository:

$ pip install git+https://github.com/vikasAWA/lazyq.git

or from conda

$ conda install -c vikasAWA lazyq

or from pypi

$ pip install lazyq

Documentation

Documentation can be found hosted on this GitHub repository’s pages. Additionally you can find package manager specific guidelines on conda and pypi respectively.

How to use

Usage

lazyq lets you build queries step by step. Let’s explore with some country data!

Exploring with a list of dicts

Let’s start simple — here’s a list of countries. We’ll use Query.from_iterable() to wrap it.

from lazyq import *

countries = [
    {"name": "India", "continent": "Asia", "population": 1428000000, "gdp": 3750000000000, "area_km2": 3287263},
    {"name": "China", "continent": "Asia", "population": 1425000000, "gdp": 17700000000000, "area_km2": 9596960},
    {"name": "USA", "continent": "North America", "population": 331000000, "gdp": 25460000000000, "area_km2": 9833517},
    {"name": "Brazil", "continent": "South America", "population": 215000000, "gdp": 1920000000000, "area_km2": 8515767},
    {"name": "Nigeria", "continent": "Africa", "population": 218000000, "gdp": 477000000000, "area_km2": 923768},
    {"name": "Germany", "continent": "Europe", "population": 84000000, "gdp": 4070000000000, "area_km2": 357114},
    {"name": "Australia", "continent": "Oceania", "population": 26000000, "gdp": 1693000000000, "area_km2": 7692024},
    {"name": "Egypt", "continent": "Africa", "population": 105000000, "gdp": 387000000000, "area_km2": 1002450},
    {"name": "France", "continent": "Europe", "population": 68000000, "gdp": 2780000000000, "area_km2": 551695},
    {"name": "Canada", "continent": "North America", "population": 38000000, "gdp": 2140000000000, "area_km2": 9984670},
]

Query 1: Let’s just get all country names.

# Build the query - nothing runs yet!
q = Query.from_iterable(countries).select('name')
print(q) # shows the pipeline map
Query(select(name))
# lets run it
q.collect()
[{'name': 'India'},
 {'name': 'China'},
 {'name': 'USA'},
 {'name': 'Brazil'},
 {'name': 'Nigeria'},
 {'name': 'Germany'},
 {'name': 'Australia'},
 {'name': 'Egypt'},
 {'name': 'France'},
 {'name': 'Canada'}]
# or you can use show() also 
q.show() # by default will show 5 values only. You can pass no. as argument
{'name': 'India'}
{'name': 'China'}
{'name': 'USA'}
{'name': 'Brazil'}
{'name': 'Nigeria'}

Query 2: Which continents have more than one country in our list?

Let’s group countries by continent, count how many are in each group, then filter to only show continents with more than one country.

q = Query.from_iterable(countries)
q
Query()
# first we can groupby continents
q.groupby('continent').collect(1) # will show only 1 value
[('Asia',
  [{'name': 'India',
    'continent': 'Asia',
    'population': 1428000000,
    'gdp': 3750000000000,
    'area_km2': 3287263},
   {'name': 'China',
    'continent': 'Asia',
    'population': 1425000000,
    'gdp': 17700000000000,
    'area_km2': 9596960}])]
# we can use count() to count the number of items in a group
q.groupby('continent').count().collect()
[('Asia', 2),
 ('North America', 2),
 ('South America', 1),
 ('Africa', 2),
 ('Europe', 2),
 ('Oceania', 1)]
# so to get the continents having more than 1 country we can use filter
q.groupby('continent').count().filter(lambda x: x[1] > 1).show()
('Asia', 2)
('North America', 2)
('Africa', 2)
('Europe', 2)

Query 3: Top 3 richest countries by GDP

Let’s find the top 3 richest countries. We use .sort() to order by GDP (highest first), then .collect(3) to grab only the top 3 — all in one lazy chain!

q.sort('gdp', reverse=True).collect(3)
[{'name': 'USA',
  'continent': 'North America',
  'population': 331000000,
  'gdp': 25460000000000,
  'area_km2': 9833517},
 {'name': 'China',
  'continent': 'Asia',
  'population': 1425000000,
  'gdp': 17700000000000,
  'area_km2': 9596960},
 {'name': 'Germany',
  'continent': 'Europe',
  'population': 84000000,
  'gdp': 4070000000000,
  'area_km2': 357114}]
# If you want to just select a particular key only. Use select()
q.sort('gdp', reverse=True).select('name').collect(3)
[{'name': 'USA'}, {'name': 'China'}, {'name': 'Germany'}]

Query 4: Countries with population over 200 million

We use F('population') to reference the population field and > 200_000_000 to build a condition. Only countries matching it pass through the filter!

q.filter(F('population') > 200_000_000).collect()
[{'name': 'India',
  'continent': 'Asia',
  'population': 1428000000,
  'gdp': 3750000000000,
  'area_km2': 3287263},
 {'name': 'China',
  'continent': 'Asia',
  'population': 1425000000,
  'gdp': 17700000000000,
  'area_km2': 9596960},
 {'name': 'USA',
  'continent': 'North America',
  'population': 331000000,
  'gdp': 25460000000000,
  'area_km2': 9833517},
 {'name': 'Brazil',
  'continent': 'South America',
  'population': 215000000,
  'gdp': 1920000000000,
  'area_km2': 8515767},
 {'name': 'Nigeria',
  'continent': 'Africa',
  'population': 218000000,
  'gdp': 477000000000,
  'area_km2': 923768}]

Query 5: Total Population per Continent

We use .groupby() to group countries by continent, then .sum('population') to add up the population in each group. Great for aggregating data!

q.groupby('continent').sum('population').collect()
[('Asia', 2853000000),
 ('North America', 369000000),
 ('South America', 215000000),
 ('Africa', 323000000),
 ('Europe', 152000000),
 ('Oceania', 26000000)]

Query 6: Largest Country by Area in Each Continent

We use .groupby() then .max('area_km2') to find the biggest country in each continent.

q.groupby('continent').max('area_km2').collect()
[('Asia',
  {'name': 'China',
   'continent': 'Asia',
   'population': 1425000000,
   'gdp': 17700000000000,
   'area_km2': 9596960}),
 ('North America',
  {'name': 'Canada',
   'continent': 'North America',
   'population': 38000000,
   'gdp': 2140000000000,
   'area_km2': 9984670}),
 ('South America',
  {'name': 'Brazil',
   'continent': 'South America',
   'population': 215000000,
   'gdp': 1920000000000,
   'area_km2': 8515767}),
 ('Africa',
  {'name': 'Egypt',
   'continent': 'Africa',
   'population': 105000000,
   'gdp': 387000000000,
   'area_km2': 1002450}),
 ('Europe',
  {'name': 'France',
   'continent': 'Europe',
   'population': 68000000,
   'gdp': 2780000000000,
   'area_km2': 551695}),
 ('Oceania',
  {'name': 'Australia',
   'continent': 'Oceania',
   'population': 26000000,
   'gdp': 1693000000000,
   'area_km2': 7692024})]

🐍 CS50 Language Popularity Survey CSV

Let’s load a CSV file and count how many respondents use each programming language.

survey = Query.from_csv('../data/favorites.csv')
survey.collect(2)
[{'Timestamp': '10/20/2025 9:45:26',
  'language': 'Python',
  'problem': 'Readability'},
 {'Timestamp': '10/20/2025 10:08:03',
  'language': 'Python',
  'problem': 'Mario'}]

How many people use Python vs other languages?

survey.groupby('language').count().collect()
[('Python', 190), ('Scratch', 24), ('C', 58)]

Most Common Problem

Let’s find the most frequently mentioned problem using groupby, count, and sort!

survey.groupby('problem').count().sort(1, reverse=True).show(1)
('Hello, World', 42)

Filter by Language

Use F('language') == 'C' to filter only C programmers, then select() to pick specific fields.

survey.filter(F('language') == 'C').select(['language', 'problem']).collect(4) # only showing 4
[{'language': 'C', 'problem': 'Cash'},
 {'language': 'C', 'problem': 'Filter'},
 {'language': 'C', 'problem': 'DNA'},
 {'language': 'C', 'problem': 'Speller'}]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lazyq-0.0.3.tar.gz (14.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lazyq-0.0.3-py3-none-any.whl (12.0 kB view details)

Uploaded Python 3

File details

Details for the file lazyq-0.0.3.tar.gz.

File metadata

  • Download URL: lazyq-0.0.3.tar.gz
  • Upload date:
  • Size: 14.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for lazyq-0.0.3.tar.gz
Algorithm Hash digest
SHA256 0b7b0b20e3f24420686bd7dcf25b9b21538954e98abb93617ad4f4abac5232cf
MD5 d256cc17b511bc2396a7283f17581644
BLAKE2b-256 c6a46b52f480130acae91b56c6d8f879559baa77c95963299b8ef7cb89b56e5d

See more details on using hashes here.

File details

Details for the file lazyq-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: lazyq-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 12.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for lazyq-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 dc2bf6af604ebc8ad7aa3be195720da41c724f2306bfb3954afa0ce7f43be6c5
MD5 1efece7e17e72d6511c80e087268cb4f
BLAKE2b-256 863cb7fbe644e9fd3b97b14b1d8cb1c1285e230f06d378c326bc56625c87c28b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page