A lightweight, chainable query pipeline for Python
Project description
What is lazyq?
lazyq is a lightweight, chainable query pipeline for Python.
Instead of executing operations immediately, lazyq builds up a map of instructions and only runs them when you actually need the results. This makes it memory-efficient and great for working with large datasets!
Usage
Installation
Install latest from the GitHub repository:
$ pip install git+https://github.com/vikasAWA/lazyq.git
or from conda
$ conda install -c vikasAWA lazyq
or from pypi
$ pip install lazyq
Documentation
Documentation can be found hosted on this GitHub repository’s pages. Additionally you can find package manager specific guidelines on conda and pypi respectively.
How to use
Usage
lazyq lets you build queries step by step. Let’s explore with some country data!
Exploring with a list of dicts
Let’s start simple — here’s a list of countries. We’ll use
Query.from_iterable()
to wrap it.
from lazyq import *
countries = [
{"name": "India", "continent": "Asia", "population": 1428000000, "gdp": 3750000000000, "area_km2": 3287263},
{"name": "China", "continent": "Asia", "population": 1425000000, "gdp": 17700000000000, "area_km2": 9596960},
{"name": "USA", "continent": "North America", "population": 331000000, "gdp": 25460000000000, "area_km2": 9833517},
{"name": "Brazil", "continent": "South America", "population": 215000000, "gdp": 1920000000000, "area_km2": 8515767},
{"name": "Nigeria", "continent": "Africa", "population": 218000000, "gdp": 477000000000, "area_km2": 923768},
{"name": "Germany", "continent": "Europe", "population": 84000000, "gdp": 4070000000000, "area_km2": 357114},
{"name": "Australia", "continent": "Oceania", "population": 26000000, "gdp": 1693000000000, "area_km2": 7692024},
{"name": "Egypt", "continent": "Africa", "population": 105000000, "gdp": 387000000000, "area_km2": 1002450},
{"name": "France", "continent": "Europe", "population": 68000000, "gdp": 2780000000000, "area_km2": 551695},
{"name": "Canada", "continent": "North America", "population": 38000000, "gdp": 2140000000000, "area_km2": 9984670},
]
Query 1: Let’s just get all country names.
# Build the query - nothing runs yet!
q = Query.from_iterable(countries).select('name')
print(q) # shows the pipeline map
Query(select(name))
# lets run it
q.collect()
[{'name': 'India'},
{'name': 'China'},
{'name': 'USA'},
{'name': 'Brazil'},
{'name': 'Nigeria'},
{'name': 'Germany'},
{'name': 'Australia'},
{'name': 'Egypt'},
{'name': 'France'},
{'name': 'Canada'}]
# or you can use show() also
q.show() # by default will show 5 values only. You can pass no. as argument
{'name': 'India'}
{'name': 'China'}
{'name': 'USA'}
{'name': 'Brazil'}
{'name': 'Nigeria'}
Query 2: Which continents have more than one country in our list?
Let’s group countries by continent, count how many are in each group, then filter to only show continents with more than one country.
q = Query.from_iterable(countries)
q
Query()
# first we can groupby continents
q.groupby('continent').collect(1) # will show only 1 value
[('Asia',
[{'name': 'India',
'continent': 'Asia',
'population': 1428000000,
'gdp': 3750000000000,
'area_km2': 3287263},
{'name': 'China',
'continent': 'Asia',
'population': 1425000000,
'gdp': 17700000000000,
'area_km2': 9596960}])]
# we can use count() to count the number of items in a group
q.groupby('continent').count().collect()
[('Asia', 2),
('North America', 2),
('South America', 1),
('Africa', 2),
('Europe', 2),
('Oceania', 1)]
# so to get the continents having more than 1 country we can use filter
q.groupby('continent').count().filter(lambda x: x[1] > 1).show()
('Asia', 2)
('North America', 2)
('Africa', 2)
('Europe', 2)
Query 3: Top 3 richest countries by GDP
Let’s find the top 3 richest countries. We use .sort() to order by GDP
(highest first), then .collect(3) to grab only the top 3 — all in one
lazy chain!
q.sort('gdp', reverse=True).collect(3)
[{'name': 'USA',
'continent': 'North America',
'population': 331000000,
'gdp': 25460000000000,
'area_km2': 9833517},
{'name': 'China',
'continent': 'Asia',
'population': 1425000000,
'gdp': 17700000000000,
'area_km2': 9596960},
{'name': 'Germany',
'continent': 'Europe',
'population': 84000000,
'gdp': 4070000000000,
'area_km2': 357114}]
# If you want to just select a particular key only. Use select()
q.sort('gdp', reverse=True).select('name').collect(3)
[{'name': 'USA'}, {'name': 'China'}, {'name': 'Germany'}]
Query 4: Countries with population over 200 million
We use F('population') to reference the population field and
> 200_000_000 to build a condition. Only countries matching it pass
through the filter!
q.filter(F('population') > 200_000_000).collect()
[{'name': 'India',
'continent': 'Asia',
'population': 1428000000,
'gdp': 3750000000000,
'area_km2': 3287263},
{'name': 'China',
'continent': 'Asia',
'population': 1425000000,
'gdp': 17700000000000,
'area_km2': 9596960},
{'name': 'USA',
'continent': 'North America',
'population': 331000000,
'gdp': 25460000000000,
'area_km2': 9833517},
{'name': 'Brazil',
'continent': 'South America',
'population': 215000000,
'gdp': 1920000000000,
'area_km2': 8515767},
{'name': 'Nigeria',
'continent': 'Africa',
'population': 218000000,
'gdp': 477000000000,
'area_km2': 923768}]
Query 5: Total Population per Continent
We use .groupby() to group countries by continent, then
.sum('population') to add up the population in each group. Great for
aggregating data!
q.groupby('continent').sum('population').collect()
[('Asia', 2853000000),
('North America', 369000000),
('South America', 215000000),
('Africa', 323000000),
('Europe', 152000000),
('Oceania', 26000000)]
Query 6: Largest Country by Area in Each Continent
We use .groupby() then .max('area_km2') to find the biggest country
in each continent.
q.groupby('continent').max('area_km2').collect()
[('Asia',
{'name': 'China',
'continent': 'Asia',
'population': 1425000000,
'gdp': 17700000000000,
'area_km2': 9596960}),
('North America',
{'name': 'Canada',
'continent': 'North America',
'population': 38000000,
'gdp': 2140000000000,
'area_km2': 9984670}),
('South America',
{'name': 'Brazil',
'continent': 'South America',
'population': 215000000,
'gdp': 1920000000000,
'area_km2': 8515767}),
('Africa',
{'name': 'Egypt',
'continent': 'Africa',
'population': 105000000,
'gdp': 387000000000,
'area_km2': 1002450}),
('Europe',
{'name': 'France',
'continent': 'Europe',
'population': 68000000,
'gdp': 2780000000000,
'area_km2': 551695}),
('Oceania',
{'name': 'Australia',
'continent': 'Oceania',
'population': 26000000,
'gdp': 1693000000000,
'area_km2': 7692024})]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lazyq-0.0.1.tar.gz.
File metadata
- Download URL: lazyq-0.0.1.tar.gz
- Upload date:
- Size: 13.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
71e8a440fa7a9dd8318481600707df9233253d442806a261154538dc48d87cad
|
|
| MD5 |
159d3d49fe0ee0da4c9bcec09461e942
|
|
| BLAKE2b-256 |
daa3c31a05477e9e3aa2eed7fdf91d0ddbe75a89ab3ea116b8414c7b50da04d7
|
File details
Details for the file lazyq-0.0.1-py3-none-any.whl.
File metadata
- Download URL: lazyq-0.0.1-py3-none-any.whl
- Upload date:
- Size: 11.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6f4b3be82d6d7e57cc6488a0c89b160964b9177e33d6663fe80a4106c9551f2b
|
|
| MD5 |
67424af9107f76ecd7912c4bdd3161fe
|
|
| BLAKE2b-256 |
7b2dc6d792a0f4224724cafb7bba453f4a571ec5d82c93b2be05866e0b2036b2
|