Skip to main content

Simplified work with partitions based on Polars library

Project description

polars_partitions

PyPI - Version

Python

pip install polars_partitions

Description

This library is not a replacement for Polars. The main goal is to improve the work (write/read/filter) with partitions by creating a Table Of Contents file (hereinafter referred to as "TOC").

Write Partition

polars_parquet.wr_partition(
          df: DataFrame,
          columns: array | string,
          output_path: str
)

Parameters

df
          Polars DataFrame
columns
          Array of columns on which to create partitions
output_path
          Path to save to

Example 🤔🤔🤔
import polars_partitions as pp
from datetime import date
import polars as pl

# Create a test dataset
df = pl.DataFrame({'col1':[date(2024,1,1),date(2024,1,1),date(2024,1,2),date(2024,1,2),date(2024,1,2),date(2024,1,3),date(2024,1,3),date(2024,1,3)],
              'col2':['A2','A2','A2','A2','A2','A2','B2','B2','B2','B2'],
              'col3':[1,2,3,4,5,6,7,8]
              })

path = './your_path_where_your_partitions'

# Which columns are partitioned by
columns = ['col1', 'col2'] 

ep = pp.EasyPartition(path)

# Write the partitions
ep.write_data(df, columns)

# Output: 
# ./your_path_where_your_partitions/toc.parquet - done! 

Write TOC

polars_parquet.wr_toc()

polars_parquet.wr_toc(
          df: DataFrame on which the partitions are based,
          columns: array | string,
          output_path: str
)

Parameters

df
          Dictionary, where the key is the column and the array is the values
columns
          Array of columns to create partitions for
output_path
          Path to save to

Reading TOC

polars_parquet.rd_toc()

polars_parquet.rd_toc(
          output_path: DataFrame,
          filters: dict = None,
          btwn: str = None
)

Parameters

output_path
          Path where to save.
filters
          Dictionary, where the key is the column and the array is the values
btwn
          Works in conjunction with filters. It takes as input the column name on which to apply the between filter. It takes the first two values from the filters(array).

Example 🤔🤔🤔
ep.get_toc()

# Output: 
shape: (4, 2)
┌────────────┬──────┐
 col1        col2 
 ---         ---  
 date        str  
╞════════════╪══════╡
 2024-01-02  A2   
 2024-01-02  B2   
 2024-01-01  A2   
 2024-01-03  B2   
└────────────┴──────┘

Read Partition

polars_parquet.rd_partition()

polars_parquet.rd_partition(
          output_path: str,
          columns: array | string = "*",
          filters: dict = None,
          btwn: str = None
) → LazyFrame

Parameters

output_path
          Path to the parquet file or to the partitions folder
columns
          Array of columns to return
filters
          Dictionary where the key is the column and the array is the values
btwn
          Works in conjunction with filters. It takes as input the column name on which to apply the between filter. It takes the first two values from the filters(array).

Example 🤔🤔🤔
filters = {'col1':[date(2024,1,1),date(2024,1,3)]}

with pl.StringCache():
    df = ep.get_data(filters=filters, between='col1', columns=['col1', 'col3']).collect()

df

# Output: 
shape: (8, 2)
┌────────────┬──────┐
 col1        col3 
 ---         ---  
 str         i64  
╞════════════╪══════╡
 2024-01-02  3    
 2024-01-02  4    
 2024-01-02  5    
 2024-01-01  1    
 2024-01-01  2    
 2024-01-03  6    
 2024-01-03  7    
 2024-01-03  8    
└────────────┴──────┘

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_partitions-0.1.2.tar.gz (4.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

polars_partitions-0.1.2-py3-none-any.whl (4.9 kB view details)

Uploaded Python 3

File details

Details for the file polars_partitions-0.1.2.tar.gz.

File metadata

  • Download URL: polars_partitions-0.1.2.tar.gz
  • Upload date:
  • Size: 4.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.2

File hashes

Hashes for polars_partitions-0.1.2.tar.gz
Algorithm Hash digest
SHA256 aef67e66e09df6ed19bf4557282300f038c9d58040a81eed8f2bfd127ae70c55
MD5 957bf7c000d859c10dcc4eee9a5b19fe
BLAKE2b-256 a732ad95905d14f432f449ab8fb655fc8650c33bfcd5901e7eee03f0e584d352

See more details on using hashes here.

File details

Details for the file polars_partitions-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for polars_partitions-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 29e21867de694bbe1596c5c9d77e3c9b34f04b8a0548fa57d0931931ac930eee
MD5 aea815ce05f04bdbedb97fcc10f4b211
BLAKE2b-256 d1ab9b37d17a138e2eb8600a14caeff32a843cf218624e7fe36e7148ad689f2b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page