High Level Expressions for Dask
Project description
Dask Expressions
Dask DataFrames with query optimization.
This is a proof-of-concept rewrite of Dask DataFrame that includes query optimization and generally improved organization.
More in our blog posts:
Example
import dask_expr as dx
df = dx.datasets.timeseries()
df.head()
df.groupby("name").x.mean().compute()
Query Representation
Dask-expr encodes user code in an expression tree:
>>> df.x.mean().pprint()
Mean:
Projection: columns='x'
Timeseries: seed=1896674884
This expression tree will be optimized and modified before execution:
>>> df.x.mean().optimize().pprint()
Div:
Sum:
Fused(375f9):
| Projection: columns='x'
| Timeseries: dtypes={'x': <class 'float'>} seed=1896674884
Count:
Fused(375f9):
| Projection: columns='x'
| Timeseries: dtypes={'x': <class 'float'>} seed=1896674884
Stability
This project is a work in progress and will be changed without notice or deprecation warning. Please provide feedback, but it's best to avoid use in production settings.
API Coverage
dask_expr.DataFrame
absaddadd_prefixadd_sufixalignallanyapplyassignastypebfillclipcombine_firstcopycountdaskdivdividedropdrop_duplicatesdropnadtypesevalexplodeffillfillnafloordivgroupbyheadidxmaxidxminìlocindexisinisnajoinmapmap_overlapmap_partitionsmaskmaxmeanmemory_usagememory_usage_per_partitionmergeminminmodmodemulnlargestnsmallestnunique_approxpartitionspivot_tablepowprodqueryraddrdivrenamerename_axisrepartitionreplacereset_indexrfloordivrmodrmulroundrpowrsubrtruedivsampleselect_dtypesset_indexshiftshufflesort_valuesstdsubsumtailto_parquetto_timestamptruedivvarvisualizewhere
dask_expr.Series
absaddalignallanyapplyastypebetweenbfillclipcombine_firstcopycountdaskdivdividedrop_duplicatesdropnadtypeexplodeffillfillnafloordivgroupbyheadidxmaxidxminindexisinisnamapmap_partitionsmaskmaxmeanmemory_usagememory_usage_per_partitionminminmodmodemulnlargestnsmallestnunique_approxpartitionspowprodraddrdivrenamerename_axisrepartitionreplacereset_indexrfloordivrmodrmulroundrpowrsubrtruedivshiftshufflestdsubsumtailto_frameto_timestamptruedivuniquevalue_countsvarvisualizewhere
dask_expr.Index
absalignallanyapplyastypeclipcombine_firstcopycountdaskdtypefillnagroupbyheadidxmaxidxminindexisinisnamap_partitionsmaxmemory_usageminminmodenunique_approxpartitionsprodrenamerename_axisrepartitionreplacereset_indexroundshufflestdsumtailto_frameto_timestampvarvisualize
dask_expr._groupby.GroupBy
aggaggregateapply- `bfill
countffillfirstlastmaxmeanmedianminprodshiftsizestdsumtransformvalue_countsvar
Support for SeriesGroupBy and DataFrameGroupBy.
dask_expr._resample.Resampler
aggcountfirstlastmaxmeanmedianminnuniqueohlcprodquantilesemsizestdsumvar
dask_expr._rolling.Rolling
aggapplycountmaxmeanmedianminquantilestdsumvarskewkurt
Binary operators (DataFrame, Series, and Index):
__add____radd____sub____rsub____mul____pow____rmul____truediv____rtruediv____lt____rlt____gt____rgt____le____rle____ge____rge____eq____ne____and____rand____or____ror____xor____rxor__
Unary operators (DataFrame, Series, and Index):
__invert____neg____pos__
Accessors:
CategoricalAccessorDatetimeAccessorStringAccessor
Function
concatfrom_pandasmergepivot_tableread_csvread_parquetrepartitionto_datetimeto_numericto_timedeltato_parquet
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dask-expr-0.2.9.tar.gz.
File metadata
- Download URL: dask-expr-0.2.9.tar.gz
- Upload date:
- Size: 113.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8805477e0640501243cc567ef2c31fd8542290f502f7ecbe4bf9869eaeb058b1
|
|
| MD5 |
bf79fdbaed9ee002783e7e4bf92019f8
|
|
| BLAKE2b-256 |
b30d903e15ca34999294f725b6b5eda11fc2dbcca02721bc17752ecab7c367f0
|
File details
Details for the file dask_expr-0.2.9-py3-none-any.whl.
File metadata
- Download URL: dask_expr-0.2.9-py3-none-any.whl
- Upload date:
- Size: 104.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6757657b7ec47ea7e244c5c40a9795401c531b0edf62839dc5cbd17797b43a76
|
|
| MD5 |
fd110b14d066528344d2d1a5a3ff7a5f
|
|
| BLAKE2b-256 |
cc7016efde793e66b629480c9fcc8c5e1fd8294a7c911fe5524493387b31039e
|