Fluent interface for data processing, advanced toolkit for data science
Project description
yo_fluq_ds
This package is an data science-specific update for yo_fluq
that introduces:
- querying and output for
pandas
data structures and files inQueryable
- handy
feed
-based extension methods.
The main reason for separating yo_fluq_ds
from yo_fluq
is that data science functionality requires huge packages like pandas
and matplotlib
, which I didn't want to include in a basic package.
Small useful classes
Obj
is an ordered dict with a member-like access:obj.a=12
works exactly asobj['a']=12
OrderedEnum
isEnum
with ordering, it's useful when using enums inpandas
, because the basic enumeration cannot be used as keys forgroup_by
Pull-queries updates
Combinatorics
Query.combinatorics
has some useful method to create lazy combinatorics enumerations:
cartesian(en1,en2,...)
will create a cartesian product of enumerations inen1
,en2
, etc.grid(field1=en1,field2=en2)
will create an enumeration ofObj
with fieldsfield1
,field2
that runs over cartesian product ofen1
,en2
, etc.triangle
is query-like replacement for loopsi=0..N
,j=0..i
powerset
produces all the subsets of a given set
File system
Adds several aggregators/query sources to work with files.
to_text_file
/Query.file.text
: text file, its lines are interpreted as enumeration's objectsto_zip_file
/Query.file.zipped_text
: zipped text fileto_pickle_file
/Query.file.pickle
: a internal format, lazily writes a sequence of objects in pickle format in one file.to_zip_folder
/Query.to_zipped_folder
: representation forKeyValuePair
: filenames are keys, its concent is values
Adds FileIO
class with one-line instruction to read text, json, pickle, jsonpickle, yaml files.
Adds Query.folder
method to create enumeration of Path
objects from folder
pandas
- Adds
to_series
,to_dataframe
andto_ndarray
aggregators - Adds
Query.series
to convert series inKeyValuePair
enumeration - Adds
Query.df
to convert dataframe inObj
(ordict
) enumeration
Adds feed
method to DataFrame
, Series
, DataFrameGroupBy
and SeriesGroupBy
by monkey-patching. It is now possible to write something like:
(df
.loc[df.status=='shipped']
.feed(lambda z: groupby(z.date.dt.to_period('M')))
.size()
)
When calling lambda inside feed
, z
will be assigned to the dataframe after filtering out.
This technique allows longer fluent instructions for pandas
, which is otherwise impossible due to filtering.
feed
-extension methods.
Some methods from yo_fluq_ds
are not incorporated into Queryables
, because they are used not that often and I want to avoid overloading Queryable
with such methods. So, they are accessible only via feed
method.
All of them are inside fluq
module.
For Queryable
fluq.with_progress_bar
is a Queryable-friendly wrapping overtqdm
. It automatically detects notebooks/console environments. Thetotal
(length of enumerable) in most cases is known fromQueryable.length
field, but sometimes needs to be provided.fluq.with_plots(columns,figsize)
will create plots for each of elements in enumerable and return the enumerable ofItemWithAx
. Very handy to draw several plots at once, e.g. for different columns in dataframefluq.pairwise
converts enumerable to the enumerable of pair of neighbouring elements
For pandas
fluq.fractions
can be used where size is normally used to determine the relative size of the groupsfluq.trimmer
can be used to trim too high/too low values from the series, thus facilitating histograms' creation.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for yo_fluq_ds-1.1.14-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 15157387dbae7c30208f26a7ff7fa9e603a7153c1032a01a8b6cd46a9ba0b340 |
|
MD5 | 99f9ba73561d5526dba2620b264c3e41 |
|
BLAKE2b-256 | da3c79c9ddc7e0dfca497ce91ca3827d8aa52b8b505c7a19a852dd8f0acdb5bf |