Skip to main content

Fluent interface for data processing, advanced toolkit for data science

Project description

yo_fluq_ds

This package is an data science-specific update for yo_fluq that introduces:

  • querying and output for pandas data structures and files in Queryable
  • handy feed-based extension methods.

The main reason for separating yo_fluq_ds from yo_fluq is that data science functionality requires huge packages like pandas and matplotlib, which I didn't want to include in a basic package.

Small useful classes

  • Obj is an ordered dict with a member-like access: obj.a=12 works exactly as obj['a']=12
  • OrderedEnum is Enum with ordering, it's useful when using enums in pandas, because the basic enumeration cannot be used as keys for group_by

Pull-queries updates

Combinatorics

Query.combinatorics has some useful method to create lazy combinatorics enumerations:

  • cartesian(en1,en2,...) will create a cartesian product of enumerations in en1, en2, etc.
  • grid(field1=en1,field2=en2) will create an enumeration of Obj with fields field1, field2 that runs over cartesian product of en1, en2, etc.
  • triangle is query-like replacement for loops i=0..N, j=0..i
  • powerset produces all the subsets of a given set

File system

Adds several aggregators/query sources to work with files.

  • to_text_file/Query.file.text: text file, its lines are interpreted as enumeration's objects
  • to_zip_file/Query.file.zipped_text: zipped text file
  • to_pickle_file/Query.file.pickle: a internal format, lazily writes a sequence of objects in pickle format in one file.
  • to_zip_folder/Query.to_zipped_folder: representation for KeyValuePair: filenames are keys, its concent is values

Adds FileIO class with one-line instruction to read text, json, pickle, jsonpickle, yaml files.

Adds Query.folder method to create enumeration of Path objects from folder

pandas

  • Adds to_series, to_dataframe and to_ndarray aggregators
  • Adds Query.series to convert series in KeyValuePair enumeration
  • Adds Query.df to convert dataframe in Obj (or dict) enumeration

Adds feed method to DataFrame, Series, DataFrameGroupBy and SeriesGroupBy by monkey-patching. It is now possible to write something like:

(df
    .loc[df.status=='shipped']
    .feed(lambda z: groupby(z.date.dt.to_period('M')))
    .size()
)

When calling lambda inside feed, z will be assigned to the dataframe after filtering out.

This technique allows longer fluent instructions for pandas, which is otherwise impossible due to filtering.

feed-extension methods.

Some methods from yo_fluq_ds are not incorporated into Queryables, because they are used not that often and I want to avoid overloading Queryable with such methods. So, they are accessible only via feed method.

All of them are inside fluq module.

For Queryable

  • fluq.with_progress_bar is a Queryable-friendly wrapping over tqdm. It automatically detects notebooks/console environments. The total (length of enumerable) in most cases is known from Queryable.length field, but sometimes needs to be provided.
  • fluq.with_plots(columns,figsize) will create plots for each of elements in enumerable and return the enumerable of ItemWithAx. Very handy to draw several plots at once, e.g. for different columns in dataframe
  • fluq.pairwise converts enumerable to the enumerable of pair of neighbouring elements

For pandas

  • fluq.fractions can be used where size is normally used to determine the relative size of the groups
  • fluq.trimmer can be used to trim too high/too low values from the series, thus facilitating histograms' creation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yo_fluq_ds-1.1.10.tar.gz (20.2 kB view hashes)

Uploaded Source

Built Distribution

yo_fluq_ds-1.1.10-py3-none-any.whl (37.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page