Skip to main content

Fluent interface for data processing, advanced toolkit for data science

Project description

yo_fluq_ds

This package is an data science-specific update for yo_fluq that introduces:

  • querying and output for pandas data structures and files in Queryable
  • handy feed-based extension methods.

The main reason for separating yo_fluq_ds from yo_fluq is that data science functionality requires huge packages like pandas and matplotlib, which I didn't want to include in a basic package.

Small useful classes

  • Obj is an ordered dict with a member-like access: obj.a=12 works exactly as obj['a']=12
  • OrderedEnum is Enum with ordering, it's useful when using enums in pandas, because the basic enumeration cannot be used as keys for group_by

Pull-queries updates

Combinatorics

Query.combinatorics has some useful method to create lazy combinatorics enumerations:

  • cartesian(en1,en2,...) will create a cartesian product of enumerations in en1, en2, etc.
  • grid(field1=en1,field2=en2) will create an enumeration of Obj with fields field1, field2 that runs over cartesian product of en1, en2, etc.
  • triangle is query-like replacement for loops i=0..N, j=0..i
  • powerset produces all the subsets of a given set

File system

Adds several aggregators/query sources to work with files.

  • to_text_file/Query.file.text: text file, its lines are interpreted as enumeration's objects
  • to_zip_file/Query.file.zipped_text: zipped text file
  • to_pickle_file/Query.file.pickle: a internal format, lazily writes a sequence of objects in pickle format in one file.
  • to_zip_folder/Query.to_zipped_folder: representation for KeyValuePair: filenames are keys, its concent is values

Adds FileIO class with one-line instruction to read text, json, pickle, jsonpickle, yaml files.

Adds Query.folder method to create enumeration of Path objects from folder

pandas

  • Adds to_series, to_dataframe and to_ndarray aggregators
  • Adds Query.series to convert series in KeyValuePair enumeration
  • Adds Query.df to convert dataframe in Obj (or dict) enumeration

Adds feed method to DataFrame, Series, DataFrameGroupBy and SeriesGroupBy by monkey-patching. It is now possible to write something like:

(df
    .loc[df.status=='shipped']
    .feed(lambda z: groupby(z.date.dt.to_period('M')))
    .size()
)

When calling lambda inside feed, z will be assigned to the dataframe after filtering out.

This technique allows longer fluent instructions for pandas, which is otherwise impossible due to filtering.

feed-extension methods.

Some methods from yo_fluq_ds are not incorporated into Queryables, because they are used not that often and I want to avoid overloading Queryable with such methods. So, they are accessible only via feed method.

All of them are inside fluq module.

For Queryable

  • fluq.with_progress_bar is a Queryable-friendly wrapping over tqdm. It automatically detects notebooks/console environments. The total (length of enumerable) in most cases is known from Queryable.length field, but sometimes needs to be provided.
  • fluq.with_plots(columns,figsize) will create plots for each of elements in enumerable and return the enumerable of ItemWithAx. Very handy to draw several plots at once, e.g. for different columns in dataframe
  • fluq.pairwise converts enumerable to the enumerable of pair of neighbouring elements

For pandas

  • fluq.fractions can be used where size is normally used to determine the relative size of the groups
  • fluq.trimmer can be used to trim too high/too low values from the series, thus facilitating histograms' creation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yo_fluq_ds-1.1.9.tar.gz (20.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

yo_fluq_ds-1.1.9-py3-none-any.whl (37.3 kB view details)

Uploaded Python 3

File details

Details for the file yo_fluq_ds-1.1.9.tar.gz.

File metadata

  • Download URL: yo_fluq_ds-1.1.9.tar.gz
  • Upload date:
  • Size: 20.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2.post20191203 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.6.9

File hashes

Hashes for yo_fluq_ds-1.1.9.tar.gz
Algorithm Hash digest
SHA256 ee9a6951c08006d5d4b60743ecec102555096a7d26638753edcd9ecf00084272
MD5 0423ae8e092fbb74fee2bbc9a1b06a0f
BLAKE2b-256 c7ae34f97aa4c24d49b94f8486645dc8e584bdf3da0b9163c44cb857c8e76ae6

See more details on using hashes here.

File details

Details for the file yo_fluq_ds-1.1.9-py3-none-any.whl.

File metadata

  • Download URL: yo_fluq_ds-1.1.9-py3-none-any.whl
  • Upload date:
  • Size: 37.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2.post20191203 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.6.9

File hashes

Hashes for yo_fluq_ds-1.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 95fab67c9fc3f07d80800ad21a309e74a75d16ee1d2d92fa321ae368246fa865
MD5 bec4b2e769b0cd9e2f8b84d90c679083
BLAKE2b-256 d198df0515c946f4630144874fe1996ed03f86e40695b405cb63517da89f74fd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page