Skip to main content

Fluent interface for data processing, advanced toolkit for data science

Project description

yo_fluq_ds

This package is an data science-specific update for yo_fluq that introduces:

  • querying and output for pandas data structures and files in Queryable
  • handy feed-based extension methods.

The main reason for separating yo_fluq_ds from yo_fluq is that data science functionality requires huge packages like pandas and matplotlib, which I didn't want to include in a basic package.

Small useful classes

  • Obj is an ordered dict with a member-like access: obj.a=12 works exactly as obj['a']=12
  • OrderedEnum is Enum with ordering, it's useful when using enums in pandas, because the basic enumeration cannot be used as keys for group_by

Pull-queries updates

Combinatorics

Query.combinatorics has some useful method to create lazy combinatorics enumerations:

  • cartesian(en1,en2,...) will create a cartesian product of enumerations in en1, en2, etc.
  • grid(field1=en1,field2=en2) will create an enumeration of Obj with fields field1, field2 that runs over cartesian product of en1, en2, etc.
  • triangle is query-like replacement for loops i=0..N, j=0..i
  • powerset produces all the subsets of a given set

File system

Adds several aggregators/query sources to work with files.

  • to_text_file/Query.file.text: text file, its lines are interpreted as enumeration's objects
  • to_zip_file/Query.file.zipped_text: zipped text file
  • to_pickle_file/Query.file.pickle: a internal format, lazily writes a sequence of objects in pickle format in one file.
  • to_zip_folder/Query.to_zipped_folder: representation for KeyValuePair: filenames are keys, its concent is values

Adds FileIO class with one-line instruction to read text, json, pickle, jsonpickle, yaml files.

Adds Query.folder method to create enumeration of Path objects from folder

pandas

  • Adds to_series, to_dataframe and to_ndarray aggregators
  • Adds Query.series to convert series in KeyValuePair enumeration
  • Adds Query.df to convert dataframe in Obj (or dict) enumeration

Adds feed method to DataFrame, Series, DataFrameGroupBy and SeriesGroupBy by monkey-patching. It is now possible to write something like:

(df
    .loc[df.status=='shipped']
    .feed(lambda z: groupby(z.date.dt.to_period('M')))
    .size()
)

When calling lambda inside feed, z will be assigned to the dataframe after filtering out.

This technique allows longer fluent instructions for pandas, which is otherwise impossible due to filtering.

feed-extension methods.

Some methods from yo_fluq_ds are not incorporated into Queryables, because they are used not that often and I want to avoid overloading Queryable with such methods. So, they are accessible only via feed method.

All of them are inside fluq module.

For Queryable

  • fluq.with_progress_bar is a Queryable-friendly wrapping over tqdm. It automatically detects notebooks/console environments. The total (length of enumerable) in most cases is known from Queryable.length field, but sometimes needs to be provided.
  • fluq.with_plots(columns,figsize) will create plots for each of elements in enumerable and return the enumerable of ItemWithAx. Very handy to draw several plots at once, e.g. for different columns in dataframe
  • fluq.pairwise converts enumerable to the enumerable of pair of neighbouring elements

For pandas

  • fluq.fractions can be used where size is normally used to determine the relative size of the groups
  • fluq.trimmer can be used to trim too high/too low values from the series, thus facilitating histograms' creation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yo_fluq_ds-1.0.2rc1.tar.gz (19.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

yo_fluq_ds-1.0.2rc1-py3-none-any.whl (37.1 kB view details)

Uploaded Python 3

File details

Details for the file yo_fluq_ds-1.0.2rc1.tar.gz.

File metadata

  • Download URL: yo_fluq_ds-1.0.2rc1.tar.gz
  • Upload date:
  • Size: 19.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.6.9

File hashes

Hashes for yo_fluq_ds-1.0.2rc1.tar.gz
Algorithm Hash digest
SHA256 d45943a4753b20874b3a932177089cda2555f3bcee641bbc48e787e1feb57cb7
MD5 d8b402855b59f39afee30ffb04e7b45d
BLAKE2b-256 9ece1ab77937afc0555bbd8f3251b56bbf0962655309cb24713cca12972b988b

See more details on using hashes here.

File details

Details for the file yo_fluq_ds-1.0.2rc1-py3-none-any.whl.

File metadata

  • Download URL: yo_fluq_ds-1.0.2rc1-py3-none-any.whl
  • Upload date:
  • Size: 37.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.6.9

File hashes

Hashes for yo_fluq_ds-1.0.2rc1-py3-none-any.whl
Algorithm Hash digest
SHA256 bff48d64c312730fdec845527a5e1ce14bb67e89751090cdedcc9bb1df8eaa79
MD5 0f4c851038fd920873d75c8de31ffc1e
BLAKE2b-256 23d98c849d1783d6908ac2fdeb3a43fa7b40d180eb957040cfa2cb5e1e273d09

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page