The toolkit for data science projects with a focus on functional programming
This is a personal library, allowing more functional programming in Python data-science. Mostly, it's focused on writing code like this:
from yo_extensions import * import json (Query .file.text('data.jsonlines') # read file and create a 'stream' of lines .select(json.loads) # parse each line with JSON .where(lambda z: maybe(z,'status')=='OK') # only items with status equals OK, maybe is Elvis operator .select(lambda z: (z['id'],z['message'])) .to_dataframe(columns=['id','message']) # seamless integration with pandas .groupby('message') .size() .feed(plots.series.pie()) # extension method, draws a pie chart with custom settings )
The key principles are:
- Fluent interface
- Type annotations
- Yet another port of
C# LINQto Python. The closest analogue is
asq. The key differences are: type annotation support and different extendability mechanism
- Extension methods for better data-science: plotting, status reporting, algorithms on pandas
- A few useful classes for machine-learning
- Wide test coverage for most of the implemented funcionality
The port of
C# LINQ to Python with type annotations. The usual methods (
where) are implemented as methods of
The extension methods are challenging due to Python restrictions. I couldn't use monkey-patching, because it does not preserve type-annotations, and injected methods are not seen by IDE. Thus, the following mechanism is employed:
- Consider the function
Xis a tuple of additional argument.
- Lets Curry
- To inject
q.feed(h(X)) = h(X)(q) = f(q,X)
This mechanism preserves the type annotation, allows to add any functionality to
Queryable and almost preserves Fluent interface: you need to add
feed instead of just chaining methods.
To avoid coding of both
h function for any functionality, the suggested way of implementation for
h is a class,
X is provided in
__init__, and also
Callable so it can accept
The same mechanism employed for
pd.SeriesGroupBy. For these classes,
feed is monkey-patced and does not preserve the type annotation.
- Several extensions for
fluq: input/output to various file types, partitioning, etc.
- Few extensions for
pandas: adding ordering inside groups, stratifying order for Dataframes, etc
- Plots: several plots I like to use in research, implemented in
yo_extensions/__init__.py provides the demonstration on how better include
fluq with extensions into the side project.
kraken: Executes method with the various arguments (plan) and returns the result as
pd.DataFramefor futher analysis
metrics: computes lots of metrics for predicted/actual values and returns them as
keras: wrapper over
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size & hash SHA256 hash help||File type||Python version||Upload date|
|yo_ds-0.3.0-py3-none-any.whl (57.0 kB) Copy SHA256 hash SHA256||Wheel||py3|
|yo_ds-0.3.0.tar.gz (37.4 kB) Copy SHA256 hash SHA256||Source||None|