A Hy library that provides a Lispy functional interface by wrapping Python's popular data libraries, such as Pandas and Matplotlib.
Project description
HyFive
What Is HyFive?
HyFive is a Hy library that provides a Lispy functional interface by wrapping Python's popular data libraries, such as Pandas and Matplotlib.
HyFive vs. Vanilla Pandas
Pandas DataFrame
has its own quirks. It ranges from not having a filter
method to having a different definiton of join
to that of SQL. This is evident from the bewildering comparison between Pandas and SQL. HyFive aims to provide a Lispy interface that is as close as possible to that of Spark's SQL and DataFrame.
From a functional programming perspective, Pandas interfaces are oddly difficult to compose unlike Spark SQL's method-chaining convention. Due to this difficulty, Pandas often perversely incentivises short names for dataframes in favour of creating intermediate variables, which litters the namespace.
HyFive utilises Hy's threading macros to mimic Spark DataFrame's method chaining convention, whilst staying with the familiar Pandas dataframe. Consider the following HyFive snippet:
(setv DATAFRAME
(-> NAME-REGISTRY
(hf.with-column "variant"
(let [mod-res-id (hf.mod "resident_id" 3)]
(hf.cond-col [(hf.eq? mod-res-id 1) (hf.lit "a")]
[(hf.eq? mod-res-id 2) (hf.lit "b")]
[:else (hf.lit "c")])))
(hf.filter (hf.is-in "variant" ["a" "b"]))
(hf.join AGE-REGISTRY :on "resident_id")
(hf.group-by "variant")
(hf.agg {"min_age" (hf.min "age")
"mean_age" (hf.mean "age")
"std_age" (hf.std "age")
"max_age" (hf.max "age")})
(hf.order-by "min_age" :desc True)))
Here, we carry out simple operations of adding a column, filtering rows, joining tables, aggregating groups and sorting rows. Apart from the Lispy cond
, these operations would have a one-to-one translation to Spark dataframe or SQL.
In contrast, one would have to work a bit harder in pure Pandas:
mod_res_id = NAME_REGISTRY.resident_id % 3
variant = np.where(mod_res_id == 1, 'a', np.where(mod_res_id == 2, 'b', 'c'))
select_ix = np.isin(variant, ['a', 'b'])
dataframe = (NAME_REGISTRY
.assign(variant=variant)
[select_ix]
.merge(AGE_REGISTRY, on='resident_id')
.groupby('variant')
.apply(lambda df: pd.Series({
'min_age': df.age.min(),
'mean_age': df.age.mean(),
'std_age': df.age.std(),
'max_age': df.age.max()
}))
.reset_index()
.sort_values(by='min_age', ascending=False)
.reset_index(drop=True))
The Pandas version is less readable and we lose the one-to-one translation to Spark dataframe or SQL.
Trying HyFive
Clone the repository, and on the root directory of the enter the following command on terminal:
./run build-docker
Run the unit tests with the following command:
./run unit-tests .
Invoke the Hy REPL by running:
./run repl
And finally, import HyFive using:
(import [hyfive :as hf])
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dev_aa_test_1-0.1.0.tar.gz
.
File metadata
- Download URL: dev_aa_test_1-0.1.0.tar.gz
- Upload date:
- Size: 3.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b61643d0a3f7e3053a6e2476678cfedf375680cd5df8d5554b094999beb976fd |
|
MD5 | f2100288ab145b62a8d904cbad873911 |
|
BLAKE2b-256 | f395c6bc0e56a4ba7c362a6dd8d3bfc61a8020ef868ef585a2f86da6be58df73 |
File details
Details for the file dev_aa_test_1-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: dev_aa_test_1-0.1.0-py3-none-any.whl
- Upload date:
- Size: 15.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f8392ec91235294e8a2313596272cb933dece755407ec4171b1fc52bb1a7526a |
|
MD5 | 6ac1139eef94e447e9f80cbc7c960311 |
|
BLAKE2b-256 | 9e90485c927696913c0ce887399628f3b2ad4ca96a5f7772b63ab8d4c3df932a |