Skip to main content

A Hy library that provides a Lispy functional interface by wrapping Python's popular data libraries, such as Pandas and Matplotlib.

Project description

HyFive

What Is HyFive?

HyFive is a Hy library that provides a Lispy functional interface by wrapping Python's popular data libraries, such as Pandas and Matplotlib.

HyFive vs. Vanilla Pandas

Pandas DataFrame has its own quirks. It ranges from not having a filter method to having a different definiton of join to that of SQL. This is evident from the bewildering comparison between Pandas and SQL. HyFive aims to provide a Lispy interface that is as close as possible to that of Spark's SQL and DataFrame.

From a functional programming perspective, Pandas interfaces are oddly difficult to compose unlike Spark SQL's method-chaining convention. Due to this difficulty, Pandas often perversely incentivises short names for dataframes in favour of creating intermediate variables, which litters the namespace.

HyFive utilises Hy's threading macros to mimic Spark DataFrame's method chaining convention, whilst staying with the familiar Pandas dataframe. Consider the following HyFive snippet:

(setv DATAFRAME
  (-> NAME-REGISTRY
      (hf.with-column "variant"
        (let [mod-res-id (hf.mod "resident_id" 3)]
          (hf.cond-col [(hf.eq? mod-res-id 1) (hf.lit "a")]
                       [(hf.eq? mod-res-id 2) (hf.lit "b")]
                       [:else                 (hf.lit "c")])))
      (hf.filter (hf.is-in "variant" ["a" "b"]))
      (hf.join AGE-REGISTRY :on "resident_id")
      (hf.group-by "variant")
      (hf.agg {"min_age"  (hf.min "age")
               "mean_age" (hf.mean "age")
               "std_age"  (hf.std "age")
               "max_age"  (hf.max "age")})
      (hf.order-by "min_age" :desc True)))

Here, we carry out simple operations of adding a column, filtering rows, joining tables, aggregating groups and sorting rows. Apart from the Lispy cond, these operations would have a one-to-one translation to Spark dataframe or SQL.

In contrast, one would have to work a bit harder in pure Pandas:

mod_res_id = NAME_REGISTRY.resident_id % 3
variant = np.where(mod_res_id == 1, 'a', np.where(mod_res_id == 2, 'b', 'c'))
select_ix = np.isin(variant, ['a', 'b'])
dataframe = (NAME_REGISTRY
                .assign(variant=variant)
                [select_ix]
                .merge(AGE_REGISTRY, on='resident_id')
                .groupby('variant')
                .apply(lambda df: pd.Series({
                    'min_age': df.age.min(),
                    'mean_age': df.age.mean(),
                    'std_age': df.age.std(),
                    'max_age': df.age.max()
                }))
                .reset_index()
                .sort_values(by='min_age', ascending=False)
                .reset_index(drop=True))

The Pandas version is less readable and we lose the one-to-one translation to Spark dataframe or SQL.

Trying HyFive

Clone the repository, and on the root directory of the enter the following command on terminal:

./run build-docker

Run the unit tests with the following command:

./run unit-tests .

Invoke the Hy REPL by running:

./run repl

And finally, import HyFive using:

(import [hyfive :as hf])

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dev_aa_test_1-0.1.0.tar.gz (3.1 kB view details)

Uploaded Source

Built Distribution

dev_aa_test_1-0.1.0-py3-none-any.whl (15.3 kB view details)

Uploaded Python 3

File details

Details for the file dev_aa_test_1-0.1.0.tar.gz.

File metadata

  • Download URL: dev_aa_test_1-0.1.0.tar.gz
  • Upload date:
  • Size: 3.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for dev_aa_test_1-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b61643d0a3f7e3053a6e2476678cfedf375680cd5df8d5554b094999beb976fd
MD5 f2100288ab145b62a8d904cbad873911
BLAKE2b-256 f395c6bc0e56a4ba7c362a6dd8d3bfc61a8020ef868ef585a2f86da6be58df73

See more details on using hashes here.

File details

Details for the file dev_aa_test_1-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: dev_aa_test_1-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 15.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for dev_aa_test_1-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f8392ec91235294e8a2313596272cb933dece755407ec4171b1fc52bb1a7526a
MD5 6ac1139eef94e447e9f80cbc7c960311
BLAKE2b-256 9e90485c927696913c0ce887399628f3b2ad4ca96a5f7772b63ab8d4c3df932a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page