dataframe operations
Project description
FrameX
Framex is a light-weight, dataset obtaining library for fast prototyping, tutorial creation, and experimenting.
Built on top of Polars.
Installation
To get started, install the library with:
pip install framex
Usage
Python
import framex as fx
Loading datasets
iris = fx.load("iris")
which returns a polars DataFrame
Therefore, you can use all the polars functions and methods on the returned DataFrame.
iris.head()
shape: (5, 5)
┌──────────────┬─────────────┬──────────────┬─────────────┬─────────┐
│ sepal_length ┆ sepal_width ┆ petal_length ┆ petal_width ┆ species │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ f32 ┆ f32 ┆ f32 ┆ f32 ┆ str │
╞══════════════╪═════════════╪══════════════╪═════════════╪═════════╡
│ 5.1 ┆ 3.5 ┆ 1.4 ┆ 0.2 ┆ setosa │
│ 4.9 ┆ 3.0 ┆ 1.4 ┆ 0.2 ┆ setosa │
│ 4.7 ┆ 3.2 ┆ 1.3 ┆ 0.2 ┆ setosa │
│ 4.6 ┆ 3.1 ┆ 1.5 ┆ 0.2 ┆ setosa │
│ 5.0 ┆ 3.6 ┆ 1.4 ┆ 0.2 ┆ setosa │
└──────────────┴─────────────┴──────────────┴─────────────┴─────────┘
iris = fx.load("iris", lazy=True)
which returns a polars LazyFrame
Both these operations create local copies of the datasets
by default cache=True
.
Available datasets
To see the list of available datasets, run:
fx.available()
{'remote': ['iris', 'mpg', 'netflix', 'starbucks', 'titanic'], 'local': ['titanic']}
which returns a dictionary of both locally and remotely available datasets.
To see only local or remote datasets, run:
fx.available("local")
fx.available("remote")
{'local': ['titanic']}
{'remote': ['iris', 'mpg', 'netflix', 'starbucks', 'titanic']}
Information on Datasets
To get information on a dataset, run:
fx.about("mpg") # basically the same as `fx.about("mpg", mode="print")`
which will print the information on the dataset as the following:
NAME : mpg
SOURCE : https://www.kaggle.com/datasets/uciml/autompg-dataset
LICENSE : CC0: Public Domain
ORIGIN : Kaggle
OG NAME : autompg-dataset
Or you can get the information as a single row polars.DataFrame by running:
row = fx.about("mpg", mode="row")
print(row)
which will print the information on the dataset ASCII art as the following:
shape: (1, 4)
┌──────┬─────────────────────────────────┬────────────────────┬────────┐
│ name ┆ source ┆ license ┆ origin │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str │
╞══════╪═════════════════════════════════╪════════════════════╪════════╡
│ mpg ┆ https://www.kaggle.com/dataset… ┆ CC0: Public Domain ┆ Kaggle │
└──────┴─────────────────────────────────┴────────────────────┴────────┘
or you can simply treat row
as a polars DataFrame in your code.
CLI
Get a single dataset:
fx get iris
or get multiple datasets:
fx get iris mpg titanic
which will download dataset(s) to the current directory.
For more parameters
fx get --help
To get the name of the available datasets on the remote server.
fx list
this will list all available datasets on the remote server.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.