dataframe operations
Project description
framex
A DataVil project.
Framex is a light-weight, dataset fetching library for fast prototyping, tutorial creation, and experimenting.
Built on top of Polars.
Contents
Installation
To get started, install the library with:
pip install framex
Usage
Pytho
import framex as fx
Loading datasets
iris = fx.load("iris")
which returns a polars DataFrame
Therefore, you can use all the polars functions and methods on the returned DataFrame.
iris.head()
shape: (5, 5)
┌──────────────┬─────────────┬──────────────┬─────────────┬─────────┐
│ sepal_length ┆ sepal_width ┆ petal_length ┆ petal_width ┆ species │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ f32 ┆ f32 ┆ f32 ┆ f32 ┆ str │
╞══════════════╪═════════════╪══════════════╪═════════════╪═════════╡
│ 5.1 ┆ 3.5 ┆ 1.4 ┆ 0.2 ┆ setosa │
│ 4.9 ┆ 3.0 ┆ 1.4 ┆ 0.2 ┆ setosa │
│ 4.7 ┆ 3.2 ┆ 1.3 ┆ 0.2 ┆ setosa │
│ 4.6 ┆ 3.1 ┆ 1.5 ┆ 0.2 ┆ setosa │
│ 5.0 ┆ 3.6 ┆ 1.4 ┆ 0.2 ┆ setosa │
└──────────────┴─────────────┴──────────────┴─────────────┴─────────┘
iris = fx.load("iris", lazy=True)
which returns a polars LazyFrame
Both these operations create local copies of the datasets by default cache=True
.
Available datasets
To see the list of available datasets, run:
fx.available()
{'remote': ['iris', 'mpg', 'netflix', 'starbucks', 'titanic'], 'local': ['titanic']}
which returns a dictionary of both locally and remotely available datasets.
To see only local or remote datasets, run:
fx.available("local")
fx.available("remote")
{'local': ['titanic']}
{'remote': ['iris', 'mpg', 'netflix', 'starbucks', 'titanic']}
Getting information on Datasets
To get information on a dataset, run:
fx.about("mpg") # basically the same as `fx.about("mpg", mode="print")`
which will print the information on the dataset as the following:
NAME : mpg
SOURCE : https://www.kaggle.com/datasets/uciml/autompg-dataset
LICENSE : CC0: Public Domain
ORIGIN : Kaggle
OG NAME : autompg-dataset
Or you can get the information as a single row polars.DataFrame by running:
row = fx.about("mpg", mode="row")
print(row)
which will print the information on the dataset ASCII art as the following:
shape: (1, 4)
┌──────┬─────────────────────────────────┬────────────────────┬────────┐
│ name ┆ source ┆ license ┆ origin │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str │
╞══════╪═════════════════════════════════╪════════════════════╪════════╡
│ mpg ┆ https://www.kaggle.com/dataset… ┆ CC0: Public Domain ┆ Kaggle │
└──────┴─────────────────────────────────┴────────────────────┴────────┘
or you can simply treat row
as a polars DataFrame in your code.
Getting Dataset URLs
In case you need the file links.
url_pokemon = fx.get_url("pokemon")
by default, the format is " feather".
Optionally, you can specify the format of the dataset.
url_pokemon_csv = fx.get_url("pokemon", format="csv")
CLI
Get a single dataset:
fx get iris
or get multiple datasets:
fx get iris mpg titanic
which will download dataset(s) to the current directory.
For more parameters
fx get --help
To get the name of the available datasets on the remote server.
fx list
this will list all available datasets on the remote server.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file framex-0.4.1.tar.gz
.
File metadata
- Download URL: framex-0.4.1.tar.gz
- Upload date:
- Size: 8.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.11.9 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c8fa74d7051c569cc51babcac3e6460f4610719b50f362229d39b06fe8d0cc02 |
|
MD5 | c77d5d099bc2ae16789a4275c0a02d46 |
|
BLAKE2b-256 | 3542244369672b92551c981dbee93a9a75b651234ee208ebc9e53e9c7ecda26a |
File details
Details for the file framex-0.4.1-py3-none-any.whl
.
File metadata
- Download URL: framex-0.4.1-py3-none-any.whl
- Upload date:
- Size: 11.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.11.9 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 928c9cc794f7bd56dd29455f7d6f9342f80651dc8773ddaf2d5959232f84483e |
|
MD5 | 26e8a4214d5133d075547c927d87be1f |
|
BLAKE2b-256 | 6ad09986ac9683127f4fc700b645c08c696af901fa96c33e98dc509d4fbe8647 |