Skip to main content

dataframe operations

Project description

Banner A DataVil project.

FrameX

GitHub PyPI

FrameX is a light-weight, dataset fetching library for fast prototyping, tutorial creation, and experimenting.

Built on top of Polars.

Installation

To get started, install the library with:

pip install framex

Usage

Python

import framex as fx

Loading datasets

iris = fx.load("iris")

which returns a polars DataFrame
Therefore, you can use all the polars functions and methods on the returned DataFrame.

iris.head()
shape: (5, 5)
┌──────────────┬─────────────┬──────────────┬─────────────┬─────────┐
│ sepal_length ┆ sepal_width ┆ petal_length ┆ petal_width ┆ species │
│ ---          ┆ ---         ┆ ---          ┆ ---         ┆ ---     │
│ f32          ┆ f32         ┆ f32          ┆ f32         ┆ str     │
╞══════════════╪═════════════╪══════════════╪═════════════╪═════════╡
│ 5.1          ┆ 3.5         ┆ 1.4          ┆ 0.2         ┆ setosa  │
│ 4.9          ┆ 3.0         ┆ 1.4          ┆ 0.2         ┆ setosa  │
│ 4.7          ┆ 3.2         ┆ 1.3          ┆ 0.2         ┆ setosa  │
│ 4.6          ┆ 3.1         ┆ 1.5          ┆ 0.2         ┆ setosa  │
│ 5.0          ┆ 3.6         ┆ 1.4          ┆ 0.2         ┆ setosa  │
└──────────────┴─────────────┴──────────────┴─────────────┴─────────┘
iris = fx.load("iris", lazy=True)

which returns a polars LazyFrame

Both these operations create local copies of the datasets by default cache=True.

Available datasets

To see the list of available datasets, run:

fx.available()
{'remote': ['iris', 'mpg', 'netflix', 'starbucks', 'titanic'], 'local': ['titanic']}

which returns a dictionary of both locally and remotely available datasets.

To see only local or remote datasets, run:

fx.available("local")
fx.available("remote")
{'local': ['titanic']}
{'remote': ['iris', 'mpg', 'netflix', 'starbucks', 'titanic']}

Getting information on Datasets

To get information on a dataset, run:

fx.about("mpg") # basically the same as `fx.about("mpg", mode="print")`

which will print the information on the dataset as the following:

NAME    : mpg
SOURCE  : https://www.kaggle.com/datasets/uciml/autompg-dataset
LICENSE : CC0: Public Domain
ORIGIN  : Kaggle
OG NAME : autompg-dataset

Or you can get the information as a single row polars.DataFrame by running:

row = fx.about("mpg", mode="row")
print(row)

which will print the information on the dataset ASCII art as the following:

shape: (1, 4)
┌──────┬─────────────────────────────────┬────────────────────┬────────┐       
│ name ┆ source                          ┆ license            ┆ origin │       
│ ---  ┆ ---                             ┆ ---                ┆ ---    │       
│ str  ┆ str                             ┆ str                ┆ str    │       
╞══════╪═════════════════════════════════╪════════════════════╪════════╡       
│ mpg  ┆ https://www.kaggle.com/dataset… ┆ CC0: Public Domain ┆ Kaggle │       
└──────┴─────────────────────────────────┴────────────────────┴────────┘ 

or you can simply treat row as a polars DataFrame in your code.

Getting Dataset URLs

In case you need the file links.

url_pokemon = fx.get_url("pokemon")

by default, the format is " feather".

Optionally, you can specify the format of the dataset.

url_pokemon_csv = fx.get_url("pokemon", format="csv")

CLI

get

Get a single dataset:

fx get iris

or get multiple datasets:

fx get iris mpg titanic

which will download dataset(s) to the current directory.

to get the datasets into cache directory:

fx get iris mpg titanic --dir cache

or to a specific directory:

fx get iris mpg titanic --dir data

list

To get the name of the available datasets on the remote server.

fx list

this will list all available datasets on the remote server.

about

To get information on a dataset or datasets, run:

fx about mpg iris

show

To show a preview of a single dataset

fx show iris

describe

To describe (or summarize) a dataset

fx describe iris

For more parameters

fx get --help

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

framex-0.7.2.tar.gz (12.9 kB view details)

Uploaded Source

Built Distribution

framex-0.7.2-py3-none-any.whl (16.1 kB view details)

Uploaded Python 3

File details

Details for the file framex-0.7.2.tar.gz.

File metadata

  • Download URL: framex-0.7.2.tar.gz
  • Upload date:
  • Size: 12.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.11.9 Windows/10

File hashes

Hashes for framex-0.7.2.tar.gz
Algorithm Hash digest
SHA256 c52e2ac338a7947dfae9bb085d246c653477e6f6f24a4bd68605b4f2cc88619b
MD5 13e3f0b80c21f09f36408e59ef19c49c
BLAKE2b-256 786e11175e460b30168ed7b63e5dc37b05972831b64a49257ab541fc8fdbea42

See more details on using hashes here.

File details

Details for the file framex-0.7.2-py3-none-any.whl.

File metadata

  • Download URL: framex-0.7.2-py3-none-any.whl
  • Upload date:
  • Size: 16.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.11.9 Windows/10

File hashes

Hashes for framex-0.7.2-py3-none-any.whl
Algorithm Hash digest
SHA256 aef3469718a8b22d52d87d0033a65c6e0e67ebca96ae2036680bcd6b3f03074e
MD5 7bfeb18fad7849f9038a59d23dc55034
BLAKE2b-256 51155e0a9984c5983948ad8026e83c66710903ff9de321a6d4eef37ea6778e1a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page