Skip to main content

dataframe operations

Project description

Banner A DataVil project.

FrameX

GitHub PyPI

FrameX is a light-weight, dataset fetching library for fast prototyping, tutorial creation, and experimenting.

Built on top of Polars.

Installation

To get started, install the library with:

pip install framex

Usage

Python

import framex as fx

Loading datasets

iris = fx.load("iris")

which returns a polars DataFrame
Therefore, you can use all the polars functions and methods on the returned DataFrame.

iris.head()
shape: (5, 5)
┌──────────────┬─────────────┬──────────────┬─────────────┬─────────┐
│ sepal_length ┆ sepal_width ┆ petal_length ┆ petal_width ┆ species │
│ ---          ┆ ---         ┆ ---          ┆ ---         ┆ ---     │
│ f32          ┆ f32         ┆ f32          ┆ f32         ┆ str     │
╞══════════════╪═════════════╪══════════════╪═════════════╪═════════╡
│ 5.1          ┆ 3.5         ┆ 1.4          ┆ 0.2         ┆ setosa  │
│ 4.9          ┆ 3.0         ┆ 1.4          ┆ 0.2         ┆ setosa  │
│ 4.7          ┆ 3.2         ┆ 1.3          ┆ 0.2         ┆ setosa  │
│ 4.6          ┆ 3.1         ┆ 1.5          ┆ 0.2         ┆ setosa  │
│ 5.0          ┆ 3.6         ┆ 1.4          ┆ 0.2         ┆ setosa  │
└──────────────┴─────────────┴──────────────┴─────────────┴─────────┘
iris = fx.load("iris", lazy=True)

which returns a polars LazyFrame

Both these operations create local copies of the datasets by default cache=True.

Available datasets

To see the list of available datasets, run:

fx.available()
{'remote': ['iris', 'mpg', 'netflix', 'starbucks', 'titanic'], 'local': ['titanic']}

PS, shorthened for clarity

which returns a dictionary of both locally and remotely available datasets.

To see only local or remote datasets, run:

fx.available("local")
fx.available("remote")
{'local': ['titanic']}
{'remote': ['iris', 'mpg', 'netflix', 'starbucks', 'titanic']}

Getting information on Datasets

To get information on a dataset, run:

fx.about("mpg") # basically the same as `fx.about("mpg", mode="print")`

which will print the information on the dataset as the following:

NAME    : mpg
SOURCE  : https://www.kaggle.com/datasets/uciml/autompg-dataset
LICENSE : CC0: Public Domain
ORIGIN  : Kaggle
OG NAME : autompg-dataset

Or you can get the information as a single row polars.DataFrame by running:

row = fx.about("mpg", mode="row")
print(row)

which will print the information on the dataset ASCII art as the following:

shape: (1, 4)
┌──────┬─────────────────────────────────┬────────────────────┬────────┐       
│ name ┆ source                          ┆ license            ┆ origin │       
│ ---  ┆ ---                             ┆ ---                ┆ ---    │       
│ str  ┆ str                             ┆ str                ┆ str    │       
╞══════╪═════════════════════════════════╪════════════════════╪════════╡       
│ mpg  ┆ https://www.kaggle.com/dataset… ┆ CC0: Public Domain ┆ Kaggle │       
└──────┴─────────────────────────────────┴────────────────────┴────────┘ 

or you can simply treat row as a polars DataFrame in your code.

Getting Dataset URLs

In case you need the file links.

url_pokemon = fx.get_url("pokemon")

by default, the format is " feather".

Optionally, you can specify the format of the dataset.

url_pokemon_csv = fx.get_url("pokemon", format="csv")

CLI

framex CLI has a slight overhead of around 400 milliseconds due to imports. However, operations still take less than a second, unless bottlenecked by the download speed.

TO see all the available commands, run:

fx -h
usage: fx [-h] [--version]
          {get,bring,about,list,show,describe} ...

Framex CLI

positional arguments:
  {get,bring,about,list,show,describe}
    get                 Get dataset(s)
    bring               Bring dataset(s) from the cache to the  
                        current working directory or to a       
                        specified directory.
    about               Info about dataset(s)
    list                List available datasets
    show                Show a preview of a single dataset      
    describe            Describe (or summarize) a dataset       

options:
  -h, --help            show this help message and exit
  --version, -v         Show version

get

Get a single dataset (to the current directory):

fx get iris

or get multiple datasets:

fx get iris mpg titanic

which will download dataset(s) to the current directory.

to get the datasets into cache directory:

fx get iris mpg titanic --cache

or to a specific directory:

fx get iris mpg titanic --dir data

list

To get the name of the available datasets on the remote server.

fx list

this will list all available datasets on the remote server.

to get the names of the available datasets that includes "dia"

fx list dia
Locally available datasets: (feather, parquet, csv, other)

Remote datasets:
diamonds

about

To get information on a dataset or datasets, run:

fx about mpg iris

show

To show a preview of a single dataset

fx show iris

describe

To describe (or summarize) a dataset

fx describe iris

For more parameters

fx get --help

bring

Bring a dataset to the current directory from cache:

fx bring iris

or bring multiple datasets:

fx bring iris mpg titanic

which will bring dataset(s) to the current directory from cache directory.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

framex-1.0.2.tar.gz (16.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

framex-1.0.2-py3-none-any.whl (19.2 kB view details)

Uploaded Python 3

File details

Details for the file framex-1.0.2.tar.gz.

File metadata

  • Download URL: framex-1.0.2.tar.gz
  • Upload date:
  • Size: 16.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.0.0 CPython/3.12.8 Darwin/24.4.0

File hashes

Hashes for framex-1.0.2.tar.gz
Algorithm Hash digest
SHA256 b8a2c4bc9d80c9b3b63e21667b577c4c69a98fa58e0a6b34a4fb0fd0ad672573
MD5 aa1b3eefcada93846ed118f3c81f713a
BLAKE2b-256 4329c59a1088ce31cbd8fa9eb0505ca46a0ae87dac555695434d011cc7ddcbee

See more details on using hashes here.

File details

Details for the file framex-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: framex-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 19.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.0.0 CPython/3.12.8 Darwin/24.4.0

File hashes

Hashes for framex-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 26d287c48e48bb3a7e221d43bacef38eeb3a8845e7033d8f3c09f3a9935e306a
MD5 88fbcb6aa0d681abcc75978b21250c7a
BLAKE2b-256 f4732e2f787a5bec77d0183ec2c496b4531923d6a814f1df29edaeeb0ad51d67

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page