A restricted Pandas API for data science learners

Project description

babypandas

A pandas data-analysis library with a restricted API

The pandas library for tabular data analysis is powerful and popular, but perhaps not the easiest to learn: for nearly every task, no matter how simple, there are multiple ways of approaching it. babypandas is a simplified, introductory pandas library that allows for basic tabular data analysis with only a small subset of methods and arguments. This restricted interface is designed to be easier to learn while still demonstrating fundamental principles and allowing for a smooth transition into pandas at a later time.

The chosen methods are meant to align with the methods in Berkeley's datascience module, developed for the data8 course. However, unlike the datascience module, all code written in babypandas is also valid pandas code.

Install

To install babypandas, use pip:

pip install babypandas

Documentation

See the documentation page.

FAQ

Who is this library for?

This library is intended for those wanting an introduction to data science in python, but want a focused, introduction much like what's covered in Berkeley's data8 course. The pandas methods available in this library encourage better Pandas usage through functional programming patterns and method chaining.

Why not just use the datascience module?

This library may be prefered over datascience when students will be moving to pandas. While this library serves as a restricted introduction to pandas, it doesn't shy away from some pandas usage patterns that may require care for new programmers:

The frequent use of named function arguments,
The use of boolean arrays (masks) to select rows,
The use of table indices.

How does this library compare to the datascience module?

Berkeley datascience module equivalents with babypandas:

`datascience` method	`babypandas` equivalent or close	method description
`Table()`	`bpd.DataFrame()`	empty table formation
`Table().with_columns(*labels_and_values)`	`bpd.DataFrame().assign(**kwargs)`	table from lists
`table.with_columns(*labels_and_values)`	`df.assign(**kwargs)`	adding columns
`table.with_rows(rows)`	`df.append(other_df, ignore_index=True)`
`Table.read_table(filepath)`	`bpd.read_csv(filepath)`	read in data
`table.num_columns`	`df.shape[1]`	number of columns
`table.num_rows`	`df.shape[0]`	number of rows
`table.labels`	`df.columns`	list of columns
`table.relabeled(label, new_label)`	`df.assign(new_label=df.get(label)).drop(columns=[label])`	rename columns
`table.column(col)`	`df.get(col)`	get a specific column (by name)
`table.column(col).item(0)`	`df.get(col).iloc[0]`	get a specific value in the table
`table.select(col1, col2)`	`df.get([col1, col2])`	get columns as a df
`table.drop(col1, col2)`	`df.drop(columns=[col1, col2])`	drop columns
`table.sort(col)`	`df.sort_values(by=col)`	sorts values in a dataframe by col
`table.take(row_indices_or_slice)`	`df.take(row_indices_or_slice)`	selects a single row
`table.where(col, are.above(num))`	`df.loc[df.get(col) > num]`	selects rows based on condition
`table.scatter(xcol, ycol)`	`df.plot(kind='scatter', x=xcol, y=ycol)`	plots a scatter plot
`table.plot(xcol, ycol)`	`df.plot(x=xcol, y=ycol)`	plots a line plot
`table.barh(col)`	`df.plot(kind='barh', x=col)`	plots a horizontal bar plot
`table.hist(col, bins)`	`df.get(col).plot(kind='hist', bins=bins)`	plots a histogram
`table.apply(fn, col)`	`df.get(col).apply(fn)`	apply function to a column
`table.group(col)`	`df.groupby(col).count()`	give counts of values in a col
`table.group(col, agg_fn)`	`df.groupby(col).agg_fn.reset_index()`	groups by column, aggregates with fn
`table.group([col1, col2])`	`df.groupby([col1, col2]).count().reset_index()`	groups by two cols, agg with counts
`table.group([col1, col2], sum)`	`df.groupby[col1, col2]).sum().reset_index()`	groups by two cols, agg with sum
`table.join(leftcol, df2, rightcol)`	`df.merge(df2, left_on=leftcol, right_on=rightcol)`	merges two dataframes (diff col names)
`table.join(col, df2, col)`	`df.merge(df2, on=col)`	merges two dataframes (same col names)
`table.sample(n)`	`df.sample(n, replace=True)`	sample with replacement
`sample_proportions(size, distr)`	`np.random.multinomial(size, distr) / size`	gets sample proportions of a distribution

Development

Publishing to PyPI requires that a tagged commit exists on the master branch. The GitHub Actions workflow will trigger package building and publishing to PyPI only when a commit on master is tagged. This can happen in one of two ways:

Direct Tagged Commit to Master: Commit your changes directly to master and tag the commit before pushing to GitHub.

git commit -m "Your descriptive commit message"
git tag <tag-name> # convention has been to tag with package version
git push origin master
git push origin <tag-name>

Merge Pull Request to Master and Post-Hoc Tag: Merge a pull request into master. After merging, tag the resulting commit in master.

git checkout master
git pull origin master
git tag <tag-name>
git push origin <tag-name>

Either of these approaches will trigger testing, building, and publishing of the package to PyPI.

Project details

Release history Release notifications | RSS feed

This version

1.0.0

Nov 4, 2025

1.0.0.dev1 pre-release

Oct 15, 2025

1.0.0.dev0 pre-release

Oct 14, 2025

0.1.9

Sep 27, 2023

0.1.8

Sep 25, 2023

0.1.7

Sep 12, 2022

0.1.6

Mar 31, 2020

0.1.5

Jan 13, 2020

0.1.4

Dec 31, 2019

0.1.3

Dec 12, 2019

0.1.2

Dec 12, 2019

0.1.1

Sep 26, 2019

0.1.0

Sep 26, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

babypandas-1.0.0.tar.gz (13.2 kB view details)

Uploaded Nov 4, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

babypandas-1.0.0-py3-none-any.whl (14.3 kB view details)

Uploaded Nov 4, 2025 Python 3

File details

Details for the file babypandas-1.0.0.tar.gz.

File metadata

Download URL: babypandas-1.0.0.tar.gz
Upload date: Nov 4, 2025
Size: 13.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for babypandas-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`932e566ab8f4526e52cf3bada4ac5cb9d29d347399b2b497581b87e0ddbf4c71`
MD5	`0a6afba2e9a95af6e30aa1eaa2d07dea`
BLAKE2b-256	`efe4d1070621f67601ed9958de5c1aaba2b044b0d7015ed8fe381d16f872a4c0`

See more details on using hashes here.

File details

Details for the file babypandas-1.0.0-py3-none-any.whl.

File metadata

Download URL: babypandas-1.0.0-py3-none-any.whl
Upload date: Nov 4, 2025
Size: 14.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for babypandas-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`73f4004d1dc04608b9009b2b1ee4b5a7ffb88e8413abe985c2931caf70946a21`
MD5	`1adedfc5b852b8146391c9bf9daa23ee`
BLAKE2b-256	`f1ee891f0b56ffa0544853f207691f68c8d4cfc422f25c351a06d16915e9c7e8`

See more details on using hashes here.

babypandas 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

babypandas

Install

Documentation

FAQ

Development

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes