Skip to main content

A restricted Pandas API for data science learners

Project description

babypandas

A pandas data-analysis library with a restricted API


The pandas library for tabular data analysis is powerful and popular, but perhaps not the easiest to learn: for nearly every task, no matter how simple, there are multiple ways of approaching it. babypandas is a simplified, introductory pandas library that allows for basic tabular data analysis with only a small subset of methods and arguments. This restricted interface is designed to be easier to learn while still demonstrating fundamental principles and allowing for a smooth transition into pandas at a later time.

The chosen methods are meant to align with the methods in Berkeley's datascience module, developed for the data8 course. However, unlike the datascience module, all code written in babypandas is also valid pandas code.


Install

To install babypandas, use pip:

pip install babypandas

Documentation

See the documentation page.


FAQ

Who is this library for?

This library is intended for those wanting an introduction to data science in python, but want a focused, introduction much like what's covered in Berkeley's data8 course. The pandas methods available in this library encourage better Pandas usage through functional programming patterns and method chaining.

Why not just use the datascience module?

This library may be prefered over datascience when students will be moving to pandas. While this library serves as a restricted introduction to pandas, it doesn't shy away from some pandas usage patterns that may require care for new programmers:

  • The frequent use of named function arguments,
  • The use of boolean arrays (masks) to select rows,
  • The use of table indices.

How does this library compare to the datascience module?

Berkeley datascience module equivalents with babypandas:

datascience method babypandas equivalent or close method description
Table() bpd.DataFrame() empty table formation
Table().with_columns(*labels_and_values) bpd.DataFrame().assign(**kwargs) table from lists
table.with_columns(*labels_and_values) df.assign(**kwargs) adding columns
table.with_rows(rows) df.append(other_df, ignore_index=True)
Table.read_table(filepath) bpd.read_csv(filepath) read in data
table.num_columns df.shape[1] number of columns
table.num_rows df.shape[0] number of rows
table.labels df.columns list of columns
table.relabeled(label, new_label) df.assign(new_label=df.get(label)).drop(columns=[label]) rename columns
table.column(col) df.get(col) get a specific column (by name)
table.column(col).item(0) df.get(col).iloc[0] get a specific value in the table
table.select(col1, col2) df.get([col1, col2]) get columns as a df
table.drop(col1, col2) df.drop(columns=[col1, col2]) drop columns
table.sort(col) df.sort_values(by=col) sorts values in a dataframe by col
table.take(row_indices_or_slice) df.take(row_indices_or_slice) selects a single row
table.where(col, are.above(num)) df.loc[df.get(col) > num] selects rows based on condition
table.scatter(xcol, ycol) df.plot(kind='scatter', x=xcol, y=ycol) plots a scatter plot
table.plot(xcol, ycol) df.plot(x=xcol, y=ycol) plots a line plot
table.barh(col) df.plot(kind='barh', x=col) plots a horizontal bar plot
table.hist(col, bins) df.get(col).plot(kind='hist', bins=bins) plots a histogram
table.apply(fn, col) df.get(col).apply(fn) apply function to a column
table.group(col) df.groupby(col).count() give counts of values in a col
table.group(col, agg_fn) df.groupby(col).agg_fn.reset_index() groups by column, aggregates with fn
table.group([col1, col2]) df.groupby([col1, col2]).count().reset_index() groups by two cols, agg with counts
table.group([col1, col2], sum) df.groupby[col1, col2]).sum().reset_index() groups by two cols, agg with sum
table.join(leftcol, df2, rightcol) df.merge(df2, left_on=leftcol, right_on=rightcol) merges two dataframes (diff col names)
table.join(col, df2, col) df.merge(df2, on=col) merges two dataframes (same col names)
table.sample(n) df.sample(n, replace=True) sample with replacement
sample_proportions(size, distr) np.random.multinomial(size, distr) / size gets sample proportions of a distribution

Development

Publishing to PyPI requires that a tagged commit exists on the master branch. The GitHub Actions workflow will trigger package building and publishing to PyPI only when a commit on master is tagged. This can happen in one of two ways:

  1. Direct Tagged Commit to Master: Commit your changes directly to master and tag the commit before pushing to GitHub.
git commit -m "Your descriptive commit message"
git tag <tag-name> # convention has been to tag with package version
git push origin master
git push origin <tag-name>
  1. Merge Pull Request to Master and Post-Hoc Tag: Merge a pull request into master. After merging, tag the resulting commit in master.
git checkout master
git pull origin master
git tag <tag-name>
git push origin <tag-name>

Either of these approaches will trigger testing, building, and publishing of the package to PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

babypandas-1.0.0.tar.gz (13.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

babypandas-1.0.0-py3-none-any.whl (14.3 kB view details)

Uploaded Python 3

File details

Details for the file babypandas-1.0.0.tar.gz.

File metadata

  • Download URL: babypandas-1.0.0.tar.gz
  • Upload date:
  • Size: 13.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for babypandas-1.0.0.tar.gz
Algorithm Hash digest
SHA256 932e566ab8f4526e52cf3bada4ac5cb9d29d347399b2b497581b87e0ddbf4c71
MD5 0a6afba2e9a95af6e30aa1eaa2d07dea
BLAKE2b-256 efe4d1070621f67601ed9958de5c1aaba2b044b0d7015ed8fe381d16f872a4c0

See more details on using hashes here.

File details

Details for the file babypandas-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: babypandas-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 14.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for babypandas-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 73f4004d1dc04608b9009b2b1ee4b5a7ffb88e8413abe985c2931caf70946a21
MD5 1adedfc5b852b8146391c9bf9daa23ee
BLAKE2b-256 f1ee891f0b56ffa0544853f207691f68c8d4cfc422f25c351a06d16915e9c7e8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page