Skip to main content

A restricted Pandas API

Project description

babypandas

A pandas data-analysis library with a restricted API


The pandas library for tabular data analysis is powerful and popular, but perhaps not the easiest to learn: for nearly every task, no matter how simple, there are multiple ways of approaching it. babypandas is a simplified, introductory pandas library that allows for basic tabular data analysis with only a small subset of methods and arguments. This restricted interface is designed to be easier to learn while still demonstrating fundamental principles and allowing for a smooth transition into pandas at a later time.

The chosen methods are meant to align with the methods in Berkeley's datascience module, developed for the data8 course. However, unlike the datascience module, all code written in babypandas is also valid pandas code.


Install

To install babypandas, use pip:

pip install babypandas

Documentation

See the documentation page.


FAQ

Who is this library for?

This library is intended for those wanting an introduction to data science in python, but want a focused, introduction much like what's covered in Berkeley's data8 course. The pandas methods available in this library encourage better Pandas usage through functional programming patterns and method chaining.

Why not just use the datascience module?

This library may be prefered over datascience when students will be moving to pandas. While this library serves as a restricted introduction to pandas, it doesn't shy away from some pandas usage patterns that may require care for new programmers:

  • The frequent use of named function arguments,
  • The use of boolean arrays (masks) to select rows,
  • The use of table indices.

How does this library compare to the datascience module?

Berkeley datascience module equivalents with babypandas:

datascience method babypandas equivalent or close method description
Table() bpd.DataFrame() empty table formation
Table().with_columns(*labels_and_values) bpd.DataFrame().assign(**kwargs) table from lists
table.with_columns(*labels_and_values) df.assign(**kwargs) adding columns
table.with_rows(rows) df.append(other_df, ignore_index=True)
Table.read_table(filepath) bpd.read_csv(filepath) read in data
table.num_columns df.shape[1] number of columns
table.num_rows df.shape[0] number of rows
table.labels df.columns list of columns
table.relabeled(label, new_label) df.assign(new_label=df.get(label)).drop(columns=[label]) rename columns
table.column(col) df.get(col) get a specific column (by name)
table.column(col).item(0) df.get(col).iloc[0] get a specific value in the table
table.select(col1, col2) df.get([col1, col2]) get columns as a df
table.drop(col1, col2) df.drop(columns=[col1, col2]) drop columns
table.sort(col) df.sort_values(by=col) sorts values in a dataframe by col
table.take(row_indices_or_slice) df.take(row_indices_or_slice) selects a single row
table.where(col, are.above(num)) df.loc[df.get(col) > num] selects rows based on condition
table.scatter(xcol, ycol) df.plot(kind='scatter', x=xcol, y=ycol) plots a scatter plot
table.plot(xcol, ycol) df.plot(x=xcol, y=ycol) plots a line plot
table.barh(col) df.plot(kind='barh', x=col) plots a horizontal bar plot
table.hist(col, bins) df.get(col).plot(kind='hist', bins=bins) plots a histogram
table.apply(fn, col) df.get(col).apply(fn) apply function to a column
table.group(col) df.groupby(col).count() give counts of values in a col
table.group(col, agg_fn) df.groupby(col).agg_fn.reset_index() groups by column, aggregates with fn
table.group([col1, col2]) df.groupby([col1, col2]).count().reset_index() groups by two cols, agg with counts
table.group([col1, col2], sum) df.groupby[col1, col2]).sum().reset_index() groups by two cols, agg with sum
table.join(leftcol, df2, rightcol) df.merge(df2, left_on=leftcol, right_on=rightcol) merges two dataframes (diff col names)
table.join(col, df2, col) df.merge(df2, on=col) merges two dataframes (same col names)
table.sample(n) df.sample(n, replace=True) sample with replacement
sample_proportions(size, distr) np.random.multinomial(size, distr) / size gets sample proportions of a distribution

Development

Publishing to PyPI requires that a tagged commit exists on the master branch. The GitHub Actions workflow will trigger package building and publishing to PyPI only when a commit on master is tagged. This can happen in one of two ways:

  1. Direct Tagged Commit to Master: Commit your changes directly to master and tag the commit before pushing to GitHub.
git commit -m "Your descriptive commit message"
git tag <tag-name> # convention has been to tag with package version
git push origin master
git push origin <tag-name>
  1. Merge Pull Request to Master and Post-Hoc Tag: Merge a pull request into master. After merging, tag the resulting commit in master.
git checkout master
git pull origin master
git tag <tag-name>
git push origin <tag-name>

Either of these approaches will trigger testing, building, and publishing of the package to PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

babypandas-0.1.9.tar.gz (20.3 kB view details)

Uploaded Source

Built Distribution

babypandas-0.1.9-py3-none-any.whl (15.1 kB view details)

Uploaded Python 3

File details

Details for the file babypandas-0.1.9.tar.gz.

File metadata

  • Download URL: babypandas-0.1.9.tar.gz
  • Upload date:
  • Size: 20.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for babypandas-0.1.9.tar.gz
Algorithm Hash digest
SHA256 fa30a50f6bd448151b715f4088ecec233f3e7667f469960e2e2b079781c3d1ce
MD5 8abb356e3ab1f61ada77b3d749c111ef
BLAKE2b-256 21b8fc5e23b0145623605ffb3ad2efba8923228c62160fd6a560f86808678c1d

See more details on using hashes here.

File details

Details for the file babypandas-0.1.9-py3-none-any.whl.

File metadata

  • Download URL: babypandas-0.1.9-py3-none-any.whl
  • Upload date:
  • Size: 15.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for babypandas-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 6145ec77fb2dfa777e0f0933dea2d69ff0249a6add6c98ad3faf6a4eae7e3777
MD5 d8d0de7c7ed40f9e96334cb4747097fd
BLAKE2b-256 2722f2712622e59f9a18827d0e5b84b479547cd0d15c63330f9381ee9ae45272

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page