A restricted Pandas API
Project description
babypandas
A pandas
data-analysis library with a restricted API
The pandas
library is a confusing mess of methods, and for every
task, no matter how simple, there are multiple ways of approaching
it. babypandas
is a simplified, introductory pandas
library that
allows for basic tabular data-analysis with only a small subset of
methods and arguments. This allows for a smooth transition into pandas
.
The chosen methods are meant to align with the methods in Berkeley's
datascience
module, developed for the data8
course. However, unlike the datascience
module, all code written in
babypandas
is also valid pandas
code.
Install
To install babypandas
, use pip
:
pip install babypandas
Documentation
See the documentation page.
FAQ
Who is this library for?
This library is intended for those wanting an introduction to data science in python, but want a focused, introduction much like what's covered in Berkeley's data8 course. The pandas methods available in this library encourage better Pandas usage through functional programming patterns and method chaining.
Why not just use the datascience module?
This library may be prefered over datascience
when students
will be moving to pandas
. While this library serves as a restricted
introduction to pandas
, it doesn't shy away from some pandas
usage patterns that may require care for new programmers:
- The frequent use of named function arguments,
- The use of boolean arrays (masks) to select rows,
- The use of table indices.
How does this library compare to the datascience module?
Berkeley datascience
module equivalents with babypandas
:
datascience method |
babypandas equivalent or close |
method description |
---|---|---|
Table() |
bpd.DataFrame() |
empty table formation |
Table().with_columns(*labels_and_values) |
bpd.DataFrame().assign(**kwargs) |
table from lists |
table.with_columns(*labels_and_values) |
df.assign(**kwargs) |
adding columns |
table.with_rows(rows) |
df.append(other_df, ignore_index=True) |
|
Table.read_table(filepath) |
bpd.read_csv(filepath) |
read in data |
table.num_columns |
df.shape[1] |
number of columns |
table.num_rows |
df.shape[0] |
number of rows |
table.labels |
df.columns |
list of columns |
table.relabeled(label, new_label) |
df.assign(new_label=df.get(label)).drop(columns=[label]) |
rename columns |
table.column(col) |
df.get(col) |
get a specific column (by name) |
table.column(col).item(0) |
df.get(col).iloc[0] |
get a specific value in the table |
table.select(col1, col2) |
df.get([col1, col2]) |
get columns as a df |
table.drop(col1, col2) |
df.drop(columns=[col1, col2]) |
drop columns |
table.sort(col) |
df.sort_values(by=col) |
sorts values in a dataframe by col |
table.take(row_indices_or_slice) |
df.take(row_indices_or_slice) |
selects a single row |
table.where(col, are.above(num)) |
df.loc[df.get(col) > num] |
selects rows based on condition |
table.scatter(xcol, ycol) |
df.plot(kind='scatter', x=xcol, y=ycol) |
plots a scatter plot |
table.plot(xcol, ycol) |
df.plot(x=xcol, y=ycol) |
plots a line plot |
table.barh(col) |
df.plot(kind='barh', x=col) |
plots a horizontal bar plot |
table.hist(col, bins) |
df.get(col).plot(kind='hist', bins=bins) |
plots a histogram |
table.apply(fn, col) |
df.get(col).apply(fn) |
apply function to a column |
table.group(col) |
df.groupby(col).count() |
give counts of values in a col |
table.group(col, agg_fn) |
df.groupby(col).agg_fn.reset_index() |
groups by column, aggregates with fn |
table.group([col1, col2]) |
df.groupby([col1, col2]).count().reset_index() |
groups by two cols, agg with counts |
table.group([col1, col2], sum) |
df.groupby[col1, col2]).sum().reset_index() |
groups by two cols, agg with sum |
table.join(leftcol, df2, rightcol) |
df.merge(df2, left_on=leftcol, right_on=rightcol) |
merges two dataframes (diff col names) |
table.join(col, df2, col) |
df.merge(df2, on=col) |
merges two dataframes (same col names) |
table.sample(n) |
df.sample(n, replace=True) |
sample with replacement |
sample_proportions(size, distr) |
np.random.multinomial(size, distr) / size |
gets sample proportions of a distribution |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.