Skip to main content

A library for data exploration comparible to pandas. No Series, No hierarchical indexing, only one indexer [ ]

Project description

dexplo

A data analysis library comparible to pandas

Main Goals

  • A very minimal set of features

  • Be as explicit as possible

  • There should be one– and preferably only one –obvious way to do it.

Data Structures

  • Only DataFrames

  • No Series

Data Types

  • Only primitive types - int, float, boolean, numpy.unicode

  • No object data types

Row and Column Labels

  • No index, meaning no row labels

  • No hierarchical index

  • Column names must be strings

  • Column names must be unique

  • Columns stored in a numpy array

Subset Selection

  • Only one way to select data - [ ]

  • Subset selection will be explicit and necessitate both rows and columns

  • Rows will be selected only by integer location

  • Columns will be selected by either label or integer location. Since columns must be strings, this will not be amibguous

  • Column names cannot be duplicated

All selections and operations copy

  • All selections and operations provide new copies of the data

  • This will avoid any chained indexing confusion

Development

  • Must use type hints

  • Must use 3.6 - fstrings

  • Must have numpy, bottleneck, numexpr

Small feature set

  • Implement as few attributes and methods as possible

  • Focus on good idiomatic cookbook examples for doing more complex tasks

Only Scalar Data Types

No complex Python data types - [x] bool - always 8 bits, not-null - [x] int - always 64 bits, not-null - [x] float - always 64 bits, nulls allowed - [x] str - A python unicode object, nulls allowed - [ ] categorical - [ ] datetime - [ ] timedelta

Attributes to implement

  • [x] size

  • [x] shape

  • [x] values

  • [x] dtypes

May not implement any of the binary operators as methods (add, sub, mul, etc…)

Methods

Stats - [x] abs - [x] all - [x] any - [x] argmax - [x] argmin - [x] clip - [ ] corr - [x] count - [ ] cov - [x] cummax - [x] cummin - [ ] cumprod - [x] cumsum - [ ] describe - [x] max - [x] min - [x] median - [x] mean - [ ] mode - [ ] nlargest - [ ] nsmallest - [ ] quantile - [ ] rank - [x] std - [x] sum - [x] var - [ ] unique - [ ] nunique

Selection - [ ] drop - [ ] drop_duplicates - [x] head - [ ] isin - [ ] sample - [x] select_dtypes - [x] tail - [ ] where

Missing Data - [ ] isna - [ ] dropna - [ ] fillna - [ ] interpolate

Other - [ ] append - [ ] apply - [ ] assign - [x] astype - [ ] groupby - [ ] info - [ ] melt - [ ] memory_usage - [ ] merge - [ ] pivot - [ ] replace - [ ] rolling - [ ] sort_values

Functions - [ ] read_csv - [ ] read_sql - [ ] concat

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dexplo-0.0.0.tar.gz (324.5 kB view hashes)

Uploaded Source

Built Distribution

dexplo-0.0.0-cp36-cp36m-macosx_10_7_x86_64.whl (317.2 kB view hashes)

Uploaded CPython 3.6m macOS 10.7+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page