Skip to main content

A library for data exploration comparible to pandas. No Series, No hierarchical indexing, only one indexer [ ]

Project description

dexplo

A data analysis library comparible to pandas

Main Goals

  • A very minimal set of features

  • Be as explicit as possible

  • There should be one– and preferably only one –obvious way to do it.

Data Structures

  • Only DataFrames

  • No Series

Data Types

  • Only primitive types - int, float, boolean, numpy.unicode

  • No object data types

Row and Column Labels

  • No index, meaning no row labels

  • No hierarchical index

  • Column names must be strings

  • Column names must be unique

  • Columns stored in a numpy array

Subset Selection

  • Only one way to select data - [ ]

  • Subset selection will be explicit and necessitate both rows and columns

  • Rows will be selected only by integer location

  • Columns will be selected by either label or integer location. Since columns must be strings, this will not be amibguous

  • Column names cannot be duplicated

All selections and operations copy

  • All selections and operations provide new copies of the data

  • This will avoid any chained indexing confusion

Development

  • Must use type hints

  • Must use 3.6 - fstrings

  • Must have numpy, bottleneck, numexpr

Small feature set

  • Implement as few attributes and methods as possible

  • Focus on good idiomatic cookbook examples for doing more complex tasks

Only Scalar Data Types

No complex Python data types - [x] bool - always 8 bits, not-null - [x] int - always 64 bits, not-null - [x] float - always 64 bits, nulls allowed - [x] str - A python unicode object, nulls allowed - [ ] categorical - [ ] datetime - [ ] timedelta

Attributes to implement

  • [x] size

  • [x] shape

  • [x] values

  • [x] dtypes

May not implement any of the binary operators as methods (add, sub, mul, etc…)

Methods

Stats - [x] abs - [x] all - [x] any - [x] argmax - [x] argmin - [x] clip - [ ] corr - [x] count - [ ] cov - [x] cummax - [x] cummin - [ ] cumprod - [x] cumsum - [ ] describe - [x] max - [x] min - [x] median - [x] mean - [ ] mode - [ ] nlargest - [ ] nsmallest - [ ] quantile - [ ] rank - [x] std - [x] sum - [x] var - [ ] unique - [ ] nunique

Selection - [ ] drop - [ ] drop_duplicates - [x] head - [ ] isin - [ ] sample - [x] select_dtypes - [x] tail - [ ] where

Missing Data - [ ] isna - [ ] dropna - [ ] fillna - [ ] interpolate

Other - [ ] append - [ ] apply - [ ] assign - [x] astype - [ ] groupby - [ ] info - [ ] melt - [ ] memory_usage - [ ] merge - [ ] pivot - [ ] replace - [ ] rolling - [ ] sort_values

Functions - [ ] read_csv - [ ] read_sql - [ ] concat

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dexplo-0.0.13.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

dexplo-0.0.13-cp36-cp36m-macosx_10_7_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.6m macOS 10.7+ x86-64

File details

Details for the file dexplo-0.0.13.tar.gz.

File metadata

  • Download URL: dexplo-0.0.13.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for dexplo-0.0.13.tar.gz
Algorithm Hash digest
SHA256 8f3c0add6c51e3ed77e9c3cac7ec878e776838c8772d6f30ca9d702c8af4afa9
MD5 d176dbec7f66a5aa163dd1751c807cb3
BLAKE2b-256 0b5a39857e7773ff44cb30799febdc2ffa6d86be2b1238e9e7bc5efd398620bb

See more details on using hashes here.

File details

Details for the file dexplo-0.0.13-cp36-cp36m-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for dexplo-0.0.13-cp36-cp36m-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 bafe473ef555c4fe01ef31c7425c222b91fe5333184dc90f11f8678cbe31117d
MD5 7943729c077bf7bbf7a2fec05eaace3d
BLAKE2b-256 45af8c689e7850d56fd16c25ae5cf579716bad66789ff231a9155515e18b062a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page