A library for data exploration comparible to pandas. No Series, No hierarchical indexing, only one indexer [ ]
Project description
dexplo
A data analysis library comparible to pandas
Main Goals
A very minimal set of features
Be as explicit as possible
There should be one– and preferably only one –obvious way to do it.
Data Structures
Only DataFrames
No Series
Data Types
Only primitive types - int, float, boolean, numpy.unicode
No object data types
Row and Column Labels
No index, meaning no row labels
No hierarchical index
Column names must be strings
Column names must be unique
Columns stored in a numpy array
Subset Selection
Only one way to select data - [ ]
Subset selection will be explicit and necessitate both rows and columns
Rows will be selected only by integer location
Columns will be selected by either label or integer location. Since columns must be strings, this will not be amibguous
Column names cannot be duplicated
All selections and operations copy
All selections and operations provide new copies of the data
This will avoid any chained indexing confusion
Development
Must use type hints
Must use 3.6 - fstrings
Must have numpy, bottleneck, numexpr
Small feature set
Implement as few attributes and methods as possible
Focus on good idiomatic cookbook examples for doing more complex tasks
Only Scalar Data Types
No complex Python data types - [x] bool - always 8 bits, not-null - [x] int - always 64 bits, not-null - [x] float - always 64 bits, nulls allowed - [x] str - A python unicode object, nulls allowed - [ ] categorical - [ ] datetime - [ ] timedelta
Attributes to implement
[x] size
[x] shape
[x] values
[x] dtypes
May not implement any of the binary operators as methods (add, sub, mul, etc…)
Methods
Stats - [x] abs - [x] all - [x] any - [x] argmax - [x] argmin - [x] clip - [ ] corr - [x] count - [ ] cov - [x] cummax - [x] cummin - [ ] cumprod - [x] cumsum - [ ] describe - [x] max - [x] min - [x] median - [x] mean - [ ] mode - [ ] nlargest - [ ] nsmallest - [ ] quantile - [ ] rank - [x] std - [x] sum - [x] var - [ ] unique - [ ] nunique
Selection - [ ] drop - [ ] drop_duplicates - [x] head - [ ] isin - [ ] sample - [x] select_dtypes - [x] tail - [ ] where
Missing Data - [ ] isna - [ ] dropna - [ ] fillna - [ ] interpolate
Other - [ ] append - [ ] apply - [ ] assign - [x] astype - [ ] groupby - [ ] info - [ ] melt - [ ] memory_usage - [ ] merge - [ ] pivot - [ ] replace - [ ] rolling - [ ] sort_values
Functions - [ ] read_csv - [ ] read_sql - [ ] concat
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dexplo-0.0.13.tar.gz
.
File metadata
- Download URL: dexplo-0.0.13.tar.gz
- Upload date:
- Size: 1.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8f3c0add6c51e3ed77e9c3cac7ec878e776838c8772d6f30ca9d702c8af4afa9 |
|
MD5 | d176dbec7f66a5aa163dd1751c807cb3 |
|
BLAKE2b-256 | 0b5a39857e7773ff44cb30799febdc2ffa6d86be2b1238e9e7bc5efd398620bb |
File details
Details for the file dexplo-0.0.13-cp36-cp36m-macosx_10_7_x86_64.whl
.
File metadata
- Download URL: dexplo-0.0.13-cp36-cp36m-macosx_10_7_x86_64.whl
- Upload date:
- Size: 1.5 MB
- Tags: CPython 3.6m, macOS 10.7+ x86-64
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bafe473ef555c4fe01ef31c7425c222b91fe5333184dc90f11f8678cbe31117d |
|
MD5 | 7943729c077bf7bbf7a2fec05eaace3d |
|
BLAKE2b-256 | 45af8c689e7850d56fd16c25ae5cf579716bad66789ff231a9155515e18b062a |