A Python package for data analysis.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Data Analysis Library Sharks

This repository contains a detailed note of a new data analysis library Sharks.

Objectives

Sharks :

Has A DataFrame class with data stored in numpy arrays
Can Select subsets of data with the brackets operator
Can Use special methods defined in the Python data model
Have a nicely formatted display of the DataFrame in the notebook
Can Implement aggregation methods - sum, min, max, mean, median, etc...
Can Implement non-aggregation methods such as isna, unique, rename, drop
Can Group by one or two columns
Have methods specific to string columns
Can Read in data from a comma-separated value file

Functionalities of Sharks

1. DataFrame constructor input types

Our DataFrame class is constructed with a single parameter.

Specifically, input types must qualify the following:

Sharks will raise a TypeError if data is not a dictionary
Sharks will raise a TypeError if the keys of data are not strings
Sharks will raise a TypeError if the values of data are not numpy arrays
Sharks will raise a ValueError if the values of data are not 1-dimensional

2. Array lengths

We are now guaranteed that data is a dictionary of strings mapped to one-dimensional arrays. Each column of data in our DataFrame must have the same number of elements.

3. Convert unicode arrays to object

Whenever you create a numpy array of Python strings, it will default the data type of that array to unicode. Take a look at the following simple numpy array created from strings. Its data type, found in the dtype attribute is shown to be 'U' plus the length of the longest string.

>>> a = np.array(['cat', 'dog', 'snake'])
>>> a.dtype
dtype('<U5')

Unicode arrays are more difficult to manipulate and don't have the flexibility that we desire. So, if our user passes us a Unicode array, we will convert it to a data type called 'object'. This is a flexible type and will help us later when creating methods just for string columns. Technically, this data type allows any Python objects within the array.

4. Find the number of rows in the DataFrame with the `len` function

The number of rows are returned when passing a pandas DataFrame to the builtin len function.

5. Return columns as a list

df.columns will return a list of the column names.

6. Set new column names

we can assign all new columns to our DataFrame by setting the columns property equal to a list. A example below shows how you would set new columns for a 3-column DataFrame.

df.columns = ['state', 'age', 'fruit']

Also Sharks will raise errors if the data that is inserted is invalid.

Sharks will Raise a TypeError if the object used to set new columns is not a list
Sharks will Raise a ValueError if the number of column names in the list does not match the current DataFrame
Sharks will Raise a TypeError if any of the columns are not strings
Sharks will Raise a ValueError if any of the column names are duplicated in the list

7. The `shape` property

The shape property will return a two-item tuple of the number of rows and columns.

8. Visual HTML representation in the notebook with the `_repr_html_` method

The _repr_html_ method is made available to developers by iPython so that your objects can have nicely formatted HTML displays within Jupyter Notebooks. Read more on this method [here in the iPython documentation][12] along with other similar methods for different representations.

9. The `values` property

values is a property that returns a single array of all the columns of data.

10. The `dtypes` property

The dtypes property will return a two-column DataFrame with the column names in the first column and their data type as a string in the other. Use 'Column Name' and 'Data Type' as column names.

11. Select a single column with the brackets

In Sharks, you can select a single column with df['colname'].

12. Select multiple columns with a list

Sharks will also be able to select multiple columns if given a list within the brackets. For example, df[['colname1', 'colname2']] will return a two column DataFrame.

13. Boolean Selection with a DataFrame

In Sharks, you can filter for specific rows of a DataFrame by passing in a boolean Series/array to the brackets. For instance, the following will select only the rows such that a is greater than 10.

>>> s = df['a'] > 10
>>> df[s]

14. Check for simultaneous selection of rows and columns

15. Select a single cell of data

Sharks can select a single cell of data with df[rs, cs]. We will assume rs is an integer and cs is either an integer or a string.

16. Simultaneously select rows as booleans, lists, or slices

Sharks can select rows and columns simultaneously with df[rs, cs]. We will allow rs to be either a single-column boolean DataFrame, a list of integers, or a slice.

17. Simultaneous selection with multiple columns as a list

18. Simultaneous selection with column slices

Sharks will allow columns to be sliced with either strings or integers. The following selections will be acceptable.

19. Tab Completion for column names

20. Create a new column or overwrite an old column

21. `head` and `tail` methods

The head and tail methods each accept a single integer parameter n which is defaulted to 5.

22. Generic aggregation methods

Sharks can implement several methods that perform an aggregation. These methods all return a single value for each column. The following aggregation methods are defined.

min
max
mean
median
sum
var
std
all
any
argmax - index of the maximum
argmin - index of the minimum

23. `isna` method

The isna method will return a DataFrame the same shape as the original but with boolean values for every single value. Each value will be tested whether it is missing or not. Use np.isnan except in the case for strings which you can use a vectorized equality expression to None.

24. `count` method

The count method returns a single-row DataFrame with the number of non-missing values for each column. You will want to use the result of isna.

25. `unique` method

This method will return the unique values for each column in the DataFrame. Specifically, it will return a list of one-column DataFrames of unique values in each column. If there is a single column, just return the DataFrame.

26. `nunique` method

Return a single-row DataFrame with the number of unique values for each column.

27. `value_counts` method

28. Normalize options for `value_counts`

29. `rename` method

The rename method renames one or more column names. Accept a dictionary of old column names mapped to new column names. Return a DataFrame. Raise a TypeError if columns is not a dictionary.

30. `drop` method

Accept a single string or a list of column names as strings. Return a DataFrame without those columns. Raise a TypeError if a string or list is not provided.

31. Non-aggregation methods

There are several non-aggregation methods that function similarly. All of the following non-aggregation methods return a DataFrame that is the same shape as the origin.

abs
cummin
cummax
cumsum
clip
round
copy

32. Arithmetic and Comparison Operators

All the common arithmetic and comparison operators will be made available to our DataFrame. For example, df + 5 uses the plus operator to add 5 to each element of the DataFrame. Take a look at some of the following examples:

df + 5
df - 5
df > 5
df != 5
5 + df
5 < df

33. `sort_values` method

This method will sort the rows of the DataFrame by one or more columns. Allow the parameter by to be either a single column name as a string or a list of column names as strings. The DataFrame will be sorted by this column or columns.

34. `pivot_table` method

35. Reading simple CSVs

The read_csv function, will read in simple comma-separated value files (CSVs) and return a DataFrame.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

1.0.2

Sep 15, 2019

1.0.1

Sep 15, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sharks-1.0.2.tar.gz (13.8 kB view hashes)

Uploaded Sep 15, 2019 Source

Built Distribution

sharks-1.0.2-py3-none-any.whl (11.6 kB view hashes)

Uploaded Sep 15, 2019 Python 3

Hashes for sharks-1.0.2.tar.gz

Hashes for sharks-1.0.2.tar.gz
Algorithm	Hash digest
SHA256	`073e5b0ddcca400b193f97b77bd5e737aedaa37cf60481f8f5deab73db80182b`
MD5	`53eac988def538c468d5d98f3e009d7d`
BLAKE2b-256	`c21e47803e24de2b9f5ce3f22c23be6c70439765cb7fcf406ea668b0e372ce90`

Hashes for sharks-1.0.2-py3-none-any.whl

Hashes for sharks-1.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`730b2d818d10a436d65ca0fc0552026277b438aed3722ed325f2e7b72cce1e41`
MD5	`30001b8964afaaac5ed6eeb95cc77dc6`
BLAKE2b-256	`97c16840e8b7eac7b11b40b2fe568444df4cfbe31fe497135de3b72a77c1e915`

sharks 1.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

Data Analysis Library Sharks

Objectives

Functionalities of Sharks

1. DataFrame constructor input types

2. Array lengths

3. Convert unicode arrays to object

4. Find the number of rows in the DataFrame with the len function

5. Return columns as a list

6. Set new column names

7. The shape property

8. Visual HTML representation in the notebook with the _repr_html_ method

9. The values property

10. The dtypes property

11. Select a single column with the brackets

12. Select multiple columns with a list

13. Boolean Selection with a DataFrame

14. Check for simultaneous selection of rows and columns

15. Select a single cell of data

16. Simultaneously select rows as booleans, lists, or slices

17. Simultaneous selection with multiple columns as a list

18. Simultaneous selection with column slices

19. Tab Completion for column names

20. Create a new column or overwrite an old column

21. head and tail methods

22. Generic aggregation methods

23. isna method

24. count method

25. unique method

26. nunique method

27. value_counts method

28. Normalize options for value_counts

29. rename method

30. drop method

31. Non-aggregation methods

32. Arithmetic and Comparison Operators

33. sort_values method

34. pivot_table method

35. Reading simple CSVs

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

4. Find the number of rows in the DataFrame with the `len` function

7. The `shape` property

8. Visual HTML representation in the notebook with the `_repr_html_` method

9. The `values` property

10. The `dtypes` property

21. `head` and `tail` methods

23. `isna` method

24. `count` method

25. `unique` method

26. `nunique` method

27. `value_counts` method

28. Normalize options for `value_counts`

29. `rename` method

30. `drop` method

33. `sort_values` method

34. `pivot_table` method