A package designed to syntactically mimic the tidyr R package

These details have not been verified by PyPI

Project links

Homepage

Project description

`tidypython` - A simple python package designed to syntactically mimic the tidyr package in R.

Install the package with pip:

pip install tidypython

Load the functions into your script with:

from tidypython import *

You will then have access to the functions:

gather(*args, **kwargs)
spread(*args, **kwargs)
separate(*args, **kwargs)

The syntax is designed to resemble that of the R package tidyr as closely as possible. All of the functionality is not yet fully implemented, but the basics are there.

All of these functions are designed to work with the dplython package operators >>.

This package is currently in development, please visit the github page if you'd like to contribute: https://github.com/durrantmm/tidypython

Examples

First, import the dplython package:

from dplython import *

Make sure that your dataframe is a DplyFrame object. You can make sure it is by calling:

df = DplyFrame(df)

Or you can read in your file directly using the readpy package:

from readpy import *
df = read_tsv("myfile.tsv")

`gather()`

The gather() command implements the pandas melt() function using the tidyr syntax.

You can use the gather command as:

df >> gather(X.key, X.value, X.column1, X.column2...)

X.key and X.value are used to determine the new names of the key and value columns that will be created.

By default, this will use column1, column2, and all other subsequent columns to determine the keys and the values. All unspecified columns will be used simply as an index. Alternatively, you can use the syntax

df >> gather(X.key, X.value, X.column1, X.column2, exclude=True)

Which will make column1, and column2 the index, and all other unspecified columns will be used for the key and value columns.

Using the mtcars data as an example:

>>> mtcars = read_tsv('mtcars.tsv')
>>> print(mtcars >> head())
                name   mpg  cyl   disp   hp  drat     wt   qsec  vs  am  gear  carb
0          Mazda RX4  21.0    6  160.0  110  3.90  2.620  16.46   0   1     4     4
1      Mazda RX4 Wag  21.0    6  160.0  110  3.90  2.875  17.02   0   1     4     4
2         Datsun 710  22.8    4  108.0   93  3.85  2.320  18.61   1   1     4     1
3     Hornet 4 Drive  21.4    6  258.0  110  3.08  3.215  19.44   1   0     3     1
4  Hornet Sportabout  18.7    8  360.0  175  3.15  3.440  17.02   0   0     3     2

Lets first gather all of the columns of interest by inclusion:

>>> mtcars_gathered_inclusion = mtcars >> \
... gather(X.info, X.val, X.mpg, X.cyl, X.disp, X.hp, X.drat, X.wt, X.qsec, X.vs, X.am, X.gear, X.carb))
>>> print(mtcars_gathered_inclusion >> head())
                name info   val
0          Mazda RX4  mpg  21.0
1      Mazda RX4 Wag  mpg  21.0
2         Datsun 710  mpg  22.8
3     Hornet 4 Drive  mpg  21.4
4  Hornet Sportabout  mpg  18.7
>>> print(mtcars_gathered_inclusion >> tail())
               name  info  val
347    Lotus Europa  carb  2.0
348  Ford Pantera L  carb  4.0
349    Ferrari Dino  carb  6.0
350   Maserati Bora  carb  8.0
351      Volvo 142E  carb  2.0

Now we can do it by exclusion, which is much shorter in this case:

>>> mtcars_gathered_exclusion = mtcars >> \
... gather(X.info, X.val, X.name, exclude=True))
>>> print(mtcars_gathered_exclusion >> head())
                name info   val
0          Mazda RX4  mpg  21.0
1      Mazda RX4 Wag  mpg  21.0
2         Datsun 710  mpg  22.8
3     Hornet 4 Drive  mpg  21.4
4  Hornet Sportabout  mpg  18.7
>>> print(mtcars_gathered_exclusion >> tail())
               name  info  val
347    Lotus Europa  carb  2.0
348  Ford Pantera L  carb  4.0
349    Ferrari Dino  carb  6.0
350   Maserati Bora  carb  8.0
351      Volvo 142E  carb  2.0

You can see that it functions very much in the same manner as the tidyr::gather function.

`spread()`

The spread() command implements the pandas pivot() function using the tidyr syntax.

You can use the spread command as:

df >> spread(X.key, X.value)

X.key and X.value are used to specify the existing columns that are pivoted, and all other unused columns are assumed to be the index.

Which will make column1, and column2 the index, and all other unspecified columns will be used for the key and value columns.

Using the mtcars data as an example:

>>> mtcars = read_tsv('mtcars.tsv')
>>> print(mtcars >> head())
                name   mpg  cyl   disp   hp  drat     wt   qsec  vs  am  gear  carb
0          Mazda RX4  21.0    6  160.0  110  3.90  2.620  16.46   0   1     4     4
1      Mazda RX4 Wag  21.0    6  160.0  110  3.90  2.875  17.02   0   1     4     4
2         Datsun 710  22.8    4  108.0   93  3.85  2.320  18.61   1   1     4     1
3     Hornet 4 Drive  21.4    6  258.0  110  3.08  3.215  19.44   1   0     3     1
4  Hornet Sportabout  18.7    8  360.0  175  3.15  3.440  17.02   0   0     3     2

Lets first gather all of the columns of interest by exclusion:

>>> mtcars_gathered_exclusion = mtcars >> \
... gather(X.info, X.val, X.name, exclude=True))
>>> print(mtcars_gathered_exclusion >> head())
                name info   val
0          Mazda RX4  mpg  21.0
1      Mazda RX4 Wag  mpg  21.0
2         Datsun 710  mpg  22.8
3     Hornet 4 Drive  mpg  21.4
4  Hornet Sportabout  mpg  18.7

We can then pivot the info and val columns out by using the spread function:

>>> print(mtcars_gathered_exclusion >> spread(X.info, X.val) >> head())
                 name   mpg  cyl   disp     hp  drat     wt   qsec   vs   am  gear   carb
0         AMC Javelin  15.2  8.0  304.0  150.0  3.15  3.435  17.30  0.0  0.0   3.0    2.0
1  Cadillac Fleetwood  10.4  8.0  472.0  205.0  2.93  5.250  17.98  0.0  0.0   3.0    4.0
2          Camaro Z28  13.3  8.0  350.0  245.0  3.73  3.840  15.41  0.0  0.0   3.0    4.0
3   Chrysler Imperial  14.7  8.0  440.0  230.0  3.23  5.345  17.42  0.0  0.0   3.0    4.0
4          Datsun 710  22.8  4.0  108.0   93.0  3.85  2.320  18.61  1.0  1.0   4.0    1.0

You can see that it functions very much in the same manner as the tidyr::gather function. As currently implemented, the order of the columns will be preserved in python >= 3.6, but the order of the index will not.

`separate()`

The separate() command doesn't have a direct parallel in other python packages that I am aware of.

You can use the separate command as:

df >> separate(X.column, into, sep=myseperator)

X.column is the column that you want to split, into is a list of the new column names for the split columns, and sep is a regex-expression used to split the X.column. By default, this will split by [^\w]+.

Let's say that our mtcars dataframe only included a name, mpg, and cyl joined in a single column by the separator '|':

>>> print(mtcars_messy >> head())
                       name  
0          Mazda RX4|21.0|6
1       Mazda RX4 Wag|1.0|6
2         Datsun 710|22.8|4
3     Hornet 4 Drive|21.4|6
4  Hornet Sportabout|18.7|8

You could seperate this column into three columns using the command:

>>> mtcars_clean = mtcars_messy >> separate(X.name, ['name', 'mpg', 'cyl'], sep='\|')
>>> print(mtcars_clean >> head())
                name   mpg cyl
0          Mazda RX4  21.0   6
1      Mazda RX4 Wag  21.0   6
2         Datsun 710  22.8   4
3     Hornet 4 Drive  21.4   6
4  Hornet Sportabout  18.7   8

Note that all of the columns remain strings after separating. It will also not give any warnings if the number of columns specified does not match the number of strings after splitting.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.0.1.dev3 pre-release

Mar 6, 2019

0.0.1.dev2 pre-release

Mar 5, 2019

0.0.1.dev1 pre-release

Mar 5, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tidypython-0.0.1.dev3.tar.gz (5.2 kB view details)

Uploaded Mar 6, 2019 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tidypython-0.0.1.dev3-py2.py3-none-any.whl (6.2 kB view details)

Uploaded Mar 6, 2019 Python 2Python 3

File details

Details for the file tidypython-0.0.1.dev3.tar.gz.

File metadata

Download URL: tidypython-0.0.1.dev3.tar.gz
Upload date: Mar 6, 2019
Size: 5.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.19.1 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.1

File hashes

Hashes for tidypython-0.0.1.dev3.tar.gz
Algorithm	Hash digest
SHA256	`3fcdfd5469a51cc06146f8c123c5c443a767da607e76ad128f4ce29cb9dd01af`
MD5	`a814e311960367567f71dcd80b98691f`
BLAKE2b-256	`8a62e591d9f7f0b3878f120945c16d2b7c92b316fbdec8924b99b93c7a3ddccc`

See more details on using hashes here.

File details

Details for the file tidypython-0.0.1.dev3-py2.py3-none-any.whl.

File metadata

Download URL: tidypython-0.0.1.dev3-py2.py3-none-any.whl
Upload date: Mar 6, 2019
Size: 6.2 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.19.1 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.1

File hashes

Hashes for tidypython-0.0.1.dev3-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`7a72657dc991a99c42fddd2439705b920abeb7e98958300022e9446d52655d5c`
MD5	`5d5b4a7befb9bd6708a208eaa12d8b0f`
BLAKE2b-256	`9ec892e68c5c9e3c79dde038660bf2b91f3d445c77879bc7f76df0dabea7c789`

See more details on using hashes here.

tidypython 0.0.1.dev3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

`tidypython` - A simple python package designed to syntactically mimic the tidyr package in R.

Examples

`gather()`

`spread()`

`separate()`

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

tidypython 0.0.1.dev3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

tidypython - A simple python package designed to syntactically mimic the tidyr package in R.

Examples

gather()

spread()

separate()

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`tidypython` - A simple python package designed to syntactically mimic the tidyr package in R.

`gather()`

`spread()`

`separate()`