Computes the intersection/symmetric difference of n DataFrames/Series
Project description
Computes the intersection/symmetric difference of n DataFrames/Series
Installation
pip install a-pandas-ex-intersection-difference
Usage
from a_pandas_ex_intersection_difference import pd_add_set
pd_add_set()
import pandas as pd
THE CODE ABOVE WILL ADD SOME METHODS TO! YOU CAN USE PANDAS LIKE YOU DID BEFORE, BUT YOU WILL HAVE A COUPLE OF METHODS MORE:
-
pandas.DataFrame.ds_set_intersections / pandas.Series.ds_set_intersections
-
pandas.DataFrame.ds_set_symmetric_difference / pandas.Series.ds_set_symmetric_difference
-
pandas.DataFrame.ds_set_union / pandas.Series.ds_set_union
-
pandas.DataFrame.ds_value_counts_to_column / pandas.Series.ds_value_counts_to_column
pandas.DataFrame.ds_set_intersections / pandas.Series.ds_set_intersections
#Computes the intersection of n DataFrames/Series
#Example
df = pd.read_csv("https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv")
#Let's create some DataFrames with random data from df
df1 = df.sample(len(df) - len(df)//2).copy()
df2 = df.sample(len(df) - len(df)//2).copy()
df3 = df.sample(len(df) - len(df)//2).copy()
df4 = df.sample(len(df) - len(df)//2).copy()
df5 = df.sample(len(df) - len(df)//2).copy()
df1.ds_set_intersections(df2) #Comparing 2 DataFrames
Out[14]:
Parch PassengerId Fare Survived ... SibSp Embarked Sex Cabin
0 1 802 26.2500 1 ... 1 S female NaN
1 0 506 108.9000 0 ... 1 C male C65
2 0 386 73.5000 0 ... 0 S male NaN
3 0 621 14.4542 0 ... 1 C male NaN
4 1 273 19.5000 1 ... 0 S female NaN
.. ... ... ... ... ... ... ... ... ...
439 0 240 12.2750 0 ... 0 S male NaN
440 0 235 10.5000 0 ... 0 S male NaN
441 1 269 153.4625 1 ... 0 S female C125
442 0 394 113.2750 1 ... 1 C female D36
443 0 400 12.6500 1 ... 0 S female NaN
[444 rows x 12 columns]
df1.ds_set_intersections(df2,df3) #Comparing 3 DataFrames
Out[15]:
Parch PassengerId Fare Survived ... SibSp Embarked Sex Cabin
0 0 506 108.9000 0 ... 1 C male C65
1 1 480 12.2875 1 ... 0 S female NaN
2 1 581 30.0000 1 ... 1 S female NaN
3 1 447 19.5000 1 ... 0 S female NaN
4 0 16 16.0000 1 ... 0 S female NaN
.. ... ... ... ... ... ... ... ... ...
340 2 154 14.5000 0 ... 0 S male NaN
341 0 668 7.7750 0 ... 0 S male NaN
342 0 702 26.2875 1 ... 0 S male E24
343 0 610 153.4625 1 ... 0 S female C125
344 0 450 30.5000 1 ... 0 S male C104
[345 rows x 12 columns]
df1.ds_set_intersections(df2,df3, df4) #Comparing 4 DataFrames
Out[16]:
Parch PassengerId Fare Survived ... SibSp Embarked Sex Cabin
0 0 506 108.9000 0 ... 1 C male C65
1 1 581 30.0000 1 ... 1 S female NaN
2 0 283 9.5000 0 ... 0 S male NaN
3 0 488 29.7000 0 ... 0 C male B37
4 0 610 153.4625 1 ... 0 S female C125
.. ... ... ... ... ... ... ... ... ...
227 0 23 8.0292 1 ... 0 Q female NaN
228 1 619 39.0000 1 ... 2 S female F4
229 2 473 27.7500 1 ... 1 S female NaN
230 0 253 26.5500 0 ... 0 S male C87
231 0 618 16.1000 0 ... 1 S female NaN
[232 rows x 12 columns]
df1.ds_set_intersections(df2,df3, df4, df5) #Comparing 5 DataFrames
Out[17]:
Parch PassengerId Fare Survived ... SibSp Embarked Sex Cabin
0 0 506 108.9000 0 ... 1 C male C65
1 1 581 30.0000 1 ... 1 S female NaN
2 1 17 29.1250 0 ... 4 Q male NaN
3 2 59 27.7500 1 ... 1 S female NaN
4 0 463 38.5000 0 ... 0 S male E63
.. ... ... ... ... ... ... ... ... ...
140 2 166 20.5250 1 ... 0 S male NaN
141 0 705 7.8542 0 ... 1 S male NaN
142 1 51 39.6875 0 ... 4 S male NaN
143 0 833 7.2292 0 ... 0 C male NaN
144 2 154 14.5000 0 ... 0 S male NaN
[145 rows x 12 columns]
pandas.DataFrame.ds_set_symmetric_difference / pandas.Series.ds_set_symmetric_difference
#Computes the symmetric difference of n DataFrames/Series
#Example
df = pd.read_csv("https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv")
#Let's create some DataFrames with random data from df
df1 = df.sample(len(df) - len(df)//2).copy()
df2 = df.sample(len(df) - len(df)//2).copy()
df3 = df.sample(len(df) - len(df)//2).copy()
df4 = df.sample(len(df) - len(df)//2).copy()
df5 = df.sample(len(df) - len(df)//2).copy()
df1.ds_set_symmetric_difference(df2) #Comparing 2 DataFrames
Out[18]:
Parch PassengerId Fare ... Embarked Sex Cabin
0 0 567 7.8958 ... S male NaN
1 0 46 8.0500 ... S male NaN
2 2 342 263.0000 ... S female C23 C25 C27
3 0 845 8.6625 ... S male NaN
4 0 1 7.2500 ... S male NaN
.. ... ... ... ... ... ... ...
219 0 865 13.0000 ... S male NaN
220 5 639 39.6875 ... S female NaN
221 0 30 7.8958 ... S male NaN
222 0 332 28.5000 ... S male C124
223 0 884 10.5000 ... S male NaN
[448 rows x 12 columns]
df1.ds_set_symmetric_difference(df2,df3) #Comparing 3 DataFrames
Out[19]:
Parch PassengerId Fare Survived ... SibSp Embarked Sex Cabin
0 0 567 7.8958 0 ... 0 S male NaN
1 0 46 8.0500 0 ... 0 S male NaN
2 0 845 8.6625 0 ... 0 S male NaN
3 0 142 7.7500 1 ... 0 S female NaN
4 0 579 14.4583 0 ... 1 C female NaN
.. ... ... ... ... ... ... ... ... ...
106 0 430 8.0500 1 ... 0 S male E10
107 1 363 14.4542 0 ... 0 C female NaN
108 1 531 26.0000 1 ... 1 S female NaN
109 0 748 13.0000 1 ... 0 S female NaN
110 0 876 7.2250 1 ... 0 C female NaN
[339 rows x 12 columns]
df1.ds_set_symmetric_difference(df2,df3,df4) #Comparing 4 DataFrames
Out[20]:
Parch PassengerId Fare Survived ... SibSp Embarked Sex Cabin
0 0 567 7.8958 0 ... 0 S male NaN
1 0 46 8.0500 0 ... 0 S male NaN
2 0 142 7.7500 1 ... 0 S female NaN
3 0 579 14.4583 0 ... 1 C female NaN
4 0 365 15.5000 0 ... 1 Q male NaN
.. ... ... ... ... ... ... ... ... ...
39 2 551 110.8833 1 ... 0 C male C70
40 0 19 18.0000 0 ... 1 S female NaN
41 0 615 8.0500 0 ... 0 S male NaN
42 0 204 7.2250 0 ... 0 C male NaN
43 1 375 21.0750 0 ... 3 S female NaN
[204 rows x 12 columns]
df1.ds_set_symmetric_difference(df2,df3,df4,df5) #Comparing 5 DataFrames
Out[21]:
Parch PassengerId Fare Survived ... SibSp Embarked Sex Cabin
0 0 567 7.8958 0 ... 0 S male NaN
1 0 579 14.4583 0 ... 1 C female NaN
2 0 365 15.5000 0 ... 1 Q male NaN
3 0 644 56.4958 1 ... 0 S male NaN
4 0 708 26.2875 1 ... 0 S male E24
.. ... ... ... ... ... ... ... ... ...
25 0 343 13.0000 0 ... 0 S male NaN
26 0 656 73.5000 0 ... 2 S male NaN
27 0 407 7.7500 0 ... 0 S male NaN
28 0 301 7.7500 1 ... 0 Q female NaN
29 0 819 6.4500 0 ... 0 S male NaN
[125 rows x 12 columns]
Parameters
args: Union[pd.Series, pd.DataFrame]
DataFrames or Series, how many you want
accept_df_with_different_columns: bool=True
Let's say you have one DataFrame whose columns are: [Parch, PassengerId, Fare, Survived, SibSp,Embarked, Sex, Cabin]
If you want to compare it to: [Flight, Fare, Survived, SibSp,Embarked, Sex, Cabin]
It won't work, unless you pass accept_df_with_different_columns=True
Only the columns that are in all dataframes will be compared
Returns
pd.DataFrame
pandas.DataFrame.ds_set_union / pandas.Series.ds_set_union
df = pd.read_csv("https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv")
#Let's create some DataFrames with random data from df
df1 = df.sample(len(df) - len(df)//2).copy()
df2 = df.sample(len(df) - len(df)//2).copy()
df3 = df.sample(len(df) - len(df)//2).copy()
df4 = df.sample(len(df) - len(df)//2).copy()
df5 = df.sample(len(df) - len(df)//2).copy()
df1[['PassengerId','Survived','Name']].ds_set_union(df2[['Pclass','Cabin','Name']])
Out[17]:
Name
0 Carbines, Mr. William
1 Sundman, Mr. Johan Julian
2 Dimic, Mr. Jovan
3 Harder, Mr. George Achilles
4 Rice, Master. Eugene
.. ...
887 Carlsson, Mr. August Sigfrid
888 Hoyt, Mr. Frederick Maxfield
889 Somerton, Mr. Francis William
890 Francatelli, Miss. Laura Mabel
891 Thayer, Mrs. John Borland (Marian Longstreth M...
#If, for whatever reason, you don't want to use pd.concat(), you can use this method.
#Don't use this method if you can use pd.concat
Parameters
args: Union[pd.Series, pd.DataFrame]
DataFrames or Series, how many you want
accept_df_with_different_columns: bool=True
Let's say you have one DataFrame whose columns are: [Parch, PassengerId, Fare, Survived, SibSp,Embarked, Sex, Cabin]
If you want to compare it to: [Flight, Fare, Survived, SibSp,Embarked, Sex, Cabin]
It won't work, unless you pass accept_df_with_different_columns=True
Only the columns that are in all dataframes will be compared
Returns
pd.DataFrame
pandas.DataFrame.ds_value_counts_to_column / pandas.Series.ds_value_counts_to_column
df = pd.read_csv("https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv")
df2.Sex.ds_value_counts_to_column()
PassengerId Survived Pclass ... Fare Cabin Embarked
504 505 1 1 ... 86.5000 B79 S
781 782 1 1 ... 57.0000 B20 S
855 856 1 3 ... 9.3500 NaN S
552 553 0 3 ... 7.8292 NaN Q
777 778 1 3 ... 12.4750 NaN S
.. ... ... ... ... ... ... ...
756 757 0 3 ... 7.7958 NaN S
224 225 1 1 ... 90.0000 C93 S
488 489 0 3 ... 8.0500 NaN S
309 310 1 1 ... 56.9292 E36 C
581 582 1 1 ... 110.8833 C68 C
[446 rows x 12 columns]
df2.Sex.ds_value_counts_to_column()
Out[22]:
0 152
1 152
2 152
3 294
4 152
...
441 294
442 294
443 294
444 152
445 152
Name: 0, Length: 446, dtype: int64
This method could also be useful, when you are comparing DataFrames, since it counts the different values in a Series
and returns a DataFrame that you can merge with your original DataFrame
Parameters
df: pd.Series
Returns
pd.DataFrame
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file a_pandas_ex_intersection_difference-0.1.tar.gz
.
File metadata
- Download URL: a_pandas_ex_intersection_difference-0.1.tar.gz
- Upload date:
- Size: 11.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e54e69d05b335b302c50965a1f0d4d67150b9f64b55fa82850676131a6ca1cc1 |
|
MD5 | 8d8ae2bb1ec1fac13a2499555a17cb75 |
|
BLAKE2b-256 | ee68b5c43aca4d2720ef3912e1f40f4c26242ee27af74bfc77f6da2277310c4f |
File details
Details for the file a_pandas_ex_intersection_difference-0.1-py3-none-any.whl
.
File metadata
- Download URL: a_pandas_ex_intersection_difference-0.1-py3-none-any.whl
- Upload date:
- Size: 15.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 44191fbf8a9d466a2a42334666fa44d9fd82ca2e749b4860eacb882b2ca0bf2d |
|
MD5 | 28cf362525eac0b86daee25497929f0c |
|
BLAKE2b-256 | 3eacab93b0c5bf05f381ab650ad0ab113a8c3f49c4864f282f294fc3b5079c8f |