Convert a Pandas DataFrame/Series with dtype str/string/object to the best available dtypes
Project description
What is it used for?
Convert a Pandas DataFrame/Series with dtype str/string/object to the best available dtypes
Installation
pip install a-pandas-ex-string-to-dtypes
Usage
from a_pandas_ex_string_to_dtypes import pd_add_string_to_dtypes
import pandas as pd
pd_add_string_to_dtypes()
df = pd.read_csv("https://github.com/pandas-dev/pandas/raw/main/doc/data/titanic.csv")
print(df)
print(df.dtypes)
PassengerId Survived Pclass ... Fare Cabin Embarked
0 1 0 3 ... 7.2500 NaN S
1 2 1 1 ... 71.2833 C85 C
2 3 1 3 ... 7.9250 NaN S
3 4 1 1 ... 53.1000 C123 S
4 5 0 3 ... 8.0500 NaN S
.. ... ... ... ... ... ... ...
886 887 0 2 ... 13.0000 NaN S
887 888 1 1 ... 30.0000 B42 S
888 889 0 3 ... 23.4500 NaN S
889 890 1 1 ... 30.0000 C148 C
890 891 0 3 ... 7.7500 NaN Q
[891 rows x 12 columns]
PassengerId int64
Survived int64
Pclass int64
Name object
Sex object
Age float64
SibSp int64
Parch int64
Ticket object
Fare float64
Cabin object
Embarked object
dtype: object
dfstring = pd.concat(
[df[x].astype("string") for x in df.columns], axis=1, ignore_index=True
)
dfstring.columns=df.columns
print(dfstring)
print(dfstring.dtypes)
PassengerId Survived Pclass ... Fare Cabin Embarked
0 1 0 3 ... 7.25 <NA> S
1 2 1 1 ... 71.2833 C85 C
2 3 1 3 ... 7.925 <NA> S
3 4 1 1 ... 53.1 C123 S
4 5 0 3 ... 8.05 <NA> S
.. ... ... ... ... ... ... ...
886 887 0 2 ... 13.0 <NA> S
887 888 1 1 ... 30.0 B42 S
888 889 0 3 ... 23.45 <NA> S
889 890 1 1 ... 30.0 C148 C
890 891 0 3 ... 7.75 <NA> Q
[891 rows x 12 columns]
PassengerId string
Survived string
Pclass string
Name string
Sex string
Age string
SibSp string
Parch string
Ticket string
Fare string
Cabin string
Embarked string
dtype: object
converted = dfstring.ds_string_to_best_dtype()
print(converted)
print(converted.dtypes)
PassengerId Survived Pclass ... Fare Cabin Embarked
0 1 0 3 ... 7.2500 <NA> S
1 2 1 1 ... 71.2833 C85 C
2 3 1 3 ... 7.9250 <NA> S
3 4 1 1 ... 53.1000 C123 S
4 5 0 3 ... 8.0500 <NA> S
.. ... ... ... ... ... ... ...
886 887 0 2 ... 13.0000 <NA> S
887 888 1 1 ... 30.0000 B42 S
888 889 0 3 ... 23.4500 <NA> S
889 890 1 1 ... 30.0000 C148 C
890 891 0 3 ... 7.7500 <NA> Q
[891 rows x 12 columns]
PassengerId uint16
Survived uint8
Pclass uint8
Name string
Sex category
Age object
SibSp uint8
Parch uint8
Ticket object
Fare float64
Cabin category
Embarked category
dtype: object
Parameters:
df: Union[pd.DataFrame, pd.Series]
Returns:
Union[pd.DataFrame, pd.Series]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for a_pandas_ex_string_to_dtypes-0.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 850db3010565b38d5d3eb8fc0f3a97c8223b7d8b595835071de1934c39d0bf91 |
|
MD5 | 82e416bf7c01e79ed37a0823f97cfe32 |
|
BLAKE2b-256 | 5acd0ee0f6357f7fe2d7f9822eef64ffc9dd047cb288615e0d3ccec1eb39a9be |
Close
Hashes for a_pandas_ex_string_to_dtypes-0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f428c8564bc589b102de2a8bae982a4c659b899f1214ae506c86207ff5c8769e |
|
MD5 | f1f0cc827f69ccfa097f87bf1c5b42f9 |
|
BLAKE2b-256 | 0df2934ee563eee188b6d7a7d267cc85268364e31c2405ed6aa04622f194c941 |