Skip to main content

Pdprpr preprocesses pandas objects (DataFrame, Series) for machine learning input.

Project description

# pdprpr

Pdprpr preprocesses pandas objects (DataFrame, Series) for machine learning input.

## Usage

Assume you have this DataFrame to be preprocessed:

```python
from pandas import DataFrame

df = DataFrame({
'num': [1, 3, float('nan')], # numerical feature to be scaled in [0, 1]
'cat': ['P', 'Q', 'R'], # categorical feature to be transformted to dummy var
'bin': [0, 0, 1], # binary (true/false) feature
}, columns =['num', 'cat', 'bin'])
# num cat bin
# 0 1.0 P 0
# 1 3.0 Q 0
# 2 NaN R 1
```

You can define preprocessing settings in JSON-like format:

```yaml
# preprocessing.yml
- name: num
kind: numerical

- name: cat
kind: categorical

- name: bin
kind: binary
```

Then `DataFramePreprocessor` instance can be created with them:

```python
import yaml

with open('preprocessing.yml') as f:
settings = yaml.load(f)

from pdprpr import DataFramePreprocessor

processor = DataFramePreprocessor(settings)
```

Finary you can use it to preprocess your DataFrame:

```python
processor.process(df)
# num/value cat/P cat/Q cat/R bin/False bin/True
# 0 0.0 1 0 0 1 0
# 1 1.0 0 1 0 1 0
# 2 NaN 0 0 1 0 1
```

For more options please see [tests](./tests/) untill docs are available...

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdprpr-0.4.0.tar.gz (4.4 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page