An open source dataset transformation, standardization, and normalization python library.
Project description
Elwood
An open source dataset transformation, standardization, and normalization python library.
Usage
To use start using Elwood, simply run:
pip install elwood
Now you are able to use any of the dataset transformation, standardization, or normalization functions exposed through this library. To start, simply include from elwood import elwood
in your python file.
Standardization
elwood.process(args)
#TODO STUB
Transformation
The transformation functions include geographical extent clipping (latitude/longitude), geographical regridding (gridded data such as NetCDF or GeoTIFF), temporal clipping, and temporal scaling.
Geospatial Clipping
elwood.clip_geo(dataframe, geo_columns, polygons_list)
This function takes a pandas dataframe, a geo_columns list of the column names for latitude and longitude, ex: ["lat", "lng"]
, and a list containing lists of objects representing the polygons to clip the data to. ex:
[
[
{
"lat": 11.0,
"lng": 42.0
},
{
"lat": 11.0,
"lng": 43.0
},
{
"lat": 12.0,
"lng": 43.0
},
{
"lat": 12.0,
"lng": 42.0
}
],
...
]
Geospatial regridding
elwood.regrid_dataframe_geo(dataframe, geo_columns, scale_multi)
This function takes a dataframe and regrids it's geography by some scale multiplier that is provided. This multiplier will be used to divide the current geographical scale in order to make a more coarse grained resolution dataset. The dataframe must have a detectable geographical scale, meaning each lat/lon represents a point in the middle of a gridded cell for the data provided. Lat and lon and determined by the geo_columns passed in: a list of the column names ex: ["lat", "lng"]
Temporal Clipping
elwood.clip_dataframe_time(dataframe, time_column, time_ranges)
This function will produce a dataframe that only includes rows with time_column
values contained within time_ranges
. The time_ranges argument is a list of objects containing a start and end time. ex: [{"start": datetime, "end": datetime}, ...]
Temporal Scaling
elwood.rescale_dataframe_time(dataframe, time_column, time_bucket, aggregation_function_list)
This function will produce a dataframe who's rows are the aggregated data based on some time bucket and some aggregation function list provided. The time_column
is the name of the column containing targeted time values for rescaling. The time_bucket
is some DateOffset, Timedelta or str representing the desired time granularity, ex. 'M', 'A', '2H'
. The aggregation_function_list
is a list of aggregation functions to apply to the data. ex. ['sum']
or ['sum', 'min', 'max']
0 to 1 Normalization
elwood.normalize_features(dataframe, output_file)
This function expects a dataframe with a "feature" column and a "value" column, or long data. Each entry for a feature has its own feature/value row.
This function returns a dataframe in which all numerical values under the "value" column for each "feature" have been 0 to 1 scaled.
Optionally you may specify an output_file
name to generate a parquet file of the dataframe.
Historys
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for elwood-0.1.2-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 139f663ee3cf7e1b4a256eb82539495be54e2c35c464a388467c276079e1703f |
|
MD5 | 278c3851c5e4299b03459a48437b9473 |
|
BLAKE2b-256 | 5f62b07aaecb0f1f59da1f5976ba918d4dc8d9df127e1ded1da3e35c5d29ebd6 |
Hashes for elwood-0.1.2-5-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c047a425f40cd6e4329f56286d44069c46a119a23b0ae8366dfb3ab6bfb4c36b |
|
MD5 | a2c8620261dfcef4ca886e65ed43cdaa |
|
BLAKE2b-256 | 0d650c4a64e597c1ccb3688bfa43577ea084b10c953b5cc94b3a5cb7c1b52c13 |
Hashes for elwood-0.1.2-4-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fd9b9cb8f7f8b30b7fdbc2b6fd31dcb068c4f7fbbcb4c0cb646c35f261c5a385 |
|
MD5 | 029623b3a416e0c323c7ba261448d8c0 |
|
BLAKE2b-256 | 6f760d32eec8f4617a83549568482227de80540345d76935b4d9a68891772d20 |
Hashes for elwood-0.1.2-3-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b16f401a503e872c50bd935c9c129af142a284e0e8db7320bb16075afc29d732 |
|
MD5 | d025c9b726164b29638983e291570479 |
|
BLAKE2b-256 | f1cfab7af976c2a9367da85affd4b3480459cdc5a8dbca781c32616e332e8de8 |
Hashes for elwood-0.1.2-2-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7e39d48a3c0b8670ec04bace88513d763c9280476ee6970c915733c31d7b15fc |
|
MD5 | 2dcafcbdb72daddc606a46aec4481f67 |
|
BLAKE2b-256 | c09e56bbf4359fb329052363d4ceca3b299ca67c22fc2d259a1ea082d68fe2ad |
Hashes for elwood-0.1.2-1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ad33bb7e22822377d5fbb111442334ea1edd0190b2e14583917a51539c5984af |
|
MD5 | 34d434fb67b6c6afdfbc0cab98bdd4ec |
|
BLAKE2b-256 | 7451289684bba0e27a177593eae3a77033bb64468ace25ef4b18d8704dfc1fcb |