Alpha version of the Rasgo Python interface.
Project description
pyRasgo is a python SDK to interact with the Rasgo API. Rasgo accelerates feature engineering for Data Scientists.
Visit us at https://www.rasgoml.com/ to turn your data into Features in minutes!
Documentation is available at: https://docs.rasgoml.com/rasgo-docs/pyrasgo/
Package Dependencies
- idna>=3.3
- more-itertools
- pandas
- pyarrow>=5.0.0
- pydantic
- pyyaml
- requests
- snowflake-connector-python>=2.7.0
- tqdm
Release Notes
-
v0.5.2 (Apr 5, 2022)
- Adds param
table_nameforrasgo.publish.dataset()for specifying the table name you want to set for the dataset
- Adds param
-
v0.5.1 (Apr 4, 2022)
- Add deprecation warnings for pre-1.0 functions
-
v0.5.0 (Mar 23, 2022)
- Update Snowflake connection to use correct role
-
v0.4.36 (Mar 22, 2022)
- Make
ds.refresh_table()complete refresh when function finishes running always
- Make
-
v0.4.35 (Mar 16, 2022)
- Adds a PyRasgo Primitive for an Accelerator
- Adds the following methods for working with Accelerators in PyRasgo
rasgo.get.accelerator()rasgo.get.accelerators()rasgo.create.accelerator()rasgo.delete.accelerator()rasgo.create.dataset_from_accelerator()Accelerator.apply()
-
v0.4.34 (Mar 11, 2022)
- Add
to_dbt()function toDatasets- Use this method to export a published Dataset as a DBT Model
- Support tracking of dataset dependencies passed transforms that accept lists of datasets (multi-join)
- Add
-
v0.4.33 (Mar 11, 2022)
- Handles long running
ds.refresh_table()process
- Handles long running
-
v0.4.32 (Mar 10, 2022)
- Fixed uses of apply transform
-
v0.4.31 (Mar 04, 2022)
- Raise an error if you supply an arg to transform which doesn't exist
- Fix dependency management for transform arguments of type
table_list
-
v0.4.30 (Mar 01, 2022)
- Cache and return dataset columns if not set when calling
ds.columnsand ds from API - New method
rasgo.update.column()to set/update metadata about a ds column
- Cache and return dataset columns if not set when calling
-
v0.4.29 (Mar 01, 2022)
- Bugfixes
-
v0.4.28 (Feb 28, 2022)
- Fetches all datasets on a call to
rasgo.get.datasets()
- Fetches all datasets on a call to
-
v0.4.27 (Feb 25, 2022)
- Adds ability to set
tagswhen creating a transform in the functionrasgo.create.transform()
- Adds ability to set
-
v0.4.26 (Feb 22, 2022)
- Allow published datasets with tables as their output to be refreshed using
dataset.refresh_table()
- Allow published datasets with tables as their output to be refreshed using
-
v0.4.25 (Feb 22, 2022)
- Creates more informative generated Data Warehouse Table names; Now tables/views names made in PyRasgo will look like the folowing below
RASGO_SDK__OP<op_num>__<transform_name>_transform__<guid>
- Adds proper error message with steps to take to fix, when publishing a DF with incompatible pandas date types
- Creates more informative generated Data Warehouse Table names; Now tables/views names made in PyRasgo will look like the folowing below
-
v0.4.24 (Feb 21, 2022)
- Adds the optional parameter
generate_statsto toggle stats generation when publishing withrasgo.publish.table/df()(defaults to True if not passed)
- Adds the optional parameter
-
v0.4.23 (Feb 17, 2022)
- Adds the parameter
parentsto specify parent dataset dependencies of table or pandas dataframe when publishing withrasgo.publish.table/df()
- Adds the parameter
-
v0.4.22 (Feb 15, 2022)
- Allows users to get the PyRasgo code used to generate a dataset with the function
dataset.generate_py()
- Allows users to get the PyRasgo code used to generate a dataset with the function
-
v0.4.21 (Feb 08, 2022)
- Enable users to append to an existing Rasgo Dataset using
rasgo.publish.df(fqtn="MY.FQTN.STRING", if_exists="append")
- Enable users to append to an existing Rasgo Dataset using
-
v0.4.20 (Feb 07, 2022)
- Add
render_onlyoptional parameter toDataset.transform()to support printing the SQL that will be executed by an applied transform instead of creating a new Dataset.- This option allows testing of transform arguments without having to execute the transform
- Add
-
v0.4.19 (Feb 02, 2022)
- Bug fixes
-
v0.4.18 (Feb 02, 2022)
- Add optional
rasgo.publish.dataset()parametertable_typeto support materializing a dataset as a table instead of a view.
- Add optional
-
v0.4.17 (Feb 01, 2022)
- Add
Dataset.generate_yaml()to allow users to export their datasets and associated operation sets as a YAML string - Add
Dataset.versionsattribute to support retrieving all versions of a Dataset
- Add
-
v0.4.16 (Jan 31, 2022)
- Add
Dataset.run_stats()to allow users to trigger stats generation for a dataset - Add
Dataset.profile()to give users a link to the Rasgo UI, where they can view details on their Dataset, including any generated stats
- Add
-
v0.4.15 (Jan 27, 2022)
- Update timeseries tracking attribute name to
time_indexto match keyword
- Update timeseries tracking attribute name to
-
v0.4.14 (Jan 26, 2022)
- Remove unnecessary import
-
v0.4.13 (Jan 26, 2022)
- Add the ability to publish dataset attributes when publishing a dataset
-
v0.4.12 (Jan 21, 2022)
- Change
experimental_asynctoasync_compute, default toTrue
- Change
-
v0.4.11 (Jan 25, 2022)
- Bug fixes
-
v0.4.10 (Jan 24, 2022)
- Adds dataset
snapshotinformation toDataset.snapshotsand provides a hook to return a snapshot's data withDataset.to_df(snapshot_index=<int>)
- Adds dataset
-
v0.4.9 (Jan 17, 2022)
- Adds parameters
filters,order_by, andcolumnsto dataset.to_df() and dataset.preview() methods
- Adds parameters
-
v0.4.8 (Jan 14, 2022)
- Adds
experimental_asyncflag to transforms to take advantage of experimental long-running operation creation
- Adds
-
v0.4.7 (Jan 13, 2022)
- Return errors for operation creation
-
v0.4.6 (Jan 12, 2022)
- Adds support for long running operation creations
-
v0.4.5 (Dec 21, 2021)
- Fixes dependency installation
-
v0.4.4 (Dec 21, 2021)
- Adds support for Python versions
3.7.12,3.8,3.9, and3.10
- Adds support for Python versions
-
v0.4.3 (Dec 17, 2021)
- Method added
rasgo.update.transform()to update a transform
- Method added
-
v0.4.2 (Dec 15, 2021)
- Adds the ability to reference Dataset attributes directly
Dataset.idDataset.nameDataset.descriptionDataset.statusDataset.fqtnDataset.columnsDataset.created_dateDataset.update_dateDataset.attributesDataset.dependenciesDataset.sql
- Adds ability function for getting Datasets by
fqtnrasgo.get.dataset(fqtn='MY_FQTN'>)
- Adds the ability to reference Dataset attributes directly
-
v0.4.1 (Dec 13, 2021)
- "Updates"
-
v0.4.0 (Dec 07, 2021)
- Add Rasgo Datasets
- Datasets are the new, single primitive available in Rasgo. Users can explore, transform, and create new data warehouse tables using this single primitive object.
- Transforming a previously saved Dataset will produce a new Dataset definition that builds on top of the transformed Dataset. This new dataset will consist of a new operation that references the transformed Dataset as the
source_tablein the applied transform. Further transforms will add to the list of operations until.saveis called to persist the created operations as a new Dataset in Rasgo. - New Rasgo Functions:
rasgo.get.datasets- Get a list of all available Datasetsrasgo.get.dataset- Get a single Dataset by ID, including the list of operations that created it (if they exist)rasgo.update.dataset- Update name and descriptionrasgo.delete.dataset- Delete a Datasetrasgo.publish.dataset- Save a new dataset to Rasgo. Can only save new Datasets that have been created by transforming old Datasetsrasgo.publish.df- Publish a Pandas DataFrame as a Rasgo Datasetrasgo.publish.table- Publish an existing table as a Rasgo dataset
- Dataset Primitive Functions:
Dataset.transform- Transform a previously existing Dataset with a given Transform to create a new Dataset definition- You can also reference transforms by name directly.
- e.g.
dataset.join(...)as opposed todataset.transform(transform_name='join', ...)
Dataset.to_df- Read a Dataset into a Pandas DataFrameDataset.preview- Get a Pandas DataFrame consisting of the top 10 rows produced by this Dataset
- Dataset Attributes:
Dataset.sql- A sql string representation of the operations that produce this dataset (if they exist)
- Add Rasgo Datasets
-
v0.3.4 (Dec 03, 2021)
- Temporary hotfix: DataSource.to_dict() returns
sourceTableattribute as a table name, instead of fqtn. Plan is to revert to fqtn in a future version when publish methods offer first-class handling of fqtn.
- Temporary hotfix: DataSource.to_dict() returns
-
v0.3.3 (Nov 08, 2021)
- Added detailed Transform Argument Definitions during Transform creation
- Allow null values for User Defined Transform arguments
-
v0.3.2 (Oct 13, 2021)
- Adds Jinja as the templating engine for User Defined Transforms
- Source transforms may now be previewed, tested and deleted to enable a full creation experience.
- Adds Rasgo template functions to enable dynamic template building
-
v0.3.1 (Sept 27, 2021)
- Adds
filterandlimitparams toread.collection_snapshot_datafunction - Fixes Collection response model bug
- Adds
-
v0.3.0 (Sept 22, 2021)
- Deprecates FeatureSet primitive (see docs for migration path: https://docs.rasgoml.com/rasgo-docs/pyrasgo-version-log/version-0.3)
- Adds support for creating features using python source code
- Adds support for user-defined transformation functionality
- Adds methods to interact with Collection snapshots (DEPRECATED):
get.collection_snapshots()read.collection_snapshot_data()
- Adds methods to Collection primitive:
.preview()to view data in a pandas df.get_compatible_features()to list features available to join
- Adds
.to_dictand.to_ymlmethods to DataSource primitive
-
v0.2.5 (Aug 18, 2021)
- adds handling and user notification for highly null dataframes which would otherwise not function well with
evaluate.profileorevaluate.feature_importance
- adds handling and user notification for highly null dataframes which would otherwise not function well with
-
v0.2.4 (Aug 4, 2021)
- supports tables named as restricted Snowflake keywords (e.g. ACCOUNT, CASE, ORDER) to be registered as Rasgo Sources
-
v0.2.3 (July 30, 2021)
- introduces
publish.features_from_source_code()function. This function allows users to pass a custom SQL string to create a view in Snowflake using their own code. This function will: register a child source based off the parent source provided as input, register features from the new child source table. - introduces new workflow to
publish.source_data()function. Pass insource_type="sql", sql_definition="<valid sql select string>"to create a new Rasgo DataSource as a view in Snowflake using custom SQL. - makes the
featuresparameter optional inpublish.features_from_source()function. If param is not passed, all columns in the underlying table that are not in thedimensionslist will be registered as features - adds
trigger_statsparameter to all publish method. When set to False, statistical profiling will not run for the data objects being published. Default = True - adds
verboseparameter to all publish methods. When set to True, prints status messages to stdout to update users on progress of the function. Default = False. - introduces
.sourceCodeproperty on Rasgo DataSource and FeatureSet classes to display the SQL or python source code used to build the underlying table - introduces
.render_sql_definition()method on Collection class to display the SQL used to create the underlying collection view - introduces
.dimensionsproperty on Rasgo Collection class to display all unique dimension columns in a Collection - introduces
trigger_statsparameter incollection.generate_training_data()method to allow users to generate a sql view without kicking off correlation and join stats. Set to False to suppress stats jobs. Default=True. - Add support for optional catboost parameter
train_dirinevaluate.feature_importance()function, which allows users to dictate where temporary training files are generated
- introduces
-
v0.2.2(July 14, 2021)
- Allow for consistency in
evaluate.feature_importance()evaluation metrics for unchanged dataframes - Allow users to control certain CatBoost parameters when running
evaluate.feature_importance()
- Allow for consistency in
-
v0.2.1(July 01, 2021)
- expand
evaluate.feature_importance()to support calculating importance for collections
- expand
-
v0.2.0(June 24, 2021)
- introduce
publish.experiment()method to fast track dataframes to Rasgo objects - fix register bug
- introduce
-
v0.1.14(June 17, 2021)
- improve new user signup experience in
register()method - fix dataframe bug when experiment wasn't set
- improve new user signup experience in
-
v0.1.13(June 16, 2021)
- intelligently run Regressor or Classifier model in
evaluate.feature_importance() - improve model performance statistics in
evaluate.feature_importance(): include AUC, Logloss, precision, recall for classification
- intelligently run Regressor or Classifier model in
-
v0.1.12(June 11, 2021)
- support fqtn in
publish.source_data(table)parameter - trim timestamps in dataframe profiles to second grain
- support fqtn in
-
v0.1.11(June 9, 2021)
- hotfix for unexpected histogram output
-
v0.1.10(June 8, 2021)
- pin pyarrow dependency to < version 4.0 to prevent segmentation fault errors
-
v0.1.9(June 8, 2021)
- improve model performance in
evaluate.feature_importance()by adding test set to catboost eval
- improve model performance in
-
v0.1.8(June 7, 2021)
evaluate.train_test_split()function supports non-timeseries dataframesevaluate.feature_importance()function now runs on an 80% training set- adds
timeseries_indexparameter toevaluate.feature_importance()&prune.features()functions
-
v0.1.7(June 2, 2021)
- expands dataframe series type recognition for profiling
-
v0.1.6(June 2, 2021)
- cleans up dataframe profiles to enhance stats and visualization for non-numeric data
-
v0.1.5(June 2, 2021)
- introduces
pip install "pyrasgo[df]"option which will install: shap, catboost, & scikit-learn
- introduces
-
v0.1.4(June 2, 2021)
- various improvements to dataframe profiles & feature_importance
-
v0.1.3(May 27, 2021)
- introduces experiment tracking on dataframes
- fixes errors when running feature_importance on dataframes with NaN values
-
v0.1.2(May 26, 2021)
- generates column profile automatically when running feature_importance
-
v0.1.1(May 24, 2021)
- supports sharing public dataframe profiles
- enforces assignment of granularity to dimensions in Publish methods based on list ordering
-
v0.1.0(May 17, 2021)
- introduces dataframe methods: evaluate, prune, transform
- supports free pyrago trial registration
-
v0.0.79(April 19, 2021)
- support additional datetime data types on Features
- resolve import errors
-
v0.0.78(April 5, 2021)
- adds include_shared param to get_collections() method
-
v0.0.77(April 5, 2021)
- adds convenience method to rename a Feature’s displayName
- adds convenience method to promote a Feature from Sandbox to Production status
- fixes permissions bug when trying to read Community data sources from a public org
-
v0.0.76(April 5, 2021)
- adds columns to DataSource primitive
- adds verbose error message to inform users when a Feature name conflict is preventing creation
-
v0.0.75(April 5, 2021)
- introduce interactive Rasgo primitives
-
v0.0.74(March 25, 2021)
- upgrade Snowflake python connector dependency to 2.4.0
- upgrade pyarrow dependency to 3.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyrasgo-0.5.2.tar.gz.
File metadata
- Download URL: pyrasgo-0.5.2.tar.gz
- Upload date:
- Size: 93.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.7.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9656564d1f3c4f5827c09e4872ac02c2d99299f9a1fec5164dc0b8081a81f46e
|
|
| MD5 |
ec4c514a0ce6a920637d139fa2d4b805
|
|
| BLAKE2b-256 |
81d1dd2fb10bbb26dc5afabe2ee6caa498b789226221510504c7f112e4db4498
|
File details
Details for the file pyrasgo-0.5.2-py3-none-any.whl.
File metadata
- Download URL: pyrasgo-0.5.2-py3-none-any.whl
- Upload date:
- Size: 111.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.7.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8f85314cc7c7816f22772dde6fe0b5e0818cf401fe6cce5aede0b2e9a3d75a2b
|
|
| MD5 |
2e83b278fd04336e7622576a9bee46a8
|
|
| BLAKE2b-256 |
abe3b1c919ece9e454ab7122ed5d46991e11ee48fc7f40939f7ca5a63bc1c054
|