Alpha version of the Rasgo Python interface.
Project description
pyRasgo is a python SDK to interact with the Rasgo API. Rasgo accelerates feature engineering for Data Scientists.
Visit us at https://www.rasgoml.com/ to turn your data into Features in minutes!
Documentation is available at: https://docs.rasgoml.com/rasgo-docs/pyrasgo/
Package Dependencies
- idna>=3.3
- more-itertools
- pandas
- pyarrow>=5.0.0
- pydantic
- pyyaml
- requests
- snowflake-connector-python>=2.7.0
- tqdm
Release Notes
-
v0.5.1 (Apr 4, 2022)
- Add deprecation warnings for pre-1.0 functions
-
v0.5.0 (Mar 23, 2022)
- Update Snowflake connection to use correct role
-
v0.4.36 (Mar 22, 2022)
- Make
ds.refresh_table()
complete refresh when function finishes running always
- Make
-
v0.4.35 (Mar 16, 2022)
- Adds a PyRasgo Primitive for an Accelerator
- Adds the following methods for working with Accelerators in PyRasgo
rasgo.get.accelerator()
rasgo.get.accelerators()
rasgo.create.accelerator()
rasgo.delete.accelerator()
rasgo.create.dataset_from_accelerator()
Accelerator.apply()
-
v0.4.34 (Mar 11, 2022)
- Add
to_dbt()
function toDatasets
- Use this method to export a published Dataset as a DBT Model
- Support tracking of dataset dependencies passed transforms that accept lists of datasets (multi-join)
- Add
-
v0.4.33 (Mar 11, 2022)
- Handles long running
ds.refresh_table()
process
- Handles long running
-
v0.4.32 (Mar 10, 2022)
- Fixed uses of apply transform
-
v0.4.31 (Mar 04, 2022)
- Raise an error if you supply an arg to transform which doesn't exist
- Fix dependency management for transform arguments of type
table_list
-
v0.4.30 (Mar 01, 2022)
- Cache and return dataset columns if not set when calling
ds.columns
and ds from API - New method
rasgo.update.column()
to set/update metadata about a ds column
- Cache and return dataset columns if not set when calling
-
v0.4.29 (Mar 01, 2022)
- Bugfixes
-
v0.4.28 (Feb 28, 2022)
- Fetches all datasets on a call to
rasgo.get.datasets()
- Fetches all datasets on a call to
-
v0.4.27 (Feb 25, 2022)
- Adds ability to set
tags
when creating a transform in the functionrasgo.create.transform()
- Adds ability to set
-
v0.4.26 (Feb 22, 2022)
- Allow published datasets with tables as their output to be refreshed using
dataset.refresh_table()
- Allow published datasets with tables as their output to be refreshed using
-
v0.4.25 (Feb 22, 2022)
- Creates more informative generated Data Warehouse Table names; Now tables/views names made in PyRasgo will look like the folowing below
RASGO_SDK__OP<op_num>__<transform_name>_transform__<guid>
- Adds proper error message with steps to take to fix, when publishing a DF with incompatible pandas date types
- Creates more informative generated Data Warehouse Table names; Now tables/views names made in PyRasgo will look like the folowing below
-
v0.4.24 (Feb 21, 2022)
- Adds the optional parameter
generate_stats
to toggle stats generation when publishing withrasgo.publish.table/df()
(defaults to True if not passed)
- Adds the optional parameter
-
v0.4.23 (Feb 17, 2022)
- Adds the parameter
parents
to specify parent dataset dependencies of table or pandas dataframe when publishing withrasgo.publish.table/df()
- Adds the parameter
-
v0.4.22 (Feb 15, 2022)
- Allows users to get the PyRasgo code used to generate a dataset with the function
dataset.generate_py()
- Allows users to get the PyRasgo code used to generate a dataset with the function
-
v0.4.21 (Feb 08, 2022)
- Enable users to append to an existing Rasgo Dataset using
rasgo.publish.df(fqtn="MY.FQTN.STRING", if_exists="append")
- Enable users to append to an existing Rasgo Dataset using
-
v0.4.20 (Feb 07, 2022)
- Add
render_only
optional parameter toDataset.transform()
to support printing the SQL that will be executed by an applied transform instead of creating a new Dataset.- This option allows testing of transform arguments without having to execute the transform
- Add
-
v0.4.19 (Feb 02, 2022)
- Bug fixes
-
v0.4.18 (Feb 02, 2022)
- Add optional
rasgo.publish.dataset()
parametertable_type
to support materializing a dataset as a table instead of a view.
- Add optional
-
v0.4.17 (Feb 01, 2022)
- Add
Dataset.generate_yaml()
to allow users to export their datasets and associated operation sets as a YAML string - Add
Dataset.versions
attribute to support retrieving all versions of a Dataset
- Add
-
v0.4.16 (Jan 31, 2022)
- Add
Dataset.run_stats()
to allow users to trigger stats generation for a dataset - Add
Dataset.profile()
to give users a link to the Rasgo UI, where they can view details on their Dataset, including any generated stats
- Add
-
v0.4.15 (Jan 27, 2022)
- Update timeseries tracking attribute name to
time_index
to match keyword
- Update timeseries tracking attribute name to
-
v0.4.14 (Jan 26, 2022)
- Remove unnecessary import
-
v0.4.13 (Jan 26, 2022)
- Add the ability to publish dataset attributes when publishing a dataset
-
v0.4.12 (Jan 21, 2022)
- Change
experimental_async
toasync_compute
, default toTrue
- Change
-
v0.4.11 (Jan 25, 2022)
- Bug fixes
-
v0.4.10 (Jan 24, 2022)
- Adds dataset
snapshot
information toDataset.snapshots
and provides a hook to return a snapshot's data withDataset.to_df(snapshot_index=<int>)
- Adds dataset
-
v0.4.9 (Jan 17, 2022)
- Adds parameters
filters
,order_by
, andcolumns
to dataset.to_df() and dataset.preview() methods
- Adds parameters
-
v0.4.8 (Jan 14, 2022)
- Adds
experimental_async
flag to transforms to take advantage of experimental long-running operation creation
- Adds
-
v0.4.7 (Jan 13, 2022)
- Return errors for operation creation
-
v0.4.6 (Jan 12, 2022)
- Adds support for long running operation creations
-
v0.4.5 (Dec 21, 2021)
- Fixes dependency installation
-
v0.4.4 (Dec 21, 2021)
- Adds support for Python versions
3.7.12
,3.8
,3.9
, and3.10
- Adds support for Python versions
-
v0.4.3 (Dec 17, 2021)
- Method added
rasgo.update.transform()
to update a transform
- Method added
-
v0.4.2 (Dec 15, 2021)
- Adds the ability to reference Dataset attributes directly
Dataset.id
Dataset.name
Dataset.description
Dataset.status
Dataset.fqtn
Dataset.columns
Dataset.created_date
Dataset.update_date
Dataset.attributes
Dataset.dependencies
Dataset.sql
- Adds ability function for getting Datasets by
fqtn
rasgo.get.dataset(fqtn='MY_FQTN'>)
- Adds the ability to reference Dataset attributes directly
-
v0.4.1 (Dec 13, 2021)
- "Updates"
-
v0.4.0 (Dec 07, 2021)
- Add Rasgo Datasets
- Datasets are the new, single primitive available in Rasgo. Users can explore, transform, and create new data warehouse tables using this single primitive object.
- Transforming a previously saved Dataset will produce a new Dataset definition that builds on top of the transformed Dataset. This new dataset will consist of a new operation that references the transformed Dataset as the
source_table
in the applied transform. Further transforms will add to the list of operations until.save
is called to persist the created operations as a new Dataset in Rasgo. - New Rasgo Functions:
rasgo.get.datasets
- Get a list of all available Datasetsrasgo.get.dataset
- Get a single Dataset by ID, including the list of operations that created it (if they exist)rasgo.update.dataset
- Update name and descriptionrasgo.delete.dataset
- Delete a Datasetrasgo.publish.dataset
- Save a new dataset to Rasgo. Can only save new Datasets that have been created by transforming old Datasetsrasgo.publish.df
- Publish a Pandas DataFrame as a Rasgo Datasetrasgo.publish.table
- Publish an existing table as a Rasgo dataset
- Dataset Primitive Functions:
Dataset.transform
- Transform a previously existing Dataset with a given Transform to create a new Dataset definition- You can also reference transforms by name directly.
- e.g.
dataset.join(...)
as opposed todataset.transform(transform_name='join', ...)
Dataset.to_df
- Read a Dataset into a Pandas DataFrameDataset.preview
- Get a Pandas DataFrame consisting of the top 10 rows produced by this Dataset
- Dataset Attributes:
Dataset.sql
- A sql string representation of the operations that produce this dataset (if they exist)
- Add Rasgo Datasets
-
v0.3.4 (Dec 03, 2021)
- Temporary hotfix: DataSource.to_dict() returns
sourceTable
attribute as a table name, instead of fqtn. Plan is to revert to fqtn in a future version when publish methods offer first-class handling of fqtn.
- Temporary hotfix: DataSource.to_dict() returns
-
v0.3.3 (Nov 08, 2021)
- Added detailed Transform Argument Definitions during Transform creation
- Allow null values for User Defined Transform arguments
-
v0.3.2 (Oct 13, 2021)
- Adds Jinja as the templating engine for User Defined Transforms
- Source transforms may now be previewed, tested and deleted to enable a full creation experience.
- Adds Rasgo template functions to enable dynamic template building
-
v0.3.1 (Sept 27, 2021)
- Adds
filter
andlimit
params toread.collection_snapshot_data
function - Fixes Collection response model bug
- Adds
-
v0.3.0 (Sept 22, 2021)
- Deprecates FeatureSet primitive (see docs for migration path: https://docs.rasgoml.com/rasgo-docs/pyrasgo-version-log/version-0.3)
- Adds support for creating features using python source code
- Adds support for user-defined transformation functionality
- Adds methods to interact with Collection snapshots (DEPRECATED):
get.collection_snapshots()
read.collection_snapshot_data()
- Adds methods to Collection primitive:
.preview()
to view data in a pandas df.get_compatible_features()
to list features available to join
- Adds
.to_dict
and.to_yml
methods to DataSource primitive
-
v0.2.5 (Aug 18, 2021)
- adds handling and user notification for highly null dataframes which would otherwise not function well with
evaluate.profile
orevaluate.feature_importance
- adds handling and user notification for highly null dataframes which would otherwise not function well with
-
v0.2.4 (Aug 4, 2021)
- supports tables named as restricted Snowflake keywords (e.g. ACCOUNT, CASE, ORDER) to be registered as Rasgo Sources
-
v0.2.3 (July 30, 2021)
- introduces
publish.features_from_source_code()
function. This function allows users to pass a custom SQL string to create a view in Snowflake using their own code. This function will: register a child source based off the parent source provided as input, register features from the new child source table. - introduces new workflow to
publish.source_data()
function. Pass insource_type="sql", sql_definition="<valid sql select string>"
to create a new Rasgo DataSource as a view in Snowflake using custom SQL. - makes the
features
parameter optional inpublish.features_from_source()
function. If param is not passed, all columns in the underlying table that are not in thedimensions
list will be registered as features - adds
trigger_stats
parameter to all publish method. When set to False, statistical profiling will not run for the data objects being published. Default = True - adds
verbose
parameter to all publish methods. When set to True, prints status messages to stdout to update users on progress of the function. Default = False. - introduces
.sourceCode
property on Rasgo DataSource and FeatureSet classes to display the SQL or python source code used to build the underlying table - introduces
.render_sql_definition()
method on Collection class to display the SQL used to create the underlying collection view - introduces
.dimensions
property on Rasgo Collection class to display all unique dimension columns in a Collection - introduces
trigger_stats
parameter incollection.generate_training_data()
method to allow users to generate a sql view without kicking off correlation and join stats. Set to False to suppress stats jobs. Default=True. - Add support for optional catboost parameter
train_dir
inevaluate.feature_importance()
function, which allows users to dictate where temporary training files are generated
- introduces
-
v0.2.2(July 14, 2021)
- Allow for consistency in
evaluate.feature_importance()
evaluation metrics for unchanged dataframes - Allow users to control certain CatBoost parameters when running
evaluate.feature_importance()
- Allow for consistency in
-
v0.2.1(July 01, 2021)
- expand
evaluate.feature_importance()
to support calculating importance for collections
- expand
-
v0.2.0(June 24, 2021)
- introduce
publish.experiment()
method to fast track dataframes to Rasgo objects - fix register bug
- introduce
-
v0.1.14(June 17, 2021)
- improve new user signup experience in
register()
method - fix dataframe bug when experiment wasn't set
- improve new user signup experience in
-
v0.1.13(June 16, 2021)
- intelligently run Regressor or Classifier model in
evaluate.feature_importance()
- improve model performance statistics in
evaluate.feature_importance()
: include AUC, Logloss, precision, recall for classification
- intelligently run Regressor or Classifier model in
-
v0.1.12(June 11, 2021)
- support fqtn in
publish.source_data(table)
parameter - trim timestamps in dataframe profiles to second grain
- support fqtn in
-
v0.1.11(June 9, 2021)
- hotfix for unexpected histogram output
-
v0.1.10(June 8, 2021)
- pin pyarrow dependency to < version 4.0 to prevent segmentation fault errors
-
v0.1.9(June 8, 2021)
- improve model performance in
evaluate.feature_importance()
by adding test set to catboost eval
- improve model performance in
-
v0.1.8(June 7, 2021)
evaluate.train_test_split()
function supports non-timeseries dataframesevaluate.feature_importance()
function now runs on an 80% training set- adds
timeseries_index
parameter toevaluate.feature_importance()
&prune.features()
functions
-
v0.1.7(June 2, 2021)
- expands dataframe series type recognition for profiling
-
v0.1.6(June 2, 2021)
- cleans up dataframe profiles to enhance stats and visualization for non-numeric data
-
v0.1.5(June 2, 2021)
- introduces
pip install "pyrasgo[df]"
option which will install: shap, catboost, & scikit-learn
- introduces
-
v0.1.4(June 2, 2021)
- various improvements to dataframe profiles & feature_importance
-
v0.1.3(May 27, 2021)
- introduces experiment tracking on dataframes
- fixes errors when running feature_importance on dataframes with NaN values
-
v0.1.2(May 26, 2021)
- generates column profile automatically when running feature_importance
-
v0.1.1(May 24, 2021)
- supports sharing public dataframe profiles
- enforces assignment of granularity to dimensions in Publish methods based on list ordering
-
v0.1.0(May 17, 2021)
- introduces dataframe methods: evaluate, prune, transform
- supports free pyrago trial registration
-
v0.0.79(April 19, 2021)
- support additional datetime data types on Features
- resolve import errors
-
v0.0.78(April 5, 2021)
- adds include_shared param to get_collections() method
-
v0.0.77(April 5, 2021)
- adds convenience method to rename a Feature’s displayName
- adds convenience method to promote a Feature from Sandbox to Production status
- fixes permissions bug when trying to read Community data sources from a public org
-
v0.0.76(April 5, 2021)
- adds columns to DataSource primitive
- adds verbose error message to inform users when a Feature name conflict is preventing creation
-
v0.0.75(April 5, 2021)
- introduce interactive Rasgo primitives
-
v0.0.74(March 25, 2021)
- upgrade Snowflake python connector dependency to 2.4.0
- upgrade pyarrow dependency to 3.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.