Alpha version of the Rasgo Python interface.
Project description
pyRasgo is a python SDK to interact with the Rasgo API. Rasgo accelerates feature engineering for Data Scientists.
Visit us at https://www.rasgoml.com/ to turn your data into Features in minutes!
Documentation is available at: https://docs.rasgoml.com/rasgo-docs/pyrasgo/
Package Dependencies
- idna>=2.5,<3
- more-itertools
- pandas
- pyarrow>=3.0
- pydantic
- pyyaml
- requests
- snowflake-connector-python>=2.4.0
- tqdm
Release Notes
-
v0.4.0a2 (Dec 07, 2021)
- Error Handling
-
v0.4.0a1 (Dec 07, 2021)
- Add Rasgo Datasets
- Datasets are the new, single primitive available in Rasgo. Users can explore, transform, and create new data warehouse tables using this single primitive object.
- Transforming a previously saved Dataset will produce a new Datset definition that builds on top of the transformed Dataset. This new dataset will consist of a new operation that references the transformed Dataset as the
source_table
in the applied transform. Furhter transforms will add to the list of operations until.save
is called to persist the created operations as a new Datset in Rasgo. - New Rasgo Functions:
rasgo.get.datasets
- Get a list of all avaialable Datasetsrasgo.get.dataset
- Get a single Dataset by ID, including the list of operations that created it (if they exist)rasgo.update.dataset
- Update name and descriptionrasgo.delete.dataset
- Delete a Datasetrasgo.save.dataset
- Save a new dataset to Rasgo. Can only save new Datasets that have been created by transforming old Datasets Dataset Primitive Functions:Dataset.transform
- Trasnsform a previously existing Dataset with a given Transform to create a new Dataset definitionDataset.read_into_df
- Read a Dataset into a Pandas DataFrameDataset.preview
- Get a Pandas DataFrame consisting of the top 10 rows produced by this Dataset Dataset Attributes:Dataset.source_code
- A string representation of the operations that produced this dataset (if they exist)
- Add Rasgo Datasets
-
v0.3.4 (Dec 03, 2021)
- Temporary hotfix: DataSource.to_dict() returns
sourceTable
attribute as a table name, instead of fqtn. Plan is to revert to fqtn in a future version when publish methods offer first-class handling of fqtn.
- Temporary hotfix: DataSource.to_dict() returns
-
v0.3.3 (Nov 08, 2021)
- Added detailed Transform Argument Definitions during Transform creation
- Allow null values for User Defined Transform arguments
-
v0.3.2 (Oct 13, 2021)
- Adds Jinja as the templating engine for User Defined Transforms
- Source transforms may now be previewed, tested and deleted to enable a full creation experience.
- Adds Rasgo template functions to enable dynamic template building
-
v0.3.1 (Sept 27, 2021)
- Adds
filter
andlimit
params toread.collection_snapshot_data
function - Fixes Collection response model bug
- Adds
-
v0.3.0 (Sept 22, 2021)
- Deprecates FeatureSet primitive (see docs for migration path: https://docs.rasgoml.com/rasgo-docs/pyrasgo-version-log/version-0.3)
- Adds support for creating features using python source code
- Adds support for user-defined transformation functionality
- Adds methods to interact with Collection snapshots:
get.collection_snapshots()
read.collection_snapshot_data()
- Adds methods to Collection primitive:
.preview()
to view data in a pandas df.get_compatible_features()
to list features available to join
- Adds
.to_dict
and.to_yml
methods to DataSource primitive
-
v0.2.5 (Aug 18, 2021)
- adds handling and user notification for highly null dataframes which would otherwise not function well with
evaluate.profile
orevaluate.feature_importance
- adds handling and user notification for highly null dataframes which would otherwise not function well with
-
v0.2.4 (Aug 4, 2021)
- supports tables named as restricted Snowflake keywords (e.g. ACCOUNT, CASE, ORDER) to be registered as Rasgo Sources
-
v0.2.3 (July 30, 2021)
- introduces
publish.features_from_source_code()
function. This function allows users to pass a custom SQL string to create a view in Snowflake using their own code. This function will: register a child source based off the parent source provided as input, register features from the new child source table. - introduces new workflow to
publish.source_data()
function. Pass insource_type="sql", sql_definition="<valid sql select string>"
to create a new Rasgo DataSource as a view in Snowflake using custom SQL. - makes the
features
parameter optional inpublish.features_from_source()
function. If param is not passed, all columns in the underlying table that are not in thedimensions
list will be registered as features - adds
trigger_stats
parameter to all publish method. When set to False, statistical profiling will not run for the data objects being published. Default = True - adds
verbose
parameter to all publish methods. When set to True, prints status messages to stdout to update users on progress of the function. Default = False. - introduces
.sourceCode
property on Rasgo DataSource and FeatureSet classes to display the SQL or python source code used to build the underlying table - introduces
.render_sql_definition()
method on Collection class to display the SQL used to create the underlying collection view - introduces
.dimensions
property on Rasgo Collection class to display all unique dimension columns in a Collection - introduces
trigger_stats
parameter incollection.generate_training_data()
method to allow users to generate a sql view without kicking off correlation and join stats. Set to False to suppress stats jobs. Default=True. - Add support for optional catboost parameter
train_dir
inevaluate.feature_importance()
function, which allows users to dictate where temporary training files are generated
- introduces
-
v0.2.2(July 14, 2021)
- Allow for consistency in
evaluate.feature_importance()
evaluation metrics for unchanged dataframes - Allow users to control certain CatBoost parameters when running
evaluate.feature_importance()
- Allow for consistency in
-
v0.2.1(July 01, 2021)
- expand
evaluate.feature_importance()
to support calculating importance for collections
- expand
-
v0.2.0(June 24, 2021)
- introduce
publish.experiment()
method to fast track dataframes to Rasgo objects - fix register bug
- introduce
-
v0.1.14(June 17, 2021)
- improve new user signup experience in
register()
method - fix dataframe bug when experiment wasn't set
- improve new user signup experience in
-
v0.1.13(June 16, 2021)
- intelligently run Regressor or Classifier model in
evaluate.feature_importance()
- improve model performance statistics in
evaluate.feature_importance()
: include AUC, Logloss, precision, recall for classification
- intelligently run Regressor or Classifier model in
-
v0.1.12(June 11, 2021)
- support fqtn in
publish.source_data(table)
parameter - trim timestamps in dataframe profiles to second grain
- support fqtn in
-
v0.1.11(June 9, 2021)
- hotfix for unexpected histogram output
-
v0.1.10(June 8, 2021)
- pin pyarrow dependency to < version 4.0 to prevent segmentation fault errors
-
v0.1.9(June 8, 2021)
- improve model performance in
evaluate.feature_importance()
by adding test set to catboost eval
- improve model performance in
-
v0.1.8(June 7, 2021)
evaluate.train_test_split()
function supports non-timeseries dataframesevaluate.feature_importance()
function now runs on an 80% training set- adds
timeseries_index
parameter toevaluate.feature_importance()
&prune.features()
functions
-
v0.1.7(June 2, 2021)
- expands dataframe series type recognition for profiling
-
v0.1.6(June 2, 2021)
- cleans up dataframe profiles to enhance stats and visualization for non-numeric data
-
v0.1.5(June 2, 2021)
- introduces
pip install "pyrasgo[df]"
option which will install: shap, catboost, & scikit-learn
- introduces
-
v0.1.4(June 2, 2021)
- various improvements to dataframe profiles & feature_importance
-
v0.1.3(May 27, 2021)
- introduces experiment tracking on dataframes
- fixes errors when running feature_importance on dataframes with NaN values
-
v0.1.2(May 26, 2021)
- generates column profile automatically when running feature_importance
-
v0.1.1(May 24, 2021)
- supports sharing public dataframe profiles
- enforces assignment of granularity to dimensions in Publish methods based on list ordering
-
v0.1.0(May 17, 2021)
- introduces dataframe methods: evaluate, prune, transform
- supports free pyrago trial registration
-
v0.0.79(April 19, 2021)
- support additional datetime data types on Features
- resolve import errors
-
v0.0.78(April 5, 2021)
- adds include_shared param to get_collections() method
-
v0.0.77(April 5, 2021)
- adds convenience method to rename a Feature’s displayName
- adds convenience method to promote a Feature from Sandbox to Production status
- fixes permissions bug when trying to read Community data sources from a public org
-
v0.0.76(April 5, 2021)
- adds columns to DataSource primitive
- adds verbose error message to inform users when a Feature name conflict is preventing creation
-
v0.0.75(April 5, 2021)
- introduce interactive Rasgo primitives
-
v0.0.74(March 25, 2021)
- upgrade Snowflake python connector dependency to 2.4.0
- upgrade pyarrow dependency to 3.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pyrasgo-0.4.0a2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 50e1f392e0831d0cf9f2deb575c7950fc3bb5ddc585772d60a22a6755ca68e3a |
|
MD5 | 3ac629efc8019d6f93d8320a7f30e233 |
|
BLAKE2b-256 | bdaeb8318f1d0e0a23b923a823ca7a7f85749a0d4b6d895c2c6e56754be68c9e |