Skip to main content

Alpha version of the Rasgo Python interface.

Project description

pyRasgo is a python SDK to interact with the Rasgo API. Rasgo accelerates feature engineering for Data Scientists.

Visit us at https://www.rasgoml.com/ to turn your data into Features in minutes!

Documentation is available at: https://docs.rasgoml.com/rasgo-docs/pyrasgo/

Package Dependencies

  • idna>=2.5,<3
  • more-itertools
  • pandas
  • pyarrow>=3.0
  • pydantic
  • pyyaml
  • requests
  • snowflake-connector-python>=2.4.0
  • tqdm

Release Notes

  • v0.2.5 (Aug 18, 2021)

    • adds handling and user notification for highly null dataframes which would otherwise not function well with evaluate.profile or evaluate.feature_importance
  • v0.2.4 (Aug 4, 2021)

    • supports tables named as restricted Snowflake keywords (e.g. ACCOUNT, CASE, ORDER) to be registered as Rasgo Sources
  • v0.2.3 (July 30, 2021)

    • introduces publish.features_from_source_code() function. This function allows users to pass a custom SQL string to create a view in Snowflake using their own code. This function will: register a child source based off the parent source provided as input, register features from the new child source table.
    • introduces new workflow to publish.source_data() function. Pass in source_type="sql", sql_definition="<valid sql select string>" to create a new Rasgo DataSource as a view in Snowflake using custom SQL.
    • makes the features parameter optional in publish.features_from_source() function. If param is not passed, all columns in the underlying table that are not in the dimensions list will be registered as features
    • adds trigger_stats parameter to all publish method. When set to False, statistical profiling will not run for the data objects being published. Default = True
    • adds verbose parameter to all publish methods. When set to True, prints status messages to stdout to update users on progress of the function. Default = False.
    • introduces .sourceCode property on Rasgo DataSource and FeatureSet classes to display the SQL or python source code used to build the underlying table
    • introduces .render_sql_definition() method on Collection class to display the SQL used to create the underlying collection view
    • introduces .dimensions property on Rasgo Collection class to display all unique dimension columns in a Collection
    • introduces trigger_stats parameter in collection.generate_training_data() method to allow users to generate a sql view without kicking off correlation and join stats. Set to False to suppress stats jobs. Default=True.
    • Add support for optional catboost parameter train_dir in evaluate.feature_importance() function, which allows users to dictate where temporary training files are generated
  • v0.2.2(July 14, 2021)

    • Allow for consistency in evaluate.feature_importance() evaluation metrics for unchanged dataframes
    • Allow users to control certain CatBoost parameters when running evaluate.feature_importance()
  • v0.2.1(July 01, 2021)

    • expand evaluate.feature_importance() to support calculating importance for collections
  • v0.2.0(June 24, 2021)

    • introduce publish.experiment() method to fast track dataframes to Rasgo objects
    • fix register bug
  • v0.1.14(June 17, 2021)

    • improve new user signup experience in register() method
    • fix dataframe bug when experiment wasn't set
  • v0.1.13(June 16, 2021)

    • intelligently run Regressor or Classifier model in evaluate.feature_importance()
    • improve model performance statistics in evaluate.feature_importance(): include AUC, Logloss, precision, recall for classification
  • v0.1.12(June 11, 2021)

    • support fqtn in publish.source_data(table) parameter
    • trim timestamps in dataframe profiles to second grain
  • v0.1.11(June 9, 2021)

    • hotfix for unexpected histogram output
  • v0.1.10(June 8, 2021)

    • pin pyarrow dependency to < version 4.0 to prevent segmentation fault errors
  • v0.1.9(June 8, 2021)

    • improve model performance in evaluate.feature_importance() by adding test set to catboost eval
  • v0.1.8(June 7, 2021)

    • evaluate.train_test_split() function supports non-timeseries dataframes
    • evaluate.feature_importance() function now runs on an 80% training set
    • adds timeseries_index parameter to evaluate.feature_importance() & prune.features() functions
  • v0.1.7(June 2, 2021)

    • expands dataframe series type recognition for profiling
  • v0.1.6(June 2, 2021)

    • cleans up dataframe profiles to enhance stats and visualization for non-numeric data
  • v0.1.5(June 2, 2021)

    • introduces pip install "pyrasgo[df]" option which will install: shap, catboost, & scikit-learn
  • v0.1.4(June 2, 2021)

    • various improvements to dataframe profiles & feature_importance
  • v0.1.3(May 27, 2021)

    • introduces experiment tracking on dataframes
    • fixes errors when running feature_importance on dataframes with NaN values
  • v0.1.2(May 26, 2021)

    • generates column profile automatically when running feature_importance
  • v0.1.1(May 24, 2021)

    • supports sharing public dataframe profiles
    • enforces assignment of granularity to dimensions in Publish methods based on list ordering
  • v0.1.0(May 17, 2021)

    • introduces dataframe methods: evaluate, prune, transform
    • supports free pyrago trial registration
  • v0.0.79(April 19, 2021)

    • support additional datetime data types on Features
    • resolve import errors
  • v0.0.78(April 5, 2021)

    • adds include_shared param to get_collections() method
  • v0.0.77(April 5, 2021)

    • adds convenience method to rename a Feature’s displayName
    • adds convenience method to promote a Feature from Sandbox to Production status
    • fixes permissions bug when trying to read Community data sources from a public org
  • v0.0.76(April 5, 2021)

    • adds columns to DataSource primitive
    • adds verbose error message to inform users when a Feature name conflict is preventing creation
  • v0.0.75(April 5, 2021)

    • introduce interactive Rasgo primitives
  • v0.0.74(March 25, 2021)

    • upgrade Snowflake python connector dependency to 2.4.0
    • upgrade pyarrow dependency to 3.0

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyrasgo-0.2.5.tar.gz (58.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyrasgo-0.2.5-py3-none-any.whl (75.0 kB view details)

Uploaded Python 3

File details

Details for the file pyrasgo-0.2.5.tar.gz.

File metadata

  • Download URL: pyrasgo-0.2.5.tar.gz
  • Upload date:
  • Size: 58.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.1 CPython/3.7.11

File hashes

Hashes for pyrasgo-0.2.5.tar.gz
Algorithm Hash digest
SHA256 37425b8bc3cc6913712dda96e6dffd54e0de92afad4dae7a9639341639a82e7f
MD5 86ff381e7eaf5894f837fda0821b4fd0
BLAKE2b-256 17e87302cf9d54509f86cb3faf6d9529ce9c28cbd0d21e9d7609d836a2a8df33

See more details on using hashes here.

File details

Details for the file pyrasgo-0.2.5-py3-none-any.whl.

File metadata

  • Download URL: pyrasgo-0.2.5-py3-none-any.whl
  • Upload date:
  • Size: 75.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.1 CPython/3.7.11

File hashes

Hashes for pyrasgo-0.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 293b3c3bf9ec8b19974311295b07823d72eb2cfb31881f3b60c00001f92b3306
MD5 1c158b68c324bfe1bff324526e6464f9
BLAKE2b-256 2e93eedc3f4abd41da6af062ccec0576e631c18dda28796e05d59ac66d1287ec

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page