A lightweight experimentation toolkit for data scientists.
Project description
Stratosphere
A lightweight experimentation toolkit for data scientists.
Designed for simplicity, efficiency and robustness. stratosphere
lets you:
- Define programmatically your experiments
- Execute them in parallel with different backends
- Track their real-time metrics and final results
- Store them as serialized objects and tabular data in your database(s)
- Query them with the best-suited interface: SQL, Pandas and Python
Built on top of solid components: SQLAlchemy, SQLite, Pandas, Joblib and Dask.
Installation
It officially requires Python 3.8.15
, but it can be forced to work with Python 3.7.15
just fine.
- With PyPI:
pip install stratosphere --upgrade
- With Poetry:
poetry add stratosphere
To run it on Google Colab, install it as follows:
# Install dependencies/update packages
!pip install pandas joblib sqlalchemy sqlalchemy-utils ulid-py psycopg2-binary \
cloudpickle colorama tabulate ipywidgets tqdm scikit-learn "dask[complete]" --upgrade
# Install the latest compatible stratosphere version, ignoring the python version and dependencies
!pip install stratosphere==0.1.13 --ignore-requires-python --no-dependencies
Documentation
- Quick demo on Colab
- Follow the tutorial notebooks to learn the basic concepts
You can run the tutorial notebooks in Colab as follows:
- Open the notebook on Github, and substitute
github.com
withgithubtocolab.com
in the URLs - Add a cell at the beginning, installing
stratosphere
following the Installation instructions for Colab
Project pages
License
This project is licensed under the terms of the BSD 3-Clause License.
Development
In this section, I documented the creation and management of my dev environment for this project.
These instructions have been tested on macOS Monterey @ MacBook Pro M2, Python 3.8.10
and Python 3.10.7
.
Set up the system
- Install command line tools
xcode-select --install
- Install pyenv/pyenv-virtualenv
brew update
brew install pyenv pyenv-virtualenv
Configure the shell, adding in ~/.zshrc
:
export PYENV_ROOT="$HOME/.pyenv"
command -v pyenv >/dev/null || export PATH="$PYENV_ROOT/bin:$PATH"
export PYENV_VIRTUALENV_DISABLE_PROMPT=1
eval "$(pyenv init -)"
eval "$(pyenv virtualenv-init -)"
- List the installed Python versions:
pyenv versions
- List the Python versions available for installation:
pyenv install --list
- Install a specific Python version
pyenv install 3.10.7
- (Optional) Set a global pyenv Python version
pyenv global 3.10.7
- Install poetry
brew install poetry
poetry config virtualenvs.in-project true
(Optional) Optimizing the Zsh shell
The powerlevel10k theme lets you customize the Zsh prompt,
showing the current folder, git status, and active environment. My .zshrc
:
# Enable Powerlevel10k instant prompt. Should stay close to the top of ~/.zshrc.
# Initialization code that may require console input (password prompts, [y/n]
# confirmations, etc.) must go above this block; everything else may go below.
if [[ -r "${XDG_CACHE_HOME:-$HOME/.cache}/p10k-instant-prompt-${(%):-%n}.zsh" ]]; then
source "${XDG_CACHE_HOME:-$HOME/.cache}/p10k-instant-prompt-${(%):-%n}.zsh"
fi
source ~/bin/powerlevel10k/powerlevel10k.zsh-theme
# To customize prompt, run `p10k configure` or edit ~/.p10k.zsh.
[[ ! -f ~/.p10k.zsh ]] || source ~/.p10k.zsh
# Required, to display the active environment on the prompt (right side)
plugins=(virtualenv)
Useful alises:
alias ll="/bin/ls -la"
alias ls="/bin/ls -laG"
Manage the project environment
Creating and removing the environment
To create it:
- List the available Python versions:
pyenv versions
- Create the environment (
./venv
):
cd stratosphere
poetry env use 3.10.7
- Check the correct installation of the Poetry environment
poetry env info
To remove it:
cd stratosphere
rm -rf .venv
Installing the project in development mode
- Activate the environment
cd stratosphere
poetry shell
- Install the project (edit mode) in the Poetry environment:
poetry install
- Run the tests
poetry run pytest
Useful Poetry commands to maintain the environment
Add a new package:
poetry add pandas
Add a new dev package:
poetry add --group dev jupyterlab
Update the lock file (to be done after changing packages):
poetry lock
List the available packages:
poetry show
Update packages to their latest compatible versions:
poetry update
Show the Poetry configuration:
poetry config --list
Show the path of the Poetry environment:
poetry env info -p
Check validity of pyproject.toml:
poetry check
Publish the package to PyPI, after buming the version (patch):
poetry version patch
poetry "-u$PYPI_USERNAME" "-p$PYPI_PASSWORD" --build publish
Advanced topics
Running the project on Apple silicon
Situation
The project works fine with macOS Monterey @ MacBook Pro M2, with Python >= 3.8
. All extras work with no issues.
The problems start if we want to support on all platforms Python 3.7.15
(latest version supported by Google Colab).
The latest versions of pandas, scipy and numpy do not support anymore Python 3.7
, meaning we must pin older versions.
In particular, these are the latest versions supported on Colab:
scipy
:scipy==1.7.3
numpy
:numpy==1.21.6
pandas
:1.3.5
scikit-learn
:scikit-learn==1.0.2
Progress so far:
Once created an environment, we can install most of the packages without problems (wheels are mostly not available, so this is quite slow):
pip install joblib sqlalchemy pandas tqdm ulid-py sqlalchemy-utils cloudpickle colorama
The challenge is installing scikit-learn
, wbich depends on scipy==1.7.3
.
A pip install results in an NotFoundError: No BLAS/LAPACK libraries found
error. Given:
- https://stackoverflow.com/questions/74113427/install-numpy-with-pyhon-3-7-on-macbook-m1
- https://stackoverflow.com/questions/65336789/numpy-build-fail-in-m1-big-sur-11-1
- https://github.com/pypa/pipenv/issues/4564#issuecomment-865077698
We can fix this error with:
brew install openblas lapack
export SYSTEM_VERSION_COMPAT=1
pip install Cython pythran pybind11
export LDFLAGS="-L/opt/homebrew/opt/openblas/lib -L/opt/homebrew/opt/lapack/lib"
export CPPFLAGS="-I/opt/homebrew/opt/openblas/include -I/opt/homebrew/opt/lapack/include"
export LAPACK=/opt/homebrew/opt/lapack/lib/liblapack.dylib
export BLAS=/opt/homebrew/opt/openblas/lib/libopenblas.dylib
export PKG_CONFIG_PATH="-L/opt/homebrew/opt/lapack/lib/pkgconfig -L/opt/homebrew/opt/openblas/lib/pkgconfig"
pip install scipy==1.7.3 --no-use-pep517
However, we now have this new error: Undefined symbols for architecture arm64 [...] "_PyArg_ParseTuple" [...]
.
I didn't manage to fix this issue yet, and I'll likely just run tests in a virtualized x86_64 environment.
(Optional) Working with pyenv-virtualenv
We don't currently use pyenv-virtualenv, as Poetry is used to
manage the project environment. Nevertheless, I am using it
to investigate the compatibility issues with Python 3.7
,
removing the Poetry layer from the equation.
Creating and environment
Create it, and auto-activate it inside the project directory
pyenv virtualenv 3.7.15 stratosphere37
pyenv activate stratosphere37
pip install --upgrade pip
pip install wheel
pyenv local stratosphere37
Removing an environment
pyenv uninstall 3.7.15/envs/stratosphere37
rm -rf ~/.pyenv/versions/3.7.15/envs/stratosphere37
To unlink it from a project:
rm stratosphere37/.python-version
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for stratosphere-0.1.15-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bb010f4110942d2d5c029e10b7106a711c04df8323e76ce6d6290c7b1d301fdc |
|
MD5 | 011c7b17ded0eddcf477b27898e7603c |
|
BLAKE2b-256 | cb7ada1a7aa81beb54ae2d2a3a3e2c5027e923f7c523092b52255ac31c0de177 |