Beam Datascience package
Project description
BeamDS (Beam Data Science)
What is Beam for? ✨
Beam was created by data-science practitioners for data-science practitioners. It is designed as an ecosystem for developing and deploying data-driven algorithms in Python. It aims to increase productivity, efficiency, and performance in the research phase and to provide production-grade tools in the deployment part.
Our Guiding Principles ✍
- Support all phases of data-driven algorithm development:
- Data exploration
- Data manipulation, preprocessing, and ETLs (Extract, Transform and Load)
- Algorithm selection
- Algorithm training
- Hyperparameter tuning
- Model deployment
- Lifelong learning
- Production level coding from the first line of code: no more quick and dirty Proof Of Concepts (POC). Every line of code counts toward a production model.
- Consume effectively all resources: use multi-core, multi-GPUs, distributed computing, remote storage solutions, and databases to enable as much as possible productivity by the resources at hand.
- Be agile: Development and production environments can change rapidly. Beam minimizes the friction of changing environments, filesystems, and computing resources to almost zero.
- Be efficient: every line of code in Beam is optimized to be as efficient as possible and to avoid unnecessary overheads.
- Easy to deploy and use algorithms: make deployment as easy as a line of code, import remote algorithms and services by their URI, and no more.
- Excel your algorithms: Beam comes with some state-of-the-art deep neural network implementations. Beam will help you store, analyze, and return to your running experiments with ease. When you are done, with development, beam will help you optimize your hyperparameters on your GPU machines.
- Data can be a hassle: beam can manipulate complex and nested data structures, including reading, processing, chunking, multi-processing, error handling, and writing.
- Be relevant: Beam is committed to staying relevant and updating towards the future of AI, adding support for Large Language Models (LLMs) and more advanced algorithms.
- Beam is the Swiss army knife that gets into your pocket: it is easy to install and maintain and it comes with the Beam Docker Image s.t. you can start developing and creating with zero effort even without an internet connection.
Installation 🧷
To install the full package from PyPi use:
pip install beam-ds[all]
If you want to install only the data-science related components use:
pip install beam-ds[ds]
To install only the LLM (Large Language Model) related components use:
pip install beam-ds[llm]
The prerequisite packages will be installed automatically, they can be found in the setup.cfg file.
Build from source 🚂
This BeamDS implementation follows the guide at https://packaging.python.org/tutorials/packaging-projects/
install the build package:
python -m pip install --upgrade build
to reinstall the package after updates use:
- Now run this command from the same directory where pyproject.toml is located:
python -m build
- reinstall the package with pip:
pip install dist/*.whl --force-reinstall
Getting Started 🚀
There are several examples both in .py files (in the examples folder) and in jupyter notebooks (in the notebooks folder). Specifically, you can start by looking into the beam_resources.ipynb notebook which makes you familiar the different resources available in Beam.
Go To the beam_resource.ipynb page
The Beam-DS Docker Image 🛸
We provide a Docker Image which contains all the necessary packages to run Beam-DS as well as many other data-science related packages which are useful for data-science development. We use it as our base image in our daily development process. It is based on the official NVIDIA PyTorch image.
To pull the image from Docker Hub use:
docker pull eladsar/beam:20240708
Building the Beam-DS docker image from source 🌱
The docker image is based on the latest official NVIDIA pytorch image. To build the docker image from Ubuntu host, you need to:
-
update nvidia drivers to the latest version: https://linuxconfig.org/how-to-install-the-nvidia-drivers-on-ubuntu-20-04-focal-fossa-linux
-
install docker: https://docs.docker.com/desktop/linux/install/ubuntu/
-
Install NVIDIA container toolkit: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#install-guide
-
Install and configure NVIDIA container runtime: https://stackoverflow.com/a/61737404
Build the sphinx documentation
Follow https://github.com/cimarieta/sphinx-autodoc-example
Profiling your code with Scalene
Scalene is a high-performance python profiler that supports GPU profiling. To analyze your code with Scalene use the following arguments:
scalene --reduced-profile --outfile OUTFILE.html --html --- your_prog.py <your additional arguments>
Uploading the package to PyPi 🌏
- Install twine:
python -m pip install --user --upgrade twine
- Build the package:
python -m build
- Upload the package:
python -m twine upload --repository pypi dist/*
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file beam_ds-2.7.2-py3-none-any.whl
.
File metadata
- Download URL: beam_ds-2.7.2-py3-none-any.whl
- Upload date:
- Size: 526.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | aa18025e5e89902939f0780a2d895e409d5cd19c6a575de8303f935228ed7084 |
|
MD5 | b9ba774fe150c1cb6601a8f8f8361966 |
|
BLAKE2b-256 | 39d603efbd85780a9feb66591c08cebf1e4b6d36646978aacce4e1285cc634f2 |