Beam Datascience package

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Reason this release was yanked:

contains import error

Project description

BeamDS (Beam Data Science)

What is Beam for?

Beam was created by data-science practitioners for data-science practitioners. It is designed as an ecosystem for developing and deploying data-driven algorithms in Python. It aims to increase productivity, efficiency, and performance in the research phase and to provide production-grade tools in the deployment part.

Our Guiding Principles

Support all phases of data-driven algorithm development:
1. Data exploration
2. Data manipulation, preprocessing, and ETLs (Extract, Transform and Load)
3. Algorithm selection
4. Algorithm training
5. Hyperparameter tuning
6. odel deployment
7. Lifelong learning
Production level coding from the first line of code: no more quick and dirty Proof Of Concepts (POC). Every line of code counts toward a production model.
Consume effectively all resources: use multi-core, multi-GPUs, distributed computing, remote storage solutions, and databases to enable as much as possible productivity by the resources at hand.
Be agile: Development and production environments can change rapidly. Beam minimizes the friction of changing environments, filesystems, and computing resources to almost zero.
Be efficient: every line of code in Beam is optimized to be as efficient as possible and to avoid unnecessary overheads.
Easy to deploy and use algorithms: make deployment as easy as a line of code, import remote algorithms and services by their URI, and no more.
Excel your algorithms: Beam comes with some state-of-the-art deep neural network implementations. Beam will help you store, analyze, and return to your running experiments with ease. When you are done, with development, beam will help you optimize your hyperparameters on your GPU machines.
Data can be a hassle: beam can manipulate complex and nested data structures, including reading, processing, chunking, multi-processing, error handling, and writing.
Be relevant: Beam is committed to staying relevant and updating towards the future of AI, adding support for Large Language Models (LLMs) and more advanced algorithms.
Beam is the Swiss army knife that gets into your pocket: it is easy to install and maintain and it comes with the Beam Docker Image s.t. you can start developing and creating with zero effort even without an internet connection.

Installation

To install the full package from PyPi use:

pip install beam-ds[all]

If you want to install only the data-science related components use:

pip install beam-ds[ds]

To install only the LLM (Large Language Model) related components use:

pip install beam-ds[llm]

The prerequisite packages will be installed automatically, they can be found in the setup.cfg file.

Build from source

This BeamDS implementation follows the guide at https://packaging.python.org/tutorials/packaging-projects/

install the build package:

python -m pip install --upgrade build

to reinstall the package after updates use:

Now run this command from the same directory where pyproject.toml is located:

python -m build

reinstall the package with pip:

pip install dist/*.whl --force-reinstall

Getting Started

There are several examples both in .py files (in the examples folder) and in jupyter notebooks (in the notebooks folder). Specifically, you can start by looking into the beam_resources.ipynb notebook which makes you familiar the different resources available in Beam.

Go To the beam_resource.ipynb page

Building the Beam-DS docker image

The docker image is based on the latest official NVIDIA pytorch image. To build the docker image from Ubuntu host, you need to:

update nvidia drivers to the latest version: https://linuxconfig.org/how-to-install-the-nvidia-drivers-on-ubuntu-20-04-focal-fossa-linux
install docker: https://docs.docker.com/desktop/linux/install/ubuntu/
Install NVIDIA container toolkit: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#install-guide
Install and configure NVIDIA container runtime: https://stackoverflow.com/a/61737404

Build the sphinx documentation

Follow https://github.com/cimarieta/sphinx-autodoc-example

Profiling your code with Scalene

Scalene is a high-performance python profiler that supports GPU profiling. To analyze your code with Scalene use the following arguments:

scalene --reduced-profile --outfile OUTFILE.html --html --- your_prog.py <your additional arguments>

Uploading the package to PyPi

Install twine:

python -m pip install --user --upgrade twine

Build the package:

python -m build

Upload the package:

python -m twine upload --repository pypi dist/*

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

2.5.4

May 27, 2024

2.5.0b0 pre-release

Mar 31, 2024

2.4.9b0 pre-release

Mar 19, 2024

2.4.0b1 pre-release

Jan 6, 2024

This version

2.3.8 yanked

Dec 27, 2023

Reason this release was yanked:

contains import error

2.3.3

Nov 16, 2023

2.3.2

Nov 14, 2023

2.3.0

Oct 26, 2023

2.2.0a1 pre-release

Sep 6, 2023

2.0.3

May 9, 2023

2.0.2

Apr 16, 2023

2.0.1

Apr 16, 2023

2.0.0

Apr 16, 2023

0.2.1

Nov 22, 2022

0.2.0

Sep 29, 2022

0.1.1

Sep 5, 2022

0.0.10

Jun 18, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

beam_ds-2.3.8-py3-none-any.whl (229.7 kB view hashes)

Uploaded Dec 27, 2023 Python 3

Hashes for beam_ds-2.3.8-py3-none-any.whl

Hashes for beam_ds-2.3.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d4ff78719a3af3a4d0ff679d1bc50ddcf9ef85f1024e2f12f6909d554fab0897`
MD5	`c87cc2e6d68aee11a2273ecd32b8bb56`
BLAKE2b-256	`baccba3bb4c27d2fd52ce44a9dd936ce8ffb84eeed769a35b765ba2ee8c164b4`