Skip to main content

A statistics and machine learning package.

Project description

🔥 Ember

Ember is a statistics and ML library for my personal use with C++ and Python. I mainly built it for educational purposes, but it's quite functional and can be used to train several datasets.

Look here to see the methods it supports.

Installation

System Support

This library supports both x86_64/amd64 and arm64/aarch64. Check if your system is supported out of the box in the table below. The library requires very few dependencies, so as long as your machine supports a C++ compiler and python, you should be able to get it working by fiddling with the CMake and setuptools files.

x86_64 Python 3.13 Python 3.12 Python 3.11 Python 3.10 Python 3.9 Python 3.8 Python 3.7
Ubuntu 24.04
Ubuntu 22.04
Ubuntu 20.04
ArchLinux 6.6.68 LTS
Debian 13
Debian 12
Debian 11
Debian 10
LinuxMint 22
LinuxMint 21
MacOS 10.15 Catalina
MacOS 10.14 Mojave
MacOS 10.13 High Sierra
MacOS 10.12 Sierra
MacOS 10.11 El Capitan
MacOS 10.10 Yosemite
MacOS 10.9 Mavericks
MacOS 10.8 Mountain Lion
MacOS 10.7 Lion
Windows 11
Windows 10
Windows 8
Windows 7
ARM64 Python 3.13 Python 3.12 Python 3.11 Python 3.10 Python 3.9 Python 3.8 Python 3.7
Ubuntu 24.04
Ubuntu 22.04
Ubuntu 20.04
MacOS 15.x Sequoia
MacOS 14.x Sonoma
MacOS 13.x Ventura
MacOS 12.x Monterey
MacOS 11.x Big Sur
Windows 12

Compiling the aten Library

Your machine will need system dependencies such as CMake, a C++ compiler, and pybind11. The library uses C++17. Preferably you will have git and conda installed already. For more specific instructions on installing these on your system, refer to the more detailed installation guide.

Git clone the repo, then pip install, which will run setup.py.

git clone git@github.com:mbahng/pyember.git 
cd pyember 
pip install .

This runs cmake on aten/CMakeLists.txt, which calls the following.

  1. It always calls aten/src/CMakeLists.txt that compiles and links the source files in the C++ tensor library.
  2. If BUILD_PYTHON_BINDINGS=ON (always on by default), it further calls aten/bindings/CMakeLists.txt to further generate a .so file that can be imported into ember.
  3. If BUILD_DEV=ON, it calls aten/test/CMakeLists.txt to further compile the C++ unit testing suite.

If there are problems with building, you should check, in order,

  1. Whether build/ has been created. This is the first step in setup.py
  2. Whether the compiled main.cpp and, if BUILD_DEV=ON, the C++ unit test files have been compiled, i.e. if build/src/main and build/test/tests executables exist.
  3. Whether build/*/aten.cpython-3**-darwin.so exists (somewhere in the build directory, depending on the machine). The Makefile generated by aten/bindings/CMakeLists.txt will produce build/*/aten.cpython-3**-darwin.so.
  4. The setup() function will immediately copy this .so file to ember/aten.cpython-3**-darwin.so. You should see a success message saying that it has been moved or an error. The .so file must live within ember, the actual library, since ember/__init__.py must access it within the same directory level.

Testing and Development

The pip install comes with two more environment variable parameters. Note that the following command is whitespace-sensitive.

CMAKE_DEBUG=1 CMAKE_DEV=1 pip install .
  1. Setting CMAKE_DEBUG=1 compiles the aten library with debug mode (-g) on, which I use when using gdb/lldb on the compiled code.
  2. Setting CMAKE_DEV=1 compiles the C++ testing suite as well. If you want to do this, you will also need to install google-tests. A code snippet for Ubuntu and Debian is shown below.
sudo apt-get install libgtest-dev 
cd /usr/src/gtest 
cmake CMakeLists.txt 
make 
cp lib/*.a /usr/lib 
rm -rf /var/lib/apt/lists/*

If you would like to run tests and/or develop the package yourself, you can run the script ./run_tests.sh all (args python to run just python tests and cpp to run just C++ tests), which will

  1. Run all C++ unit tests for aten, ensuring that all functions work correctly.
  2. Run all Python unit tests for ember, ensuring that additional functions work correctly and that the C++ functions are bound correctly.

The stub (.pyi) files for aten are located in ember/aten.

Repository Structure

I tried to model a lot of the structure from Pytorch and TinyGrad. Very briefly,

  1. aten/ contains the header and source files for the C++ low-level tensor library, such as basic operations and an autograd engine.
    1. aten/src contains all the source files and definitions.
    2. aten/bindings contains the pybindings.
    3. aten/test contains all the C++ testing modules for aten.
  2. ember/ contains the actual library, supporting high level models, objectives, optimizers, dataloaders, and samplers.
    1. ember/aten contains the stub files.
    2. ember/datasets contains all preprocessing tools, such as datasets/loaders, standardizing, cross validation checks.
    3. ember/models contains all machine learning models.
    4. ember/objectives contain all loss functions and regularizers.
    5. ember/optimizers contain all the optimizers/solvers, such as iterative (e.g. SGD), greedy (e.g. decision tree splitting), and one-shot (e.g. least-squares solution).
    6. ember/samplers contain all samplers (e.g. MCMC, SGLD).
  3. docs/ contains detailed documentation about each function.
  4. examples/ are example python scripts on training models.
  5. tests/ are python testing modules for the ember library.
  6. docker/ contains docker images of all the operating systems and architectures I tested ember on. General workflows on setting up the environment can be found there for supported machines.
  7. setup.py allows you to pip install this as a package.
  8. run_tests.sh which is the main test running script.

For a more detailed explanation, look here.

Getting Started

Ember Tensors and GradTensors

ember.Tensors represent data and parameters, while ember.GradTensors represent gradients. An advantage of this package is that rather than just supporting batch vector operations and matrix multiplications, we can also perform general contractions of rank $(N, M)$-tensors, a generalization of matrix multiplication. This allows us to represent and utilize the full power of higher order derivatives for arbitrary functions $f: \mathbb{R}^{\mathbf{M}} \rightarrow \mathbb{R}^{\mathbf{N}}$, where $\mathbf{M} = (M_1, \ldots, M_m)$ and $\mathbf{N} = (N_1, \ldots, N_m)$ are vectors, not just scalars, representing the dimension of each space.

Tensors are multidimensional arrays that can be initialized in a number of ways. GradTensors are initialized during the backpropagation method, but we can explicitly set them if desired.

import ember 

a = ember.Tensor([2]) # scalar
b = ember.Tensor([1, 2, 3])  # vector 
c = ember.Tensor([[1, 2], [3, 4]]) # 2D vector 
d = ember.Tensor([[[1, 2]]]) # 3D vector

Say that you have a series of elementary operations on tensors.

a = ember.Tensor([2, -3]) 
h = a ** 2
b = ember.Tensor([3, 5])

c = b * h

d = ember.Tensor([10, 1])
e = c.dot(d)

f = ember.Tensor([-2])

g = f * e

Automatic Differentiation

The C++ backend computes a directed acyclic graph (DAG) representing the operations done to compute g. You can then run g.backprop() to compute the gradients by applying the chain rule. This constructs the DAG and returns a topological sorting of its nodes. The gradients themselves, which are technically Jacobian matrices, are updated, with each mapping x -> y constructing a gradient tensor on x with value dy/dx. The gradients can be either accumulated by setting backprop(intermediate=False) so that the chain rule is not applied yet, or we can set =True to apply the chain rule to calculate the derivative of the tensor we called backprop on w.r.t. the rest of the tensors.

top_sort = g.backprop()
print(a.grad) # [[4.0, 0.0], [0.0, -6.0]]
print(h.grad) # [[3.0, 0.0], [0.0, 5.0]]
print(b.grad) # [[4.0, 0.0], [0.0, 9.0]]
print(c.grad) # [[10.0, 1.0]]
print(d.grad) # [[12.0, 45.0]]
print(e.grad) # [[-2.0]]
print(f.grad) # [[165.0]]
print(g.grad) # [[1.0]]

Finally, we can visualize this using the networkx package.

Alt text

Linear Regression

To perform linear regression, use the LinearRegression model.

import ember 

ds = ember.datasets.LinearDataset(N=20, D=14)
dl = ember.datasets.Dataloader(ds, batch_size=2)
model = ember.models.LinearRegression(15) 
mse = ember.objectives.MSELoss()

for epoch in range(500): 
  loss = None
  for x, y in dl: 
    y_ = model.forward(x)  
    loss = mse(y, y_)
    loss.backprop()
    model.step(1e-5) 

  print(loss)

K Nearest Neighbors

To do a simple K Nearest Neighbors regressor, use the following model. The forward method scans over the whole dataset, so we must input it to the model during instantiation. Note that we do not need a dataloader or a backpropagation method since we aren't iteratively updating gradients, though we want to show the loss.

import ember
from ember.models import KNearestRegressor
from ember.datasets import LinearDataset

ds = LinearDataset(N=20, D=3)
model = KNearestRegressor(dataset=ds, K=1)
mse = ember.objectives.MSELoss() 

for k in range(1, 21): # hyperparameter tuning
  model.K = k
  print(f"{k} ===") 
  loss = 0
  for i in range(len(ds)): 
    x, y = ds[i] 
    y_ = model.forward(x) 
    loss = loss + mse(y, y_) 

  print(loss)

Multilayer Perceptrons

To instantiate a MLP, just call it from models. In here we make a 2-layer MLP with a dummy dataset. For now only SGD with batch size 1 is supported.

import ember 

ds = ember.datasets.LinearDataset(N=20, D=14)
dl = ember.datasets.Dataloader(ds, batch_size=2)
model = ember.models.MultiLayerPerceptron(15, 10) 
mse = ember.objectives.MSELoss()

for epoch in range(500):  
  loss = None
  for x, y in dl: 
    y_ = model.forward(x) 
    loss = mse(y, y_)
    loss.backprop() 
    model.step(1e-5)

  print(loss)

Its outputs over 1 minute.

LOSS = 256733.64437981808
LOSS = 203239.08846901066
LOSS = 160223.4554735339
LOSS = 125704.33716141782
LOSS = 98074.96981384761
LOSS = 76026.19871949886
LOSS = 58491.92389906721
LOSS = 44604.493032865605
LOSS = 33658.23285350788
LOSS = 25079.638682869212
LOSS = 18403.01062298029
LOSS = 13250.54496118543
LOSS = 9316.069468116035
LOSS = 6351.758695807299
LOSS = 4157.286052245369
LOSS = 2570.96819208677
LOSS = 1462.5380952427417
LOSS = 727.2493587808174
LOSS = 281.0683664354656
LOSS = 56.75530418715159

Datasets

Models and Training

Monte Carlo Samplers

Contributing

To implement a new functionality in the aten library, you must

  1. Add the class or function header in aten/src/Tensor.h
  2. Add the implementation in the correct file (or create a new one) in aten./*Tensor/*.cpp. Make sure to update aten/bindings/CMakeLists.txt if needed.
  3. Add its pybindings (if a public function that will be used in ember) in aten/bindings/*bindings.cpp. Make sure to update aten/bindings/CMakeLists.txt if needed.
  4. Add relevant C++ tests in aten/test/.
  5. Not necessary, but it's good to test it out on a personal script for a sanity check.
  6. Add to the stub files in ember/aten/*.pyi.
  7. Add Python tests in test/.
  8. If everything passes, you can submit a pull request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pyember-0.0.15-cp313-cp313-win_amd64.whl (237.2 kB view details)

Uploaded CPython 3.13Windows x86-64

pyember-0.0.15-cp312-cp312-win_amd64.whl (237.1 kB view details)

Uploaded CPython 3.12Windows x86-64

pyember-0.0.15-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (390.8 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

pyember-0.0.15-cp311-cp311-win_amd64.whl (235.8 kB view details)

Uploaded CPython 3.11Windows x86-64

pyember-0.0.15-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (391.2 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

pyember-0.0.15-cp310-cp310-win_amd64.whl (234.4 kB view details)

Uploaded CPython 3.10Windows x86-64

pyember-0.0.15-cp310-cp310-macosx_11_0_arm64.whl (654.7 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

pyember-0.0.15-cp39-cp39-win_amd64.whl (228.3 kB view details)

Uploaded CPython 3.9Windows x86-64

File details

Details for the file pyember-0.0.15-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: pyember-0.0.15-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 237.2 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.8

File hashes

Hashes for pyember-0.0.15-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 5cb092ded3083d66d7edee96b5f9057b0f514f0870951f88b0da25080a950068
MD5 c2b447e5eeb3ac6dd0f509848120dbc0
BLAKE2b-256 a8f12ab9f8e1f37f33f421cd10573ab9cf4a9183f69fb002664bb85cbf4b4486

See more details on using hashes here.

File details

Details for the file pyember-0.0.15-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: pyember-0.0.15-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 237.1 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.8

File hashes

Hashes for pyember-0.0.15-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 173cea67d75e9b4660cd841181dc4270a866a0ff3a3839cdaff4764e5e348cbb
MD5 7d96545a404982c005fd7985ab90c3e0
BLAKE2b-256 2a814422ecde0d1bee8f02e7316ca9ef044a2e792dd267815edbbbdb1f50fb7b

See more details on using hashes here.

File details

Details for the file pyember-0.0.15-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyember-0.0.15-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 3e441aa672c97d38058e6adf746ab0659085fdba775d282bd507d16d48a788f0
MD5 8be78181dd49917bb085f69cb33bb9c8
BLAKE2b-256 a1f40f0cb2753ce4b8ec834323fbfca4adb63ec8c59ab10bb7955ec6696a1f2c

See more details on using hashes here.

File details

Details for the file pyember-0.0.15-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: pyember-0.0.15-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 235.8 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.8

File hashes

Hashes for pyember-0.0.15-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 7ea073f6195405e2379cf32b52e268d310b8be45a93893a30500e4a59aa173e7
MD5 b1cc25ec1165ddfa9d943a81d9f1aa2b
BLAKE2b-256 89170c0246b60f2ebc8d263aaec910a856982c4fd80870d9a5c8fae6d52831f9

See more details on using hashes here.

File details

Details for the file pyember-0.0.15-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyember-0.0.15-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 e5389eda6c5f0b89bb27fd4499cf345b8739927ac8cebaf1ccf56bff7d988ea9
MD5 4f3fd7201f8b95ba3e1f44f9fd3b90eb
BLAKE2b-256 e7be1f45103c841c9d352cd5e624885ea513b931b806315d7689441cf7441216

See more details on using hashes here.

File details

Details for the file pyember-0.0.15-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: pyember-0.0.15-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 234.4 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.8

File hashes

Hashes for pyember-0.0.15-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 f53129626e9a79f121419856bcbf5de9474e4c52e8affb4e29b087cd5877a991
MD5 d0536c3f8f22d1242281aa35fece04af
BLAKE2b-256 771091b110b596a36e8eb4b878a02142679a3b96b7d43cdc80b792ab292d7df7

See more details on using hashes here.

File details

Details for the file pyember-0.0.15-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pyember-0.0.15-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4923c35ce8096cbf5e16bf1e9868229336f3645af755c3512fc21cdc6a518aeb
MD5 db7f3e9bacd5f4760e3bbc1d54e26407
BLAKE2b-256 1186d45a0cbee306a25db6a7ebd1369aff38b6d7caa895057135437382fde25a

See more details on using hashes here.

File details

Details for the file pyember-0.0.15-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: pyember-0.0.15-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 228.3 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.8

File hashes

Hashes for pyember-0.0.15-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 36ee4d3da158c892faa034271e819d7cbf63a4da2241a2c8ce19c83133f53c24
MD5 45b8276101635a978dba8d5c40f35c50
BLAKE2b-256 bc33e333f2298dbe6d4c9fc31e02172cd12af67445244b3350d3af023c7596ea

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page