Skip to main content

A statistics and machine learning package.

Project description

🔥 Ember

Ember is a statistics and ML library for my personal use with C++ and Python. I mainly built it for educational purposes, but it's quite functional and can be used to train several datasets.

Look here to see the methods it supports.

Installation

System Support

This library supports both x86_64/amd64 and arm64/aarch64. Check if your system is supported out of the box in the table below. The library requires very few dependencies, so as long as your machine supports a C++ compiler and python, you should be able to get it working by fiddling with the CMake and setuptools files.

x86_64 Python 3.13 Python 3.12 Python 3.11 Python 3.10 Python 3.9 Python 3.8 Python 3.7
Ubuntu 24.04
Ubuntu 22.04
Ubuntu 20.04
ArchLinux 6.6.68 LTS
Debian 13
Debian 12
Debian 11
Debian 10
LinuxMint 22
LinuxMint 21
MacOS 10.15 Catalina
MacOS 10.14 Mojave
MacOS 10.13 High Sierra
MacOS 10.12 Sierra
MacOS 10.11 El Capitan
MacOS 10.10 Yosemite
MacOS 10.9 Mavericks
MacOS 10.8 Mountain Lion
MacOS 10.7 Lion
Windows 11
Windows 10
Windows 8
Windows 7
ARM64 Python 3.13 Python 3.12 Python 3.11 Python 3.10 Python 3.9 Python 3.8 Python 3.7
Ubuntu 24.04
Ubuntu 22.04
Ubuntu 20.04
MacOS 15.x Sequoia
MacOS 14.x Sonoma
MacOS 13.x Ventura
MacOS 12.x Monterey
MacOS 11.x Big Sur
Windows 12

Compiling the aten Library

Your machine will need system dependencies such as CMake, a C++ compiler, and pybind11. The library uses C++17. Preferably you will have git and conda installed already. For more specific instructions on installing these on your system, refer to the more detailed installation guide.

Git clone the repo, then pip install, which will run setup.py.

git clone git@github.com:mbahng/pyember.git 
cd pyember 
pip install .

This runs cmake on aten/CMakeLists.txt, which calls the following.

  1. It always calls aten/src/CMakeLists.txt that compiles and links the source files in the C++ tensor library.
  2. If BUILD_PYTHON_BINDINGS=ON (always on by default), it further calls aten/bindings/CMakeLists.txt to further generate a .so file that can be imported into ember.
  3. If BUILD_DEV=ON, it calls aten/test/CMakeLists.txt to further compile the C++ unit testing suite.

If there are problems with building, you should check, in order,

  1. Whether build/ has been created. This is the first step in setup.py
  2. Whether the compiled main.cpp and, if BUILD_DEV=ON, the C++ unit test files have been compiled, i.e. if build/src/main and build/test/tests executables exist.
  3. Whether build/*/aten.cpython-3**-darwin.so exists (somewhere in the build directory, depending on the machine). The Makefile generated by aten/bindings/CMakeLists.txt will produce build/*/aten.cpython-3**-darwin.so.
  4. The setup() function will immediately copy this .so file to ember/aten.cpython-3**-darwin.so. You should see a success message saying that it has been moved or an error. The .so file must live within ember, the actual library, since ember/__init__.py must access it within the same directory level.

Testing and Development

The pip install comes with two more environment variable parameters. Note that the following command is whitespace-sensitive.

CMAKE_DEBUG=1 CMAKE_DEV=1 pip install .
  1. Setting CMAKE_DEBUG=1 compiles the aten library with debug mode (-g) on, which I use when using gdb/lldb on the compiled code.
  2. Setting CMAKE_DEV=1 compiles the C++ testing suite as well. If you want to do this, you will also need to install google-tests. A code snippet for Ubuntu and Debian is shown below.
sudo apt-get install libgtest-dev 
cd /usr/src/gtest 
cmake CMakeLists.txt 
make 
cp lib/*.a /usr/lib 
rm -rf /var/lib/apt/lists/*

If you would like to run tests and/or develop the package yourself, you can run the script ./run_tests.sh all (args python to run just python tests and cpp to run just C++ tests), which will

  1. Run all C++ unit tests for aten, ensuring that all functions work correctly.
  2. Run all Python unit tests for ember, ensuring that additional functions work correctly and that the C++ functions are bound correctly.

The stub (.pyi) files for aten are located in ember/aten.

Repository Structure

I tried to model a lot of the structure from Pytorch and TinyGrad. Very briefly,

  1. aten/ contains the header and source files for the C++ low-level tensor library, such as basic operations and an autograd engine.
    1. aten/src contains all the source files and definitions.
    2. aten/bindings contains the pybindings.
    3. aten/test contains all the C++ testing modules for aten.
  2. ember/ contains the actual library, supporting high level models, objectives, optimizers, dataloaders, and samplers.
    1. ember/aten contains the stub files.
    2. ember/datasets contains all preprocessing tools, such as datasets/loaders, standardizing, cross validation checks.
    3. ember/models contains all machine learning models.
    4. ember/objectives contain all loss functions and regularizers.
    5. ember/optimizers contain all the optimizers/solvers, such as iterative (e.g. SGD), greedy (e.g. decision tree splitting), and one-shot (e.g. least-squares solution).
    6. ember/samplers contain all samplers (e.g. MCMC, SGLD).
  3. docs/ contains detailed documentation about each function.
  4. examples/ are example python scripts on training models.
  5. tests/ are python testing modules for the ember library.
  6. docker/ contains docker images of all the operating systems and architectures I tested ember on. General workflows on setting up the environment can be found there for supported machines.
  7. setup.py allows you to pip install this as a package.
  8. run_tests.sh which is the main test running script.

For a more detailed explanation, look here.

Getting Started

Ember Tensors and GradTensors

ember.Tensors represent data and parameters, while ember.GradTensors represent gradients. An advantage of this package is that rather than just supporting batch vector operations and matrix multiplications, we can also perform general contractions of rank $(N, M)$-tensors, a generalization of matrix multiplication. This allows us to represent and utilize the full power of higher order derivatives for arbitrary functions $f: \mathbb{R}^{\mathbf{M}} \rightarrow \mathbb{R}^{\mathbf{N}}$, where $\mathbf{M} = (M_1, \ldots, M_m)$ and $\mathbf{N} = (N_1, \ldots, N_m)$ are vectors, not just scalars, representing the dimension of each space.

Tensors are multidimensional arrays that can be initialized in a number of ways. GradTensors are initialized during the backpropagation method, but we can explicitly set them if desired.

import ember 

a = ember.Tensor([2]) # scalar
b = ember.Tensor([1, 2, 3])  # vector 
c = ember.Tensor([[1, 2], [3, 4]]) # 2D vector 
d = ember.Tensor([[[1, 2]]]) # 3D vector

Say that you have a series of elementary operations on tensors.

a = ember.Tensor([2, -3]) 
h = a ** 2
b = ember.Tensor([3, 5])

c = b * h

d = ember.Tensor([10, 1])
e = c.dot(d)

f = ember.Tensor([-2])

g = f * e

Automatic Differentiation

The C++ backend computes a directed acyclic graph (DAG) representing the operations done to compute g. You can then run g.backprop() to compute the gradients by applying the chain rule. This constructs the DAG and returns a topological sorting of its nodes. The gradients themselves, which are technically Jacobian matrices, are updated, with each mapping x -> y constructing a gradient tensor on x with value dy/dx. The gradients can be either accumulated by setting backprop(intermediate=False) so that the chain rule is not applied yet, or we can set =True to apply the chain rule to calculate the derivative of the tensor we called backprop on w.r.t. the rest of the tensors.

top_sort = g.backprop()
print(a.grad) # [[4.0, 0.0], [0.0, -6.0]]
print(h.grad) # [[3.0, 0.0], [0.0, 5.0]]
print(b.grad) # [[4.0, 0.0], [0.0, 9.0]]
print(c.grad) # [[10.0, 1.0]]
print(d.grad) # [[12.0, 45.0]]
print(e.grad) # [[-2.0]]
print(f.grad) # [[165.0]]
print(g.grad) # [[1.0]]

Finally, we can visualize this using the networkx package.

Alt text

Linear Regression

To perform linear regression, use the LinearRegression model.

import ember 

ds = ember.datasets.LinearDataset(N=20, D=14)
dl = ember.datasets.Dataloader(ds, batch_size=2)
model = ember.models.LinearRegression(15) 
mse = ember.objectives.MSELoss()

for epoch in range(500): 
  loss = None
  for x, y in dl: 
    y_ = model.forward(x)  
    loss = mse(y, y_)
    loss.backprop()
    model.step(1e-5) 

  print(loss)

K Nearest Neighbors

To do a simple K Nearest Neighbors regressor, use the following model. The forward method scans over the whole dataset, so we must input it to the model during instantiation. Note that we do not need a dataloader or a backpropagation method since we aren't iteratively updating gradients, though we want to show the loss.

import ember
from ember.models import KNearestRegressor
from ember.datasets import LinearDataset

ds = LinearDataset(N=20, D=3)
model = KNearestRegressor(dataset=ds, K=1)
mse = ember.objectives.MSELoss() 

for k in range(1, 21): # hyperparameter tuning
  model.K = k
  print(f"{k} ===") 
  loss = 0
  for i in range(len(ds)): 
    x, y = ds[i] 
    y_ = model.forward(x) 
    loss = loss + mse(y, y_) 

  print(loss)

Multilayer Perceptrons

To instantiate a MLP, just call it from models. In here we make a 2-layer MLP with a dummy dataset. For now only SGD with batch size 1 is supported.

import ember 

ds = ember.datasets.LinearDataset(N=20, D=14)
dl = ember.datasets.Dataloader(ds, batch_size=2)
model = ember.models.MultiLayerPerceptron(15, 10) 
mse = ember.objectives.MSELoss()

for epoch in range(500):  
  loss = None
  for x, y in dl: 
    y_ = model.forward(x) 
    loss = mse(y, y_)
    loss.backprop() 
    model.step(1e-5)

  print(loss)

Its outputs over 1 minute.

LOSS = 256733.64437981808
LOSS = 203239.08846901066
LOSS = 160223.4554735339
LOSS = 125704.33716141782
LOSS = 98074.96981384761
LOSS = 76026.19871949886
LOSS = 58491.92389906721
LOSS = 44604.493032865605
LOSS = 33658.23285350788
LOSS = 25079.638682869212
LOSS = 18403.01062298029
LOSS = 13250.54496118543
LOSS = 9316.069468116035
LOSS = 6351.758695807299
LOSS = 4157.286052245369
LOSS = 2570.96819208677
LOSS = 1462.5380952427417
LOSS = 727.2493587808174
LOSS = 281.0683664354656
LOSS = 56.75530418715159

Datasets

Models and Training

Monte Carlo Samplers

Contributing

To implement a new functionality in the aten library, you must

  1. Add the class or function header in aten/src/Tensor.h
  2. Add the implementation in the correct file (or create a new one) in aten./*Tensor/*.cpp. Make sure to update aten/bindings/CMakeLists.txt if needed.
  3. Add its pybindings (if a public function that will be used in ember) in aten/bindings/*bindings.cpp. Make sure to update aten/bindings/CMakeLists.txt if needed.
  4. Add relevant C++ tests in aten/test/.
  5. Not necessary, but it's good to test it out on a personal script for a sanity check.
  6. Add to the stub files in ember/aten/*.pyi.
  7. Add Python tests in test/.
  8. If everything passes, you can submit a pull request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pyember-0.0.16-cp313-cp313-win_amd64.whl (237.2 kB view details)

Uploaded CPython 3.13Windows x86-64

pyember-0.0.16-cp313-cp313-macosx_11_0_arm64.whl (654.9 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

pyember-0.0.16-cp312-cp312-win_amd64.whl (237.1 kB view details)

Uploaded CPython 3.12Windows x86-64

pyember-0.0.16-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (390.8 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

pyember-0.0.16-cp312-cp312-macosx_11_0_arm64.whl (654.3 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

pyember-0.0.16-cp311-cp311-win_amd64.whl (235.8 kB view details)

Uploaded CPython 3.11Windows x86-64

pyember-0.0.16-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (391.2 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

pyember-0.0.16-cp311-cp311-macosx_11_0_arm64.whl (657.1 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

pyember-0.0.16-cp310-cp310-win_amd64.whl (234.5 kB view details)

Uploaded CPython 3.10Windows x86-64

pyember-0.0.16-cp310-cp310-macosx_11_0_arm64.whl (654.8 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

pyember-0.0.16-cp39-cp39-win_amd64.whl (228.3 kB view details)

Uploaded CPython 3.9Windows x86-64

pyember-0.0.16-cp39-cp39-macosx_11_0_arm64.whl (655.1 kB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

File details

Details for the file pyember-0.0.16-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: pyember-0.0.16-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 237.2 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.8

File hashes

Hashes for pyember-0.0.16-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 d02973117f82dc64605dd857371564794e377f2b32b89cde74eb26b3d0202808
MD5 b4c90471da85d6da90ba261464393e42
BLAKE2b-256 75a785aaf2660d5f9acc746e8b320f3167104892948442806be8328cb8e7834b

See more details on using hashes here.

File details

Details for the file pyember-0.0.16-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pyember-0.0.16-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8c552fb16596fc75374e01242334f1520c7de1a70811de36ac4d63df180191ba
MD5 697297ff270f9cc36705dfa19c45260f
BLAKE2b-256 b019bc6e797cd5aad7036715d1d473b0fc95c00c116020cf0044e258621e5190

See more details on using hashes here.

File details

Details for the file pyember-0.0.16-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: pyember-0.0.16-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 237.1 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.8

File hashes

Hashes for pyember-0.0.16-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 57abfcbe2ff473f84865d76c7104a99acddce30d7cc27b2e8d902d722cf81961
MD5 9ebe8e62cedfae50dbf78a4fd898c1aa
BLAKE2b-256 795d25543debfc768ad180443a696435d2274944c97cc671462df2d0a12eaecf

See more details on using hashes here.

File details

Details for the file pyember-0.0.16-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyember-0.0.16-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b3154005cbfc05d4c928c62e7b6b1988a83d62fe35beba45a348fdcab916c843
MD5 549872e37ad32c8d17e65b222a774a32
BLAKE2b-256 641cc2082e5497515684ff13e129d8323488574afd03855994a33170fcd64360

See more details on using hashes here.

File details

Details for the file pyember-0.0.16-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pyember-0.0.16-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 037864eaf7d9081287ab1b71fa08d08601bea55114b28811ad5cece19a297b27
MD5 dc5829a3421fdb45391188b39e0781d2
BLAKE2b-256 052010558f89b5b14be35b14b266a197ebca976fabcadee805f9bf73d2235cff

See more details on using hashes here.

File details

Details for the file pyember-0.0.16-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: pyember-0.0.16-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 235.8 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.8

File hashes

Hashes for pyember-0.0.16-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 02ddd4c5dc9574b3bb13905efca74ce4f89ba452e815729b8a3b5f9c6e3272ef
MD5 34a66e0d3bc70f479fb51ffb9c65eb3f
BLAKE2b-256 ea6be94b487c8bd99c7f30449721bbbbb29aa6ac34ed5966110c4f39ec2e106f

See more details on using hashes here.

File details

Details for the file pyember-0.0.16-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyember-0.0.16-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 819e696a72fa57e69495dd6218a28c01cdbadefd3d6f097834bd83b118a8ae6d
MD5 20ae2674604c5a021da707ffa0abbb09
BLAKE2b-256 f5939c710dd8bd3c3dde18f820dc2aacb9c14083330ea73275f988c020643697

See more details on using hashes here.

File details

Details for the file pyember-0.0.16-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pyember-0.0.16-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 370bede977c51a99babcd4b2c5c36282b3049020e77779aee7ed5c2dd7e98a6b
MD5 281b55f07380bcfe590c1a644dd652af
BLAKE2b-256 162977ac4ca61605dd49145fb2eb3260a4f49570fbafd83f1993ea353694a857

See more details on using hashes here.

File details

Details for the file pyember-0.0.16-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: pyember-0.0.16-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 234.5 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.8

File hashes

Hashes for pyember-0.0.16-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 bc2acc8dd6f3df88a1763e158dc8a3183ed9b9b9a0cea500f6de53f8c8ccecac
MD5 94778e55a6d422f405daa07473fe2184
BLAKE2b-256 c61c88705f7ab6e27e582564fcdcec0d2284120b1362ec25d6cab0953a7e3def

See more details on using hashes here.

File details

Details for the file pyember-0.0.16-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pyember-0.0.16-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9036343cc6af54f58c710505b130ca13ee9100f9cc88c513ab7383b49c423fa7
MD5 3f57a707b22d153713ddb0b07bd5839c
BLAKE2b-256 e89ad0c9abe9bb464b2703394d6fd0f4af74e58ccc5db26fbdcc15b1e69d4c4f

See more details on using hashes here.

File details

Details for the file pyember-0.0.16-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: pyember-0.0.16-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 228.3 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.8

File hashes

Hashes for pyember-0.0.16-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 43469054a7319cf3ba160ffd9f6493697adc62f61bd8510c0e127db2c9dba071
MD5 fd1548c6aebbe2572a2334422a424616
BLAKE2b-256 4d5458dc6aeb1cf990c419fe4760adf188ce90a50afff04978801c6d1482430e

See more details on using hashes here.

File details

Details for the file pyember-0.0.16-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pyember-0.0.16-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f935035904fdd4dd96ef46751852cf796a93494262d04264867e2be9bcbbd1f2
MD5 2389530377f9a920ade83ba25c8e7a53
BLAKE2b-256 42a915fbc0c94cd8f5090ddd31ee6c4e031dd4f57f5e0cfa092a691887674aab

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page