Skip to main content

A statistics and machine learning package.

Project description

🔥 Ember

Ember is a statistics and ML library for my personal use with C++ and Python. I mainly built it for educational purposes, but it's quite functional and can be used to train several datasets.

Look here to see the methods it supports.

Installation

System Support

This library supports both x86_64/amd64 and arm64/aarch64. Check if your system is supported out of the box in the table below. The library requires very few dependencies, so as long as your machine supports a C++ compiler and python, you should be able to get it working by fiddling with the CMake and setuptools files.

x86_64 Python 3.13 Python 3.12 Python 3.11 Python 3.10 Python 3.9 Python 3.8 Python 3.7
Ubuntu 24.04
Ubuntu 22.04
Ubuntu 20.04
ArchLinux 6.6.68 LTS
Debian 13
Debian 12
Debian 11
Debian 10
LinuxMint 22
LinuxMint 21
MacOS 10.15 Catalina
MacOS 10.14 Mojave
MacOS 10.13 High Sierra
MacOS 10.12 Sierra
MacOS 10.11 El Capitan
MacOS 10.10 Yosemite
MacOS 10.9 Mavericks
MacOS 10.8 Mountain Lion
MacOS 10.7 Lion
Windows 11
Windows 10
Windows 8
Windows 7
ARM64 Python 3.13 Python 3.12 Python 3.11 Python 3.10 Python 3.9 Python 3.8 Python 3.7
Ubuntu 24.04
Ubuntu 22.04
Ubuntu 20.04
MacOS 15.x Sequoia
MacOS 14.x Sonoma
MacOS 13.x Ventura
MacOS 12.x Monterey
MacOS 11.x Big Sur
Windows 12

Compiling the aten Library

Your machine will need system dependencies such as CMake, a C++ compiler, and pybind11. The library uses C++17. Preferably you will have git and conda installed already. For more specific instructions on installing these on your system, refer to the more detailed installation guide.

Git clone the repo, then pip install, which will run setup.py.

git clone git@github.com:mbahng/pyember.git 
cd pyember 
pip install .

This runs cmake on aten/CMakeLists.txt, which calls the following.

  1. It always calls aten/src/CMakeLists.txt that compiles and links the source files in the C++ tensor library.
  2. If BUILD_PYTHON_BINDINGS=ON (always on by default), it further calls aten/bindings/CMakeLists.txt to further generate a .so file that can be imported into ember.
  3. If BUILD_DEV=ON, it calls aten/test/CMakeLists.txt to further compile the C++ unit testing suite.

If there are problems with building, you should check, in order,

  1. Whether build/ has been created. This is the first step in setup.py
  2. Whether the compiled main.cpp and, if BUILD_DEV=ON, the C++ unit test files have been compiled, i.e. if build/src/main and build/test/tests executables exist.
  3. Whether build/*/aten.cpython-3**-darwin.so exists (somewhere in the build directory, depending on the machine). The Makefile generated by aten/bindings/CMakeLists.txt will produce build/*/aten.cpython-3**-darwin.so.
  4. The setup() function will immediately copy this .so file to ember/aten.cpython-3**-darwin.so. You should see a success message saying that it has been moved or an error. The .so file must live within ember, the actual library, since ember/__init__.py must access it within the same directory level.

Testing and Development

The pip install comes with two more environment variable parameters. Note that the following command is whitespace-sensitive.

CMAKE_DEBUG=1 CMAKE_DEV=1 pip install .
  1. Setting CMAKE_DEBUG=1 compiles the aten library with debug mode (-g) on, which I use when using gdb/lldb on the compiled code.
  2. Setting CMAKE_DEV=1 compiles the C++ testing suite as well. If you want to do this, you will also need to install google-tests. A code snippet for Ubuntu and Debian is shown below.
sudo apt-get install libgtest-dev 
cd /usr/src/gtest 
cmake CMakeLists.txt 
make 
cp lib/*.a /usr/lib 
rm -rf /var/lib/apt/lists/*

If you would like to run tests and/or develop the package yourself, you can run the script ./run_tests.sh all (args python to run just python tests and cpp to run just C++ tests), which will

  1. Run all C++ unit tests for aten, ensuring that all functions work correctly.
  2. Run all Python unit tests for ember, ensuring that additional functions work correctly and that the C++ functions are bound correctly.

The stub (.pyi) files for aten are located in ember/aten.

Repository Structure

I tried to model a lot of the structure from Pytorch and TinyGrad. Very briefly,

  1. aten/ contains the header and source files for the C++ low-level tensor library, such as basic operations and an autograd engine.
    1. aten/src contains all the source files and definitions.
    2. aten/bindings contains the pybindings.
    3. aten/test contains all the C++ testing modules for aten.
  2. ember/ contains the actual library, supporting high level models, objectives, optimizers, dataloaders, and samplers.
    1. ember/aten contains the stub files.
    2. ember/datasets contains all preprocessing tools, such as datasets/loaders, standardizing, cross validation checks.
    3. ember/models contains all machine learning models.
    4. ember/objectives contain all loss functions and regularizers.
    5. ember/optimizers contain all the optimizers/solvers, such as iterative (e.g. SGD), greedy (e.g. decision tree splitting), and one-shot (e.g. least-squares solution).
    6. ember/samplers contain all samplers (e.g. MCMC, SGLD).
  3. docs/ contains detailed documentation about each function.
  4. examples/ are example python scripts on training models.
  5. tests/ are python testing modules for the ember library.
  6. docker/ contains docker images of all the operating systems and architectures I tested ember on. General workflows on setting up the environment can be found there for supported machines.
  7. setup.py allows you to pip install this as a package.
  8. run_tests.sh which is the main test running script.

For a more detailed explanation, look here.

Getting Started

Ember Tensors and GradTensors

ember.Tensors represent data and parameters, while ember.GradTensors represent gradients. An advantage of this package is that rather than just supporting batch vector operations and matrix multiplications, we can also perform general contractions of rank $(N, M)$-tensors, a generalization of matrix multiplication. This allows us to represent and utilize the full power of higher order derivatives for arbitrary functions $f: \mathbb{R}^{\mathbf{M}} \rightarrow \mathbb{R}^{\mathbf{N}}$, where $\mathbf{M} = (M_1, \ldots, M_m)$ and $\mathbf{N} = (N_1, \ldots, N_m)$ are vectors, not just scalars, representing the dimension of each space.

Tensors are multidimensional arrays that can be initialized in a number of ways. GradTensors are initialized during the backpropagation method, but we can explicitly set them if desired.

import ember 

a = ember.Tensor([2]) # scalar
b = ember.Tensor([1, 2, 3])  # vector 
c = ember.Tensor([[1, 2], [3, 4]]) # 2D vector 
d = ember.Tensor([[[1, 2]]]) # 3D vector

Say that you have a series of elementary operations on tensors.

a = ember.Tensor([2, -3]) 
h = a ** 2
b = ember.Tensor([3, 5])

c = b * h

d = ember.Tensor([10, 1])
e = c.dot(d)

f = ember.Tensor([-2])

g = f * e

Automatic Differentiation

The C++ backend computes a directed acyclic graph (DAG) representing the operations done to compute g. You can then run g.backprop() to compute the gradients by applying the chain rule. This constructs the DAG and returns a topological sorting of its nodes. The gradients themselves, which are technically Jacobian matrices, are updated, with each mapping x -> y constructing a gradient tensor on x with value dy/dx. The gradients can be either accumulated by setting backprop(intermediate=False) so that the chain rule is not applied yet, or we can set =True to apply the chain rule to calculate the derivative of the tensor we called backprop on w.r.t. the rest of the tensors.

top_sort = g.backprop()
print(a.grad) # [[4.0, 0.0], [0.0, -6.0]]
print(h.grad) # [[3.0, 0.0], [0.0, 5.0]]
print(b.grad) # [[4.0, 0.0], [0.0, 9.0]]
print(c.grad) # [[10.0, 1.0]]
print(d.grad) # [[12.0, 45.0]]
print(e.grad) # [[-2.0]]
print(f.grad) # [[165.0]]
print(g.grad) # [[1.0]]

Finally, we can visualize this using the networkx package.

Alt text

Linear Regression

To perform linear regression, use the LinearRegression model.

import ember 

ds = ember.datasets.LinearDataset(N=20, D=14)
dl = ember.datasets.Dataloader(ds, batch_size=2)
model = ember.models.LinearRegression(15) 
mse = ember.objectives.MSELoss()

for epoch in range(500): 
  loss = None
  for x, y in dl: 
    y_ = model.forward(x)  
    loss = mse(y, y_)
    loss.backprop()
    model.step(1e-5) 

  print(loss)

K Nearest Neighbors

To do a simple K Nearest Neighbors regressor, use the following model. The forward method scans over the whole dataset, so we must input it to the model during instantiation. Note that we do not need a dataloader or a backpropagation method since we aren't iteratively updating gradients, though we want to show the loss.

import ember
from ember.models import KNearestRegressor
from ember.datasets import LinearDataset

ds = LinearDataset(N=20, D=3)
model = KNearestRegressor(dataset=ds, K=1)
mse = ember.objectives.MSELoss() 

for k in range(1, 21): # hyperparameter tuning
  model.K = k
  print(f"{k} ===") 
  loss = 0
  for i in range(len(ds)): 
    x, y = ds[i] 
    y_ = model.forward(x) 
    loss = loss + mse(y, y_) 

  print(loss)

Multilayer Perceptrons

To instantiate a MLP, just call it from models. In here we make a 2-layer MLP with a dummy dataset. For now only SGD with batch size 1 is supported.

import ember 

ds = ember.datasets.LinearDataset(N=20, D=14)
dl = ember.datasets.Dataloader(ds, batch_size=2)
model = ember.models.MultiLayerPerceptron(15, 10) 
mse = ember.objectives.MSELoss()

for epoch in range(500):  
  loss = None
  for x, y in dl: 
    y_ = model.forward(x) 
    loss = mse(y, y_)
    loss.backprop() 
    model.step(1e-5)

  print(loss)

Its outputs over 1 minute.

LOSS = 256733.64437981808
LOSS = 203239.08846901066
LOSS = 160223.4554735339
LOSS = 125704.33716141782
LOSS = 98074.96981384761
LOSS = 76026.19871949886
LOSS = 58491.92389906721
LOSS = 44604.493032865605
LOSS = 33658.23285350788
LOSS = 25079.638682869212
LOSS = 18403.01062298029
LOSS = 13250.54496118543
LOSS = 9316.069468116035
LOSS = 6351.758695807299
LOSS = 4157.286052245369
LOSS = 2570.96819208677
LOSS = 1462.5380952427417
LOSS = 727.2493587808174
LOSS = 281.0683664354656
LOSS = 56.75530418715159

Datasets

Models and Training

Monte Carlo Samplers

Contributing

To implement a new functionality in the aten library, you must

  1. Add the class or function header in aten/src/Tensor.h
  2. Add the implementation in the correct file (or create a new one) in aten./*Tensor/*.cpp. Make sure to update aten/bindings/CMakeLists.txt if needed.
  3. Add its pybindings (if a public function that will be used in ember) in aten/bindings/*bindings.cpp. Make sure to update aten/bindings/CMakeLists.txt if needed.
  4. Add relevant C++ tests in aten/test/.
  5. Not necessary, but it's good to test it out on a personal script for a sanity check.
  6. Add to the stub files in ember/aten/*.pyi.
  7. Add Python tests in test/.
  8. If everything passes, you can submit a pull request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pyember-0.0.17-cp313-cp313-win_amd64.whl (223.9 kB view details)

Uploaded CPython 3.13Windows x86-64

pyember-0.0.17-cp313-cp313-macosx_11_0_arm64.whl (640.1 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

pyember-0.0.17-cp312-cp312-win_amd64.whl (223.8 kB view details)

Uploaded CPython 3.12Windows x86-64

pyember-0.0.17-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (376.0 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

pyember-0.0.17-cp312-cp312-macosx_11_0_arm64.whl (639.5 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

pyember-0.0.17-cp311-cp311-win_amd64.whl (222.5 kB view details)

Uploaded CPython 3.11Windows x86-64

pyember-0.0.17-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (376.4 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

pyember-0.0.17-cp311-cp311-macosx_11_0_arm64.whl (642.3 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

pyember-0.0.17-cp310-cp310-win_amd64.whl (221.1 kB view details)

Uploaded CPython 3.10Windows x86-64

pyember-0.0.17-cp310-cp310-macosx_11_0_arm64.whl (640.0 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

pyember-0.0.17-cp39-cp39-win_amd64.whl (215.0 kB view details)

Uploaded CPython 3.9Windows x86-64

pyember-0.0.17-cp39-cp39-macosx_11_0_arm64.whl (640.3 kB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

File details

Details for the file pyember-0.0.17-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: pyember-0.0.17-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 223.9 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.8

File hashes

Hashes for pyember-0.0.17-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 caec0eb6753436edc22a6d17e81cd4fad86ec015e7dd9bd5c524c2503151fe72
MD5 efbb1f8871cc8006a583be3554bac26f
BLAKE2b-256 9f5bae2abae31970a01081edbc4c4050810ae7272e8c873f21b98131f12ef371

See more details on using hashes here.

File details

Details for the file pyember-0.0.17-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pyember-0.0.17-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e2c01c4ab857fa920fe0c0576e5ff8bfe1f8368d56dc66ed8dccf7bd8023f2d7
MD5 b7a35c067a2329d766e12848afd0c4b6
BLAKE2b-256 e7176f2f4371b3981dc42b65d0a5acd64819b8d077de159261975d6a97822342

See more details on using hashes here.

File details

Details for the file pyember-0.0.17-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: pyember-0.0.17-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 223.8 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.8

File hashes

Hashes for pyember-0.0.17-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 a26aa519dc636ec9ed846499bf55bb204858b1788419d05be4cd1d4814abcbde
MD5 9698d8ab4f413fd54cfb23e3e87c727f
BLAKE2b-256 4f9a3c9cef4438fee58714ec533411f51e671a07637de1a4a7faefbeea6ff8a9

See more details on using hashes here.

File details

Details for the file pyember-0.0.17-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyember-0.0.17-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 03483b45fd31cfa2457279596188642b61c96918ea443b2617a7f1f12cbe3169
MD5 72e58e21ddab540b03a2fd3e33be9ae1
BLAKE2b-256 f81def818c97d05f8babddb3bdae7825b8d9d44c23b2985cb5f2b33bd817ea82

See more details on using hashes here.

File details

Details for the file pyember-0.0.17-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pyember-0.0.17-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8cc695dfa665e469bc79590875a6629c63dcc9880dcfada7570a0a813228fc5d
MD5 d2d2434162f601c37c39d3d24841d88f
BLAKE2b-256 711a110a582ea8d446f3196842e22199af48c9752d10f649755a2e0253f22f9e

See more details on using hashes here.

File details

Details for the file pyember-0.0.17-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: pyember-0.0.17-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 222.5 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.8

File hashes

Hashes for pyember-0.0.17-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 8322ece4592e267f8b4b31e621d9e3827a5ef0cc095ab13ce5c4fc18e27a8f6a
MD5 3ee85ff98fdc7a6203a13414f682e567
BLAKE2b-256 c6a0f1485c11f4a64634fdd15c64321e02d55cdbc088e9b8661ab1c15ef29403

See more details on using hashes here.

File details

Details for the file pyember-0.0.17-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyember-0.0.17-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 07d109bad9a3166d4de9715c7b82567d3954ff2921567f271158544c4a8f0594
MD5 97f6f9732c8db98bbcf7dd991ae1910d
BLAKE2b-256 87d3212839dd2d9d21dd1310431e495b218e9e1116b4b1eee32e3c2ed785bef4

See more details on using hashes here.

File details

Details for the file pyember-0.0.17-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pyember-0.0.17-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ffd57bfd016f2c85b599432f47caefa05d9ff9a2e67ef8631865de99e38e3142
MD5 f0832ce5b600ea671fccd9d872b2923f
BLAKE2b-256 6bd103fc404b17a6e1f8acb49d6e8e2cee54bef34cd4420706a98dae957f0fee

See more details on using hashes here.

File details

Details for the file pyember-0.0.17-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: pyember-0.0.17-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 221.1 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.8

File hashes

Hashes for pyember-0.0.17-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 e4aacbb65fe6ad0338e07334679335c458debb2c570bfc876f8e35e4ec3a4c50
MD5 c8b8080fc01ab9898eec8ce4d69c90e0
BLAKE2b-256 b5f7f5ebb199dfa845f2dfb55b0bc795196f1691f5075b9645e29698ddaa814c

See more details on using hashes here.

File details

Details for the file pyember-0.0.17-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pyember-0.0.17-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 be64bca18f2139dbd2c2de2370525aa9d17afe86c31341dc6a10bf99d7d63114
MD5 fa3783bff2561e0e5e2caf2c27ab1678
BLAKE2b-256 811e9e29bc0235880363a323dec83796faaad868505c6bcb867501be46be8dfe

See more details on using hashes here.

File details

Details for the file pyember-0.0.17-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: pyember-0.0.17-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 215.0 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.8

File hashes

Hashes for pyember-0.0.17-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 3f72555c5e3fbee4faf4e68da053a114d7d87648d7201f7969c44eb0f117441e
MD5 29f89fa9ebc9c1f18bb8f3ead95f5721
BLAKE2b-256 9f4e8927ab4b9da9083076d817e1549704a66d11b95bfb309540819ab3d35fe9

See more details on using hashes here.

File details

Details for the file pyember-0.0.17-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pyember-0.0.17-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b49feb9cb1f013b1859f76cdc4ac7247bb2afe06db217482a1bb4a35992459ac
MD5 417117a60f8cda951391980b2e3f82e7
BLAKE2b-256 2dff42b11dac32808fad4a0be8e35e6080dce7aeb4854377092f00c15bc03f49

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page