Skip to main content

A statistics and machine learning package.

Project description

🔥 Ember

Ember is a statistics and ML library for my personal use with C++ and Python. I mainly built it for educational purposes, but it's quite functional and can be used to train several datasets.

Look here to see the methods it supports.

Installation

System Support

This library supports both x86_64/amd64 and arm64/aarch64. Check if your system is supported out of the box in the table below. The library requires very few dependencies, so as long as your machine supports a C++ compiler and python, you should be able to get it working by fiddling with the CMake and setuptools files.

x86_64 Python 3.13 Python 3.12 Python 3.11 Python 3.10 Python 3.9 Python 3.8 Python 3.7
Ubuntu 24.04
Ubuntu 22.04
Ubuntu 20.04
ArchLinux 6.6.68 LTS
Debian 13
Debian 12
Debian 11
Debian 10
LinuxMint 22
LinuxMint 21
MacOS 10.15 Catalina
MacOS 10.14 Mojave
MacOS 10.13 High Sierra
MacOS 10.12 Sierra
MacOS 10.11 El Capitan
MacOS 10.10 Yosemite
MacOS 10.9 Mavericks
MacOS 10.8 Mountain Lion
MacOS 10.7 Lion
Windows 11
Windows 10
Windows 8
Windows 7
ARM64 Python 3.13 Python 3.12 Python 3.11 Python 3.10 Python 3.9 Python 3.8 Python 3.7
Ubuntu 24.04
Ubuntu 22.04
Ubuntu 20.04
MacOS 15.x Sequoia
MacOS 14.x Sonoma
MacOS 13.x Ventura
MacOS 12.x Monterey
MacOS 11.x Big Sur
Windows 12

Compiling the aten Library

Your machine will need system dependencies such as CMake, a C++ compiler, and pybind11. The library uses C++17. Preferably you will have git and conda installed already. For more specific instructions on installing these on your system, refer to the more detailed installation guide.

Git clone the repo, then pip install, which will run setup.py.

git clone git@github.com:mbahng/pyember.git 
cd pyember 
pip install .

This runs cmake on aten/CMakeLists.txt, which calls the following.

  1. It always calls aten/src/CMakeLists.txt that compiles and links the source files in the C++ tensor library.
  2. If BUILD_PYTHON_BINDINGS=ON (always on by default), it further calls aten/bindings/CMakeLists.txt to further generate a .so file that can be imported into ember.
  3. If BUILD_DEV=ON, it calls aten/test/CMakeLists.txt to further compile the C++ unit testing suite.

If there are problems with building, you should check, in order,

  1. Whether build/ has been created. This is the first step in setup.py
  2. Whether the compiled main.cpp and, if BUILD_DEV=ON, the C++ unit test files have been compiled, i.e. if build/src/main and build/test/tests executables exist.
  3. Whether build/*/aten.cpython-3**-darwin.so exists (somewhere in the build directory, depending on the machine). The Makefile generated by aten/bindings/CMakeLists.txt will produce build/*/aten.cpython-3**-darwin.so.
  4. The setup() function will immediately copy this .so file to ember/aten.cpython-3**-darwin.so. You should see a success message saying that it has been moved or an error. The .so file must live within ember, the actual library, since ember/__init__.py must access it within the same directory level.

Testing and Development

The pip install comes with two more environment variable parameters. Note that the following command is whitespace-sensitive.

CMAKE_DEBUG=1 CMAKE_DEV=1 pip install .
  1. Setting CMAKE_DEBUG=1 compiles the aten library with debug mode (-g) on, which I use when using gdb/lldb on the compiled code.
  2. Setting CMAKE_DEV=1 compiles the C++ testing suite as well. If you want to do this, you will also need to install google-tests. A code snippet for Ubuntu and Debian is shown below.
sudo apt-get install libgtest-dev 
cd /usr/src/gtest 
cmake CMakeLists.txt 
make 
cp lib/*.a /usr/lib 
rm -rf /var/lib/apt/lists/*

If you would like to run tests and/or develop the package yourself, you can run the script ./run_tests.sh all (args python to run just python tests and cpp to run just C++ tests), which will

  1. Run all C++ unit tests for aten, ensuring that all functions work correctly.
  2. Run all Python unit tests for ember, ensuring that additional functions work correctly and that the C++ functions are bound correctly.

The stub (.pyi) files for aten are located in ember/aten.

Repository Structure

I tried to model a lot of the structure from Pytorch and TinyGrad. Very briefly,

  1. aten/ contains the header and source files for the C++ low-level tensor library, such as basic operations and an autograd engine.
    1. aten/src contains all the source files and definitions.
    2. aten/bindings contains the pybindings.
    3. aten/test contains all the C++ testing modules for aten.
  2. ember/ contains the actual library, supporting high level models, objectives, optimizers, dataloaders, and samplers.
    1. ember/aten contains the stub files.
    2. ember/datasets contains all preprocessing tools, such as datasets/loaders, standardizing, cross validation checks.
    3. ember/models contains all machine learning models.
    4. ember/objectives contain all loss functions and regularizers.
    5. ember/optimizers contain all the optimizers/solvers, such as iterative (e.g. SGD), greedy (e.g. decision tree splitting), and one-shot (e.g. least-squares solution).
    6. ember/samplers contain all samplers (e.g. MCMC, SGLD).
  3. docs/ contains detailed documentation about each function.
  4. examples/ are example python scripts on training models.
  5. tests/ are python testing modules for the ember library.
  6. docker/ contains docker images of all the operating systems and architectures I tested ember on. General workflows on setting up the environment can be found there for supported machines.
  7. setup.py allows you to pip install this as a package.
  8. run_tests.sh which is the main test running script.

For a more detailed explanation, look here.

Getting Started

Ember Tensors and GradTensors

ember.Tensors represent data and parameters, while ember.GradTensors represent gradients. An advantage of this package is that rather than just supporting batch vector operations and matrix multiplications, we can also perform general contractions of rank $(N, M)$-tensors, a generalization of matrix multiplication. This allows us to represent and utilize the full power of higher order derivatives for arbitrary functions $f: \mathbb{R}^{\mathbf{M}} \rightarrow \mathbb{R}^{\mathbf{N}}$, where $\mathbf{M} = (M_1, \ldots, M_m)$ and $\mathbf{N} = (N_1, \ldots, N_m)$ are vectors, not just scalars, representing the dimension of each space.

Tensors are multidimensional arrays that can be initialized in a number of ways. GradTensors are initialized during the backpropagation method, but we can explicitly set them if desired.

import ember 

a = ember.Tensor([2]) # scalar
b = ember.Tensor([1, 2, 3])  # vector 
c = ember.Tensor([[1, 2], [3, 4]]) # 2D vector 
d = ember.Tensor([[[1, 2]]]) # 3D vector

Say that you have a series of elementary operations on tensors.

a = ember.Tensor([2, -3]) 
h = a ** 2
b = ember.Tensor([3, 5])

c = b * h

d = ember.Tensor([10, 1])
e = c.dot(d)

f = ember.Tensor([-2])

g = f * e

Automatic Differentiation

The C++ backend computes a directed acyclic graph (DAG) representing the operations done to compute g. You can then run g.backprop() to compute the gradients by applying the chain rule. This constructs the DAG and returns a topological sorting of its nodes. The gradients themselves, which are technically Jacobian matrices, are updated, with each mapping x -> y constructing a gradient tensor on x with value dy/dx. The gradients can be either accumulated by setting backprop(intermediate=False) so that the chain rule is not applied yet, or we can set =True to apply the chain rule to calculate the derivative of the tensor we called backprop on w.r.t. the rest of the tensors.

top_sort = g.backprop()
print(a.grad) # [[4.0, 0.0], [0.0, -6.0]]
print(h.grad) # [[3.0, 0.0], [0.0, 5.0]]
print(b.grad) # [[4.0, 0.0], [0.0, 9.0]]
print(c.grad) # [[10.0, 1.0]]
print(d.grad) # [[12.0, 45.0]]
print(e.grad) # [[-2.0]]
print(f.grad) # [[165.0]]
print(g.grad) # [[1.0]]

Finally, we can visualize this using the networkx package.

Alt text

Linear Regression

To perform linear regression, use the LinearRegression model.

import ember 

ds = ember.datasets.LinearDataset(N=20, D=14)
dl = ember.datasets.Dataloader(ds, batch_size=2)
model = ember.models.LinearRegression(15) 
mse = ember.objectives.MSELoss()

for epoch in range(500): 
  loss = None
  for x, y in dl: 
    y_ = model.forward(x)  
    loss = mse(y, y_)
    loss.backprop()
    model.step(1e-5) 

  print(loss)

K Nearest Neighbors

To do a simple K Nearest Neighbors regressor, use the following model. The forward method scans over the whole dataset, so we must input it to the model during instantiation. Note that we do not need a dataloader or a backpropagation method since we aren't iteratively updating gradients, though we want to show the loss.

import ember
from ember.models import KNearestRegressor
from ember.datasets import LinearDataset

ds = LinearDataset(N=20, D=3)
model = KNearestRegressor(dataset=ds, K=1)
mse = ember.objectives.MSELoss() 

for k in range(1, 21): # hyperparameter tuning
  model.K = k
  print(f"{k} ===") 
  loss = 0
  for i in range(len(ds)): 
    x, y = ds[i] 
    y_ = model.forward(x) 
    loss = loss + mse(y, y_) 

  print(loss)

Multilayer Perceptrons

To instantiate a MLP, just call it from models. In here we make a 2-layer MLP with a dummy dataset. For now only SGD with batch size 1 is supported.

import ember 

ds = ember.datasets.LinearDataset(N=20, D=14)
dl = ember.datasets.Dataloader(ds, batch_size=2)
model = ember.models.MultiLayerPerceptron(15, 10) 
mse = ember.objectives.MSELoss()

for epoch in range(500):  
  loss = None
  for x, y in dl: 
    y_ = model.forward(x) 
    loss = mse(y, y_)
    loss.backprop() 
    model.step(1e-5)

  print(loss)

Its outputs over 1 minute.

LOSS = 256733.64437981808
LOSS = 203239.08846901066
LOSS = 160223.4554735339
LOSS = 125704.33716141782
LOSS = 98074.96981384761
LOSS = 76026.19871949886
LOSS = 58491.92389906721
LOSS = 44604.493032865605
LOSS = 33658.23285350788
LOSS = 25079.638682869212
LOSS = 18403.01062298029
LOSS = 13250.54496118543
LOSS = 9316.069468116035
LOSS = 6351.758695807299
LOSS = 4157.286052245369
LOSS = 2570.96819208677
LOSS = 1462.5380952427417
LOSS = 727.2493587808174
LOSS = 281.0683664354656
LOSS = 56.75530418715159

Datasets

Models and Training

Monte Carlo Samplers

Contributing

To implement a new functionality in the aten library, you must

  1. Add the class or function header in aten/src/Tensor.h
  2. Add the implementation in the correct file (or create a new one) in aten./*Tensor/*.cpp. Make sure to update aten/bindings/CMakeLists.txt if needed.
  3. Add its pybindings (if a public function that will be used in ember) in aten/bindings/*bindings.cpp. Make sure to update aten/bindings/CMakeLists.txt if needed.
  4. Add relevant C++ tests in aten/test/.
  5. Not necessary, but it's good to test it out on a personal script for a sanity check.
  6. Add to the stub files in ember/aten/*.pyi.
  7. Add Python tests in test/.
  8. If everything passes, you can submit a pull request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pyember-0.0.14-cp313-cp313-win_amd64.whl (237.2 kB view details)

Uploaded CPython 3.13Windows x86-64

pyember-0.0.14-cp313-cp313-macosx_11_0_arm64.whl (654.9 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

pyember-0.0.14-cp312-cp312-win_amd64.whl (237.1 kB view details)

Uploaded CPython 3.12Windows x86-64

pyember-0.0.14-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (390.8 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

pyember-0.0.14-cp312-cp312-macosx_11_0_arm64.whl (654.3 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

pyember-0.0.14-cp311-cp311-win_amd64.whl (235.8 kB view details)

Uploaded CPython 3.11Windows x86-64

pyember-0.0.14-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (391.2 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

pyember-0.0.14-cp311-cp311-macosx_11_0_arm64.whl (657.1 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

pyember-0.0.14-cp310-cp310-win_amd64.whl (234.4 kB view details)

Uploaded CPython 3.10Windows x86-64

pyember-0.0.14-cp310-cp310-macosx_11_0_arm64.whl (654.7 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

pyember-0.0.14-cp39-cp39-win_amd64.whl (228.3 kB view details)

Uploaded CPython 3.9Windows x86-64

pyember-0.0.14-cp39-cp39-macosx_11_0_arm64.whl (655.1 kB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

File details

Details for the file pyember-0.0.14-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: pyember-0.0.14-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 237.2 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.8

File hashes

Hashes for pyember-0.0.14-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 35c32674bd5a42bcc252ced333f9ab6aea0586810d7193bc95f8aecc639ec2db
MD5 1f7f9f39adc3f6e40c339cfe4280cca9
BLAKE2b-256 703ed029986136dfc213384cc99bb9fc032d154483b78060525d95921123ea83

See more details on using hashes here.

File details

Details for the file pyember-0.0.14-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pyember-0.0.14-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 cb4bc595257b6cc9a9509b68ece704787e4606947a9ed08e2b49ccc25480675e
MD5 d9fcfd62ef822b4e2119655c8183dc10
BLAKE2b-256 58f31c5c0865c1ab3dbec5a140587b014598f0e770b44d6c15bb5cdd473cd835

See more details on using hashes here.

File details

Details for the file pyember-0.0.14-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: pyember-0.0.14-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 237.1 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.8

File hashes

Hashes for pyember-0.0.14-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 88842786446d6c66965373d923f172a34f9893994682013f522c3a5d76cc2b1e
MD5 eae51e1a7f9c13178b10379735942c94
BLAKE2b-256 c61533a860328a4f7bb38daba2d03dcd3fe2cd5799110dfcf1e470ba162b6a3e

See more details on using hashes here.

File details

Details for the file pyember-0.0.14-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyember-0.0.14-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 6fd072b96383d8cd4334c91f34fa35ad09deae5c8ac23b1e8581912882849a52
MD5 9b91267aff64e93bf9eea9504dc0ea7d
BLAKE2b-256 19e090d2c4e75bd306f316da241fea1a89915f2c00e044f95fb6b1095830a516

See more details on using hashes here.

File details

Details for the file pyember-0.0.14-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pyember-0.0.14-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 30ea694652cb82698a4f6158da047954570b6a92b1be02e65dc4be2dd75b3ee0
MD5 078f640b50dc5c34368bf39f688f0525
BLAKE2b-256 874caec867c1a064f781047367d45480fc60b5eab8708673a93101ebc5b53f5e

See more details on using hashes here.

File details

Details for the file pyember-0.0.14-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: pyember-0.0.14-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 235.8 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.8

File hashes

Hashes for pyember-0.0.14-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 a25544f715f74766fb536633a911290c5feb76e86bb5ee172ff91e6edb7c2372
MD5 434075426f3a4addb98f33f50a599d11
BLAKE2b-256 1d2d72a6212dd61196043f36e2e417953f3b887dd089c812a1dde52dae3eacf7

See more details on using hashes here.

File details

Details for the file pyember-0.0.14-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyember-0.0.14-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 5791f95a17218ff32720a9b5464b1340ab8dd621b73a8fc4bec48516b709149f
MD5 e9e3f033d0ae36f438afaf49720a977b
BLAKE2b-256 b0df8f971d6fed9f4bf93a778dae5d3d6203ad6acd727bfd650b0364fd7d0e68

See more details on using hashes here.

File details

Details for the file pyember-0.0.14-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pyember-0.0.14-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c3bc61dcd32d797781014d6735ecfd3f019a12cff4102165bdf7602a06d63a72
MD5 b2a6308c85c7bdc8664ca16ffc3e09a9
BLAKE2b-256 4912a9e0bf7d2775efe0736b47dbd8ef4fb7d395ab164d005e388e5d2122a938

See more details on using hashes here.

File details

Details for the file pyember-0.0.14-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: pyember-0.0.14-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 234.4 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.8

File hashes

Hashes for pyember-0.0.14-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 67acfcb6318fdde7f82d7d611f65cb888ab7aa2be7050b335cca227dd01d8190
MD5 3bc90684840c0c160d83f09b72851d3b
BLAKE2b-256 dafe37e03f9e628e2b672af38dee8b231e82b4e3d3157da813fb6b40fc1e6a33

See more details on using hashes here.

File details

Details for the file pyember-0.0.14-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pyember-0.0.14-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 dda7dcb154fe0dd32df6283e76773f055c4bf952758204254a1b9d0b77dfb4d2
MD5 24d1839c531018f58ae1eb952c460305
BLAKE2b-256 7002fb5316d87586d1bd92296cddf9fabf79acb8398b1fb4cf8a1c4044a491c3

See more details on using hashes here.

File details

Details for the file pyember-0.0.14-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: pyember-0.0.14-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 228.3 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.8

File hashes

Hashes for pyember-0.0.14-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 83f8620a8936e79a4ff9bb2af07178468c4801799c38529bb969ce604b18064c
MD5 bb7dcd3a16289b5ae480927650258d82
BLAKE2b-256 41d015fa433b83d4b25bcd4e9604584d505dcea8421d30db06e37bb5090f2011

See more details on using hashes here.

File details

Details for the file pyember-0.0.14-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pyember-0.0.14-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e944f0ee7b74e943c112ffceac30b00710a0fb95d932fd485ea36f01974ba189
MD5 df2ea8dc73cc6add76e69473045ce8b5
BLAKE2b-256 e9a70019eabeb6e21b70a594123c5781ca3163647d36e82e7c399c7f20d35ca4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page