Automatic differentiation package AC207
Project description
cs107-FinalProject
Group 26
- David Chataway
- Javiera Astudillo
- Wenhan Zhang
Introduction
Our library, AutoDiff, provides an automatic differentiation (AD) tool that is designed to be very user-friendly. The base autodiff class contains modules for both forward and reverse mode. A user provides an array of values and multivariate lambda functions, specifies the mode and a seed vector (optional) and is able to compute the jacobian, the gradients and the directional derivative with respect to a custom direction so that other users will be able to use the library in other algorithms like Newton’s method. Developing a user-friendly and efficient AD library is important for two primary reasons:
-
Automatic differentiation can improve upon symbolic differentiation offered by symbolic libraries (like SymPy) or by web services (like Wolfram Alpha) because AD methods can achieve machine precision while executing faster because AD does not need to carry through the whole expression tree and doesn’t suffer from exponentially longer derivative expressions (due to for example multiple product rules).
-
Efficient automatic differentiation is necessary to optimize computational cost and time due to different scaling between forward and reverse modes and the storing of intermediate variables and expressions. This is crucial in many applications such as optimization algorithms (e.g. gradient descent).
Background
In our understanding, at the heart of AD is the idea that functional expressions can be broken down into elementary symbolic expressions and the numerical result of the function’s derivatives can be calculated by storing the value and derivative value at intermediate variables of those expressions. The graph structure of automatic differentiation arises because these intermediate variables can have both parents and children that each point to different steps in the evaluation trace. Furthermore, many of these intermediate variables will have multiple children - i.e. they will be used more than once in the evaluation trace. Although forward and reverse mode differ in terms of the order in which the graph is evaluated, both modes 1) produce the same graph, 2) evaluate the derivatives at each child node by using a stored derivative value with respect to the parent node(s) and 3) take a function of dimension n (number of input variables) and produce its numerical derivative, in the form of a Jacobian (or its transpose) of dimension m (number of outputs).
Forward mode
In the forward mode, we calculate the partial derivatives by evaluating the chain rule for each intermediate variable during the forward pass. This is done by both applying a symbolic derivative of the intermediate variable and directly applying the derivative value of the parent node(s). For instance, for an intermediate variable expressed as a function of a function (its parent node), h(u(t)), the chain rule is evaluated as follows:
where the first term of the multiplication is evaluated symbolically and the second term is taken from the derivative value of the parent node.
Because of the graph structure, the forward mode will be able to compute the partial derivative of all of the n outputs with respect to a single input variable (if seeded in the unit direction of that specific input variable). That is, each forward pass produces a single column of the Jacobian matrix.
Reverse mode
In the reverse mode, rather than directly computing the chain rule at each intermediate variable, you just store the partial derivatives and the dependencies of the expression tree during the forward pass. Then, you do a reverse pass by starting at a single output variable, ‘seeding it’ (e.g. set the derivative of the output variable to 1 and the derivatives of the other output variables to 0) and propagating backwards to all the input variables. We are essentially asking, ‘the output variable affect can affect which input variables’? This is possible by calculating the ‘adjoint’ which stores the current value of the partial derivative of the output with respect to the variable at node i according to the following formula, where j is a child of i.
However, in order to calculate the derivative of a different output variable, we need to re-run the program again with a different unit seed. As a result, each run of the reverse mode will generate a single row of the Jacobian (or a column of the transpose of the Jacobian).
Example
For example, consider the function f(x,y), shown below (same as the one in pair programming 6).
Rather than storing the entire symbolic derivative expression, we can instead break down the function into intermediate variables (shown below) and store (or overload) the symbolic derivatives of the elemental operations of +/-, **2, exp(), sin() and cos().
For forward mode, all that needs to be stored is a tuple of the value and the derivative value according to a particular seed. Thus, this evaluation would need to be run twice for the two seed vectors corresponding to each input variable.
If this automatic differentiation were to be evaluated in reverse mode, on the forward pass at each node we would instead store the values of the intermediate variable partial derivatives with respect to each parent argument (rather that applying the chain rule) and then we would work backwards in one pass to calculate df/dx and df/dy using the adjoint method. These graph structures are outlined below:
How to use
Our solution is provided as a Python package called AutoDiff
and is available through pip
, a package installer tool for Python. The prerequisite to use AutoDiff
is to have Python
installed as well as numpy
and scipy
packages installed. If you install AutoDiff
through pip
then the dependencies will be automatically installed (numpy
and scipy
).
PIP
python3 -m pip install autodiff-djw
Create AutoDiff
The main access point to our library is through the AutoDiff
class. To instantiate an object of this class the user must provide the following arguments:
AutoDiff(f, vals, mode='forward')
-
f
: array_likeList-like output functions. Order preserving.
-
vals
: array_likeList-like initital values for inputs. All output functions will be evaluated at this values.
-
mode
: string (default=forward
)Either
forward
orbackward
mode.
Example: new AutoDiff
from autodiff import AutoDiff
inputs_vals = [np.pi/2, np.pi/4, np.pi/8]
f1 = ...
f2 = ...
f = [f1, f2]
autodiff = AutoDiff(vals=inputs_vals, f=f, mode='forward')
Function creation
To create functions ([f1,f2]
in the previous example) that are compatible with our library methods (i.e implements automatic differentiation of its outputs) you must provide regular python functions which: (1) meet the signature as described following and (2) use functions supported by our library.
Function signature
The elements of f
(provided at AutoDiff
instantiation) must be regular python functions which meet the following signature:
f1(inputs)
Arguments
---------
inputs: array_like
A sequence representing the inputs. Each element represents one input variable (scalar).
Returns
-------
output: numeric (scalar)
A function of the inputs.
Supported operations
To enable automatic differentiation features, any user defined function must use the library provided functions:
-
Basic Operations:
- Addition (commutative) (
+
) - Subtraction (
-
) - Multiplication (commutative) (
*
) - Division (
/
) - Power (
**
) - Negation (
-
)
- Addition (commutative) (
-
Elementary Functions:
log
(base
configurable)exp
sin
cos
tan
arcsin
arccos
arctan
sinh
cosh
tanh
sqrt
expit
These functions may be accesed through the autodiff.elementary
module and the user should import them before declaring a function. All of the elementary functions meet the same signature as numpy
equivalent functions.
Example: declaring functions
from autodiff.elementary import sin
f1 = lambda inputs: inputs[0]*inputs[2]**2
f2 = lambda inputs: inputs[2]*sin(inputs[1])
f = [f1, f2]
or equivalently:
def f1(inputs):
x1 = inputs[0] # first input
x3 = inputs[2] # second input
return x1*(x3**2)
def f2(inputs):
x2 = inputs[1] # first input
x3 = inputs[2] # second input
return x2*sin(x3)
f = [f1, f2]
AutoDiff
attributes and methods
Attributes
inputs
: get the user provided inputs values.outputs
: get the output of the user provided functions, evaluated atinputs
.jacobian
: get the jacobian.
Methods
- dir_der
dir_der(seed)
Calculates the derivative of each output function in the direction of `dir`=`seed`.
Arguments
---------
seed: array-like (M,)
Represents the direction to calculate the derivative of each output function.
Must have the same dimension as the number of inputs.
M: number of inputs
Returns
-------
list (N,)
Derivative of each input function in the direction of `seed` vector.
N: number of functions
- gradient
gradient(fn_idx)
Creates or pulls a row of the Jacobian.
Calculates the derivative of the output function `fn_index` with respect to each input variable.
Arguments
--------
fn_idx: int
Index of the output function to calculate the gradient.
Returns
-------
list (M,)
M: number of inputs.
Demo
Let's take the following vector function to illustrate the usage of our library:
Now we'll evaluate this function at the point x = [pi/2, pi/4, pi/8] and evaluate the directional derivative of each output function with respect to v = [1,2,1]:
This analytical solution may be approximated through the AutoDiff
library through either the forward or reverse modes as follows. It can be seen that the returned derivatives are equal to each other (within machine precision) and to the expected value.
Forward Mode
import numpy as np
from autodiff import AutoDiff
from autodiff.elementary import log, exp, sin, cos, tan, arcsin, arccos, arctan, sinh, cosh, tanh, sqrt, expit
inputs_vals = [np.pi/2, np.pi/4, np.pi/8]
seed = [1,2,1]
f1 = lambda inputs: inputs[0]*inputs[2]**2
f2 = lambda inputs: inputs[2]*sin(inputs[1])
f = np.array([f1, f2])
autodiff = AutoDiff(f=f, vals=inputs_vals, mode='forward')
print(autodiff.inputs)
>> [1.5707963267948966, 0.7853981633974483, 0.39269908169872414]
print(autodiff.outputs)
>> [0.24223653656484231, 0.2776801836348979]
print(autodiff.jacobian)
>> [[0.15421257 0. 1.23370055]
[0. 0.27768018 0.70710678]]
print(autodiff.gradient(fn_idx=0))
>> [0.15421257 0. 1.23370055]
print(autodiff.dir_der(seed=seed))
[1.38791312 1.26246715]
Reverse Mode
import numpy as np
from autodiff import AutoDiff
from autodiff.elementary import log, exp, sin, cos, tan, arcsin, arccos, arctan, sinh, cosh, tanh, sqrt, expit
inputs_vals = [np.pi/2, np.pi/4, np.pi/8]
seed = [1,2,1]
f1 = lambda inputs: inputs[0]*inputs[2]**2
f2 = lambda inputs: inputs[2]*sin(inputs[1])
f = np.array([f1, f2])
autodiff = AutoDiff(f=f, vals=inputs_vals, mode='reverse')
print(autodiff.inputs)
>> [1.5707963267948966, 0.7853981633974483, 0.39269908169872414]
print(autodiff.outputs)
>> [0.24223653656484231, 0.2776801836348979]
print(autodiff.jacobian)
>> [[0.15421257 0. 1.23370055]
[0. 0.27768018 0.70710678]]
print(autodiff.gradient(fn_idx=0))
>> [0.15421257 0. 1.23370055]
print(autodiff.dir_der(seed=seed))
[1.38791312 1.26246715]
Software organization
The current structure of our project is as following:
.
├── src
│ ├── autodiff
│ │ ├── autodiff_.py
│ │ ├── elementary.py
│ │ ├── forward.py
│ │ ├── reverse.py
│ │ └── ...
│ ├── demo
│ └── tests
└── docs
├── README.md
└── imgs
forward.py
and reverse.py
modules contain the core functions to use our AutoDiff
package in forward and reverse mode correspondingly. elementary.py
provides elementary math functions to use in conjunction with our library while autodiff_.py
contains the AutoDiff
class for accesing the auto-differentiation methods of our library. All our tests are located in src/tests
which are integrated with Travis CI and Codecov. To run the test suite you'll need to install pytest
and codecov
:
pip install pytest pytest-cov
pip install codecov
Once both previous packages are installed you may run the test suite locally (starting at the root of our folder structure):
cd .
export PYTHONPATH="$PWD/src"
python -m pytest --cov src/tests
Implementation details
There's a common interface to access the automatic differentiation tools provided by this library: AutoDiff
class. Under the hood, AutoDiff
implements a factory pattern at instantiation that returns either an AutoDiffForward
or AutoDiffReverse
object. This pattern was selected so that the the usage of the AutoDiff
object was transparent to the user with regard to the implementation of the auto-differentiation mode used.
Forward Mode
- A "
Variable
" class was created with.val
and.der
attributes and overloaded elementary dunder methods and elementary functions - AutoDiffForward class automatically creates
Variable
objects according to the input array. - AutoDiffForward class creates the Jacobian (
.jacobian
) by calling the directional derivative (dir_dev
) method on unit seed vectors and iterating through the sizeM
(number of input variables) - AutoDiffForward class
gradient
method requires the creation of the Jacobian and returns a row.
Let's look at a sample function sin
within Variable
class to understand better how it implements the auto-differentiation underneath:
class Variable:
def __init__(self,**kwargs):
...
self.val = kwargs['val']
self.der = kwargs['der']
...
def sin(a):
out = Variable()
try:
out.val = np.sin(a.val)
out.der = np.cos(a.val)*a.der
except AttributeError:
try:
out.val = np.sin(a)
except Exception as e:
raise TypeError('Sin failed: `input` not a `Variable` object or a real number.')
return out
For the forward calculation you need to first seed the value of the derivative at the very beginning self.der = kwargs['der']
. Once you've done that, any call to a mathematical function passing a Variable
object as argument willl inmediately return a Variable
object with the updated val
and derivative
with no need to keep track of the forward pass. For the case of sin
we know that the output value is np.sin(a.val)
while it's partial derivative will be cos(a.val)*a.der
following the chain rule.
Reverse Mode
- A "
Node
" class was created with.val
,.grad_value
and.children
([partial der, child object]) attributes and overloaded elementary dunder methods and elementary functions that update .children- A
grad()
method was created that recursively sums the (partial der * adjoint) for all children of the parent Node (thus acting as the backwards pass)
- A
- AutoDiffReverse class automatically creates Node objects according to the input array.
- AutoDiffReverse class creates the
Jacobian
by calling the gradient method on unit seed vectors and iterating through the sizeN
(number of output functions). - AutoDiffReverse class directional derivative (
dir_dev
) method requires the creation of theJacobian
and returns a matrix multiplication of the Jacobian times the custom vector.
Let's look at the same sample function sin
but within Node
class this time:
class Node:
def __init__(self, **kwargs):
self.val = 0
self.children = []
self.grad_value = None #the adjoint
...
def grad(self):
# recurse only if the value is not yet cached
if self.grad_value is None:
# recursively calculate derivative (adjoint) using chain rule
self.grad_value = sum(weight * node.grad()
for weight, node in self.children)
return self.grad_value
def gradient(self, fn_idx):
...
inputs_vars = [Node(val=val) for val in self.vals] # create Node inputs
f_vars = np.array([f_(inputs_vars) for f_ in self.f])
# Seed with .grad_value = 1 only for the function at fn_idx and all all others .grad_value = 0
for i in range(len(f_vars)):
if i == fn_idx:
f_vars[i].grad_value = 1.0
else:
f_vars[i].grad_value = 0
# Get the partial derivative wrt each input
der = [inputs_.grad() for inputs_ in inputs_vars]
return der
def sin(a):
out = Node()
try:
out.val = np.sin(a.val)
a.children.append((np.cos(a.val), out)) # weight = ∂z/∂self = cos(self)
except AttributeError:
try:
out.val = np.sin(a)
except Exception as e:
raise TypeError('Sin failed: `input` not an `Node` object or a real number.')
return out
Note that for calculating the gradient through the reverse mode we need to first append the partial derivatives and the dependencies of the expression tree during the forward pass. This is done when calling f_vars = np.array([f_(inputs_vars) for f_ in self.f])
within gradient
which will pass through our custom functions such as sin
.
Then, you do a reverse pass by starting at a single output variable, ‘seeding it’ (e.g. set the derivative of the output variable to 1 and the derivatives of the other output variables to 0). This is executed at lines f_vars[i].grad_value = ...
. After seeding, you propagate backwards to all the input variables (this is done when calling for der = [inputs_.grad() for inputs_ in inputs_vars]
).
Our extension
Our proposed extension includes a working reverse
mode implementation. At user level, it exposes the same methods and attributes as the forward mode and the usage is transparent for the user (doesn't realize any change in the usage other than setting the mode to backward
). More specifically, the user may instantiate an AutoDiff
object through the following syntax (similar to the forward
syntax):
autodiff = AutoDiff(vals=inputs_vals, f=f, mode='reverse')
The user may access the following methods and attributes from this class:
inputs
outputs
jacobian
dir_der(seed)
gradient(fn_idx)
Which represent exactly the same values as for the forward
mode described in previous sections. It's usage is also exactly the same as in the forward
mode.
All background content necessary to understand the reverse
mode may be found in Background > Reverse mode
section while it's usage it's described throughout all previous decoumentation alongside forward
usage.
Broader Impact and Inclusivity Statement
We created this library to be as user-friendly as possible and to assist with common differentiation tasks like Newton's method and optimization problems. However, it is possible that as the library grows over time - in usage and functionality - that people may misuse our library. While we anticipate that initially the users will be primarily academics and hobbyists, what happens if the library becomes adopted by those with broad and significant societal impact? For instance, if the COMPAS tool - used to assess the likelihood of defendants becoming a recidivist - adopted our library to optimize their predictive models, it would become crucial to maintain the integrity of the library. As a result, we would need to devote more resources to testing, error management and debugging. Otherwise, the societal impacts of a erroneous automatic differentiation library could be vast.
In making our AD as user-friendly as possible, we considered the perspectives of a varied audience and sought not to deliberately exclude individuals or groups based on different cultural, economical, social, gender and disability groups. For instance, we tried to refrain from using exclusive terms such as master. However, we are cognisant that our library is exclusive because of three core deficiencies: 1) our documentation and much of our code's functions are writen only in English, 2) our code and documentation makes reference to parent and child (nodes) which may be triggering to some, and 3) pull requests to the library can only be approved by select few, which may make others feel like they cannot contribute. Given more time, we would like to make our library as language-friendly as possible, by writing documentation in different languages and code that is less English-centric, 2) determine better terminology for the parent-child relationship, and 3) democratize the contribution reviewal and approval process. Finally, it is important to consider that there may be unanticipated exlusive characteristics of our library that will be revealed with broader usage. Thus, it is paramount that a forum be created for users to voice concerns and for there to be a standardized process to review those concerns and implement changes if needed.
Future Features
Future work on the library will remain focused on the original objective: to make an AD package that is as user-friendly as possible for our target users, primarily academics and hobbyists. As such, development work on future features is as follows:
- Add more compatible functions primarily usage in academic research (e.g. absolute value, Gaussian)
- Complement the test suite with tests for the additional functions
- Print run time and/or computational intensity of the class methods
- Initial ideas include implementing the time module and using the time.clock() method
- Add logic to decide on using Forward or Reverse mode based on m, n dimensionality
- Print the suggestion to the user to help with useability
- Create a graphical user interface (GUI) to execute the library
- This could be a done by hosting a public server similar to autoed.herokuapp.com
- Create a cache to improve runtime (e.g. data structure to track stored intermediate variables on independent passes)
- Initial ideas for the data structures include a disctionary or graph
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file autodiff-djw-0.1.3.tar.gz
.
File metadata
- Download URL: autodiff-djw-0.1.3.tar.gz
- Upload date:
- Size: 15.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cc040c889e785dc25c93ec3fbb12cb9fb11ecdbb6878c529f3cef3ef0736aba6 |
|
MD5 | a6d79d664fb832f3f23399970664c244 |
|
BLAKE2b-256 | a05f065c775014e7be89aee51b139fbdaf12c093f9869d5d8d0bbd5d92eaaa80 |
File details
Details for the file autodiff_djw-0.1.3-py3-none-any.whl
.
File metadata
- Download URL: autodiff_djw-0.1.3-py3-none-any.whl
- Upload date:
- Size: 18.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0b56332c31c5b62d49d3c4ca59d3a9a5801b41159df4f1364fc3f631139ecd34 |
|
MD5 | 91ce8f3c5650b3cff0096eaad726afcd |
|
BLAKE2b-256 | 674f24898571284bf15df03181da4c8106081c779cfcf07d8cd6c5f00ebfebb7 |