Fast Exact Summation Using Small and Large Superaccumulators

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Science/Research
License
- OSI Approved :: GNU Lesser General Public License v2 (LGPLv2)
Operating System
- OS Independent
Topic
- Scientific/Engineering

Project description

Fast Exact Summation Using Small and Large Superaccumulators (XSUM)

In applications like optimization in the near-flat surfaces or finding the sample mean of data, it is desirable to use higher accuracy than a simple summation. Where in a simple summation, rounding happens after each addition. An exact summation is a way to achieve higher accuracy results.

XSUM is a library for performing exact summation using super-accumulators. It provides methods for exactly summing a set of floating-point numbers, where using a simple summation and the rounding which happens after each addition could be an important factor in many applications.

This library is an easy to use header-only cross-platform C++11 implementation and also contains Python bindings (please see the example).

The main algorithm is taken from the original C library FUNCTIONS FOR EXACT SUMMATION described in the paper "Fast Exact Summation Using Small and Large Superaccumulators," by Radford M. Neal.

The code is rewritten in C++ and amended with more functionalities with the goal of ease of use. The provided Python bindings provide the exact summation interface in a Python code.

The C++ code also includes extra summation functionalities, (parallel reduction on multi-core architectures) which are especially useful in high-performance message passing libraries (like OpenMPI and MPICH). Where binding a user-defined global summation operation to an op handle can subsequently be used in MPI_Reduce, MPI_Allreduce, MPI_Reduce_scatter, and MPI_Scan or a similar calls.

NOTE: To see or use or reproduce the results of the original implementation reported in the paper Fast Exact Summation Using Small and Large Superaccumulators, by Radford M. Neal, please refer to FUNCTIONS FOR EXACT SUMMATION.

Usage

XSUM library presents two objects, or super-accumulators xsum_small_accumulator and xsum_large_accumulator. It also provides methods for summing floating-point numbers and rounding to the nearest floating-point number.

A small superaccumulator uses sixty-seven 64-bit chunks, each with 32-bit overlap with the next one. This accumulator is the preferred method for summing a moderate number of terms. A large superaccumulator uses 4096 64-bit chunks and is suitable for big summations. A small superaccumulator is also a component of the large superaccumulator [1].

XSUM library provides two interfaces to use superaccumulators. The first one is a function interface, which takes input and produces output, and in the second one, supperaccumulators are represented as classes (xsum_small and xsum_large.)

C++

xsum_small_accumulator and xsum_large_accumulator, both have a default constructor, thus they do not need to be initialized. Addition operation is simply a xsum_add,

// A small superaccumulator
xsum_small_accumulator sacc;

// Adding values to the small accumulator sacc
xsum_add(&sacc, 1.0);
xsum_add(&sacc, 2.0);

// Large superaccumulator
xsum_large_accumulator lacc;

// Adding values to the large accumulator lacc
xsum_add(&lacc, 1.0e-3);
xsum_add(&lacc, 2.0e-3);

or xsum_small, and xsum_large objects can simply be used as,

// A small superaccumulator
xsum_small sacc;

// Adding values to the small accumulator sacc
sacc.add(1.0);
sacc.add(2.0);

// Large superaccumulator
xsum_large lacc;

// Adding values to the large accumulator lacc
lacc.add(1.0e-3);
lacc.add(2.0e-3);

One can also add a vector of numbers to a superaccumulator.

// A small superaccumulator
xsum_small_accumulator sacc;

// Adding a vector of numbers
double vec[] = {1.234e88, -93.3e-23, 994.33, 1334.3, 457.34, -1.234e88, 93.3e-23, -994.33, -1334.3, -457.34};

xsum_add(&sacc, vec, 10);

std::vector<double> v = {1.234e88, -93.3e-23, 994.33, 1334.3, 457.34, -1.234e88, 93.3e-23, -994.33, -1334.3, -457.34};

xsum_add(&sacc, v);

the same with xsum_small, and xsum_large objects as,

// A small superaccumulator
xsum_small sacc;

// Adding a vector of numbers
double vec[] = {1.234e88, -93.3e-23, 994.33, 1334.3, 457.34, -1.234e88, 93.3e-23, -994.33, -1334.3, -457.34};

sacc.add(vec, 10);

std::vector<double> v = {1.234e88, -93.3e-23, 994.33, 1334.3, 457.34, -1.234e88, 93.3e-23, -994.33, -1334.3, -457.34};

sacc.add(v);

The squared norm of a vector (sum of squares of elements) and the dot product of two vectors (sum of products of corresponding elements) can be added to a superaccumulator using xsum_add_sqnorm and xsum_add_dot respectively.

For exmaple,

// A small superaccumulator
xsum_small_accumulator sacc;

// Adding a vector of numbers
double vec1[] = {1.e-2, 2., 3.};
double vec2[] = {1.e-3, 2., 3.};

// Add dot product of vectors to a small superaccumulator
xsum_add_dot(&sacc, vec1, vec2, 3);

with xsum_small, and xsum_large objects as,

// A small superaccumulator
xsum_small sacc;

// Adding a vector of numbers
double vec1[] = {1.e-2, 2., 3.};
double vec2[] = {1.e-3, 2., 3.};

// Add dot product of vectors to a small superaccumulator
sacc.add_dot(vec1, vec2, 3);

When it is needed, one can simply use the xsum_init to reinitilize the superaccumulator.

xsum_small_accumulator sacc;

xsum_add(&sacc, 1.0e-2);
...

// Reinitilize the small accumulator
xsum_init(&sacc);
...

or with xsum_small object as,

xsum_small sacc;

sacc.add(1.0e-2);
...

// Reinitilize the small accumulator
sacc.init();
...

The superaccumulator can be rounded as,

xsum_small_accumulator sacc;
xsum_add(&sacc, 1.0e-15);

....

double s = xsum_round(&sacc);

where, xsum_round is used to round the superaccumulator to the nearest floating-point number.

With xsum_small, and xsum_large objects we do as,

xsum_small sacc;
sacc.add(1.0e-15);

....

double s = sacc.round();

where, round is used to round the superaccumulator to the nearest floating-point number.

Two small superaccumulators can be added together. xsum_add can be used to add the second superaccumulator to the first one without doing any rounding. Two large superaccumulator can also be added in the same way. In the case of the two large superaccumulators, the second one internally will be rounded to a small superaccumulator and then the addition is done.

// Small superaccumulators
xsum_small_accumulator sacc1;
xsum_small_accumulator sacc2;

xsum_add(&sacc1, 1.0);
xsum_add(&sacc2, 2.0);

xsum_add(&sacc1, &sacc2);

// Large superaccumulators
xsum_large_accumulator lacc1;
xsum_large_accumulator lacc2;

xsum_add(&lacc1, 1.0);
xsum_add(&lacc2, 2.0);

xsum_add(&lacc1, &lacc2);

or as,

// Small superaccumulators
xsum_small sacc1;
xsum_small sacc2;

sacc1.add(1.0);
sacc2.add(2.0);

// Add a small accumulator sacc2 to the first accumulator sacc1
sacc1.add(sacc2);

// Large superaccumulators
xsum_large lacc1;
xsum_large lacc2;

lacc1.add(1.0);
lacc2.add(2.0);

// Add a large accumulator lacc2 to the first accumulator lacc1
lacc1.add(lacc2);

A small superaccumulator can also be added to a large one,

// Small superaccumulator
xsum_small_accumulator sacc;

xsum_add(&sacc, 1.0e-10);

...

// Large superaccumulator
xsum_large_accumulator lacc;

xsum_add(&lacc, 2.0e-3);

...

// Addition of a small superaccumulator to a large one
xsum_add(&lacc, &sacc);

With xsum_small, and xsum_large objects we do as,

// Small superaccumulator
xsum_small sacc;

sacc.add(1.0e-10);

...

// Large superaccumulator
xsum_large lacc;

lacc.add(2.0e-3);

...

// Addition of a small superaccumulator to a large one
lacc.add(sacc);

The large superaccumulator can be rounded to a small one as,

xsum_large_accumulator lacc;

xsum_small_accumulator sacc = xsum_round_to_small(&lacc);

or as,

xsum_large lacc;

xsum_small_accumulator sacc = lacc.round_to_small();

Example

Two simple examples on how to use the library:

#include <iomanip>
#include <iostream>

#include "xsum/xsum.hpp"

int main() {
  // Large superaccumulator
  xsum_large_accumulator lacc;
  double const a = 0.7209e-5;
  double s = 0;
  for (int i = 0; i < 10000; ++i) {
    xsum_add(&lacc, a);
    s += a;
  }
  std::cout << std::setprecision(20) << xsum_round(&lacc) << "\n"
            << std::setprecision(20) << s << "\n";
}

or a xsum_small or xsum_large objects can simply be used as,

#include <iomanip>
#include <iostream>

#include "xsum/xsum.hpp"

int main() {
  // Large superaccumulator
  xsum_large lacc;
  double const a = 0.7209e-5;
  double s = 0;
  for (int i = 0; i < 10000; ++i) {
    lacc.add(a);
    s += a;
  }
  std::cout << std::setprecision(20) << lacc.round() << "\n"
            << std::setprecision(20) << s << "\n";
}

One can compile the code as,

g++ simple.cpp -std=c++11 -O3 -o simple

icpc simple.cpp -std=c++11 -O3 -fp-model=double -o simple

running the simple would result,

./simple

0.072090000000000001301
0.072089999999998641278

MPI reduction example (`MPI_Allreduce`)

To use a superaccumulator in high-performance message passing libraries, first we need an MPI datatype of a superaccumulator, and then we define a user-defined global operation XSUM that can subsequently be used in MPI_Reduce, MPI_Aallreduce, MPI_Reduce_scatter, and MPI_Scan.

The below example is a simple demonstration of the use of a superaccumulator on multiple processors, where the final summation across all processors is desired.

#include <mpi.h>

#include <iomanip>
#include <iostream>

#include "xsum/myxsum.hpp"
#include "xsum/xsum.hpp"

int main() {
  // Initialize the MPI environment
  MPI_Init(NULL, NULL);

  // Get the number of processes
  int world_size;
  MPI_Comm_size(MPI_COMM_WORLD, &world_size);

  // Get the rank of the process
  int world_rank;
  MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);

  // Create the MPI data type of the superaccumulator
  MPI_Datatype acc_mpi = create_mpi_type<xsum_large_accumulator>();

  // Create the XSUM user-op
  MPI_Op XSUM = create_XSUM<xsum_large_accumulator>();

  double const a = 0.239e-3;
  double s(0);

  xsum_large_accumulator lacc;

  for (int i = 0; i < 1000; ++i) {
    s += a;
    xsum_add(&lacc, a);
  }

  MPI_Allreduce(MPI_IN_PLACE, &s, 1, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD);
  MPI_Allreduce(MPI_IN_PLACE, &lacc, 1, acc_mpi, XSUM, MPI_COMM_WORLD);

  if (world_rank == 0) {
    std::cout << "Rank =  " << world_rank
              << ", sum   =  " << std::setprecision(20) << a * 1000 * world_size
              << ", sum 1 =  " << std::setprecision(20) << s
              << ", sum 2 =  " << std::setprecision(20) << xsum_round(&lacc)
              << std::endl;
  }

  // Free the created user-op
  destroy_XSUM(XSUM);

  // Free the created MPI data type
  destroy_mpi_type(acc_mpi);

  // Finalize the MPI environment.
  MPI_Finalize();
}

mpic++ mpi_simple.cpp -std=c++11 -O3 -o simple

running the above code using 4 processors would result,

mpirunt -np 4 ./simple

Rank =  0, sum   =  0.95600000000000007194, sum 1 =  0.95599999999998419575, sum 2 =  0.95600000000000007194

Python

The provided Python bindings provide the exact summation interface in a Python code. You need Python 3.6 or later to run xsum. You can have multiple Python versions (2.x and 3.x) installed on the same system without problems.

To install Python 3 for different Linux flavors, macOS and Windows, packages are available at
https://www.python.org/getit/

pip is the most popular tool for installing Python packages, and the one included with modern versions of Python. pip currently supports cloning over git.

xsum can be installed with pip:

pip install git+https://github.com/yafshar/xsum.git

Note:

Depending on your Python installation, you may need to use pip3 instead of pip, or you may have to run pip like this:

python3 -m pip install git+https://github.com/yafshar/xsum.git

from xsum import *
import numpy as np

# A small superaccumulator
sacc = xsum_small_accumulator()

# Adding values to the small accumulator sacc
xsum_add(sacc, 1.0)
xsum_add(sacc, 2.0)

# Large superaccumulator
lacc = xsum_large_accumulator()

# Adding values to the large accumulator lacc
xsum_add(lacc, 1.0e-3)
xsum_add(lacc, 2.0e-3)

One can also add a vector of numbers to a superaccumulator.

from xsum import *
import numpy as np

# A small superaccumulator
sacc = xsum_small_accumulator()

a = np.arange(0, 1, 0.1)

# Adding a vector of numbers
xsum_add(sacc, a)

print("sum       = {:.20f}".format(np.sum(a)))
print("Exact sum = {:.20f}".format(xsum_round(sacc)))

or a xsum_small or xsum_large objects can simply be used as,

from xsum import *
import numpy as np

# A small superaccumulator
sacc = xsum_small()

a = np.arange(0, 1, 0.1)

# Adding a vector of numbers
sacc.add(a)

print("sum       = {:.20f}".format(np.sum(a)))
print("Exact sum = {:.20f}".format(sacc.round()))

running the simple.py script would result,

python ./simple.py

sum       = 4.50000000000000088818
Exact sum = 4.50000000000000000000

References

Neal, Radford M., "Fast exact summation using small and large superaccumulators," arXiv e-prints, (2015)
https://www.cs.toronto.edu/~radford/xsum.software.html
https://gitlab.com/radfordneal/xsum

Contributing

Contributors:
Yaser Afshar

License

LGPLv2.1

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Science/Research
License
- OSI Approved :: GNU Lesser General Public License v2 (LGPLv2)
Operating System
- OS Independent
Topic
- Scientific/Engineering

Release history Release notifications | RSS feed

1.1.8

Mar 20, 2023

1.1.1

Sep 11, 2020

1.1.0

Sep 11, 2020

This version

1.0.1

Aug 31, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xsum-1.0.1.tar.gz (33.9 kB view hashes)

Uploaded Aug 31, 2020 Source

Built Distribution

xsum-1.0.1-cp38-cp38-macosx_10_15_x86_64.whl (168.5 kB view hashes)

Uploaded Aug 31, 2020 CPython 3.8 macOS 10.15+ x86-64

Hashes for xsum-1.0.1.tar.gz

Hashes for xsum-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`be7d67060e186311e97fc054374baaa2ebbfaa364d47cd53667e2279430145a8`
MD5	`fc09c64f6451ef887800bc456d4aa48e`
BLAKE2b-256	`56f7b20cfc8bb580e0a9c420980ffaf0e971f08f0fc531c27973869fd657099b`

Hashes for xsum-1.0.1-cp38-cp38-macosx_10_15_x86_64.whl

Hashes for xsum-1.0.1-cp38-cp38-macosx_10_15_x86_64.whl
Algorithm	Hash digest
SHA256	`5b5885a56f5046c39dc932ada339547fbea5d7027dca05a9f48c40038c846f47`
MD5	`6ae803792e013c439daea056a7224638`
BLAKE2b-256	`5dd8ce115b7c315d0f55e35a4ba950c409b878d3bbc426c50e33e3f767edb0ae`

xsum 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Fast Exact Summation Using Small and Large Superaccumulators (XSUM)

Usage

C++

Example

MPI reduction example (`MPI_Allreduce`)

Python

References

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

xsum 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Fast Exact Summation Using Small and Large Superaccumulators (XSUM)

Usage

C++

Example

MPI reduction example (MPI_Allreduce)

Python

References

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

MPI reduction example (`MPI_Allreduce`)