Skip to main content

A python impementation of the famous L-BFGS-B quasi-Newton solver.

Project description

LBFGSB

License Stars Python PyPI Downoads Build Status Documentation Status Coverage codacy Precommit: enabled Ruff Checked with ty DOI

🐍 A Python implementation of the famous L-BFGS-B quasi-Newton solver [1].

This package is a Python port of the famous Limited-memory Broyden-Fletcher-Goldfarb-Shanno algorithm with bound constraints, L-BFGS-B, known as Algorithm 778 and originally written in Fortran [2,3] (last update in 2011).

The original Fortran code can be found here: https://dl.acm.org/doi/10.1145/279232.279236

It is a complete Python reimplementation, relying on NumPy, SciPy utilities, and optional Numba acceleration for performance-critical parts.

The complete and up to date documentation can be found here: https://lbfgsb.readthedocs.io.

🎯 Motivations

Although there are many implementations or ports (wrappings) of the lbfgsb code, as evidenced by the list compiled by Jonathan Schilling, in the vast majority, these are merely interfaces (wrapper) to access highly optimized Fortran or C code. In Python, for example, this is the case for Scipy. These codes mainly date back to the 90s and are difficult to understand, and therefore difficult to maintain or modify. Incidentally, the only other python implementation we know of to date, by Avieira, is not very optimized and under GPL3 license, which makes it tricky to use.

In this context, the objectives of this package are:

  • Explain and expose the underlying mechanisms of the L-BFGS-B algorithm.

  • Provide understandable, modern Python code with explicit function names, standardized formatting, and static typing support through tools such as Ruff and ty.

  • Provide detailed and explicit documentation.

  • Offer permissively licensed code, including for commercial use, thanks to the BSD 3-Clause License.

  • Keep the number of objective-function and gradient evaluations competitive with the reference implementation.

  • Maintain reasonable memory usage and runtime through NumPy vectorization and optional Numba acceleration.

  • Add practical stopping criteria.

  • Support checkpointing and solver restarts.

  • Support on-the-fly updates of the stored gradient sequence, useful for adaptive objective functions such as dynamically weighted regularized problems.

  • Use logging rather than print statements for better integration in larger applications.

🚀 Quick start

To install lbfgsb, the easiest way is through pip:

pip install lbfgsb

Or alternatively using conda

conda install lbfgsb

You might also clone the repository and install from source

pip install -e .

Once the installation is done, given an optimization problem defined by an objective function and a feasible space:

import numpy as np
from lbfgsb.types import NDArrayFloat  # for type hints, numpy array of floats


 def rosenbrock(x: NDArrayFloat) -> float:
     """
     The Rosenbrock function.

     Parameters
     ----------
     x : array_like
     Array of of points at which the Rosenbrock function is to be computed.
     It can be of shape (m,) or (m, n), m being the number of variables per vector
     of parameters and n the number of different vectors.

     Returns
     -------
     float
         The gradient of the Rosenbrock function with size (n,).

     """
     x = np.asarray(x)
     sum1 = ((x[1:] - x[:-1] ** 2.0) ** 2.0).sum(axis=0)
     sum2 = np.square(1.0 - x[:-1]).sum(axis=0)
     return 100.0 * sum1 + sum2


 def rosenbrock_grad(x: NDArrayFloat) -> NDArrayFloat:
     """
     The gradient of the Rosenbrock function.

     Parameters
     ----------
     x : array_like
     Array of of points at which the Rosenbrock function is to be computed.
     It can be of shape (m,) or (m, n), m being the number of variables per vector
     of parameters and n the number of different vectors.

     Returns
     -------
     NDArrayFloat
         The gradient(s) of the Rosenbrock function with the same shapes as the input x.
     """
     x = np.asarray(x)
     g = np.zeros(x.shape)
     # derivation of sum1
     g[1:] += 100.0 * (2.0 * x[1:] - 2.0 * x[:-1] ** 2.0)
     g[:-1] += 100.0 * (-4.0 * x[1:] * x[:-1] + 4.0 * x[:-1] ** 3.0)
     # derivation of sum2
     g[:-1] += 2.0 * (x[:-1] - 1.0)
     return g

lb = np.array([-2, -2])  # lower bounds
ub = np.array([2, 2])  # upper bounds
bounds = np.array((lb, ub)).T  # The number of variables to optimize is len(bounds)
x0 = np.array([-0.8, -1])  # The initial guess

The optimal solution can be found following:

from lbfgsb import minimize_lbfgsb

res = minimize_lbfgsb(
  x0=x0, fun=rosenbrock, jac=rosenbrock_grad, bounds=bounds, ftol=1e-5, gtol=1e-5
)

minimize_lbfgsb returns an OptimalResult instance (from Scipy) that contains the results of the optimization:

message: CONVERGENCE: REL_REDUCTION_OF_F_<=_FTOL
success: True
status: 0
fun: 5.834035724922196e-07
x: [ 9.994e-01  9.989e-01]
nit: 16
jac: [-2.171e-02  1.030e-02]
nfev: 19
njev: 19
hess_inv: <2x2 LbfgsInvHessProduct with dtype=float64>

Note that unlike minimize from scipy, minimize_lbfgsb does not accept args nor kwargs. Hence you will have to define wrappers if needed.

# This cannot be passed to minimize_lbfgsb
def my_cost_function(x: NDArrayFloat, arg1: int, arg2: float, *, kwargs1: int=0, kwargs2: str="blabla") -> float:
    """
    Return a float and takes args and kwargs.
    """
    ... # just do something and return a float

# This can be passed to minimize_lbfgsb
def my_wrapper(x: NDArrayFloat) -> float:
    """
    Return a float and takes args and kwargs.
    """
    return my_cost_function(x, 10, 239.9, kwargs1=1, kwargs2="blabla2")

See all use cases in the tutorials section of the documentation.

⚡ Performance

Although memory usage and runtime remain reasonable thanks to NumPy and extensive vectorization, a pure Python implementation is inherently slower and more memory-intensive than the SciPy reference implementation. The latter is written in low-level languages (originally Fortran 77 and, since SciPy v1.15.0, ported to C)) and therefore benefits from decades of compiler and library-level optimizations.

In practice, this performance gap is negligible for small- to medium-scale problems, or when the overall runtime is dominated by evaluations of the objective function and its gradient rather than by the optimization routine itself.

To mitigate the overhead of Python in performance-critical sections, we provide a Numba JIT-compiled implementation of the most expensive components of the algorithm. This approach significantly reduces Python overhead and brings performance close to that of Scipy for large-scale problems (2 fold speed up for 1M parameters), while still preserving the additional flexibility and features offered by our implementation. The only overhead introduced is a one-time LLVM compilation during the first call; subsequent calls benefit from Numba’s caching mechanism.

Numba acceleration is enabled by default and can be disabled via the boolean parameter is_use_numba_jit. If this option is set to True but Numba is not available, a warning is issued and the code automatically falls back to the non-JIT implementation.

res = minimize_lbfgsb(
  x0=x0,
  fun=rosenbrock,
  jac=rosenbrock_grad,
  bounds=bounds,
  ftol=1e-5,
  gtol=1e-5,
  is_use_numba_jit=False
)

Note that numba comes as an optional dependency and should be installed in one of the following ways:

pip install lbfgsb[numba]
pip install lbfgsb numba

Or alternatively using conda

conda install lbfgsb[numba]
conda install lbfgsb numba

If numba is not found in your environement, a RunTime warning will be raised.

🛠️ Unique features

Here are some of the unique features that this implementation provides (to the best of our knowledge in 2025).

✨ Checkpointing

In quasi-Newtons (L-BFGS-B is one of them), the inverse of the Hessian is approximated from the series of the (m last) past gradients. If the optimization is stopped, the history is lost and restarting the optimization would results in a slower convergence (more total objective function and gradient calls) because the inverse Hessian would be reinitiated as the identity.

Let’s take the previous example with the rosenbrock objective function. But this time, the process is stopped after three calls of the function (maxfun=3)

res_3_fun = minimize_lbfgsb(
    x0=x0,
    fun=rosenbrock,
    jac=rosenbrock_grad,
    bounds=bounds,
    ftol=1e-5,
    gtol=1e-5,
    maxfun=3
)
print(res_3_fun)

It yields

message: STOP: TOTAL NO. of f AND g EVALUATIONS EXCEEDS LIMIT
success: True
status: 1
fun: 0.8619211711864526
x: [ 6.370e-01  4.913e-01]
nit: 1
jac: [-2.250e+01  1.709e+01]
nfev: 3
njev: 3
hess_inv: <2x2 LbfgsInvHessProduct with dtype=float64>

Resuming the minimization from the previous parameters state (x0=res_3_fun.x):

res_final = minimize_lbfgsb(
    x0=res_3_fun.x,
    fun=rosenbrock,
    jac=rosenbrock_grad,
    bounds=bounds,
    ftol=1e-5,
    gtol=1e-5
)
print(res_final)

gives

message: CONVERGENCE: REL_REDUCTION_OF_F_<=_FTOL
success: True
status: 0
fun: 1.349454245280619e-10
x: [ 1.000e+00  1.000e+00]
nit: 14
jac: [ 1.030e-04 -6.267e-05]
nfev: 21
njev: 21
hess_inv: <2x2 LbfgsInvHessProduct with dtype=float64>

One can see that the total number of calls to the objective function and to its gradient is higher (3+21 = 24 vs 19 originally). In addition, one needs to compute it manually because it looses track of the state when restarting L-BFGS-B. Also one sees that the final result is different.

With the checkpointing, it is possible to restore the last state in a straighforward manner and obtain strictly the same results:

res_checkpoint = minimize_lbfgsb(
    x0=res_3_fun.x,
    fun=rosenbrock,
    jac=rosenbrock_grad,
    bounds=bounds,
    ftol=1e-5,
    gtol=1e-5,
    checkpoint=res_3_fun
)
print(res_checkpoint)

yields

message: CONVERGENCE: REL_REDUCTION_OF_F_<=_FTOL
success: True
status: 0
fun: 5.834035724922196e-07
x: [ 9.994e-01  9.989e-01]
nit: 16
jac: [-2.171e-02  1.030e-02]
nfev: 19
njev: 19
hess_inv: <2x2 LbfgsInvHessProduct with dtype=float64>

Note that this time, we keep track of nfev and njev. In addition, the results is strictly the same as minimizing the function in a single run. This can be pretty useful if computing the objective function and its gradient is expensive but one is not so sure about what stopping criteria to use. TODO: add something about use case for scaling.

✨ Callback

Our implementation of L-BFGS-B allows to use several standard stop criteria:

  • The absolute value of the objective function

  • The change in objective function value between two iterations.

  • And the norm of the objective function gradient.

  • The maximum number of iterations.

  • The maximum number of objective function calls.

The callback mechanism allows to enhance these possibilities and define custom stopping criteria. For example, one can redefine the criterion based on the number of objective function evaluations (maxfun). If the algorithm should stop, the callback must return True.

import numpy as np
from scipy.optimize import OptimizeResult

def my_custom_callback(xk: np.typing.NDArray[np.float64], state: OptimizeResult) -> bool:
    # if the objective function has been called 10 times or more => stop
    if state.nfev >= 10:
        return True
    return False


res_callback = minimize_lbfgsb(
    x0=x0,
    fun=rosenbrock,
    jac=rosenbrock_grad,
    bounds=bounds,
    ftol=1e-5,
    gtol=1e-5,
    callback=my_custom_callback
)
print(res_callback)

yields

message: STOP: USER CALLBACK
success: True
status: 2
fun: 0.08537354414890966
x: [ 7.354e-01  5.284e-01]
nit: 8
jac: [ 3.115e+00 -2.478e+00]
nfev: 10
njev: 10
hess_inv: <2x2 LbfgsInvHessProduct with dtype=float64>

✨ Cost function update

Function to update the gradient sequence. This allows changing the objective function definition on the fly. In the first place this functionality is dedicated to regularized problems for which the regularization weight is computed while optimizing the cost function. In order to get a Hessian matching the new definition of fun, the gradient sequence must be updated.

``update_fun_def(x, f0, f0_old, grad, x_deque, grad_deque)
-> f0, f0_old, grad, updated grad_deque``

🏗️ Complete example with supporting paper coming Q1 2026.

✨ What’s new in 1.1.0

Version 1.1.0 adds SciPy-compatible support for jac=True. The objective function may now return both the scalar objective value and its gradient:

def objective_and_gradient(x):
    f = x[0] ** 2 + x[1] ** 2
    g = np.array([2.0 * x[0], 2.0 * x[1]])
    return f, g

res = minimize_lbfgsb(
    x0=np.array([10.0, 10.0]),
    fun=objective_and_gradient,
    jac=True,
)

This release also improves static typing support, including compatibility with ty, and makes the line-search and scalar-function wrappers more robust in numerically degenerate cases.

🔑 License

This project is released under the BSD 3-Clause License.

Copyright (c) 2025, Antoine COLLET. All rights reserved.

For more details, see the LICENSE file included in this repository.

⚠️ Disclaimer

This software is provided “as is”, without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose, or non-infringement. In no event shall the authors or copyright holders be liable for any claim, damages, or other liability, whether in an action of contract, tort, or otherwise, arising from, out of, or in connection with the software or the use or other dealings in the software.

By using this software, you agree to accept full responsibility for any consequences, and you waive any claims against the authors or contributors.

📧 Contact

For questions, suggestions, or contributions, you can reach out via:

We welcome contributions!

📚 References

[1] R. H. Byrd, P. Lu and J. Nocedal. A Limited Memory Algorithm for Bound Constrained Optimization, (1995), SIAM Journal on Scientific and Statistical Computing, 16, 5, pp. 1190-1208.

[2] C. Zhu, R. H. Byrd and J. Nocedal. L-BFGS-B: Algorithm 778: L-BFGS-B, FORTRAN routines for large scale bound constrained optimization (1997), ACM Transactions on Mathematical Software, 23, 4, pp. 550 - 560.

[3] J.L. Morales and J. Nocedal. L-BFGS-B: Remark on Algorithm 778: L-BFGS-B, FORTRAN routines for large scale bound constrained optimization (2011), ACM Transactions on Mathematical Software, 38, 1.

  • Free software: SPDX-License-Identifier: BSD-3-Clause

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lbfgsb-1.1.0.tar.gz (64.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lbfgsb-1.1.0-py3-none-any.whl (51.9 kB view details)

Uploaded Python 3

File details

Details for the file lbfgsb-1.1.0.tar.gz.

File metadata

  • Download URL: lbfgsb-1.1.0.tar.gz
  • Upload date:
  • Size: 64.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for lbfgsb-1.1.0.tar.gz
Algorithm Hash digest
SHA256 0c52938c21df44e53d64c7d3488e5b4cd4420708f2e74e79fa449cd0324120ff
MD5 22af088f9500d24f211b52336001a7b6
BLAKE2b-256 bce4d18a4595a1e89d719ed37d6bee962130827ac9753f94a85589de09dc4d6e

See more details on using hashes here.

File details

Details for the file lbfgsb-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: lbfgsb-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 51.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for lbfgsb-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e373180b67937567d55ef9c8147b338835399dc80dadb58a135c9bfb6230c0be
MD5 a90d2979e88f137e8504d5d361cbc91b
BLAKE2b-256 98840b4aa8470b6550795301e5fc4abb300096fa013b820a7c119ddd661860a8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page