Skip to main content

Code-generation for various ML models into native code.

Project description


GitHub Actions Status Coverage Status License: MIT Python Versions PyPI Version Downloads

m2cgen (Model 2 Code Generator) - is a lightweight library which provides an easy way to transpile trained statistical models into a native code (Python, C, Java, Go, JavaScript, Visual Basic, C#, PowerShell, R, PHP, Dart, Haskell, Ruby, F#).


Supported Python version is >= 3.6.

pip install m2cgen

Supported Languages

  • C
  • C#
  • Dart
  • F#
  • Go
  • Haskell
  • Java
  • JavaScript
  • PHP
  • PowerShell
  • Python
  • R
  • Ruby
  • Visual Basic (VBA-compatible)

Supported Models

Classification Regression
  • scikit-learn
    • LogisticRegression
    • LogisticRegressionCV
    • PassiveAggressiveClassifier
    • Perceptron
    • RidgeClassifier
    • RidgeClassifierCV
    • SGDClassifier
  • lightning
    • AdaGradClassifier
    • CDClassifier
    • FistaClassifier
    • SAGAClassifier
    • SAGClassifier
    • SDCAClassifier
    • SGDClassifier
  • scikit-learn
    • ARDRegression
    • BayesianRidge
    • ElasticNet
    • ElasticNetCV
    • GammaRegressor
    • HuberRegressor
    • Lars
    • LarsCV
    • Lasso
    • LassoCV
    • LassoLars
    • LassoLarsCV
    • LassoLarsIC
    • LinearRegression
    • OrthogonalMatchingPursuit
    • OrthogonalMatchingPursuitCV
    • PassiveAggressiveRegressor
    • PoissonRegressor
    • RANSACRegressor(only supported regression estimators can be used as a base estimator)
    • Ridge
    • RidgeCV
    • SGDRegressor
    • TheilSenRegressor
    • TweedieRegressor
  • StatsModels
    • Generalized Least Squares (GLS)
    • Generalized Least Squares with AR Errors (GLSAR)
    • Generalized Linear Models (GLM)
    • Ordinary Least Squares (OLS)
    • [Gaussian] Process Regression Using Maximum Likelihood-based Estimation (ProcessMLE)
    • Quantile Regression (QuantReg)
    • Weighted Least Squares (WLS)
  • lightning
    • AdaGradRegressor
    • CDRegressor
    • FistaRegressor
    • SAGARegressor
    • SAGRegressor
    • SDCARegressor
  • scikit-learn
    • LinearSVC
    • NuSVC
    • SVC
  • lightning
    • KernelSVC
    • LinearSVC
  • scikit-learn
    • LinearSVR
    • NuSVR
    • SVR
  • lightning
    • LinearSVR
  • DecisionTreeClassifier
  • ExtraTreeClassifier
  • DecisionTreeRegressor
  • ExtraTreeRegressor
Random Forest
  • ExtraTreesClassifier
  • LGBMClassifier(rf booster only)
  • RandomForestClassifier
  • XGBRFClassifier
  • ExtraTreesRegressor
  • LGBMRegressor(rf booster only)
  • RandomForestRegressor
  • XGBRFRegressor
  • LGBMClassifier(gbdt/dart/goss booster only)
  • XGBClassifier(gbtree(including boosted forests)/gblinear booster only)
    • LGBMRegressor(gbdt/dart/goss booster only)
    • XGBRegressor(gbtree(including boosted forests)/gblinear booster only)

    You can find versions of packages with which compatibility is guaranteed by CI tests here. Other versions can also be supported but they are untested.

    Classification Output

    Linear / Linear SVM / Kernel SVM


    Scalar value; signed distance of the sample to the hyperplane for the second class.


    Vector value; signed distance of the sample to the hyperplane per each class.


    The output is consistent with the output of LinearClassifierMixin.decision_function.



    Scalar value; signed distance of the sample to the hyperplane for the second class.


    Vector value; one-vs-one score for each class, shape (n_samples, n_classes * (n_classes-1) / 2).


    The output is consistent with the output of BaseSVC.decision_function when the decision_function_shape is set to ovo.

    Tree / Random Forest / Boosting


    Vector value; class probabilities.


    Vector value; class probabilities.


    The output is consistent with the output of the predict_proba method of DecisionTreeClassifier / ExtraTreeClassifier / ExtraTreesClassifier / RandomForestClassifier / XGBRFClassifier / XGBClassifier / LGBMClassifier.


    Here's a simple example of how a linear model trained in Python environment can be represented in Java code:

    from sklearn.datasets import load_boston
    from sklearn import linear_model
    import m2cgen as m2c
    boston = load_boston()
    X, y =,
    estimator = linear_model.LinearRegression(), y)
    code = m2c.export_to_java(estimator)

    Generated Java code:

    public class Model {
        public static double score(double[] input) {
            return (((((((((((((36.45948838508965) + ((input[0]) * (-0.10801135783679647))) + ((input[1]) * (0.04642045836688297))) + ((input[2]) * (0.020558626367073608))) + ((input[3]) * (2.6867338193449406))) + ((input[4]) * (-17.76661122830004))) + ((input[5]) * (3.8098652068092163))) + ((input[6]) * (0.0006922246403454562))) + ((input[7]) * (-1.475566845600257))) + ((input[8]) * (0.30604947898516943))) + ((input[9]) * (-0.012334593916574394))) + ((input[10]) * (-0.9527472317072884))) + ((input[11]) * (0.009311683273794044))) + ((input[12]) * (-0.5247583778554867));

    You can find more examples of generated code for different models/languages here.


    m2cgen can be used as a CLI tool to generate code using serialized model objects (pickle protocol):

    $ m2cgen <pickle_file> --language <language> [--indent <indent>] [--function_name <function_name>]
             [--class_name <class_name>] [--module_name <module_name>] [--package_name <package_name>]
             [--namespace <namespace>] [--recursion-limit <recursion_limit>]

    Don't forget that for unpickling serialized model objects their classes must be defined in the top level of an importable module in the unpickling environment.

    Piping is also supported:

    $ cat <pickle_file> | m2cgen --language <language>


    Q: Generation fails with RuntimeError: maximum recursion depth exceeded error.

    A: If this error occurs while generating code using an ensemble model, try to reduce the number of trained estimators within that model. Alternatively you can increase the maximum recursion depth with sys.setrecursionlimit(<new_depth>).

    Q: Generation fails with ImportError: No module named <module_name_here> error while transpiling model from a serialized model object.

    A: This error indicates that pickle protocol cannot deserialize model object. For unpickling serialized model objects, it is required that their classes must be defined in the top level of an importable module in the unpickling environment. So installation of package which provided model's class definition should solve the problem.

    Q: Generated by m2cgen code provides different results for some inputs compared to original Python model from which the code were obtained.

    A: Some models force input data to be particular type during prediction phase in their native Python libraries. Currently, m2cgen works only with float64 (double) data type. You can try to cast your input data to another type manually and check results again. Also, some small differences can happen due to specific implementation of floating-point arithmetic in a target language.

    Project details

    Download files

    Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

    Files for m2cgen, version 0.9.0
    Filename, size File type Python version Upload date Hashes
    Filename, size m2cgen-0.9.0-py3-none-any.whl (73.7 kB) File type Wheel Python version py3 Upload date Hashes View
    Filename, size m2cgen-0.9.0.tar.gz (40.9 kB) File type Source Python version None Upload date Hashes View

    Supported by

    AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page