Lightwood is Legos for Machine Learning.

These details have not been verified by PyPI

Project description

Lightwood

Lightwood is an AutoML framework that enables you to generate and customize machine learning pipelines declarative syntax called JSON-AI.

Our goal is to make the data science/machine learning (DS/ML) life cycle easier by allowing users to focus on what they want to do their data without needing to write repetitive boilerplate code around machine learning and data preparation. Instead, we enable you to focus on the parts of a model that are truly unique and custom.

Lightwood works with a variety of data types such as numbers, dates, categories, tags, text, arrays and various multimedia formats. These data types can be combined together to solve complex problems. We also support a time-series mode for problems that have between-row dependencies.

Our JSON-AI syntax allows users to change any and all parts of the models Lightwood automatically generates. The syntax outlines the specifics details in each step of the modeling pipeline. Users may override default values (for example, changing the type of a column) or alternatively, entirely replace steps with their own methods (ex: use a random forest model for a predictor). Lightwood creates a "JSON-AI" object from this syntax which can then be used to automatically generate python code to represent your pipeline.

For details on how to generate JSON-AI syntax and how Lightwood works, check out the Lightwood Philosophy.

Lightwood Philosophy

Lightwood abstracts the ML pipeline into 3 core steps:

(1) Pre-processing and data cleaning
(2) Feature engineering
(3) Model building and training

Lightwood internals

i) Pre-processing and cleaning

For each column in your dataset, Lightwood will identify the suspected data type (numeric, categorical, etc.) via a brief statistical analysis. From this, it will generate a JSON-AI syntax.

If the user keeps default behavior, Lightwood will perform a brief pre-processing approach to clean each column according to its identified data type. From there, it will split the data into train/dev/test splits.

The cleaner and splitter objects respectively refer to the pre-processing and the data splitting functions.

ii) Feature Engineering

Data can be converted into features via "encoders". Encoders represent the rules for transforming pre-processed data into a numerical representations that a model can be used.

Encoders can be rule-based or learned. A rule-based encoder transforms data per a specific set of instructions (ex: normalized numerical data) whereas a learned encoder produces a representation of the data after training (ex: a "[CLS]" token in a language model).

Encoders are assigned to each column of data based on the data type; users can override this assignment either at the column-based level or at the data-type based level. Encoders inherit from the BaseEncoder class.

iii) Model Building and Training

We call a predictive model that intakes encoded feature data and outputs a prediction for the target of interest a mixer model. Users can either use Lightwood's default mixers or create their own approaches inherited from the BaseMixer class.

We predominantly use PyTorch based approaches, but can support other models.

Usage

We invite you to check out our documentation for specific guidelines and tutorials! Please stay tuned for updates and changes.

Quick use cases

Lightwood works with pandas.DataFrames. Once a DataFrame is loaded, defined a "ProblemDefinition" via a dictionary. The only thing a user needs to specify is the name of the column to predict (via the key target).

Create a JSON-AI syntax from the command json_ai_from_problem. Lightwood can then use this object to automatically generate python code filling in the steps of the ML pipeline via code_from_json_ai.

You can make a Predictor object, instantiated with that code via predictor_from_code.

To train a Predictor end-to-end, starting with unprocessed data, users can use the predictor.learn() command with the data.

import pandas as pd
from lightwood.api.high_level import (
    ProblemDefinition,
    json_ai_from_problem,
    code_from_json_ai,
    predictor_from_code,
)

if __name__ == '__main__':
    # Load a pandas dataset
    df = pd.read_csv("https://raw.githubusercontent.com/mindsdb/benchmarks/main/benchmarks/datasets/hdi/data.csv"
    )

    # Define the prediction task by naming the target column
    pdef = ProblemDefinition.from_dict(
        {
            "target": "Development Index",  # column you want to predict
        }
    )

    # Generate JSON-AI code to model the problem
    json_ai = json_ai_from_problem(df, problem_definition=pdef)

    # OPTIONAL - see the JSON-AI syntax
    # print(json_ai.to_json())

    # Generate python code
    code = code_from_json_ai(json_ai)

    # OPTIONAL - see generated code
    # print(code)

    # Create a predictor from python code
    predictor = predictor_from_code(code)

    # Train a model end-to-end from raw data to a finalized predictor
    predictor.learn(df)

    # Make the train/test splits and show predictions for a few examples
    test_df = predictor.split(predictor.preprocess(df))["test"]
    preds = predictor.predict(test_df).iloc[:10]
    print(preds)

BYOM: Bring your own models

Lightwood supports user architectures/approaches so long as you follow the abstractions provided within each step.

Our tutorials provide specific use cases for how to introduce customization into your pipeline. Check out "custom cleaner", "custom splitter", "custom explainer", and "custom mixer". Stay tuned for further updates.

Installation

You can install Lightwood as follows:

pip3 install lightwood

Note: depending on your environment, you might have to use pip instead of pip3 in the above command.

However, we recommend creating a python virtual environment.

Setting up a dev environment

Python version should be in the range >=3.8, < 3.11
Clone lightwood
cd lightwood && pip install -r requirements.txt && pip install -r requirements_image.txt
Add it to your python path (e.g. by adding export PYTHONPATH='/where/you/cloned/lightwood':$PYTHONPATH as a newline at the end of your ~/.bashrc file)
Check that the unittests are passing by going into the directory where you cloned lightwood and running: python -m unittest discover tests

If python default to python2.x on your environment use python3 and pip3 instead

Currently, the preferred environment for working with lightwood is visual studio code, a very popular python IDE. However, any IDE should work. While we don't have guides for those, please feel free to use the following section as a template for VSCode, or to contribute your own tips and tricks to set up other IDEs.

Setting up a VSCode environment

Install and enable setting sync using github account (if you use multiple machines)
Install pylance (for types) and make sure to disable pyright
Go to Python > Lint: Enabled and disable everything but flake8
Set python.linting.flake8Path to the full path to flake8 (which flake8)
Set Python › Formatting: Provider to autopep8
Add --global-config=<path_to>/lightwood/.flake8 and --experimental to Python › Formatting: Autopep8 Args
Install live share and live share whiteboard

Contribute to Lightwood

We love to receive contributions from the community and hear your opinions! We want to make contributing to Lightwood as easy as it can be.

Being part of the core Lightwood team is possible to anyone who is motivated and wants to be part of that journey!

Please continue reading this guide if you are interested in helping democratize machine learning.

How can you help us?

Report a bug
Improve documentation
Solve an issue
Propose new features
Discuss feature implementations
Submit a bug fix
Test Lightwood with your own data and let us know how it went!

Code contributions

In general, we follow the "fork-and-pull" git workflow. Here are the steps:

Fork the Lightwood repository
Make changes and commit them
Make sure that the CI tests pass. You can run the test suite locally with flake8 . to check style and python -m unittest discover tests to run the automated tests. This doesn't guarantee it will pass remotely since we run on multiple envs, but should work in most cases.
Push your local branch to your fork
Submit a pull request from your repo to the main branch of mindsdb/lightwood so that we can review your changes. Be sure to merge the latest from main before making a pull request!

Note: You will need to sign a CLI agreement for the code since lightwood is under a GPL license.

Feature and Bug reports

We use GitHub issues to track bugs and features. Report them by opening a new issue and fill out all of the required inputs.

Code review process

Pull request (PR) reviews are done on a regular basis. If your PR does not address a previous issue, please make an issue first.

If your change has a chance to affecting performance we will run our private benchmark suite to validate it.

Please, make sure you respond to our feedback/questions.

Community

If you have additional questions or you want to chat with MindsDB core team, you can join our community: .

To get updates on Lightwood and MindsDB’s latest announcements, releases, and events, sign up for our Monthly Community Newsletter.

Join our mission of democratizing machine learning and allowing developers to become data scientists!

Contributor Code of Conduct

Please note that this project is released with a Contributor Code of Conduct. By participating in this project, you agree to abide by its terms.

Current contributors

License

Lightwood License

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

25.12.1.0

Dec 2, 2025

25.9.1.0

Sep 2, 2025

25.7.5.1

Jul 29, 2025

25.5.2.2

May 15, 2025

25.5.2.1

May 11, 2025

25.3.3.3

Mar 25, 2025

25.3.3.0

Mar 24, 2025

25.2.2.0

Feb 14, 2025

24.12.3.0

Dec 19, 2024

24.12.1.0

Dec 6, 2024

24.11.4.0

Nov 28, 2024

24.5.2.1

Nov 27, 2024

24.5.2.0

May 15, 2024

24.3.3.1

Mar 19, 2024

23.12.4.0

Dec 25, 2023

23.11.1.0

Oct 26, 2023

23.8.1.0

Aug 7, 2023

23.7.1.0

Jul 3, 2023

23.6.4.0

Jun 26, 2023

23.6.2.0

Jun 13, 2023

23.5.1.1

Jun 7, 2023

23.5.1.0

May 7, 2023

23.4.3.0

Apr 19, 2023

23.3.2.0

Mar 14, 2023

23.2.1.1

Mar 16, 2023

23.2.1.0

Feb 9, 2023

23.1.2.1

Jan 17, 2023

23.1.2.0

Jan 16, 2023

22.12.2.0

Dec 15, 2022

22.12.1.1

Dec 8, 2022

22.12.1.0

Dec 2, 2022

22.11.2.0

Nov 11, 2022

22.10.4.2

Nov 1, 2022

22.10.4.1

Oct 28, 2022

22.10.4.0

Oct 26, 2022

22.9.1.0

Sep 6, 2022

22.8.4.3

Aug 31, 2022

22.8.4.2

Aug 29, 2022

22.8.4.1

Aug 23, 2022

22.8.4.0

Aug 22, 2022

22.8.1.0

Aug 5, 2022

22.7.4.0

Jul 26, 2022

22.7.3.0

Jul 20, 2022

22.7.2.2

Jul 13, 2022

22.7.2.1

Jul 12, 2022

22.7.2.0

Jul 11, 2022

22.6.1.2

Jun 3, 2022

22.6.1.1

Jun 3, 2022

22.6.1.0

Jun 3, 2022

22.5.4.0

May 24, 2022

22.5.2.0

May 13, 2022

22.5.1.0

May 6, 2022

22.4.4.0

Apr 30, 2022

22.4.1.0

Apr 1, 2022

22.2.3.0

Feb 21, 2022

22.2.1.1

Feb 8, 2022

22.2.1.0

Feb 3, 2022

22.1.4.0

Jan 26, 2022

1.9.0

Dec 27, 2021

1.8.0

Dec 10, 2021

1.7.0

Nov 17, 2021

1.6.1

Nov 3, 2021

1.6.0

Nov 1, 2021

1.5.0

Oct 22, 2021

1.4.0

Oct 11, 2021

1.3.0

Oct 7, 2021

1.2.0

Sep 23, 2021

1.1.0

Aug 28, 2021

1.0.0

Aug 13, 2021

0.69.1

Jun 25, 2021

0.69.0

Jun 23, 2021

0.68.0

Jun 9, 2021

0.67.0

May 24, 2021

0.66.0

May 19, 2021

0.65.0

May 5, 2021

0.64.2

Apr 29, 2021

0.64.1

Apr 15, 2021

0.64.0

Apr 7, 2021

0.63.0

Mar 29, 2021

0.62.3

Mar 19, 2021

0.62.2

Mar 19, 2021

0.62.1

Mar 12, 2021

0.62.0

Mar 4, 2021

0.61.1

Feb 25, 2021

0.61.0

Feb 22, 2021

0.60.1

Feb 16, 2021

0.60.0

Feb 12, 2021

0.59.0

Feb 11, 2021

0.58.0

Feb 4, 2021

0.57.2

Feb 1, 2021

0.57.1

Feb 1, 2021

0.57.0

Jan 28, 2021

0.56.1

Jan 26, 2021

0.56.0

Jan 21, 2021

0.55.0

Jan 11, 2021

0.54.0

Dec 22, 2020

0.53.0

Dec 21, 2020

0.52.0

Dec 14, 2020

0.51.0

Dec 7, 2020

0.50.0

Nov 30, 2020

0.49.0

Nov 23, 2020

0.48.0

Nov 16, 2020

0.47.0

Nov 2, 2020

0.46.1

Oct 29, 2020

0.46.0

Oct 27, 2020

0.45.0

Oct 12, 2020

0.44.0

Oct 6, 2020

0.43.0

Oct 6, 2020

0.42.0

Sep 29, 2020

0.41.0

Sep 16, 2020

0.40.0

Sep 15, 2020

0.39.0

Sep 7, 2020

0.38.0

Sep 1, 2020

0.37.0

Aug 24, 2020

0.36.0

Aug 17, 2020

0.35.0

Aug 13, 2020

0.34.0

Aug 5, 2020

0.33.0

Aug 5, 2020

0.32.0

Jul 29, 2020

0.31

Jul 29, 2020

0.29.0

Jul 5, 2020

0.28.0

Jul 1, 2020

0.27.2

Jun 15, 2020

0.27.1

Jun 15, 2020

0.27.0

Jun 4, 2020

0.26.0

Jun 4, 2020

0.25.0

May 6, 2020

0.24.3

May 5, 2020

0.24.1

May 4, 2020

0.24.0

May 4, 2020

0.23.1

Apr 29, 2020

0.23.0

Apr 28, 2020

0.22.4

Apr 24, 2020

0.22.3

Apr 20, 2020

0.22.2

Apr 19, 2020

0.21.2

Apr 7, 2020

0.21.1

Apr 7, 2020

0.21.0

Mar 30, 2020

0.20.1

Mar 26, 2020

0.20.0

Mar 26, 2020

0.19.3

Mar 25, 2020

0.19.2

Mar 25, 2020

0.19.1

Mar 23, 2020

0.19.0

Mar 22, 2020

0.18.1

Mar 16, 2020

0.18.0

Mar 9, 2020

0.17.2

Mar 5, 2020

0.17.1

Feb 21, 2020

0.17.0

Feb 18, 2020

0.16.4

Feb 17, 2020

0.16.3

Feb 13, 2020

0.16.1

Feb 12, 2020

0.16.0

Feb 11, 2020

0.15.8

Feb 10, 2020

0.15.6

Feb 4, 2020

0.15.5

Jan 29, 2020

0.15.4

Jan 29, 2020

0.15.3

Jan 28, 2020

0.15.1

Jan 20, 2020

0.14.2

Jan 20, 2020

0.14.1

Jan 13, 2020

0.14.0

Jan 10, 2020

0.13.7

Dec 31, 2019

0.13.6

Dec 26, 2019

0.13.5

Dec 17, 2019

0.13.4

Dec 16, 2019

0.13.3

Dec 13, 2019

0.13.2

Dec 11, 2019

0.13.0

Dec 9, 2019

0.12.2

Dec 4, 2019

0.12.0

Dec 1, 2019

0.11.8

Nov 28, 2019

0.11.7

Nov 28, 2019

0.11.6

Nov 28, 2019

0.11.5

Nov 27, 2019

0.11.4

Nov 26, 2019

0.11.3

Nov 9, 2019

0.11.2

Nov 1, 2019

0.11.0

Oct 30, 2019

0.10.2

Oct 19, 2019

0.10.1

Oct 17, 2019

0.9.12

Oct 8, 2019

0.9.11

Oct 2, 2019

0.9.10

Sep 30, 2019

0.9.9

Sep 28, 2019

0.9.8

Sep 28, 2019

0.9.7

Sep 28, 2019

0.9.3

Sep 27, 2019

0.9.1

Sep 25, 2019

0.9.0

Sep 13, 2019

0.8.9

Aug 31, 2019

0.8.8

Aug 29, 2019

0.8.7

Aug 27, 2019

0.8.6

Aug 25, 2019

0.8.3

Aug 19, 2019

0.8.2

Aug 12, 2019

0.8.1

Aug 12, 2019

0.8.0

Aug 10, 2019

0.7.9

Aug 2, 2019

0.7.8

Jul 26, 2019

0.7.7

Jul 25, 2019

0.7.6

Jul 24, 2019

0.7.5

Jul 23, 2019

0.7.3

Jul 17, 2019

0.7.0

Jul 13, 2019

0.6.9

Jul 13, 2019

0.6.8

Jul 1, 2019

0.6.7

Jun 24, 2019

0.6.6

Jun 17, 2019

0.6.5

Jun 17, 2019

0.6.4

Jun 13, 2019

0.6.3

Jun 13, 2019

0.6.2

Jun 12, 2019

0.6.1

Jun 7, 2019

0.6.0

Jun 7, 2019

0.5.0

Jun 6, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lightwood-25.12.1.0.tar.gz (167.5 kB view details)

Uploaded Dec 2, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lightwood-25.12.1.0-py3-none-any.whl (226.4 kB view details)

Uploaded Dec 2, 2025 Python 3

File details

Details for the file lightwood-25.12.1.0.tar.gz.

File metadata

Download URL: lightwood-25.12.1.0.tar.gz
Upload date: Dec 2, 2025
Size: 167.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.2.1 CPython/3.10.19 Linux/6.11.0-1018-azure

File hashes

Hashes for lightwood-25.12.1.0.tar.gz
Algorithm	Hash digest
SHA256	`df0fa1efab6731530f402f0100e9ba342de039a96f2aaa3f2c910a40031c2ab5`
MD5	`b961dbb4bccd645a2b5419af436d6af3`
BLAKE2b-256	`eadffeda83e2b92b6b0da517ffd40a0d6a6cef2149cf48cf0cad5f7ab6020357`

See more details on using hashes here.

File details

Details for the file lightwood-25.12.1.0-py3-none-any.whl.

File metadata

Download URL: lightwood-25.12.1.0-py3-none-any.whl
Upload date: Dec 2, 2025
Size: 226.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.2.1 CPython/3.10.19 Linux/6.11.0-1018-azure

File hashes

Hashes for lightwood-25.12.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a86358a437c65bc3ead2bf96b2eb662f79fe651f236145045b3f134a3c82a17d`
MD5	`560a09c6330cdfbc3dd1105fb2b085dc`
BLAKE2b-256	`5561b7b06de202499414e0239244048580c0e6195718a8361a79ed0dfc559fe9`

See more details on using hashes here.

lightwood 25.12.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Lightwood

Lightwood Philosophy

i) Pre-processing and cleaning

ii) Feature Engineering

iii) Model Building and Training

Usage

Quick use cases

BYOM: Bring your own models

Installation

Setting up a dev environment

Setting up a VSCode environment

Contribute to Lightwood

How can you help us?

Code contributions

Feature and Bug reports

Code review process

Community

Contributor Code of Conduct

Current contributors

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes