No project description provided
Project description
Kubeflow Pipelines
Purpose
The purpose of this repository is a place to learn new things. From infrastructure to new LLM training techniques.
State
Data: Currently I have a functional transformer pipeline which learns common n-grams in human speech. This is due to the lack of data primarily.
Model: The transformer model is tested, and has been compared to other implementation. It's prediction time is sensible and small enough for my local hardware.
Future Work
Improved ML: Currently only supervised learning is employed, I expect the performance will plateau without reinforcement learning with human feedback (RLHF). This is to be added to the reddit pipeline.
Improved Logging: After training I'd like a set of input output pairs logged to MlFlow for increased transparency to output deficiencies.
Pipelines
- Reddit Iteratively learning to create engaging posts with reddit data.
Baremetal Usage
- Run notebook
notebooks/reddit_training.ipynb- Define hyperparameters that make sense for your system
- Metrics are recorded locally and can be observed with locally running mlflow or with the
verbose=trueoptions, test examples are printed tostandard out
Kubeflow Usage
- Upload notebook
notebooks/reddit_pipeline.ipynb - Define environment variables
- Run cells defining training pipeline
- Run/Schedule pipeline
Pipeline Description
The pipeline is ran each day. In this process this is done:
- New data is downloaded
- The current best model is downloaded and evaluated
- If the model has degraded or is not proficient, training is ran
At the time of writing this I only have 500 samples in training set, so a test BLEU score of 0 is expected, though I hope in the coming days it will improve.
The pipeline records metrics in mlflow and records the hyperparameters/logs/outputs of each run.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ajperry_pipeline-0.1.16.tar.gz.
File metadata
- Download URL: ajperry_pipeline-0.1.16.tar.gz
- Upload date:
- Size: 11.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.4 CPython/3.10.18 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
713dd9379a34716fa7ccde06981fcf45563fd48762f70185aa51d1c54d67fc0c
|
|
| MD5 |
0df26d6d80ad6dfc7fbb43cbf1496d64
|
|
| BLAKE2b-256 |
f8f92f9a8811293c6f8e86cf59b938adf8b669075ad7d40349a7743159513a76
|
File details
Details for the file ajperry_pipeline-0.1.16-py3-none-any.whl.
File metadata
- Download URL: ajperry_pipeline-0.1.16-py3-none-any.whl
- Upload date:
- Size: 16.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.4 CPython/3.10.18 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3800b4a57a370b7cea242d23bff409a5201bd17152366425ef18c2657c80683f
|
|
| MD5 |
d219d652b872f12b10947ea105cce967
|
|
| BLAKE2b-256 |
7dfbc504ab8d6f02f169873aa8b515d16c0628b3392cdedb8dd61eed59665926
|