5 projects
torch-gqc
Adaptive gradient accumulation for improved training efficiency
torch-schedule-anything
Schedule any optimizer hyperparameter, not just learning rate
arcAGI2024
Support mechanism for the arc agi project
trax_act_controller
An ACT (Adaptive Computation Time) controller with extensive error handling for trax.
supertransformerlib
A set of tools for performing ensemble based supertransformer