11 projects
drlx
DRLX is a library for distributed training of diffusion models via RL
code-tokenizers
Aligning BPE and AST
squeakily
A library for squeakily cleaning and filtering language datasets.
perplexed
Find out where your model is perplexed!
hf-clean-benchmarks
This repository contains code for cleaning your training data of benchmark data to help combat data snooping.
function-parser
This library contains various utils to parse GitHub repositories into function definition and docstring pairs. It is based on tree-sitter to parse code into ASTs and apply heuristics to parse metadata in more details. Currently, it supports 6 languages: Python, Java, Go, Php, Ruby, and Javascript. It also parses function calls and links them with their definitions for Python.
cute-ranking
A cute little python module for calculating different ranking metrics. Based entirely on the gist from https://gist.github.com/bwhite/3726239.
ncoop57-mages
A description of your project
mages
Cute little python module full of magic and wonder
cute-deltas
A cute little python module for finding the deltas between different things.
fast-trees
Cute little python module that sits atop the tree-sitter library to provide an easier to use and cleaner interface for interacting with source code.