Skip to main content

Some nice ML overhaul

Project description

k1lib

This library enables piping in Python, and has a lot of prebuilt piping tools to support this workflow.

Installation

  pip install k1lib[all]

This will install lots of heavy dependencies like PyTorch. If you want to install the leanest version of the library, do this instead:

  pip install k1lib

To use it in a notebook, do this:

  from k1lib.imports import *

Check out the source code for "k1lib.imports" if you're curious what it's importing. If you hate * imports for whatever reason, you can import cli tools individually, like this::

  from k1lib.cli import ls, cat, grep, apply, batched, display

Examples

# returns [0, 1, 4, 9, 16], kinda like map
range(5) | apply(lambda x: x**2) | deref()

# plotting the function y = x^2
x = np.linspace(-2, 2); y = x**2
plt.plot(x, y)         # normal way
[x, y] | ~aS(plt.plot) # pipe way

# plotting the functions y = x**2, y = x**3, y = x**4
x = np.linspace(-2, 2)
[2, 3, 4] | apply(lambda exp: [x, x**exp]) | ~apply(plt.plot) | deref()

# loading csv file and displaying first 10 rows in a nice table
cat("abc.csv") | apply(lambda x: x.split(",")) | display()

# searching for "gene_name: ..." lines in a file and display a nice overview of just the gene names alone
cat("abc.txt") | grep("gene_name: ") | apply(lambda x: x.split(": ")[1]) | batched(4) | display()

# manipulate numpy arrays and pytorch tensors
a = np.random.randn(3, 4, 5)
a | transpose()     | shape() # returns (4, 3, 5)
a | transpose(0, 2) | shape() # returns (5, 4, 3)

# loading images from categories and splitting them into train and valid sets. Image url: dataset/categoryA/image1.jpg
train, valid = ls("dataset") | apply(ls() | splitW()) | transpose() | deref()
# shape of output is (train/valid, category, image url). It was (category, train/valid, image url) before going through transpose()

# executing task in multiple processes
range(10_000_000) | batched(1_000_000) | applyMp(toSum()) | toSum()
# this splits numbers from 0 to 10M into 10 batches, and then sum each batch in parallel, and then sum the results of each batch

# executing task in multiple processes on multiple computers
range(10_000_000) | batched(1_000_000) | applyCl(toSum()) | toSum()

You can combine these "cli tools" together in really complex ways to do really complex manipulation really fast and with little code. Hell, you can even create a full blown PyTorch dataloader from scratch where you're in control of every detail, operating in 7 dimensions, in multiple processes on multiple nodes, in just 6 lines of code. Check over the basics of it here: k1lib.cli.

After doing that, you can check out the tutorials to get a large overview of how everything integrates together nicely.

Some details

Contacts?

If you found bugs, open a new issue on the repo itself. If you want to have a chat, then email me at 157239q@gmail.com

If you want to get an overview of how the repo is structured, read contributing.md

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

k1lib-1.3.6.5.linux-x86_64.tar.gz (2.8 MB view hashes)

Uploaded Source

Built Distribution

k1lib-1.3.6.5-py3-none-any.whl (2.5 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page