Skip to main content

A collection of tools for machine learning.

Project description

cylearn

cylearn contains a collection of tools for machine learning. The 'cy' prefix is just the abbreviation of my given name.

Warning

This package is still under development, use with caution.

Installation

You can install cylearn through PyPI:

$ pip install cylearn

Or just copy the files you need to your project, but don't forget the LICENSE.

Dependencies

Submodules

There is only one submodule so far.

  • Data
    • Dataset
    • Loader

Data loading with cylearn.Data

cylearn.Data provides two classes Dataset and Loader and several functions shuffle(), split(), and get_loader().

We first use some examples for a quick start before going into details. These examples are presented in the jupyter style.

Example 1

import numpy as np
from cylearn.Data import Dataset, Loader
x = Dataset(np.arange(5)).map(lambda i: i ** 2)
for _ in x: print(_)
0
1
4
9
16
loader_x = Loader(x, batch_size=2, shuffle=False)
for _ in loader_x: print(_)
[0, 1]
[4, 9]
[16]

Example 2

+- data/
|  +- 1.png
|  +- 1.txt
|  +- 2.png
|  +- 2.txt
|  |  ...
|  +- 5.png
|  +- 5.txt
|
+- demo.py

The above is the structure of an imaginary folder, where x.png is an image and x.txt stores an integer which is the label of x.png.

Assume the memory can only store two images once at a time because these images are too large. We now demonstrate how to make batches for these images and labels with Dataset and Loader.

import os
import numpy
from cylearn.Data import Dataset, get_loader
# Filter the correct files by their extension and recover their dirname.
images = Dataset(os.listdir('./data/')).filter(lambda f: '.png' in f).map(lambda f: './data/' + f)
labels = Dataset(os.listdir('./data/')).filter(lambda f: '.txt' in f).map(lambda f: './data/' + f)
for i, l in zip(images, labels): print(i, l)
./data/1.png ./data/1.txt
./data/2.png ./data/2.txt
./data/3.png ./data/3.txt
./data/4.png ./data/4.txt
./data/5.png ./data/5.txt
# Define functions for reading images and labels.
# Import statements must be inside a function to make multiprocessing work.
# This makes sure the name of the imported module is inside the local symbol table.
def read_image(path):
    '''
    Returns a numpy array.
    '''
    import numpy as np
    from PIL import Image
    return np.asarray(Image.open(path))

def read_label(path):
    '''
    Returns an integer.
    '''
    with open(path, 'r') as f:
        return int(f.readline())
# The original data, which is a list of strings here, won't change after lazy_map() is called.
# If using map(), a list of strings will be transformed into a list of numpy arrays or integers.
# This is the key to solve the memory issue.
images = images.lazy_map(read_image)
labels = labels.lazy_map(read_label)
# Dataset.get() is a method to retrieve the stored data.
# We can see the mapping occurs when __getitem__() is invoked.
# But the stored data won't changed before and after invoking __getitem__().
for i in range(len(images)): print(images.get(i), type(images[i]))
for i in range(len(labels)): print(labels.get(i), type(labels[i]))
./data/1.png <class 'numpy.ndarray'>
./data/2.png <class 'numpy.ndarray'>
./data/3.png <class 'numpy.ndarray'>
./data/4.png <class 'numpy.ndarray'>
./data/5.png <class 'numpy.ndarray'>
./data/1.txt <class 'int'>
./data/2.txt <class 'int'>
./data/3.txt <class 'int'>
./data/4.txt <class 'int'>
./data/5.txt <class 'int'>
# Use two workers to read data.
# An error will occur if 'multiprocess' is not installed.
# Fix it by installing 'multiprocess' or not passing `parallel`.
images_loader, labels_loader = get_loader(images, labels, batch_size=2, parallel=2)
for X, y in zip(images_loader, labels_loader): print(len(X), len(y))
2 2
2 2
1 1

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cylearn-0.1.4.tar.gz (22.1 kB view details)

Uploaded Source

Built Distribution

cylearn-0.1.4-py3-none-any.whl (21.6 kB view details)

Uploaded Python 3

File details

Details for the file cylearn-0.1.4.tar.gz.

File metadata

  • Download URL: cylearn-0.1.4.tar.gz
  • Upload date:
  • Size: 22.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for cylearn-0.1.4.tar.gz
Algorithm Hash digest
SHA256 9fddf24e50bea922d46c963de2ef780421f50bedf4b9fae4bcc2a235a0f0c61f
MD5 8462b6d1f5b3b070a55c762eaa19f4e8
BLAKE2b-256 3ccfa3d582bc835f12381762c9514740e4968c4746c0e47b0057715b20874f51

See more details on using hashes here.

File details

Details for the file cylearn-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: cylearn-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 21.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for cylearn-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 f6c7f85e24aa084798a66126beace489ab9e0a472a5799db58141c27fecb6071
MD5 aa949023332f65542763e83b7314a38b
BLAKE2b-256 8af65f9b9915bfea7c3b21c818ef593b0d4da7442df87a567f4a9e6b1482b4bd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page