A collection of tools for machine learning.
Project description
cylearn
cylearn contains a collection of tools for machine learning. The 'cy' prefix is just the abbreviation of my given name.
Warning
This package is still under development, use with caution.
Installation
You can install cylearn through PyPI:
$ pip install cylearn
Or just copy the files you need to your project, but don't forget the LICENSE.
Dependencies
- multiprocess (optional)
Submodules
There is only one submodule so far.
Data
Dataset
Loader
Data loading with cylearn.Data
cylearn.Data
provides two classes Dataset
and Loader
and several functions shuffle()
, split()
, and get_loader()
.
We first use some examples for a quick start before going into details. These examples are presented in the jupyter style.
Example 1
import numpy as np
from cylearn.Data import Dataset, Loader
x = Dataset(np.arange(5)).map(lambda i: i ** 2)
for _ in x: print(_)
0
1
4
9
16
loader_x = Loader(x, batch_size=2, shuffle=False)
for _ in loader_x: print(_)
[0, 1]
[4, 9]
[16]
Example 2
+- data/
| +- 1.png
| +- 1.txt
| +- 2.png
| +- 2.txt
| | ...
| +- 5.png
| +- 5.txt
|
+- demo.py
The above is the structure of an imaginary folder, where x.png is an image and x.txt stores an integer which is the label of x.png.
Assume the memory can only store two images once at a time because these images are too large. We now demonstrate how to make batches for these images and labels with Dataset
and Loader
.
import os
import numpy
from cylearn.Data import Dataset, get_loader
# Filter the correct files by their extension and recover their dirname.
images = Dataset(os.listdir('./data/')).filter(lambda f: '.png' in f).map(lambda f: './data/' + f)
labels = Dataset(os.listdir('./data/')).filter(lambda f: '.txt' in f).map(lambda f: './data/' + f)
for i, l in zip(images, labels): print(i, l)
./data/1.png ./data/1.txt
./data/2.png ./data/2.txt
./data/3.png ./data/3.txt
./data/4.png ./data/4.txt
./data/5.png ./data/5.txt
# Define functions for reading images and labels.
# Import statements must be inside a function to make multiprocessing work.
# This makes sure the name of the imported module is inside the local symbol table.
def read_image(path):
'''
Returns a numpy array.
'''
import numpy as np
from PIL import Image
return np.asarray(Image.open(path))
def read_label(path):
'''
Returns an integer.
'''
with open(path, 'r') as f:
return int(f.readline())
# The original data, which is a list of strings here, won't change after lazy_map() is called.
# If using map(), a list of strings will be transformed into a list of numpy arrays or integers.
# This is the key to solve the memory issue.
images = images.lazy_map(read_image)
labels = labels.lazy_map(read_label)
# Dataset.get() is a method to retrieve the stored data.
# We can see the mapping occurs when __getitem__() is invoked.
# But the stored data won't changed before and after invoking __getitem__().
for i in range(len(images)): print(images.get(i), type(images[i]))
for i in range(len(labels)): print(labels.get(i), type(labels[i]))
./data/1.png <class 'numpy.ndarray'>
./data/2.png <class 'numpy.ndarray'>
./data/3.png <class 'numpy.ndarray'>
./data/4.png <class 'numpy.ndarray'>
./data/5.png <class 'numpy.ndarray'>
./data/1.txt <class 'int'>
./data/2.txt <class 'int'>
./data/3.txt <class 'int'>
./data/4.txt <class 'int'>
./data/5.txt <class 'int'>
# Use two workers to read data.
# An error will occur if 'multiprocess' is not installed.
# Fix it by installing 'multiprocess' or not passing `parallel`.
images_loader, labels_loader = get_loader(images, labels, batch_size=2, parallel=2)
for X, y in zip(images_loader, labels_loader): print(len(X), len(y))
2 2
2 2
1 1
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file cylearn-0.1.4.tar.gz
.
File metadata
- Download URL: cylearn-0.1.4.tar.gz
- Upload date:
- Size: 22.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9fddf24e50bea922d46c963de2ef780421f50bedf4b9fae4bcc2a235a0f0c61f |
|
MD5 | 8462b6d1f5b3b070a55c762eaa19f4e8 |
|
BLAKE2b-256 | 3ccfa3d582bc835f12381762c9514740e4968c4746c0e47b0057715b20874f51 |
File details
Details for the file cylearn-0.1.4-py3-none-any.whl
.
File metadata
- Download URL: cylearn-0.1.4-py3-none-any.whl
- Upload date:
- Size: 21.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f6c7f85e24aa084798a66126beace489ab9e0a472a5799db58141c27fecb6071 |
|
MD5 | aa949023332f65542763e83b7314a38b |
|
BLAKE2b-256 | 8af65f9b9915bfea7c3b21c818ef593b0d4da7442df87a567f4a9e6b1482b4bd |