GNU Tools for python
Project description
GNUTOOLS
Modules • Code Design • Code Structure • How To Use • Docker • PythonEnv • Benchmark • Ressources •
Gnutools is a Python package that provides a few perks:
- Up to 3x speedup processing the disk by using system commands instead of python libs.
- A simple interface with optimized command executed from the system.
- A list of functions to ease the file processing in python.
You can reuse your favorite Python packages such as NumPy, SciPy and Cython to extend ZakuroCache integration.
Modules
At a granular level, Gnutools is a library that consists of the following components:
Component | Description |
---|---|
gnutools | Contains the implementation of Gnutools |
gnutools.audio | Audio processsing |
gnutools.fs | File system processing |
gnutools.test | Unit tests |
gnutools.utils | Utilitaries |
Code design
- We recommend using Docker for dev and production. Therefore we encourage its usage all other the repo.
- We have
vanilla
andsandbox
environment.Vanilla
refers to a prebuilt docker image that already contains system dependencies.Sandbox
referes a predbuilt docker image that contains the code of this repo.
- Semantic versioning https://semver.org/ . We commit fix to
a.b.x
, features toa.x.c
and stable release (master) tox.b.c
. - PR are done to
dev
and reviewed for additional features. This should only be reviewed by the engineers in the team. - PR are done to
master
for official (internal) release of the codes. This should be reviewed by the maximum number of engineers. - The ETL jobs are scatter accross sequential refinement of the data
landing/bronze/silver/gold
- Modules and scripts: Any piece of code that can of use for parent module modules should be moved at a higher level.
- eg:
functional.py
contains common funtions foretl.bronze
andetl.silver
- eg:
...
├── etl
│ ├── bronze
│ │ ├── __init__.py
│ │ └── __main__.py
│ ├── functional.py
│ ├── __init__.py
│ └── landing
│ ├── __init__.py
│ └── __main__.py
├── functional.py
├── __init__.py
...
- Modules should ideally contain a
__main__.py
that demo an exeution of the moduleetl/bronze/__main__.py
describes an etl job for the creation of the bronze paritiontrainer/__main__.py
describes the training pipeline
Code structure
from setuptools import setup
from gnutools import __version__
setup(
name="gnutools-python",
version=__version__,
packages=[
"gnutools",
"gnutools.audio",
"gnutools.concurrent",
"gnutools.fs",
"gnutools.grid",
"gnutools.tests",
"gnutools.utils",
],
long_description="".join(open("README.md", "r").readlines()),
long_description_content_type='text/markdown',
include_package_data=True,
url="https://github.com/JeanMaximilienCadic/gnutools-python",
license="MIT",
author="Jean Maximilien Cadic",
python_requires=">=3.6",
install_requires=[r.rsplit()[0] for r in open("requirements.txt")],
author_email="git@cadic.jp",
description="GNU Tools for python",
classifiers=[
"Programming Language :: Python :: 3.6",
"License :: OSI Approved :: MIT License",
],
)
How to use
To clone and run this application, you'll need Git and https://docs.docker.com/docker-for-mac/install/ and Python installed on your computer. From your command line:
Install the package:
# Clone this repository and install the code
git clone https://github.com/JeanMaximilienCadic/gnutools-python
# Go into the repository
cd gnutools-python
Makefile
Exhaustive list of make commands:
install_wheels
sandbox_cpu
sandbox_gpu
build_sandbox
push_environment
push_container_sandbox
push_container_vanilla
pull_container_vanilla
pull_container_sandbox
build_vanilla
clean
build_wheels
auto_branch
Docker
(* recommended)
To build and run the docker image
make build
make docker_run_sandbox_cpu
PythonEnv
(* not recommended)
make install_wheels
Benchmark
- Pathlib
10.1s
to scan856631
files
from pathlib import Path
results = [f for f in Path("/mnt/hdd/backup/ASR").glob("**/*.wav")]
- gnutools
3.7s
to scan856631
files
from gnutools.fs import listfiles
results = listfiles("/mnt/hdd/backup/ASR", [".wav"])
Ressources
- Vanilla: https://en.wikipedia.org/wiki/Vanilla_software
- Sandbox: https://en.wikipedia.org/wiki/Sandbox_(software_development)
- All you need is docker: https://www.theregister.com/2014/05/23/google_containerization_two_billion/
- Dev in containers : https://code.visualstudio.com/docs/remote/containers
- Delta lake partitions: https://k21academy.com/microsoft-azure/data-engineer/delta-lake/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file gnutools_python-2.2.8-py3-none-any.whl
.
File metadata
- Download URL: gnutools_python-2.2.8-py3-none-any.whl
- Upload date:
- Size: 11.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ba836cdafab0232111801365f4ed40da0bc5e8e8226cc092ce57218207cf4870 |
|
MD5 | 5ab218a788a380b5d54ed09ef0f62adb |
|
BLAKE2b-256 | ea030966c3a70142867299ce0a38b22e14695cf90c310710476caac0c4aa0478 |