Skip to main content

crunch-convert - Conversion module for the CrunchDAO Platform

Project description

Crunch Convert Tool

PyTest

This Python library is designed for the CrunchDAO Platform, exposing the conversion tools in a very small CLI.

Installation

Use pip to install the crunch-convert.

pip install --upgrade crunch-convert

Usage

Convert a Notebook

crunch-convert notebook ./my-notebook.ipynb --write-requirements --write-embedded-files
Show a programmatic way
from crunch_convert.notebook import extract_from_file
from crunch_convert.requirements_txt import CrunchHubWhitelist, format_files_from_imported

flatten = extract_from_file("notebook.ipynb")

# Write the main.py
with open("main.py", "w") as fd:
  fd.write(flatten.source_code)

# Map the imported requirements using the Crunch Hub's whitelist
whitelist = CrunchHubWhitelist()
requirements_files = format_files_from_imported(
  flatten.requirements,
  header="extracted from a notebook",
  whitelist=whitelist,
)

# Write the requirements.txt files (Python and/or R)
for requirement_language, content in requirements_files.items():
  with open(requirement_language.txt_file_name, "w") as fd:
    fd.write(content)

# Write the embedded files
for embedded_file in flatten.embedded_files:
  with open(embedded_file.normalized_path, "w") as fd:
    fd.write(embedded_file.content)

Freeze Requirements

crunch-convert requirements-txt freeze requirements.user.txt
Show a programmatic way
from crunch_convert import RequirementLanguage
from crunch_convert.requirements_txt import CrunchHubVersionFinder, CrunchHubWhitelist, format_files_from_named, freeze, parse_from_file

whitelist = CrunchHubWhitelist()
version_finder = CrunchHubVersionFinder()

# Open the requirements.txt to freeze
with open("requirements.txt", "r") as fd:
    content = fd.read()

# Parse it into NamedRequirement
requirements = parse_from_file(
    language=RequirementLanguage.PYTHON,
    file_content=content
)

# Freeze them
frozen_requirements = freeze(
    requirements=requirements,

    # Only freeze if required by the whitelist
    freeze_only_if_required=True,
    whitelist=whitelist,

    version_finder=version_finder,
)

# Format the new requirements.txt using now frozen requirements
frozen_requirements_files = format_files_from_named(
    frozen_requirements,
    header="frozen from registry",
    whitelist=whitelist,
)

# Write to the new file
with open("requirements.frozen.txt", "w") as fd:
    content = frozen_requirements_files[RequirementLanguage.PYTHON]
    fd.write(content)

[!TIP] The output of format_files_from_imported() can be re-parsed right after, no need to first store it in a file.

Features

Automatic line commenting

Only includes the functions, imports, and classes will be kept.

Everything else is commented out to prevent side effects when your code is loaded into the cloud environment. (e.g. when you're exploring the data, debugging your algorithm, or doing visualizating using Matplotlib, etc.)

You can prevent this behavior by using special comments to tell the system to keep part of your code:

  • To start a section that you want to keep, write: @crunch/keep:on
  • To end the section, write: @crunch/keep:off
# @crunch/keep:on

# keep global initialization
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# keep constants
TRAIN_DEPTH = 42
IMPORTANT_FEATURES = [ "a", "b", "c" ]

# @crunch/keep:off

# this will be ignored
x, y = crunch.load_data()

def train(...):
    ...

The result will be:

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

TRAIN_DEPTH = 42
IMPORTANT_FEATURES = [ "a", "b", "c" ]

#x, y = crunch.load_data()

def train(...):
    ...

[!TIP] You can put a @crunch/keep:on at the top of the cell and never close it to keep everything.

Ignore everything

To ignore everything when submitting, use the @crunch/keep:none command to exclude even imports, functions, and classes.

# @crunch/keep:none
from google.colab import files
files.download("test.joblib")

def score_local():
    ...

The result will be:

# from google.colab import files
# files.download("test.joblib")

# def score_local():
#     ...

[!TIP] You can put a @crunch/keep:none at the top of the cell and never close it to keep absolutly nothing.
You can put a @crunch/keep:off to restore the default commenting behavior.

Specifying package versions

Since submitting a notebook does not include a requirements.txt, users can instead specify the version of a package using import-level requirement specifiers in a comment on the same line.

# Valid statements
import pandas # == 1.3
import sklearn # >= 1.2, < 2.0
import tqdm # [foo, bar]
import sklearn # ~= 1.4.2
from requests import Session # == 1.5

Inconsistent versions

Specifying multiple times will cause the submission to be rejected if they are different.

# Inconsistant versions will be rejected
import pandas # == 1.3
import pandas # == 1.5

Standard libraries

Specifying versions on standard libraries does nothing (but they will still be rejected if there is an inconsistent version).

# Will be ignored
import os # == 1.3
import sys # == 1.5

Optional dependencies

If an optional dependency is required for the code to work properly, an import statement must be added, even if the code does not use it directly.

import castle.algorithms

# Keep me, I am needed by castle
import torch

Name conflicts

It is possible for multiple import names to resolve to different libraries on PyPI. If this happens, you must specify which one you want. If you do not want a specific version, you can use @latest, as without this, we cannot distinguish between commented code and version specifiers.

# Prefer https://pypi.org/project/EMD-signal/
import pyemd # EMD-signal @latest

# Prefer https://pypi.org/project/pyemd/
import pyemd # pyemd @latest

Ignore an import

If you do not want the process to add the package to the requirements.txt file, you can use @ignore as a version specifier.

# Ignore pandas, use already installed (if any; else, import error is expected!)
import pandas # @ignore

R imports via rpy2

For notebook users, the packages are automatically extracted from the importr("<name>") calls, which is provided by rpy2.

# Import the `importr` function
from rpy2.robjects.packages import importr

# Import the "base" R package
base = importr("base")

The following format must be followed:

  • The import must be declared at the root level.
  • The result must be assigned to a variable; the variable's name will not matter.
  • The function name must be importr, and it must be imported as shown in the example above.
  • The first argument must be a string constant, variables or other will be ignored.
  • The other arguments are ignored; this allows for custom import mapping if necessary.

The line will not be commented, read more about line commenting here.

Embedded Files

Additional files can be embedded in cells to be submitted with the Notebook. In order for the system to recognize a cell as an Embed File, the following syntax must be followed:

---
file: <file_name>.md
---

<!-- File content goes here -->
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Aenean rutrum condimentum ornare.

Submitting multiple cells with the same file name will be rejected.

While the focus is on Markdown files, any text file will be accepted. Including but not limited to: .txt, .yaml, .json, ...

User Warnings

The converter can detect potential issues that might arise when using the output files.

Nested imports

Importing a package into a function will generate a warning indicating that the package will not be used when the requirements.txt file is generated.

Global Constants

Global variables that are not guarded using the @crunch/keep:on command will likely be commented out. This means that referencing them in a nested scope without re-declaration will likely cause a guaranteed crash.

Contributing

Pull requests are always welcome! If you find any issues or have suggestions for improvements, please feel free to submit a pull request or open an issue in the GitHub repository.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crunch_convert-0.11.0.tar.gz (24.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crunch_convert-0.11.0-py3-none-any.whl (25.2 kB view details)

Uploaded Python 3

File details

Details for the file crunch_convert-0.11.0.tar.gz.

File metadata

  • Download URL: crunch_convert-0.11.0.tar.gz
  • Upload date:
  • Size: 24.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for crunch_convert-0.11.0.tar.gz
Algorithm Hash digest
SHA256 8ea53e01ee42de3c02a36a4d2c1bc8fa91393d715548aa254a49891c6dc98e8f
MD5 fd187b563d4c8ffe2023117708641268
BLAKE2b-256 359318e4cdc41d85c7cdbf0465808e3ee426f313da448c97ed4081eb0d432a00

See more details on using hashes here.

File details

Details for the file crunch_convert-0.11.0-py3-none-any.whl.

File metadata

File hashes

Hashes for crunch_convert-0.11.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0b6bbe3956dbce635bb73e40894ddd61188da929b16f5ae41dcfaea846c4b59a
MD5 fc8c95540aa2e8f9c645b24b9b1a098f
BLAKE2b-256 c0d65a2d8098cac1d0f8055c562308f5c8aad0fe2426c9bd1863eb9de1fe5704

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page