Infrastructure to build LLVM IR-based Datasets.
Project description
LLVM-IR Dataset Utilities
This repository contains utilities to construct large LLVM IR datasets from multiple sources.
Getting Started
To get started with the dataset construction utilities, we'd suggest to use the packaged pipenv, or the packaged poetry to isolate the Python from your system isolation or other environments.
Pipenv
To get started with pipenv, you then have to
pipenv install
or if you seek to utilize the packaged lockfile
pipenv sync
After that you are ready to activate the environment, and install the dataset construction utilities into it
pipenv shell && pip install .
In case you want to develop the package, this becomes
pipenv shell && pip install -e .
Poetry
To get started with poetry, you then have to
poetry install
which will draw the exact software version from the packaged lockfile, and install the editable version of the dataset construction utilities into the environment. To only install the dependencies, you can run
poetry install --no-root
To then develop inside of poetry's virtual environment, we can launch a shell with
poetry shell
Creating First Data
To create your first small batch of IR data you then have to run from the root directory of the package
python3 ./llvm_ir_dataset_utils/tools/corpus_from_description.py \
--source_dir=/path/to/store/dataset/to/source \
--corpus_dir=/path/to/store/dataset/to/corpus \
--build_dir=/path/to/store/dataset/to/build \
--corpus_description=./corpus_descriptions_test/manual_tree.json
Beware! You'll need to have a version of
llvm-objcopy
on your$PATH
. If you are missingllvm-objcopy
, an easy way to obtain it is by downloading an llvm-release from either your preferred package channel such asapt
,dnf
orpacman
, or build llvm from source where only the LLVM project itself needs to be enabled during the build, i.e.-DLLVM_ENABLE_PROJECTS="llvm"
.
You'll then receive a set of .bc
files in /path/to/store/dataset/to/corpus/tree
, which you can convert with llvm-dis
into LLVM-IR, i.e. from inside of the folder
llvm-dis *.bc
Last steps into the dataloader to be described here.
Corpus Description
Basics of the corpus description to be outlined here to easily enable someone to point the package at a new source.
IR Sources
The package contains a number of builders to target the LLVM-based languages, and extract IR:
- Individual projects (C/C++)
- Rust crates
- Spack packages
- Autoconf
- Cmake
- Julia packages
- Swift packages
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file llvm_ir_dataset_utils-0.2.tar.gz
.
File metadata
- Download URL: llvm_ir_dataset_utils-0.2.tar.gz
- Upload date:
- Size: 55.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.9.19 Linux/6.5.0-1018-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e8cbd3aaed052070cdaad70a2b5e8b885814585793a7c1783b3416fb168a5d5e |
|
MD5 | 89871cda60cb0f73a0287f853b891159 |
|
BLAKE2b-256 | 5047bc60377c853d037b0f814937c3c29a5a79ec686cc4cdde459c09f5da7c16 |
File details
Details for the file llvm_ir_dataset_utils-0.2-py3-none-any.whl
.
File metadata
- Download URL: llvm_ir_dataset_utils-0.2-py3-none-any.whl
- Upload date:
- Size: 88.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.9.19 Linux/6.5.0-1018-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4195df3b304f0f80d505ec2dad74e86d67d70d1a8918669257c808f5b4d9690d |
|
MD5 | 910777d4444161db1b091520e709b5ee |
|
BLAKE2b-256 | 310f6895d74379f4c7845072696bcfa3b1ae1c5982f5d74bd7c63da2a04fc350 |