Skip to main content

APIs to access the PyMigBench dataset

Project description

PyMigBench is a benchmark of Python Library Migrations. This repository contains the data and the code the library that can be used to access the dataset.

Dataset

PyMigBench v2

The current version, PyMigBench-2.0, includes 3,096 migration-related code changes from 335 migrations between 141 analogous library pairs. This includes all migrations from PyMigBench v1 and additional migrations borrowed from the SALM dataset. The data also includes additional information per migration-related code change compared to v1.

The dataset is published through the FSE 2024 paper titled Characterizing Python Library Migrations. We will add the citation info once it is available. Release 2.0.2 points to the exact dataset linked to the paper. The data is also permanently archived in figshare. Use either of these links to reproduce the paper.

We may update this repository to correct any mistakes or add more data and it may not synch with the paper. For, the latest data, use the latest release in this repository.

PyMigBench v1

We recommend using PyMigBench v2 for any new research. However, you want to use the v1 dataset, you should look at Release 1.0.3. Cite the paper below if you use the v1 dataset.

@INPROCEEDINGS{pymigbench,
  author={Islam, Mohayeminul and Jha, Ajay Kumar and Nadi, Sarah and Akhmetov, Ildar},
  booktitle={2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)}, 
  title={PyMigBench: A Benchmark for Python Library Migration}, 
  year={2023},
  volume={},
  number={},
  pages={511-515},
  doi={10.1109/MSR59073.2023.00075}
}

Library

Installation

The library and the dataset should be at the same version to be compatible. To install the library, run:

pip install pymigbench==<version>

Basic usage

To use the library, you need to have the dataset downloaded. You can download the dataset from the GitHub repository.

from pymigbench.database import Database
from pathlib import Path

yaml_root = Path('repo-root/migration/')

db = Database.load_from_dir(yaml_root)  # Load the dataset from the directory
migs = db.migs()  # Get all the migrations

The constants

There are several enums to help you work with the dataset: They are all in the pymigbench.constants module. Example:

from pymigbench.constants import ProgramElement

The migration-related objects

There are three main classes to encapsulate the data: Migration, MigrationFile, and CodeChange.

Migration is the top level class representing one single migration, ie, one yaml file. Migration has a list of MigrationFile objects, which represent the files that were changed in the migration. MigrationFile has a list of CodeChange objects, which represent a single migration-related code change. Each of these model classes has an id() method that returns a unique identifier for the object across the full dataset. CodeChange object additionally has an index property and a id_in_file() method, which are unique within container file. Each of the classes has some additional helper methods.

Contributors

For any queries, please contact mohayemin@ualberta.ca.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pymigbench-2.2.4.tar.gz (9.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pymigbench-2.2.4-py3-none-any.whl (9.6 kB view details)

Uploaded Python 3

File details

Details for the file pymigbench-2.2.4.tar.gz.

File metadata

  • Download URL: pymigbench-2.2.4.tar.gz
  • Upload date:
  • Size: 9.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for pymigbench-2.2.4.tar.gz
Algorithm Hash digest
SHA256 a40305953218ffc8c085b560fca479b7dcdfe4c0911db2ac915454cf03914a00
MD5 228f261f3b69ca5def332bcb57f7c91b
BLAKE2b-256 df734923523f7a96b8a30c9d483b4d994231f65659fbf3ca0596a1a4fe56a0b5

See more details on using hashes here.

File details

Details for the file pymigbench-2.2.4-py3-none-any.whl.

File metadata

  • Download URL: pymigbench-2.2.4-py3-none-any.whl
  • Upload date:
  • Size: 9.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for pymigbench-2.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 c59cfa4c71d352c8f272e2206b1eaec515ca63af766c2789778ca1e5f0c7631b
MD5 c7b6d808ff5ab17cee3f8777666bc4bd
BLAKE2b-256 d1a2e19a4be17c87e06df25a7213fde70b62153e5d8d0ce950d3295a8048393e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page