An input set generator for R2C

These details have not been verified by PyPI

Project links

Homepage

License
- Other/Proprietary License
Operating System
- OS Independent
Programming Language

Project description

Input Set Generator

This is the input set generator for the R2C platform.

Installation

To install, simply pip install r2c-inputset-generator. Then run r2c-isg to load the shell.

Note: This application caches HTTP requests to the various package registries in the terminal's current directory. Be sure to navigate to an appropriate directory before loading the shell, or use the command set-api --nocache inside the shell.

Quick Start

Try the following command sequences:

Load the top 4,000 pypi projects by downloads in the last 365 days, sort by descending number of downloads, trim to the top 100 most downloaded, download project metadata and all versions, and generate an input set json.
```
  load pypi list top4kyear
  sort "desc download_count"
  trim 100
  get -mv all
  set-meta -n test -v 1.0
  export inputset.json
```

Load all npm projects, sample 100, download the latest versions, and generate an input set json.

  load npm list allbydependents
  sample 100
  get -v latest
  set-meta -n test -v 1.0
  export inputset.json

Load a csv containing github urls and commit hashes, get project metadata and the latest versions, generate an input set json of type GitRepoCommit, remove all versions, and generate an input set json of type GitRepo.
```
  load --columns "url v.commit" github file list_of_github_urls_and_commits.csv
  get -mv latest
  set-meta -n test -v 1.0
  export inputset_1.json
  trim -v 0
  export inputset_2.json
```
Load a list of github repos from an organization name.
```
  load github org netflix
```

Shell Usage

Input/Output

load (OPTIONS) [noreg | github | npm | pypi] [WEBLIST_NAME | FILEPATH.csv]
Generates a dataset from a weblist or a local file. The following weblists are available:
- Github: top1kstarred, top1kforked; the top 1,000 most starred or forked repos
- NPM: allbydependents; all packages, sorted from most to fewest dependents count (caution: 1M+ projects... handle with care)
- Pypi: top4kmonth and top4kyear; the top 4,000 most downloaded projects in the last 30/365 days
Options:
-c --columns "string of col names": A space-separated list of column names in a csv. Overrides default columns (name and version), as well as any headers listed in the file (headers in files begin with a '!'). The CSV reader recognizes the following column keywords: name, url, org, v.commit, v.version. All other columns are read in as project or version attributes.
Example usage: --headers "name url downloads v.commit v.date".
backup (FILEPATH.p)
Backs up the dataset to a pickle file (defaults to ./dataset_name.p).
restore FILEPATH.p
Restores a dataset from a pickle file.
import [noreg | github | npm | pypi] FILEPATH.json
Builds a dataset from an R2C input set.
export (FILEPATH.json)
Exports a dataset to an R2C input set (defaults to ./dataset_name.json).

Data Acquisition

get (OPTIONS)
Downloads project and version metadata from Github/NPM/Pypi.

Options:
-m --metadata: Gets metadata for all projects.
-v --versions [all | latest]: Gets historical versions for all projects.

Transformation

trim (OPTIONS) N
Trims the dataset to n projects or n versions per project.

Options
-v --versions: Binary flag; trims on versions instead of projects.
sample (OPTIONS) N
Samples n projects or n versions per project.

Options
-v --versions: Binary flag; sample versions instead of projects.
sort "[asc, desc] attributes [...]"
Sorts the projects and versions based on a space-separated string of keywords. Valid keywords are:
- Any project attributes
- Any version attributes (prepend "v." to the attribute name)
- Any uuids (prepend "uuids." to the uuid name
- Any meta values (prepend "meta." to the meta name
- The words "asc" and "desc"
All values are sorted in ascending order by default. The first keyword in the string is the primary sort key, the next the secondary, and so on.

Example: The string "uuids.name meta.url downloads desc v.version_str v.date" would sort the dataset by ascending project name, url, and download count; and descending version string and date (assuming those keys exist).

Settings

set-meta (OPTIONS)
Sets the dataset's metadata.

Options:
-n --name NAME: Input set name. Must be set before the dataset can be exported.
-v --version VERSION: Input set version. Must be set before the dataset can be exported.
-d --description DESCRIPTION: Description string.
-r --readme README: Markdown-formatted readme string.
-a --author AUTHOR: Author name; defaults to git user.name.
-e --email EMAIL: Author email; defaults to git user.email.
set-api (OPTIONS)
--cache_dir CACHE_DIR: The path to the requests cache; defaults to ./.requests_cache.
--cache_timeout DAYS: The number of days before a cached request goes stale.
--nocache: Binary flag; disables request caching for this dataset.
--github_pat GITHUB_PAT: A github personal access token, used to increase the max allowed hourly request rate from 60/hr to 5,000/hr. For instructions on how to obtain a token, see: https://help.github.com/en/articles/creating-a-personal-access-token-for-the-command-line.

Visualization

show
Converts the dataset to a json file and loads it in the system's native json viewer.

Python Project

You can also import the package into your own project. Just import the Dataset structure, initialize it, and you're good to go!

from r2c_isg.structures import Dataset

ds = Dataset.import_inputset(
    'file.csv' ~or~ 'weblist_name',
    registry='github' ~or~ 'npm' ~or~ 'pypi',
    cache_dir=path/to/cache/dir,      # optional; overrides ./.requests_cache
    cache_timeout=int(days_in_cache), # optional; overrides 1 week cache timeout
    nocache=True,                     # optional; disables caching
    github_pat=your_github_pat        # optional; personal access token for github api
)

ds.get_projects_meta()

ds.get_project_versions(historical='all' ~or~ 'latest')

ds.trim(
    n,
    on_versions=True	# optional; defaults to False
)

ds.sample(
    n,
    on_versions=True	# optional; defaults to False
)

ds.sort('string of sort parameters')

ds.update(**{'name': 'you_dataset_name', 'version': 'your_dataset_version'})

ds.export_inputset('your_inputset.json')

Troubleshooting

If you run into any issues, you can run the shell with the --debug flag enabled to get a full error message. Then reach out to support@ret2.co with the stack trace and the steps to reproduce the error.

Note: If the issue is related to the "sample" command, be sure to seed the random number generator to ensure reproducibility.

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- Other/Proprietary License
Operating System
- OS Independent
Programming Language

Release history Release notifications | RSS feed

This version

0.3.2

Mar 13, 2020

0.3.1

Jan 29, 2020

0.3.0

Jan 24, 2020

0.2.7

Jan 29, 2020

0.2.6

Oct 9, 2019

0.2.5

Sep 19, 2019

0.2.4

Aug 2, 2019

0.2.3

Aug 2, 2019

0.2.2

Jul 31, 2019

0.2.1

Jul 31, 2019

0.2.0

Jul 31, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

r2c-inputset-generator-0.3.2.tar.gz (29.0 kB view details)

Uploaded Mar 13, 2020 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

r2c_inputset_generator-0.3.2-py3-none-any.whl (39.9 kB view details)

Uploaded Mar 13, 2020 Python 3

File details

Details for the file r2c-inputset-generator-0.3.2.tar.gz.

File metadata

Download URL: r2c-inputset-generator-0.3.2.tar.gz
Upload date: Mar 13, 2020
Size: 29.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.0.0b5 CPython/3.7.2 Darwin/19.3.0

File hashes

Hashes for r2c-inputset-generator-0.3.2.tar.gz
Algorithm	Hash digest
SHA256	`7928fcab2c05b70c9084a799d0cb70357aa53154bd15b4987e20e1b7c38dde54`
MD5	`05a19a1722d82742f0b6adc90783e93a`
BLAKE2b-256	`d4e2a60c2f4c9a5e4357bb7c068754ccdc47bf7d89f014676646bf9bc0329a20`

See more details on using hashes here.

File details

Details for the file r2c_inputset_generator-0.3.2-py3-none-any.whl.

File metadata

Download URL: r2c_inputset_generator-0.3.2-py3-none-any.whl
Upload date: Mar 13, 2020
Size: 39.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.0.0b5 CPython/3.7.2 Darwin/19.3.0

File hashes

Hashes for r2c_inputset_generator-0.3.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6fea53f3a6b95e533f81aa2159eb0c04874b609b879fd5055bfad3d392db6bbb`
MD5	`d88a07bf739ba4e58324b4969d0dc460`
BLAKE2b-256	`abaad4e563e91671986ce24fb3d8284506167b3127cf48f36525ada57e1a06a1`

See more details on using hashes here.

r2c-inputset-generator 0.3.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Input Set Generator

Installation

Quick Start

Shell Usage

Input/Output

Data Acquisition

Transformation

Settings

Visualization

Python Project

Troubleshooting

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes