Skip to main content

Powerful command line tools for reference management with ASReview

Project description

ASReview-datatools

This package is currently under development. See ASReview-statistics for stable version compatible with ASReview LAB <=0.19.x.

Deploy and release Build status DOI

ASReview-datatools is an extension for ASReview LAB software. The extension can be used for describing and cleaning your (input) data via the command line.

Installation

The ASReview-datatools extensions requires Python 3.6+ and ASReview LAB version 1.

The easiest way to install the datatools extension is to install from PyPI:

pip install asreview-datatools

After installation of the datatools extension, asreview should automatically detect it. Test this by:

asreview --help

If it lists asreview data describe, then the extension is successfully installed.

Getting started

data describe

Describe a dataset

% asreview data describe MY_DATASET.csv

Export the results to a file (output.json)

% asreview data describe MY_DATASET.csv -o output.json

Describe the van_de_schoot_2017 dataset from the benchmark platform.

% asreview data describe benchmark:van_de_schoot_2017 -o output.json
{
  "asreviewVersion": "1.0rc2+14.gac96c1a",
  "apiVersion": "0.4+4.g3f54294",
  "data": {
    "items": [
      {
        "id": "n_records",
        "title": "Number of records",
        "description": "The number of records in the dataset.",
        "value": 6189
      },
      {
        "id": "n_relevant",
        "title": "Number of relevant records",
        "description": "The number of relevant records in the dataset.",
        "value": 43
      },
      {
        "id": "n_irrelevant",
        "title": "Number of irrelevant records",
        "description": "The number of irrelevant records in the dataset.",
        "value": 6146
      },
      {
        "id": "n_unlabeled",
        "title": "Number of unlabeled records",
        "description": "The number of unlabeled records in the dataset.",
        "value": 0
      },
      {
        "id": "n_missing_title",
        "title": "Number of records with missing title",
        "description": "The number of records in the dataset with missing title.",
        "value": 5
      },
      {
        "id": "n_missing_abstract",
        "title": "Number of records with missing abstract",
        "description": "The number of records in the dataset with missing abstract.",
        "value": 764
      },
      {
        "id": "n_duplicates",
        "title": "Number of duplicate records (basic algorithm)",
        "description": "The number of duplicate records in the dataset based on similar text.",
        "value": 104
      }
    ]
  }
}

data convert

Convert the format of a dataset. For example, convert a RIS dataset into a CSV, Excel, or TAB dataset.

asreview data convert MY_DATASET.ris MY_OUTPUT.csv

data dedup

Remove duplicate records with a simple and straightforward deduplication algorithm. The algorithm concatenates the title and abstract, whereafter it removes all non-alphanumeric tokens. Then the duplicates are removed.

asreview data dedup MY_DATASET.ris

Export the deduplicated dataset to a file (output.csv)

asreview data dedup MY_DATASET.ris -o output.csv

Using the van_de_schoot_2017 dataset from the benchmark platform.

asreview data dedup benchmark:van_de_schoot_2017

License

This extension is MIT licensed.

Contact

Use the issue tracker or see more contact options in the ASReview LAB repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

asreview-datatools-1.0rc1.tar.gz (22.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

asreview_datatools-1.0rc1-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file asreview-datatools-1.0rc1.tar.gz.

File metadata

  • Download URL: asreview-datatools-1.0rc1.tar.gz
  • Upload date:
  • Size: 22.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.13

File hashes

Hashes for asreview-datatools-1.0rc1.tar.gz
Algorithm Hash digest
SHA256 3a8dc1550c6c24d5f109e4da1416192c3cfeeb0a86430dc7a0d95077d0aace77
MD5 9a446959fa951612506f59950176c710
BLAKE2b-256 4672560142b59ff3e0b06701ad2bdcf51ff82af814e7020bf740e5c27f5d20a4

See more details on using hashes here.

File details

Details for the file asreview_datatools-1.0rc1-py3-none-any.whl.

File metadata

File hashes

Hashes for asreview_datatools-1.0rc1-py3-none-any.whl
Algorithm Hash digest
SHA256 44922de15d024215a127fc4236ef36195ca73a1776ca021dbbe6aaf11def7ff7
MD5 875de83068c65de0f69d6d6c7f8fbcdb
BLAKE2b-256 a371408c8b25e0d53c3807b2a52938406ae4317f6ba1d6a73349f124714015df

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page