Powerful command line tools for reference management with ASReview
Project description
ASReview-datatools
This package is currently under development. See ASReview-statistics for stable version compatible with ASReview LAB <=0.19.x.
ASReview-datatools is an extension for ASReview LAB software. The extension can be used for describing and cleaning your (input) data via the command line.
Installation
The ASReview-datatools extensions requires Python 3.6+ and ASReview LAB version 1.
The easiest way to install the datatools extension is to install from PyPI:
pip install asreview-datatools
After installation of the datatools extension, asreview
should automatically
detect it. Test this by:
asreview --help
If it lists asreview data describe
, then the extension is successfully installed.
Getting started
data describe
Describe a dataset
% asreview data describe MY_DATASET.csv
Export the results to a file (output.json
)
% asreview data describe MY_DATASET.csv -o output.json
Describe the van_de_schoot_2017
dataset from the benchmark
platform.
% asreview data describe benchmark:van_de_schoot_2017 -o output.json
{
"asreviewVersion": "1.0rc2+14.gac96c1a",
"apiVersion": "0.4+4.g3f54294",
"data": {
"items": [
{
"id": "n_records",
"title": "Number of records",
"description": "The number of records in the dataset.",
"value": 6189
},
{
"id": "n_relevant",
"title": "Number of relevant records",
"description": "The number of relevant records in the dataset.",
"value": 43
},
{
"id": "n_irrelevant",
"title": "Number of irrelevant records",
"description": "The number of irrelevant records in the dataset.",
"value": 6146
},
{
"id": "n_unlabeled",
"title": "Number of unlabeled records",
"description": "The number of unlabeled records in the dataset.",
"value": 0
},
{
"id": "n_missing_title",
"title": "Number of records with missing title",
"description": "The number of records in the dataset with missing title.",
"value": 5
},
{
"id": "n_missing_abstract",
"title": "Number of records with missing abstract",
"description": "The number of records in the dataset with missing abstract.",
"value": 764
},
{
"id": "n_duplicates",
"title": "Number of duplicate records (basic algorithm)",
"description": "The number of duplicate records in the dataset based on similar text.",
"value": 104
}
]
}
}
data convert
Convert the format of a dataset. For example, convert a RIS dataset into a CSV, Excel, or TAB dataset.
asreview data convert MY_DATASET.ris MY_OUTPUT.csv
data dedup
Remove duplicate records with a simple and straightforward deduplication algorithm. The algorithm concatenates the title and abstract, whereafter it removes all non-alphanumeric tokens. Then the duplicates are removed.
asreview data dedup MY_DATASET.ris
Export the deduplicated dataset to a file (output.csv
)
asreview data dedup MY_DATASET.ris -o output.csv
Using the van_de_schoot_2017
dataset from the benchmark
platform.
asreview data dedup benchmark:van_de_schoot_2017
License
This extension is MIT licensed.
Contact
Use the issue tracker or see more contact options in the ASReview LAB repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for asreview-datatools-1.0rc1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3a8dc1550c6c24d5f109e4da1416192c3cfeeb0a86430dc7a0d95077d0aace77 |
|
MD5 | 9a446959fa951612506f59950176c710 |
|
BLAKE2b-256 | 4672560142b59ff3e0b06701ad2bdcf51ff82af814e7020bf740e5c27f5d20a4 |
Hashes for asreview_datatools-1.0rc1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 44922de15d024215a127fc4236ef36195ca73a1776ca021dbbe6aaf11def7ff7 |
|
MD5 | 875de83068c65de0f69d6d6c7f8fbcdb |
|
BLAKE2b-256 | a371408c8b25e0d53c3807b2a52938406ae4317f6ba1d6a73349f124714015df |