Datalabs
Project description
DataLab API CN
Installation
Install
```shell
pip install --upgrade pip
pip install datalabs
```
or
```shell
pip install --upgrade pip
git clone https://github.com/ExpressAI/Datalab.git
cd Datalab
pip install .
```
Dataset Operation
# pip install datalab
from datalabs import operations, load_dataset
from featurize import *
dataset = load_dataset("ag_news")
# print(task schema)
print(dataset['test']._info.task_templates)
# data operators
res = dataset["test"].apply(get_text_length)
print(next(res))
# get entity
res = dataset["test"].apply(get_entity_spacy)
print(next(res))
# get postag
res = dataset["test"].apply(get_postag_spacy)
print(next(res))
from edit import *
# add typos
res = dataset["test"].apply(add_typo)
print(next(res))
# change person name
res = dataset["test"].apply(change_person_name)
print(next(res))
Task Schema
-
text-classification
text
:strlabel
:ClassLabel
-
text-matching
text1
:strtext2
:strlabel
:ClassLabel
-
summarization
text
:strsummary
:str
-
sequence-labeling
tokens
:List[str]tags
:List[ClassLabel]
-
question-answering-extractive
:context
:strquestion
:stranswers
:List[{"text":"","answer_start":""}]
one can use dataset[SPLIT]._info.task_templates
to get more useful task-dependent information, where
SPLIT
could be train
or validation
or test
.
Supported Datasets
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
datalabs-0.1.5.dev0.tar.gz
(292.4 kB
view hashes)
Built Distribution
Close
Hashes for datalabs-0.1.5.dev0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 35b121f6b0305477aecc060c44a34c5f9b7cd79662b9a47a3102e11220d76ba7 |
|
MD5 | 292938d3371a12ce0b1da4b598c59cae |
|
BLAKE2b-256 | 21aff97b1ae5a4e24d38f3e6500100ac9f48f98dc214b753c9a2b3ed7d396012 |