Skip to main content

Python SDK for DSDL

Project description

 
OpenDataLab website HOT     
 

English | 简体中文

PyPI - Python Version PyPI docsdev workflowstage & preview workflow

📘Documentation |

Introduction

Data is the cornerstone of artificial intelligence. The efficiency of data acquisition, exchange, and application directly impacts the advances in technologies and applications. Over the long history of AI, a vast quantity of data sets have been developed and distributed. However, these datasets are defined in very different forms, which incurs significant overhead when it comes to exchange, integration, and utilization -- it is often the case that one needs to develop a new customized tool or script in order to incorporate a new dataset into a workflow.

To overcome such difficulties, we develop Data Set Description Language (DSDL).

Major features

The design of DSDL is driven by three goals, namely generic, portable, extensible. We refer to these three goals together as GPE.

  • Generic

    This language aims to provide a unified representation standard for data in multiple fields of artificial intelligence, rather than being designed for a single field or task. It should be able to express data sets with different modalities and structures in a consistent format.

  • Portable

    Write once, distribute everywhere. Dataset descriptions can be widely distributed and exchanged, and used in different environments without modification of the source files. The achievement of this goal is crucial for creating an open and thriving ecosystem. To this end, we need to carefully examine the details of the design, and remove unnecessary dependencies on specific assumptions about the underlying facilities or organizations.

  • Extensible

    One should be able to extend the boundary of expression without modifying the core standard. For a programming language such as C++ or Python, its application boundaries can be significantly extended by libraries or packages, while the core language remains stable over a long period. Such libraries and packages form a rich ecosystem, making the language stay alive for a very long time.

Installation

Case a install it with pip

pip install dsdl

Case b install it from source

git clone https://github.com/opendatalab/dsdl.git
cd dsdl
python setup.py install

Get Started

Use dsdl parser to deserialize the Yaml file to Python code

dsdl parse --yaml demo/coco_demo.yaml

Modify the configuration & set the directory of media in dataset

Create a configuration file config.py with the following contents(for now dsdl only reading from aliyun oss or local is supported):

local = dict(
    type="LocalFileReader",
    working_dir="local path of your media",
)

ali_oss = dict(
    type="AliOSSFileReader",
    access_key_secret="your secret key of aliyun oss",
    endpoint="your endpoint of aliyun oss",
    access_key_id="your access key of aliyun oss",
    bucket_name="your bucket name of aliyun oss",
    working_dir="the relative path of your media dir in the bucket")

In config.py, the configuration of how to read the media in a dataset is defined. One should specify the arguments depending on from where to read the media:

  1. read from local: working_dir field in local should be specified (the directory of local media)
  2. read from aliyun oss: all the field in ali_oss should be specified (including access_key_secret, endpoint, access_key_id, bucket_name, working_dir)

Visualize samples

dsdl view -y <yaml-name>.yaml -c <config.py> -l ali-oss -n 10 -r -v -f Label BBox Attributes

The description of each argument is shown below:

simplified argument argument description
-y --yaml The path of dsdl yaml file.
-c --config The path of location configuration file.
-l --location local or ali-oss,which means read media from local or aliyun oss.
-n --num The number of samples to be visualized.
-r --random Whether to load the samples in a random order.
-v --visualize Whether to visualize the samples or just print the information in console.
-f --field The field type to visualize, e.g. -f BBoxmeans show the bounding box in samples, -f Attributesmeans show the attributes of a sample in the console . One can specify multiple field types simultaneously, such as -f Label BBox Attributes.
-t --task The task you are working on, for example, -t detection is equivalent to -f Label BBox Polygon Attributes.

Citation

If you find this project useful in your research, please consider cite:

@misc{dsdl2022,
    title={{DSDL}: Data Set Description Language},
    author={DSDL Contributors},
    howpublished = {\url{https://github.com/opendatalab/dsdl}},
    year={2022}
}

License

DSDL is released under the Apache 2.0 license.

Acknowledgement

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dsdl-0.1.10.1.tar.gz (47.6 kB view details)

Uploaded Source

Built Distributions

dsdl-0.1.10.1-py3.8.egg (15.5 MB view details)

Uploaded Source

dsdl-0.1.10.1-py3-none-any.whl (62.9 kB view details)

Uploaded Python 3

File details

Details for the file dsdl-0.1.10.1.tar.gz.

File metadata

  • Download URL: dsdl-0.1.10.1.tar.gz
  • Upload date:
  • Size: 47.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.7.0

File hashes

Hashes for dsdl-0.1.10.1.tar.gz
Algorithm Hash digest
SHA256 5c152ff36dbd347bfa734946f350d0d7906cf4c8e66e10538da9148613e1b92e
MD5 82ceb1d560475670ad635cbed38344ee
BLAKE2b-256 2c5825e466fb24076946ccae8fc8d8b7eaf08aa657072dd1cbf482ab421c59b3

See more details on using hashes here.

File details

Details for the file dsdl-0.1.10.1-py3.8.egg.

File metadata

  • Download URL: dsdl-0.1.10.1-py3.8.egg
  • Upload date:
  • Size: 15.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.7.0

File hashes

Hashes for dsdl-0.1.10.1-py3.8.egg
Algorithm Hash digest
SHA256 4798ab2080502a63ae553e69eec680f4382d420d325b94685a1bb503ff466e40
MD5 254bfe2384ff1ccbefd6c3b46dc7ba45
BLAKE2b-256 cddd24dd7935822b3bbfccd8acdc4ebb4090eac6239506d26a130202843e56e6

See more details on using hashes here.

File details

Details for the file dsdl-0.1.10.1-py3-none-any.whl.

File metadata

  • Download URL: dsdl-0.1.10.1-py3-none-any.whl
  • Upload date:
  • Size: 62.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.7.0

File hashes

Hashes for dsdl-0.1.10.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a2238134677b2215488bdccb6895a3ce55f8133ea450636e767b2313d132458a
MD5 c1320d2ec2c73b1d6cd425c1d30e769e
BLAKE2b-256 52ec7816fba7406726b859e5e0dd9b09136ef976edf1c534d49a915b17c80e28

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page