Skip to main content

container database (cdb) metadata generation tool.

Project description

Container Database (cdb)

This is the Python support tool for containerdb to support generation of data containers. Python is more friendly to generating arbitrary data structures, and is popular among the data science community, so I chose it for metadata generation instead of using GoLang.

PyPI version

Generation works as follows:

  1. The library will take as input some data folder
  2. The user defines a function to parse each file and generate a dataset, or a default function is used
  3. A GoLang template is generated to be compiled along with containerdb to generate a container entrypoint and in-memory database.

Docker Usage

The intended usage is via Docker, so you don't need to worry about installation of Python, GoLang, and multistage builds to basically:

  1. Generate a db.go template
  2. Compile it
  3. Add to scratch with data as data container entrypoint.

Thus, to run the dummy example here using the Dockerfile:

$ docker build -t data-container .

And then run to see a basic print of the data added (these functions need to be further developed to have an interface to query data, and also extract more useful metadata.

docker run data-container
$ docker run data-container
value is {"size": 9, "sha256": "327bf8231c9572ecdfdc53473319699e7b8e6a98adf0f383ff6be5b46094aba4"}
value is {"size": 8, "sha256": "3b7721618a86990a3a90f9fa5744d15812954fba6bb21ebf5b5b66ad78cf5816"}

Python Usage

The above doesn't require you to install the Container Database (cdb) metadata generator, however if you want to (to develop or otherwise interact) you can do the following. First, install cdb from pypi or a local repository:

$ pip install cdb

or

git clone git@github.com:singularityhub/cdb
cd cdb
pip install -e .

Command Line

The next step is to generate the goLang file to compile. You'll next want to change directory to somewhere you have a dataset folder. For example, in tests we have a dummy "data" folder.

cd tests/

We might then run cdb generate to create a binary for our container, targeting the tests/data folder:

$ cdb generate data --out db.go

The db.go file is then in the present working directory. You can either build it during a multistage build as is done in the Dockerfile, or do it locally with your own GoLang install and then add to the container. For example, to compile:

go get github.com/singularityhub/containerdb && \
GOOS=linux GOARCH=amd64 go build -ldflags="-w -s" -o /db -i /db.go

And then a very basic Dockerfile would need to add the data at the path specified, and the compiled entrypoint.

FROM scratch
WORKDIR /data
COPY data/ .
COPY db /db
CMD ["/db"]

A more useful entrypoint will be developed soon! This is just a very basic start to the library.

Python

You can run the same generation functions interactively with Python.

from cdb.main import ContainerDatabase
db = ContainerDatabase(path="data")
# <cdb.main.ContainerDatabase at 0x7fcaa9cb8950>

View that there is a files generator at db.files

db.files
<generator object recursive_find at 0x7fcaaa4ae950>

And then generate! If you don't provide an output file, a string will be returned. Otherwise, the output file name is returned.

output = db.generate(output="db.go", force=True)

Currently, functions for parsing metadata are named in cdb/functions.py, however you can also define a custom import path. This has not yet been tested and will be soon.

under development

TODO

  • add build prefix
  • ensure cdb installed from pip

License

  • Free software: MPL 2.0 License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cdb-0.0.0.tar.gz (19.4 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page