Skip to main content

Script for creating SQL files from video files

Project description

dataset2database

supported versions Tweet

About

The package is made as a solution when using video inputs in Machine Learning models. As extracting and storing frames in .JPEG/.PNG files will quickly increase the memory requirements and more importantly the number of inodes, the package provides a convenient alternative. Video frames are stored as blobs at database file .db which can be read as quickly as the .JPEG files but without the additional large memory requirements.

Currently supported video formats include .mp4,mpeg-4,.avi,.wmv. If you have a different extension, you can simply change the script to include them (in the dataset2database/jpgs2singlefile.py)


Package requirements

The two required packages are opencv for image/frame loading and numpy for array manipulation. Make sure that both are installed before running any functions.

Multiprocessing: The code uses multiprocessing for improving speeds, thus the total time required for the conversion varies across different processors. The code has been tested on an AMD Threadripper 2950X with an average conversion time of 48 minutes for ~500K videos.


Dataset structure

The package assumes a fixed dataset structure such as:

<dataset>    
  │
  └──<class 1>
  │     │
  │     │─── <video_data_1.mp4>
  │     │─── <video_data_2.mp4>
  │     │─── ...
  │    ...      
  │
  └───<class 2>
  │      │
  │      │─── <video_data_1.mp4>
  │      │─── <video_data_2.mp4>
  │      │─── ...
 ...    ...


Usage

The main code is at the jpgs2single.py file. To run the convertor simply call the convert function with the base directory of the dataset and the destination directory for where to save the generated databases.

from dataset2database import convert
#or
from jpgs2singlefile import convert

convert(my_dataset_dir, my_target_dir)

! Please not that you need to use a "/" for Unix-based systems or a "//" for Windows-based systems alongside your my_dataset_dir.


Frames.db files

Video frames are stored in frames.db files with their video name and frame number as their ObjID and the frames array are stored as blobs. The name format is basically <video_name>/frame _ [frame number in 5-digit format]

dataset2database

File viewer: If you want to ensure that everything has been converted correctly, you can use SQLiteStudio which provides an easy to use multi-platform interface (available for Windows, Mac and Ubuntu).


Database loading

Loading the database can easily be done with an SQL SELECT command based on a list of all frames with specified ObjIds. Then, with the help of np.fromstring() and cv2.imdecode() functions the images can be again converted to uint8 arrays.

An example of data loading in python can be found below:

import sqlite3
import cv2
import numpy as np

con = sqlite3.connect('my_video_database.db')
cur = con.cursor()


# retrieve entire video from database (frames are unordered)
frame_names = ["{}/{}".format(my_path.split('/')[-1],'frame_%05d'%(index+1)) for index in frame_indices]
sql = "SELECT Objid, frames FROM Images WHERE ObjId IN ({seq})".format(seq=','.join(['?']*len(frame_names)))
row = cur.execute(sql,frame_names)

ids = []
frames = []
i = 0

row = row.fetchall()
# Video order re-arangement
for ObjId, item in row:
  #--- Decode blob
  nparr  = np.fromstring(item, np.uint8)
  img = cv2.imdecode(nparr, cv2.IMREAD_COLOR)
  ids.append(ObjId)
  frames.append(img)
  i+=1

# Ensuring correct order of frames
frames = [frame for _, frame in sorted(zip(ids,frames), key=lambda pair: pair[0])]

# (if required) array conversion [frames x height x width x channels]
frames = np.asarray(frames)

cur.close()
con.close()

Installation through git

Please make sure, Git is installed in your machine:

$ sudo apt-get update
$ sudo apt-get install git
$ git clone https://github.com/alexandrosstergiou/dataset2database.git
$ cd dataset2database
$ pip install .

You can then use it as any other package installed through pip.


Installation through pip

The latest stable release is also available for download through pip

$ pip install dataset2database

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataset2database-1.1.tar.gz (5.7 kB view details)

Uploaded Source

Built Distribution

dataset2database-1.1-py3-none-any.whl (18.4 kB view details)

Uploaded Python 3

File details

Details for the file dataset2database-1.1.tar.gz.

File metadata

  • Download URL: dataset2database-1.1.tar.gz
  • Upload date:
  • Size: 5.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.6.10

File hashes

Hashes for dataset2database-1.1.tar.gz
Algorithm Hash digest
SHA256 d3814297f7b0d1633294aaf2990463062f89ecaadf36c4514828141a95a0dc81
MD5 c0c129d601900a30c983620ea7f2ac36
BLAKE2b-256 53298464f1824a7a62f1a65b1cdf892e5fdf6bfd985fd4872adb14d06464e85e

See more details on using hashes here.

File details

Details for the file dataset2database-1.1-py3-none-any.whl.

File metadata

  • Download URL: dataset2database-1.1-py3-none-any.whl
  • Upload date:
  • Size: 18.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.6.10

File hashes

Hashes for dataset2database-1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f1327141a317b86c032b5a4d149ef095d742a6027c2f356a931113eb36db88ef
MD5 71168ea0f7c242a8be99d0c0f03de031
BLAKE2b-256 fea5e6a67962cb5085f7ea77dd91fefd9ccc1d767b3837e73b8f15f40dbd14b6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page