Skip to main content

Sparse Virtual File System Cache implemented in C++.

Project description

Sparse Virtual File

Introduction

Sometimes you don’t need the whole file. Sometimes you don’t want the whole file. Especially if it is huge and on some remote server. But, you might know what parts of the file that you want and svfsc can help you store them locally so it looks as if you have access to the complete file but with just the pieces of interest.

svfsc is targeted at reading very large binary files such as TIFF, RP66V1, HDF5 where the structure is well known. For example you might want to parse a TIFF file for its metadata or for a particular image tile or strip which is a tiny fraction of the file itself.

svfsc implements a Sparse Virtual File, a specialised in-memory cache where a particular file might not be available but parts of it can be obtained without reading the whole file. A Sparse Virtual File (SVF) is represented internally as a map of blocks of data with the key being their file offsets. Any write to an SVF will coalesce these blocks where possible. There is no cache punting strategy implemented so an SVF always accumulates data. A Sparse Virtual File System (SVFS) is an extension of this to provide a key/value store where the key is a file ID and the value a Sparse Virtual File.

svfsc is written in C++ with a Python interface. It is thread safe in both domains.

A SVF might be used like this:

  • The user requests some data (for example TIFF metadata) from a remote file using a Parser that knows the TIFF structure.

  • The Parser consults the SVF, if the SVF has the data then the Parser parses it and gives the results to the user.

  • If the SVF does not have the data then the Parser consults the SVF for what data is needed, then issues the appropriate GET request(s) to the remote server.

  • That data is used to update the SVF, then the parser can use it and give the results to the user.

Here is a conceptual example of a SVF running on a local file system containing data from a single file.

            CLIENT SIDE           |             LOCAL FILE SYSTEM
                                  .
/------\      /--------\          |              /-------------\
| User | <--> | Parser | <-- read(fpos, len) --> | File System |
\------/      \--------/          |              \-------------/
                   |              .
                   |              |
               /-------\          .
               |  SVF  |          |
               \-------/          .

Here is a conceptual example of an SVFS running with a remote file system.

            CLIENT SIDE           |             SERVER SIDE
                                  .
/------\      /--------\          |             /--------\
| User | <--> | Parser | <-- GET(fpos, len) --> | Server |
\------/      \--------/          |             \--------/
                   |              .                  |
                   |              |                  |
               /-------\          .           /-------------\
               |  SVF  |          |           | File System |
               \-------/          .           \-------------/

Example Python Usage

Installation

Install from pypi:

$ pip install svfsc

Using a Single SVF

This shows the basic functionality: write(), read() and need():

import svfsc

# Construct a Sparse Virtual File
svf = svfsc.cSVF('Some file ID')

# Write six bytes at file position 14
svf.write(14, b'ABCDEF')

# Read from it
svf.read(16, 2) # Returns b'CD'

# What do I have to do to read 24 bytes from file position 8?
# This returns a tuple of pairs ((file_position, read_length), ...)
svf.need(8, 24) # Returns ((8, 6), (20, 4))
# Go and get the data from those file positions and write it to
# the SVF then you can read directly from the SVF.

The basic operation is to check if the SVF has data, if not then get it and write that data to the SVF. Then read directly:

if not svf.has_data(file_position, length):
    for read_position, read_length in svf.need(file_position, length):
        # Somehow get the data as a bytes object at (read_position, read_length)...
        # This could be a GET request to a remote file.
        # Then...
        svf.write(read_position, data)
# Now read directly
svf.read(file_position, length)

A Sparse Virtual File System

The example above uses a single Sparse Virtual File, but you can also create a Sparse Virtual File System. This is a key/value store where the key is some string and the value a SVF:

import svfsc

svfs = svfsc.cSVFS()

# Insert an empty SVF with a corresponding ID
ID = 'abc'
svfs.insert(ID)

# Write six bytes to that SVF at file position 14
svfs.write(ID, 14, b'ABCDEF')

# Read from the SVF
svfs.read(ID, 16, 2) # Returns b'CD'

# What do I have to do to read 24 bytes from file position 8
# from that SVF?
svfs.need(ID, 8, 24) # Returns ((8, 6), (20, 4))

Example C++ Usage

svfsc is written in C++ so can be used directly:

#include "svf.h"

// File modification time of 1672574430.0 (2023-01-01 12:00:30)
SVFS::SparseVirtualFile svf("Some file ID", 1672574430.0);

// Write six char at file position 14
svf.write(14, "ABCDEF", 6);

// Read from it
char read_buffer[2];
svf.read(16, 2, read_buffer);
// read_buffer now contains "CD"

// What do I have to do to read 24 bytes from file position 8?
// This returns a std::vector<std::pair<size_t, size_t>>
// as ((file_position, read_length), ...)
auto need = svf.need(8, 24);

// The following prints ((8, 6), (20, 4),)
std::cout << "(";
for (auto &val: need) {
    std::cout << "(" << val.first << ", " << val.second << "),";
}
std::cout << ")" << std::endl;

Documentation

Build the documentation from the docs directory or find it on readthedocs: https://svfsc.readthedocs.io/

Acknowledgments

Many thanks to my employer Paige.ai for allowing me to release this as FOSS software.

History

0.4.1 (2025-03-24)

  • Documentation improvements.

  • Add AWS cost to simulator.

  • Add svfsc.cSVF.clear()

  • Support for Python 3.8, 3.9, 3.10, 3.11, 3.12, 3.13.

  • Development Status :: 5 - Production/Stable

0.4.0 (2024-02-11)

  • Add counters for blocks/bytes erased and blocks/bytes punted and then their associated APIs.

  • Use the SVFS_SVF_METHOD_SIZE_T_REGISTER macro in CPython to simplify registering CPython methods.

  • Fix builds on Linux, mainly compiler flags.
    • Move to -std=c++17 to exploit [[nodiscard]].

    • Better alignment of compiler flags between CMakeLists.txt and setup.py

  • Other minor fixes.

  • Because of the extensive use of this in various projects this version 0.4 is moved to production status: Development Status :: 5 - Production/Stable

0.3.0 (2024-01-06)

  • Add need_many().

  • Fix bug in lru_punt().

  • Development Status :: 4 - Beta

0.2.2 (2023-12-28)

  • Minor fixes.

  • Development Status :: 4 - Beta

0.2.1 (2023-12-27)

  • Include stub file.

  • Development Status :: 4 - Beta

0.2.0 (2023-12-24)

  • Add cache punting.

  • Make C docstrings type parsable (good for Sphinx) and add a script that can create a mypy stub file.

  • Development Status :: 4 - Beta

0.1.2 (2023-10-03)

  • First release.

  • Development Status :: 3 - Alpha

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

svfsc-0.4.1.tar.gz (47.4 kB view details)

Uploaded Source

Built Distributions

svfsc-0.4.1-cp313-cp313-macosx_10_13_universal2.whl (125.6 kB view details)

Uploaded CPython 3.13 macOS 10.13+ universal2 (ARM64, x86-64)

svfsc-0.4.1-cp312-cp312-macosx_10_13_universal2.whl (125.6 kB view details)

Uploaded CPython 3.12 macOS 10.13+ universal2 (ARM64, x86-64)

svfsc-0.4.1-cp311-cp311-macosx_10_9_universal2.whl (126.4 kB view details)

Uploaded CPython 3.11 macOS 10.9+ universal2 (ARM64, x86-64)

svfsc-0.4.1-cp310-cp310-macosx_10_9_universal2.whl (126.3 kB view details)

Uploaded CPython 3.10 macOS 10.9+ universal2 (ARM64, x86-64)

svfsc-0.4.1-cp39-cp39-macosx_10_9_universal2.whl (126.3 kB view details)

Uploaded CPython 3.9 macOS 10.9+ universal2 (ARM64, x86-64)

svfsc-0.4.1-cp38-cp38-macosx_11_0_universal2.whl (125.6 kB view details)

Uploaded CPython 3.8 macOS 11.0+ universal2 (ARM64, x86-64)

File details

Details for the file svfsc-0.4.1.tar.gz.

File metadata

  • Download URL: svfsc-0.4.1.tar.gz
  • Upload date:
  • Size: 47.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for svfsc-0.4.1.tar.gz
Algorithm Hash digest
SHA256 d3659dc4e59ac3c79a9a2895d6d72a139b305c62db25dd5a0ae24894a58b5d61
MD5 e5192fcb65b6119235bcc5e96d6c03c4
BLAKE2b-256 eb032fb57caf45acd18487f4fad2a090743500619773a67a0391c1df63dcd241

See more details on using hashes here.

File details

Details for the file svfsc-0.4.1-cp313-cp313-macosx_10_13_universal2.whl.

File metadata

File hashes

Hashes for svfsc-0.4.1-cp313-cp313-macosx_10_13_universal2.whl
Algorithm Hash digest
SHA256 b36dd97eecf5bd17db121c942bb6ba265343a99c288dcd42e37bf84fcb82e397
MD5 986b15e883b42d88ae38cd8c66b481fc
BLAKE2b-256 7bdf6bb98773ad9e29843d763fac4e3548db94c9a98a2a9dfdc48ce1b8212f54

See more details on using hashes here.

File details

Details for the file svfsc-0.4.1-cp312-cp312-macosx_10_13_universal2.whl.

File metadata

File hashes

Hashes for svfsc-0.4.1-cp312-cp312-macosx_10_13_universal2.whl
Algorithm Hash digest
SHA256 576653ea0596e03ab1acba9bdbc5e498da8b6fd1ee0c543005230d6c0b1dee66
MD5 9446b91692d98391fa18369f96921977
BLAKE2b-256 05e4a6065dd26c08aa2acf5de7c279e4f5477338a8710e704289e857523bc0bb

See more details on using hashes here.

File details

Details for the file svfsc-0.4.1-cp311-cp311-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for svfsc-0.4.1-cp311-cp311-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 22b63f04e8a15ccecac0c62b304799c4a0ee440aea5366a1134b68fb13e99951
MD5 56345e68ecd57d15a145c56c8109cc02
BLAKE2b-256 96bd4e09dc573ce63cfbe448f33ba7ea7e15526ed617fe4b9e6542fbdaab2831

See more details on using hashes here.

File details

Details for the file svfsc-0.4.1-cp310-cp310-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for svfsc-0.4.1-cp310-cp310-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 1d0698d4c9206607e5ac69581be51020494e24949eb94ad03ceb1f3b763cc0ce
MD5 3027d05352d140bea5733621583d2f31
BLAKE2b-256 e669207bf06830ce5c4afc7190d18353866152fe8a2079d3d69fe082a4067ba0

See more details on using hashes here.

File details

Details for the file svfsc-0.4.1-cp39-cp39-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for svfsc-0.4.1-cp39-cp39-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 6a5240b46cf5acf58ca8a816e31d7e8de25e5543ccd80bab1a7b96fe7367144d
MD5 27a7d7ef05002eee16d7b97762987641
BLAKE2b-256 aaa783f9eacaa4a8743e6e1a42d5a0c05838f6bb807ac4b49a69904e26af5d06

See more details on using hashes here.

File details

Details for the file svfsc-0.4.1-cp38-cp38-macosx_11_0_universal2.whl.

File metadata

File hashes

Hashes for svfsc-0.4.1-cp38-cp38-macosx_11_0_universal2.whl
Algorithm Hash digest
SHA256 4646cca05f0eea951dd6ee360d6e0f4166bc35530fd142798f3e59e4cc8d19cd
MD5 02f50221d48747ccbc571f85e199f43c
BLAKE2b-256 056e7a8d20c34b9317d0c24f8520ae65fc1ca67a1a704b0832d4c52a9ff27a1f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page