Mount Barecat archives with FUSE.

These details have not been verified by PyPI

Project description

Mounting Barecat archives via FUSE

Background

Barecat is a simple and highly scalable aggregate storage format for storing many (tens of millions and more) small files, with focus on fast random access and minimal overhead. You can think of it as a filesystem-in-file, or as a key-value store. Data is stored sequentially in a flat file (or multiple shard files) and an SQLite database is used to index the data. The index is used to quickly locate the data of a file by its path and to provide directory listings, file statistics, and other metadata. It can handle at least tens of millions of files and terabytes of data, even over 100k files in single directories. Directory listing is written to produce the results in a streaming fashion, so entries will start appearing even in huge directories fairly quickly.

Barecat archives can be mounted via FUSE, allowing it to be used like a filesystem locally. This is useful for browsing the contents of the archive, for reading and writing files. This is mostly for inspecting the data and making smaller changes, but for the main workload (e.g. training a deep learning model), you should use the Python API, which is more efficient as it directly accesses the data without the overhead of FUSE.

Installation

sudo apt-get install libfuse-dev  # or its equivalent with other package managers
pip install git+https://github.com/isarandi/barecat-mount.git

Usage

# readonly:
barecat-mount mydata.barecat mountpoint/

# read-write:
barecat-mount --writable mydata.barecat mountpoint/

# unmount:
fusermount -u mountpoint/
# or
umount mountpoint/

A Note on Fragmentation

Since Barecat always adds new files at the end of the archive, many deletions and insertions will lead to fragmentation. The general idea is to write once, read many times, and do deletions only when you need to fix a mistake. There is basic heuristic auto-defragmentation that can be enabled as follows:

barecat-mount --writable --enable-defrag mydata.barecat mountpoint/

This way, the filesystem will periodically defragment itself after significant amount of deletions. You can also perform a defrag with:

barecat-defrag mydata.barecat

This will go in sequence and move all the files towards the beginning of the archive, leaving no gaps. This may take very long, since even closing one byte gap requires moving all the following data. A quick option is available with:

barecat-defrag --quick mydata.barecat

This will proceed backwards, starting from the end of the archive, and will move each file into the first available gap, counted from the beginning of the archive (first-fit). The algorithm stops after meeting the first file that has no gap that can fit it.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.3

May 19, 2025

This version

0.1.2

Mar 22, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

barecat_mount-0.1.2.tar.gz (21.0 kB view details)

Uploaded Mar 22, 2025 Source

File details

Details for the file barecat_mount-0.1.2.tar.gz.

File metadata

Download URL: barecat_mount-0.1.2.tar.gz
Upload date: Mar 22, 2025
Size: 21.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for barecat_mount-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`64bd35d644b068ec91f2a836521c780963a6d72cb511c22978ea20c0f34fbbf3`
MD5	`8d50cb5e81cb20993a2278d4c430d2b4`
BLAKE2b-256	`06b7da64a11a44032facce43ecea8a824bc85df01fc3284d169f063507d5a0cf`

See more details on using hashes here.

Provenance

The following attestation bundles were made for barecat_mount-0.1.2.tar.gz:

Publisher: python-publish.yml on isarandi/barecat-mount

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: barecat_mount-0.1.2.tar.gz
- Subject digest: 64bd35d644b068ec91f2a836521c780963a6d72cb511c22978ea20c0f34fbbf3
- Sigstore transparency entry: 186666208
- Sigstore integration time: Mar 22, 2025
Source repository:
- Permalink: isarandi/barecat-mount@152fd89ab0b014cc29c5039147ac2d5bc8ef0332
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/isarandi
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@152fd89ab0b014cc29c5039147ac2d5bc8ef0332
- Trigger Event: release

barecat-mount 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers