Random Access Read-Only Tar Mount

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
License
- OSI Approved :: MIT License
Operating System
- POSIX
- Unix
Programming Language
- Python :: 3
Topic
- System :: Archiving

Project description

Random Access Read-Only Tar Mount (Ratarmount)

Combines the random access indexing idea from tarindexer and then mounts the TAR using fusepy for easy read-only access just like archivemount. It also will mount TARs inside TARs inside TARs, ... recursively into folders of the same name, which is useful for the ImageNet data set. Furthermore, it now has support for BZip2 compressed TAR archives provided by indexed_bzip2, a refactored and extended version of bzcat from toybox, and support for Gzip compressed TAR archives provided by the indexed_gzip dependency.

Installation
Usage
The Problem
The Solution
Benchmarks

Installation

You can simply install it from PyPI:

pip install ratarmount

Or, if you want to test the latest development version on a Debian-like system:

sudo apt-get update
sudo apt-get install python3 python3-pip git
git clone https://github.com/mxmlnkn/ratarmount.git
python3 -m pip install --user .
ratarmount --help

You can also simply download ratarmount.py and call it directly but then BZip2 support will not work and you will have to install the dependencies manually, so at least pip3 install --user fusepy.

If you want to use other serialization backends instead of the default SQLite one, then either install those packages manually or install ratarmount by specifying the legacy-serializers feature:

pip install ratarmount[legacy-serializers]

Usage

usage: ratarmount.py [-h] [-f] [-d DEBUG] [-c] [-r] [-s SERIALIZATION_BACKEND]
                     [-p PREFIX] [--fuse FUSE]
                     tar-file-path [mount-path]

If no mount path is specified, then the tar will be mounted to a folder of the
same name but without a file extension. TAR files contained inside the tar and
even TARs in TARs in TARs will be mounted recursively at folders of the same
name barred the file extension '.tar'. In order to reduce the mounting time,
the created index for random access to files inside the tar will be saved to
<path to tar>.index.<backend>[.<compression]. If it can't be saved there, it
will be saved in ~/.ratarmount/<path to tar: '/' ->
'_'>.index.<backend>[.<compression].

positional arguments:
  tar-file-path         The path to the TAR archive to be mounted.
  mount-path            The path to a folder to mount the TAR contents into.
                        (default: None)

optional arguments:
  -h, --help            show this help message and exit
  -f, --foreground      Keeps the python program in foreground so it can print
                        debug output when the mounted path is accessed.
                        (default: False)
  -d DEBUG, --debug DEBUG
                        Sets the debugging level. Higher means more output.
                        Currently, 3 is the highest. (default: 1)
  -c, --recreate-index  If specified, pre-existing .index files will be
                        deleted and newly created. (default: False)
  -r, --recursive       Mount TAR archives inside the mounted TAR recursively.
                        Note that this only has an effect when creating an
                        index. If an index already exists, then this option
                        will be effectively ignored. Recreate the index if you
                        want change the recursive mounting policy anyways.
                        (default: False)
  -s SERIALIZATION_BACKEND, --serialization-backend SERIALIZATION_BACKEND
                        (deprecated) Specify which library to use for writing
                        out the TAR index. Supported keywords: (none,pickle,pi
                        ckle2,pickle3,custom,cbor,msgpack,rapidjson,ujson,simp
                        lejson,sqlite)[.(lz4,gz)] (default: sqlite)
  -p PREFIX, --prefix PREFIX
                        The specified path to the folder inside the TAR will
                        be mounted to root. This can be useful when the
                        archive as created with absolute paths. E.g., for an
                        archive created with `tar -P cf
                        /var/log/apt/history.log`, -p /var/log/apt/ can be
                        specified so that the mount target directory
                        >directly< contains history.log. (default: )
  --fuse FUSE           Comma separated FUSE options. See "man mount.fuse" for
                        help. Example: --fuse
                        "allow_other,entry_timeout=2.8,gid=0". (default: )

Index files are if possible created to / if existing loaded from these file locations in order:

<path to tar>.index.<serialization backend>
~/.tarmount/<path to tar: '/' -> '_'>.index.<serialization backend>

The Problem

You downloaded a large TAR file from the internet, for example the 1.31TB large ImageNet, and you now want to use it but lack the space, time, or a file system fast enough to extract all the 14.2 million image files.

Partial Solutions

Archivemount

Archivemount seems to have large performance issues for too many files for both mounting and file access in version 0.8.7. A more in-depth comparison benchmark can be found here.

Mounting the 6.5GB ImageNet Large-Scale Visual Recognition Challenge 2012 validation data set, and then testing the speed with: time cat mounted/ILSVRC2012_val_00049975.JPEG | wc -c takes 250ms for archivemount and 2ms for ratarmount.
Trying to mount the 150GB ILSVRC object localization data set containing 2 million images was given up upon after 2 hours. Ratarmount takes ~15min to create a ~150MB index and <1ms for opening an already created index (SQLite database) and mounting the TAR. In contrast, archivemount will take the same amount of time even for subsequent mounts.
Does not support recursive mounting. Although, you could write a script to stack archivemount on top of archivemount for all contained TAR files.

Tarindexer

Tarindex is a command line to tool written in Python which can create index files and then use the index file to extract single files from the tar fast. However, it also has some caveats which ratarmount tries to solve:

It only works with single files, meaning it would be necessary to loop over the extract-call. But this would require loading the possibly quite large tar index file into memory each time. For example for ImageNet, the resulting index file is hundreds of MB large. Also, extracting directories will be a hassle.
It's difficult to integrate tarindexer into other production environments. Ratarmount instead uses FUSE to mount the TAR as a folder readable by any other programs requiring access to the contained data.
Can't handle TARs recursively. In order to extract files inside a TAR which itself is inside a TAR, the packed TAR first needs to be extracted.

TAR Browser

I didn't find out about TAR Browser before I finished the ratarmount script. That's also one of it's cons:

Hard to find. I don't seem to be the only one who has trouble finding it as it has zero stars on Github after 4 years compared to 29 stars for tarindexer after roughly the same amount of time.
Hassle to set up. Needs compilation and I gave up when I was instructed to set up a MySQL database for it to use. Confusingly, the setup instructions are not on its Github but here.
Doesn't seem to support recursive TAR mounting. I didn't test it because of the MysQL dependency but the code does not seem to have logic for recursive mounting.

Pros:

supports bz2- and xz-compressed TAR archives

The Solution

Ratarmount creates an index file with file names, ownership, permission flags, and offset information to be stored at the TAR file's location or inside ~/.ratarmount/ and then offers a FUSE mount integration for easy access to the files.

The test with the first version (50e8dbb), which used pickle serialization, for the ImageNet data set is promising:

TAR size: 1.31TB
Contains TARs: yes
Files in TAR: ~26 000
Files in TAR (including recursively in contained TARs): 14.2 million
Index creation (first mounting): 4 hours
Index size: 1GB
Index loading (subsequent mounting): 80s
Reading a 40kB file: 100ms (first time) and 4ms (subsequent times)

The reading time for a small file simply verifies the random access by using file seek to be working. The difference between the first read and subsequent reads is not because of ratarmount but because of operating system and file system caches.

Here is a more recent test for version 0.2.0 with the new default SQLite backend:

TAR size: 124GB
Contains TARs: yes
Files in TAR: 1000
Files in TAR (including recursively in contained TARs): 1.26 million
Index creation (first mounting): 15m 39s
Index size: 146MB
Index loading (subsequent mounting): 0.000s
Reading a 64kB file: ~4ms
Running 'find mountPoint -type f | wc -l' (1.26M stat calls): 1m 50s

Benchmarks

During the making of this project several benchmarks were created. These can be viewed here. These are some of the things benchmarked and compared there:

Memory and runtime comparisons of backends for saving the index with offsets
Comparison of SQLite table designs
Mounting and file access time comparison between archivemount and ratarmount

Benchmark comparison between ratarmount and archivemount

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
License
- OSI Approved :: MIT License
Operating System
- POSIX
- Unix
Programming Language
- Python :: 3
Topic
- System :: Archiving

Release history Release notifications | RSS feed

1.2.1

Nov 20, 2025

1.2.0

Aug 16, 2025

1.1.2

Aug 1, 2025

1.1.1

Jul 23, 2025

1.1.0

Jun 21, 2025

1.0.0

Nov 1, 2024

0.15.2

Sep 1, 2024

0.15.1

Jun 2, 2024

0.15.0

Apr 7, 2024

0.14.2

Apr 6, 2024

0.14.1

Feb 23, 2024

0.14.0

Sep 3, 2023

0.13.0

Feb 19, 2023

0.12.0

Nov 13, 2022

0.11.3

Jun 25, 2022

0.11.2

May 27, 2022

0.11.1

Apr 10, 2022

0.11.0

Apr 6, 2022

0.10.0

Jan 15, 2022

0.9.3

Dec 21, 2021

0.9.2

Nov 28, 2021

0.9.1

Sep 26, 2021

0.9.0

Sep 16, 2021

0.8.1

Jul 11, 2021

0.8.0

Jun 27, 2021

0.7.0

Dec 20, 2020

0.6.1

Oct 2, 2020

0.6.0 yanked

Oct 2, 2020

Reason this release was yanked:

ratarmount can't be called from the command line anymore

0.5.0

May 8, 2020

This version

0.4.1

Apr 10, 2020

0.4.0

Dec 15, 2019

0.3.4

Dec 6, 2019

0.3.3

Dec 5, 2019

0.3.2

Dec 1, 2019

0.3.1

Nov 23, 2019

0.3.0

Nov 23, 2019

0.2.0

Nov 17, 2019

0.1.0

Nov 14, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ratarmount-0.4.1.tar.gz (23.1 kB view details)

Uploaded Apr 10, 2020 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ratarmount-0.4.1-py3-none-any.whl (23.0 kB view details)

Uploaded Apr 10, 2020 Python 3

File details

Details for the file ratarmount-0.4.1.tar.gz.

File metadata

Download URL: ratarmount-0.4.1.tar.gz
Upload date: Apr 10, 2020
Size: 23.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.6.3 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.5.6

File hashes

Hashes for ratarmount-0.4.1.tar.gz
Algorithm	Hash digest
SHA256	`d3520660159e40c892582f7649fb450d4cb37ec44a005009604c2b95812cd8e7`
MD5	`32a2ed964650ca20ebac6bea90789b30`
BLAKE2b-256	`0629d83854b0bdcd11dfc14f6aba0199f0491c45f9e37f660c2e53fdca2673a3`

See more details on using hashes here.

File details

Details for the file ratarmount-0.4.1-py3-none-any.whl.

File metadata

Download URL: ratarmount-0.4.1-py3-none-any.whl
Upload date: Apr 10, 2020
Size: 23.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.6.3 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.5.6

File hashes

Hashes for ratarmount-0.4.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e5d0ff8d005fb36a3073efe5b48c92421df6732a945bdf41f6657f1a41283b9b`
MD5	`ba0af563ed7cae3209bcca992db4635a`
BLAKE2b-256	`08602084880e9bf67cfdbae91590c41449149c80984b124222d8d0fecf2c4d74`

See more details on using hashes here.

ratarmount 0.4.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Random Access Read-Only Tar Mount (Ratarmount)

Table of Contents

Installation

Usage

The Problem

Partial Solutions

Archivemount

Tarindexer

TAR Browser

The Solution

Benchmarks

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes