Skip to main content

Share data without data duplication using nfs4_acls and hard links

Project description

NFSv4-SHARE

Installation

Using PyPi:

pip install nfs4_share

Or install from source:

  1. clone the repository

  2. cd into the repository directory

  3. Install using:

         pip install .
    

Usage:

Run nfs4_share --help for arguments. A more detailed description can be found below

Motivation

Typically Within the high-performance clusters a bulkstorage is exposed as a mount with type Network Filestorage System version 4 (NFSv4). To protect the data only limited read access is granted and to an even lesser degree write access is granted. Playing safe can become a problem when the need arises to share the results with other researchers members. Manually providing and revoking access on an NFSv4 is possible but cumbersome and error prone. Simply copying the data to a shared location is dangerous and very quickly increases usage of expensive storage. This can be very expensive.

The NFSv4-SHARE program is build to solve this problem. It uses properties of the NFSv4 mount to prevent data duplication and makes keeping track of permissions relatively easy. Data duplication is prevented by only creating hard-links to files. Keeping track of permissions is done by wrapping the hard-linked files in a directory that all share the same permissions.

Access to the data and shares itself is controlled by NFSv4 access-control lists (ACLists). These ACLists consist of entries (ACEntries) which determine what permissions a calling user has. The main differences between ACLists and the standard POSIX permissions (i.e. rwxrwxrwx) are as follows:

  • multiple users and groups can be defined
  • more fine-grained permissions can be controlled (13 for files, 14 for directories).

A small addition has been made to also control the .htaccess file of a share to allow data sharing via an apache server.


Practical Example

Take some imaginary source data that is structured as follows:

	/data/results/
			QC.txt
	/raw_data/sample1/
			run_L1.bam
			run_L2.bam
	/shares/..

If you want to do the following:

  • create a share for project foobar under /shares
  • share the file QC.txt from directory /data/results
  • share the subdirectory sample1/ from directory /raw_data
  • provide access to user bob and alice
  • manage the share with group pmc_omics

You run the following command:

	nfs4_share create /shares/foobar \
	--users bob alice \
	--managing_groups pmc_omics \
	--items /data/results/QC.txt /raw_data/sample1

You then end up with a share and source data that is structured as follows:

	/data/results/
			QC.txt
	/raw_data/sample1/
			run_L1.bam
			run_L2.bam
	/shares/foobar/
			QC.txt
			sample1/
				run_L1.bam
				run_L2.bam

Users bob and alice could then navigate to the share at /shares/foobar to access the shared data.

When they finish or you need to recreate the share, use NFSv4-SHARE to delete the share:

	nfs4_share delete /shares/foobar

Implementation Details

The ACLists on the share directory (i.e. /shares/foobar) are the de facto share permissions.

Shared Files

Within the example above, all the files have the original ACEntries. These NEED to include reading permissions.

Shared Directories

Any subdirectories from the source that end up in foobar are different subdirectories(!). The directories from the specified source items have their tree freshly rebuild within the share. For instance, the directory /shares/foobar/sample1 is not a hard-link to /raw_data/sample1, but a remake. The directories share the name sample1 but have a different inode number and associated ACList. Within the foobar share, the ACList of directory sample1 only has the ACEntries required to have bob and alice read and index files.

Unit tests

If the source code is located on an NFSv4 mount with ACLs enabled you can run unit tests as follows:

pip install .[test]
pytest --basetemp=<NFS4_MOUNT>

If the source code is not stored on an NFSv4 mount, you should first move it to an NFSv4 mount before unit testing.

Luckily, this is already automated in the following script. It will push the source code to the Horus server and have the unittest runs there locally.

bash tests/run_test_on_remote_server.sh

Python Module Interface

If you want to programmatically call this program within python you can use something as follows:

from nfs4_share.manage import create, delete

create(share_directory="/data/isi/p/pmc_research/omics/shares/share1",
                domain="op.umcutrecht.nl",
                items=["file1.txt", "file2.txt"])

delete(share_directory="/data/isi/p/pmc_research/omics/shares/share1")

Upload new version to PyPi

This requires an account at pypi.org with access to the project.

  1. Change the version number in ./__version__.py

  2. Tag a new version

     git tag v0.1.0
    
  3. Install required packages for uploading

     pip install --upgrade setuptools wheel twine
    
  4. Build dist

     python setup.py sdist bdist_wheel
    
  5. Upload using twine

     twine upload dist/*
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nfs4-share-0.3.1.tar.gz (14.4 kB view details)

Uploaded Source

Built Distribution

nfs4_share-0.3.1-py3-none-any.whl (14.5 kB view details)

Uploaded Python 3

File details

Details for the file nfs4-share-0.3.1.tar.gz.

File metadata

  • Download URL: nfs4-share-0.3.1.tar.gz
  • Upload date:
  • Size: 14.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.6

File hashes

Hashes for nfs4-share-0.3.1.tar.gz
Algorithm Hash digest
SHA256 0265e0bff0d73dfbb6f4899884829f12c62eefb0adf8af7ada6c7c531170627b
MD5 0f3ab5c012fee32b3cecc1909ebd9f35
BLAKE2b-256 8d4e5208cbda16893a1d8f2b859b5b1a7e8eb07c3bd9c79587124ec9aea23ec0

See more details on using hashes here.

Provenance

File details

Details for the file nfs4_share-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: nfs4_share-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 14.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.6

File hashes

Hashes for nfs4_share-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a224f8fa10f8e63316f8ba69f93d613fdcf7228808e04221fb02b124e094d388
MD5 42d4be4c83c2329777430d7954d8cb9a
BLAKE2b-256 bd1baf030ed7fa82cb203e79c573400703c36dc1c234c071fd02b2289678e412

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page