Skip to main content

Scrape metadata from CVMFS Stratum servers.

Project description

CVMFS server scraper and prometheus exporter

This tool scrapes the public metadata sources from set of stratum0 and stratum1 servers. It grabs:

- cvmfs/info/v1/repositories.json 

And then for every repo it finds (that it's not told to ignore), it grabs:

- cvmfs/<repo>/.cvmfs_status.json
- cvmfs/<repo>/.cvmfspublished

Installation

pip install cvmfs-server-scraper

Usage

#!/usr/bin/env python3

import logging
from cvmfsscraper import scrape, scrape_server, set_log_level

# server = scrape_server("aws-eu-west1.stratum1.cvmfs.eessi-infra.org")

set_log_level(logging.DEBUG)

servers = scrape(
    stratum0_servers=[
        "stratum0.tld",
    ],
    stratum1_servers=[
        "stratum1-no.tld",
        "stratum1-au.tld",
    ],
    repos=[],
    ignore_repos=[],
)

# Note that the order of servers is undefined.
print(servers[0])

for repo in servers[0].repositories:
    print("Repo: " + repo.name )
    print("Root size: " + repo.root_size)
    print("Revision: " + repo.revision)
    print("Revision timestamp: " + repo.revision_timestamp)
    print("Last snapshot: " + str(repo.last_snapshot))

Note that if you are using a Stratum1 server with S3 as its backend, you need to set repos explicitly. This is because the S3 backend does not have a cvmfs/info/v1/repositories.json file. Also, the GeoAPI status will be NOT_FOUND for these servers.

# Data structure

## Server

A server object, representing a specific server that has been scraped.

````python
servers = scrape(...)
server_one = servers[0]

Name

Type: Attribute

server.name

Returns

The name of the server, usually its fully qualified domain name.

GeoApi status

Type: Attribute

server.geoapi_status

Returns

A GeoAPIstatus enum object. Defined in constants.py. The possible values are:

  • OK (0: OK)
  • LOCATION_ERROR (1: GeoApi gives wrong location)
  • NO_RESPONSE (2: No response)
  • NOT_FOUND (9: The server has no repository available so the GeoApi cannot be tested)
  • NOT_YET_TESTED (99: The server has not yet been tested)

Repositories

Type: attribute

server.repositories

Returns

A list of repository objects, sorted by name. Empty if no repositores are scraped on the server.

Ignored repositories

Type: Attribute

server.ignored_repositories

Returns

List of repositories names that are to be ignored by the scraper.

Forced repositories

Type: Attribute

server.forced_repositories

Returns

A list of repository names that the server is forced to scrape. If a repo name exists in both ignored_repositories and forced_repositories, it will be scraped.

Repository

A repository object, representing a single repository on a scraped server.

servers = scrape(...)
repo_one = servers[0].repositories[0]

Name

Type: Attribute

repo_one.name

Returns

The fully qualified name of the repository.

Server

Type: Attribute

repo_one.server

Returns

The server object to which the repository belongs.

Path

Type: Attribute

repo_one.path

Returns

The path for the repository on the server. May differ from the name. To get a complete URL, one can do:

url = "http://" + repo_one.server.name + repo_one.path

Status attributes

These attributes are populated from cvmfs_status.json:

Attribute Value
last_gc Timestamp of last garbage collection
last_snapshot Timestamp of the last snapshot

Information from .cvmfspublished is also provided. For explanations for these keys, please see CVMFS' official documentation. The field value in the table is the field key from .cvmfspublished.

Attribute  Field
alternative_name
full_name N
is_garbage_collectable G
metadata_cryptographic_hash M
micro_cataogues L
reflog_checksum_cryptographic_hash Y
revision_timestamp T
root_catalogue_ttl D
root_cryptographic_hash C
root_size B
root_path_hash  R 
signature The end signature blob
signing_certificate_cryptographic_hash X
tag_history_cryptographic_hash H

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cvmfs_server_scraper-0.0.4.tar.gz (32.8 kB view details)

Uploaded Source

Built Distribution

cvmfs_server_scraper-0.0.4-py3-none-any.whl (45.9 kB view details)

Uploaded Python 3

File details

Details for the file cvmfs_server_scraper-0.0.4.tar.gz.

File metadata

  • Download URL: cvmfs_server_scraper-0.0.4.tar.gz
  • Upload date:
  • Size: 32.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.3 Darwin/23.5.0

File hashes

Hashes for cvmfs_server_scraper-0.0.4.tar.gz
Algorithm Hash digest
SHA256 e4e8dcd563cf4f2f73f3c392d79aa38a3f40b9e7536810f892baf9cb0fff2efd
MD5 89e07747ca97a9cea45eb10bf2f73f9d
BLAKE2b-256 675fa8eef0a65e851e30ad7c81be32b255f7edef07a962c66a022eaf5e7a142c

See more details on using hashes here.

File details

Details for the file cvmfs_server_scraper-0.0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for cvmfs_server_scraper-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 db4ebd9a4689545bc1e1aed0fa141f0d87299b5d5045ef3a9387bbcc670209a9
MD5 908fd857379fb22b85270bc1cc1736ae
BLAKE2b-256 13ec62c5ba9a626cd002168ce709b2b1bab37ee6920b3a77c6dda3ff11114f31

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page