Skip to main content

Scrape metadata from CVMFS Stratum servers.

Project description

CVMFS server scraper and prometheus exporter

This tool scrapes the public metadata sources from set of stratum0 and stratum1 servers. It grabs:

- cvmfs/info/v1/repositories.json 

And then for every repo it finds (that it's not told to ignore), it grabs:

- cvmfs/<repo>/.cvmfs_status.json
- cvmfs/<repo>/.cvmfspublished

Usage

#!/usr/bin/env python3

from cvmfsscraper.main import scrape, scrape_server

# server = scrape_server("aws-eu-west1.stratum1.cvmfs.eessi-infra.org")

servers = scrape(
    servers = [
        "aws-eu-west1.stratum1.cvmfs.eessi-infra.org",
        "bgo-no.stratum1.cvmfs.eessi-infra.org",
    ],
    ignore_repos = [
        "ci.eessi-hpc.org",
    ],
)

print(servers[0])

for repo in servers[0].repositories:
    print("Repo: " + repo.name )
    print("Root size: " + repo.root_size)
    print("Revision: " + repo.revision)
    print("Revision timestamp: " + repo.revision_timestamp)
    print("Last snapshot: " + str(repo.last_snapshot))

Data structure

Server

A server object, representing a specific server that has been scraped.

servers = scrape(...)
server_one = servers[0]

Name

Type: Attribute

server.name

Returns

The name of the server, usually its fully qualified domain name.

GeoApi status

Type: Attribute

server.geoapi_status

Returns

An integer value within [0, 1, 2, 9], with the following meaning:

  • 0 : OK
  • 1 : GeoApi gives wrong location
  • 2 : No response
  • 9 : The server has no repository available so the GeoApi cannot be tested

Repositories

Type: attribute

server.repositories

Returns

A list of repository objects, empty if no repositores are scraped on the server.

Ignored repositories

Type: Attribute

server.ignored_repositories

Returns:

List of repositories names that are to be ignored by the scraper.

Forced repositories

Type: Attribute

server.forced_repositories

Returns

A list of repository names that the server is forced to scrape. If a repo name exists in both ignored_repositories and forced_repositories, it will be scraped.

Repository

A repository object, representing a single repository on a scraped server.

servers = scrape(...)
repo_one = servers[0].repositories[0]

Name

Type: Attribute

repo_one.name

Returns

The fully qualified name of the repository.

Server

Type: Attribute

repo_one.server

Returns

The server object to which the repository belongs.

Path

Type: Attribute

repo_one.path

Returns

The path for the repository on the server. May differ from the name. To get a complete URL, one can do:

url = "http://" + repo_one.server.name + repo_one.path

Status attributes:

These attributes are populated from cvmfs_status.json:

Attribute Value
last_gc Timestamp of last garbage collection
last_snapshot Timestamp of the last snapshot

Information from .cvmfspublished is also provided. For explanations for these keys, please see CMVFS' official documentation. The field value in the table is the field key from .cvmfspublished.

Attribute  Field
alternative_name
full_name N
is_garbage_collectable G
metadata_cryptographic_hash M
micro_cataogues L
reflog_checksum_cryptographic_hash Y
revision_timestamp T
root_catalogue_ttl D
root_cryptographic_hash C
root_size B
root_path_hash  R 
signature The end signature blob
signing_certificate_cryptographic_hash X
tag_history_cryptographic_hash H

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cvmfs-server-scraper-0.0.1.tar.gz (14.4 kB view hashes)

Uploaded Source

Built Distribution

cvmfs_server_scraper-0.0.1-py2.py3-none-any.whl (14.6 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page