Scrape metadata from CVMFS Stratum servers.
Project description
CVMFS server scraper and prometheus exporter
This tool scrapes the public metadata sources from set of stratum0 and stratum1 servers. It grabs:
- cvmfs/info/v1/repositories.json
And then for every repo it finds (that it's not told to ignore), it grabs:
- cvmfs/<repo>/.cvmfs_status.json
- cvmfs/<repo>/.cvmfspublished
Installation
pip install cvmfs-server-scraper
Usage
#!/usr/bin/env python3
import logging
from cvmfsscraper import scrape, scrape_server, set_log_level
# server = scrape_server("aws-eu-west1.stratum1.cvmfs.eessi-infra.org")
set_log_level(logging.DEBUG)
servers = scrape(
stratum0_servers=[
"stratum0.tld",
],
stratum1_servers=[
"stratum1-no.tld",
"stratum1-au.tld",
],
repos=[],
ignore_repos=[],
)
# Note that the order of servers is undefined.
print(servers[0])
for repo in servers[0].repositories:
print("Repo: " + repo.name )
print("Root size: " + repo.root_size)
print("Revision: " + repo.revision)
print("Revision timestamp: " + repo.revision_timestamp)
print("Last snapshot: " + str(repo.last_snapshot))
Note that if you are using a Stratum1 server with S3 as its backend, you need to set repos explicitly.
This is because the S3 backend does not have a cvmfs/info/v1/repositories.json
file. Also, the GeoAPI
status will be NOT_FOUND
for these servers.
# Data structure
## Server
A server object, representing a specific server that has been scraped.
````python
servers = scrape(...)
server_one = servers[0]
Name
Type: Attribute
server.name
Returns
The name of the server, usually its fully qualified domain name.
GeoApi status
Type: Attribute
server.geoapi_status
Returns
A GeoAPIstatus enum object. Defined in constants.py
. The possible values are:
- OK (0: OK)
- LOCATION_ERROR (1: GeoApi gives wrong location)
- NO_RESPONSE (2: No response)
- NOT_FOUND (9: The server has no repository available so the GeoApi cannot be tested)
- NOT_YET_TESTED (99: The server has not yet been tested)
Repositories
Type: attribute
server.repositories
Returns
A list of repository objects, sorted by name. Empty if no repositores are scraped on the server.
Ignored repositories
Type: Attribute
server.ignored_repositories
Returns
List of repositories names that are to be ignored by the scraper.
Forced repositories
Type: Attribute
server.forced_repositories
Returns
A list of repository names that the server is forced to scrape. If a repo name exists in both ignored_repositories and forced_repositories, it will be scraped.
Repository
A repository object, representing a single repository on a scraped server.
servers = scrape(...)
repo_one = servers[0].repositories[0]
Name
Type: Attribute
repo_one.name
Returns
The fully qualified name of the repository.
Server
Type: Attribute
repo_one.server
Returns
The server object to which the repository belongs.
Path
Type: Attribute
repo_one.path
Returns
The path for the repository on the server. May differ from the name. To get a complete URL, one can do:
url = "http://" + repo_one.server.name + repo_one.path
Status attributes
These attributes are populated from cvmfs_status.json
:
Attribute | Value |
---|---|
last_gc | Timestamp of last garbage collection |
last_snapshot | Timestamp of the last snapshot |
Information from .cvmfspublished
is also provided. For explanations for these keys, please see CVMFS' official documentation. The field value in the table is the field key from .cvmfspublished
.
Attribute | Field |
---|---|
alternative_name | A |
full_name | N |
is_garbage_collectable | G |
metadata_cryptographic_hash | M |
micro_cataogues | L |
reflog_checksum_cryptographic_hash | Y |
revision_timestamp | T |
root_catalogue_ttl | D |
root_cryptographic_hash | C |
root_size | B |
root_path_hash | R |
signature | The end signature blob |
signing_certificate_cryptographic_hash | X |
tag_history_cryptographic_hash | H |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for cvmfs_server_scraper-0.0.4.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | e4e8dcd563cf4f2f73f3c392d79aa38a3f40b9e7536810f892baf9cb0fff2efd |
|
MD5 | 89e07747ca97a9cea45eb10bf2f73f9d |
|
BLAKE2b-256 | 675fa8eef0a65e851e30ad7c81be32b255f7edef07a962c66a022eaf5e7a142c |
Hashes for cvmfs_server_scraper-0.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | db4ebd9a4689545bc1e1aed0fa141f0d87299b5d5045ef3a9387bbcc670209a9 |
|
MD5 | 908fd857379fb22b85270bc1cc1736ae |
|
BLAKE2b-256 | 13ec62c5ba9a626cd002168ce709b2b1bab37ee6920b3a77c6dda3ff11114f31 |