Scrape metadata from CVMFS Stratum servers.
Project description
CVMFS server scraper and prometheus exporter
This tool scrapes the public metadata sources from set of stratum0 and stratum1 servers. It grabs:
- cvmfs/info/v1/repositories.json
And then for every repo it finds (that it's not told to ignore), it grabs:
- cvmfs/<repo>/.cvmfs_status.json
- cvmfs/<repo>/.cvmfspublished
Installation
pip install cvmfs-server-scraper
Usage
#!/usr/bin/env python3
import logging
from cvmfsscraper import scrape, scrape_server, set_log_level
# server = scrape_server("aws-eu-west1.stratum1.cvmfs.eessi-infra.org")
set_log_level(logging.DEBUG)
servers = scrape(
stratum0_servers=[
"stratum0.tld",
],
stratum1_servers=[
"stratum1-no.tld",
"stratum1-au.tld",
],
repos=[],
ignore_repos=[],
)
# Note that the order of servers is undefined.
print(servers[0])
for repo in servers[0].repositories:
print("Repo: " + repo.name )
print("Root size: " + repo.root_size)
print("Revision: " + repo.revision)
print("Revision timestamp: " + repo.revision_timestamp)
print("Last snapshot: " + str(repo.last_snapshot))
Note that if you are using a Stratum1 server with S3 as its backend, you need to set repos explicitly.
This is because the S3 backend does not have a cvmfs/info/v1/repositories.json
file. Also, the GeoAPI
status will be NOT_FOUND
for these servers.
# Data structure
## Server
A server object, representing a specific server that has been scraped.
````python
servers = scrape(...)
server_one = servers[0]
Name
Type: Attribute
server.name
Returns
The name of the server, usually its fully qualified domain name.
GeoApi status
Type: Attribute
server.geoapi_status
Returns
A GeoAPIstatus enum object. Defined in constants.py
. The possible values are:
- OK (0: OK)
- LOCATION_ERROR (1: GeoApi gives wrong location)
- NO_RESPONSE (2: No response)
- NOT_FOUND (9: The server has no repository available so the GeoApi cannot be tested)
- NOT_YET_TESTED (99: The server has not yet been tested)
Repositories
Type: attribute
server.repositories
Returns
A list of repository objects, sorted by name. Empty if no repositores are scraped on the server.
Ignored repositories
Type: Attribute
server.ignored_repositories
Returns
List of repositories names that are to be ignored by the scraper.
Forced repositories
Type: Attribute
server.forced_repositories
Returns
A list of repository names that the server is forced to scrape. If a repo name exists in both ignored_repositories and forced_repositories, it will be scraped.
Repository
A repository object, representing a single repository on a scraped server.
servers = scrape(...)
repo_one = servers[0].repositories[0]
Name
Type: Attribute
repo_one.name
Returns
The fully qualified name of the repository.
Server
Type: Attribute
repo_one.server
Returns
The server object to which the repository belongs.
Path
Type: Attribute
repo_one.path
Returns
The path for the repository on the server. May differ from the name. To get a complete URL, one can do:
url = "http://" + repo_one.server.name + repo_one.path
Status attributes
These attributes are populated from cvmfs_status.json
:
Attribute | Value |
---|---|
last_gc | Timestamp of last garbage collection |
last_snapshot | Timestamp of the last snapshot |
Information from .cvmfspublished
is also provided. For explanations for these keys, please see CVMFS' official documentation. The field value in the table is the field key from .cvmfspublished
.
Attribute | Field |
---|---|
alternative_name | A |
full_name | N |
is_garbage_collectable | G |
metadata_cryptographic_hash | M |
micro_cataogues | L |
reflog_checksum_cryptographic_hash | Y |
revision_timestamp | T |
root_catalogue_ttl | D |
root_cryptographic_hash | C |
root_size | B |
root_path_hash | R |
signature | The end signature blob |
signing_certificate_cryptographic_hash | X |
tag_history_cryptographic_hash | H |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file cvmfs_server_scraper-0.0.4.tar.gz
.
File metadata
- Download URL: cvmfs_server_scraper-0.0.4.tar.gz
- Upload date:
- Size: 32.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.12.3 Darwin/23.5.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e4e8dcd563cf4f2f73f3c392d79aa38a3f40b9e7536810f892baf9cb0fff2efd |
|
MD5 | 89e07747ca97a9cea45eb10bf2f73f9d |
|
BLAKE2b-256 | 675fa8eef0a65e851e30ad7c81be32b255f7edef07a962c66a022eaf5e7a142c |
File details
Details for the file cvmfs_server_scraper-0.0.4-py3-none-any.whl
.
File metadata
- Download URL: cvmfs_server_scraper-0.0.4-py3-none-any.whl
- Upload date:
- Size: 45.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.12.3 Darwin/23.5.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | db4ebd9a4689545bc1e1aed0fa141f0d87299b5d5045ef3a9387bbcc670209a9 |
|
MD5 | 908fd857379fb22b85270bc1cc1736ae |
|
BLAKE2b-256 | 13ec62c5ba9a626cd002168ce709b2b1bab37ee6920b3a77c6dda3ff11114f31 |