Python package for accessing CrUX (Chrome User Experience Report) cached data
Project description
crux-cache Python Package
Python package for accessing CrUX (Chrome User Experience Report) cached data from the crux-cache repository.
Installation
pip install crux-cache
Quick Start
from crux_cache import CruxCache
# Initialize the client
cache = CruxCache()
# List available datasets
datasets = cache.list_datasets()
for ds in datasets:
print(f"{ds['id']}: {ds['latest_origins']} origins")
# Iterate over the latest global dataset
for origin, rank in cache.get_dataset('global'):
print(f"{origin}: {rank}")
Usage Examples
Filter by Rank (Top Domains)
Use max_rank to filter domains with rank ≤ max_rank. Valid values: 1000, 5000, 10000, 50000, 100000, 500000, 1000000, 5000000, 10000000, 50000000.
from crux_cache import CruxCache
cache = CruxCache()
# Get top 1k domains from the latest US dataset
for origin, rank in cache.get_dataset('us', max_rank=1000):
print(f"{origin}: {rank}")
# Get top 5k domains (includes top 1k)
for origin, rank in cache.get_dataset('global', max_rank=5000):
print(f"{origin}: {rank}")
# Get top 1 million domains from global dataset
for origin, rank in cache.get_dataset('global', max_rank=1000000):
print(f"{origin}: {rank}")
Access Specific Months
from crux_cache import CruxCache
cache = CruxCache()
# List available months for a dataset
months = cache.list_months('de')
print(f"Available months: {', '.join(months)}")
print(f"Latest month: {months[-1]}")
# Get a specific month
for origin, rank in cache.get_dataset('global', month='202510'):
print(f"{origin}: {rank}")
Cache Management
from crux_cache import CruxCache
# Use default cache (.crux in current directory)
cache = CruxCache()
# Use a custom cache directory
cache = CruxCache(cache_dir='/tmp/crux')
# Set custom metadata TTL (in seconds)
cache = CruxCache(metadata_ttl=3600) # 1 hour
# Clear the cache
cache.clear_cache()
Features
- Automatic caching with configurable TTL
- Access global and country-specific datasets (us, de, jp)
- Filter by rank value to get top domains (e.g., top 1k, 5k, 1M)
- Access current or historical data by month
- Simple API with sensible defaults
API Reference
CruxCache
Main client for accessing CrUX cached data.
__init__(cache_dir=".crux", metadata_ttl=86400)
Initialize the client.
cache_dir: Cache directory (default:.crux)metadata_ttl: Metadata cache TTL in seconds (default: 86400 = 1 day)
list_datasets() -> List[Dict]
List all available datasets with their metadata (id, name, total_months, earliest_month, latest_month, latest_origins, total_size).
list_months(dataset_type: str) -> List[str]
List available months for a dataset in YYYYMM format.
get_dataset(dataset_type: str, month: Optional[str] = None, max_rank: Optional[int] = None) -> CruxDataset
Get an iterator for a specific dataset and month. Returns all domains where rank ≤ max_rank.
Parameters:
dataset_type: 'global', 'us', 'de', or 'jp'month: YYYYMM format (e.g., '202510'). Defaults to latest monthmax_rank: Filter by rank (1000, 5000, 10000, 50000, 100000, 500000, 1000000, etc.)
Returns: Iterator yielding (origin, rank) tuples
clear_cache()
Clear all cached files. Metadata and CSV files will be re-downloaded on next access.
CruxDataset
Iterator that yields (origin, rank) tuples when iterating.
Data Format
Each iteration yields a tuple of:
origin(str): Full URL (e.g.,https://www.google.com)rank(int): Popularity bucket (1000, 10000, 100000, 1000000, etc.)
Caching Behavior
- Metadata files (datasets.json, manifest.json): Cached with TTL (default: 1 day)
- CSV chunks: Cached indefinitely (reused across sessions)
- Cache location:
.crux/in current directory (configurable) - Clear cache: Use
cache.clear_cache()to remove all cached files
Requirements
- Python 3.7+
- requests >= 2.25.0
License
MIT License - See LICENSE
CrUX data provided by Google under CrUX Dataset Terms
Links
- Main Repository: https://github.com/lonetis/crux-cache
- PyPI Package: https://pypi.org/project/crux-cache/
- Web Interface: https://lonetis.github.io/crux-cache
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file crux_cache-1.0.0.tar.gz.
File metadata
- Download URL: crux_cache-1.0.0.tar.gz
- Upload date:
- Size: 9.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fda51e4da49126c49ecfed4e4dfd0d932375fbf4f5147315352f726cd7deb277
|
|
| MD5 |
f443b591e3ddc766cfda78d565616a1f
|
|
| BLAKE2b-256 |
6ad62677f2351b0dcfac55d166e60bfaed7d2bb316dc40e18bf64cd1525b966d
|
Provenance
The following attestation bundles were made for crux_cache-1.0.0.tar.gz:
Publisher:
publish-python.yml on lonetis/crux-cache
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
crux_cache-1.0.0.tar.gz -
Subject digest:
fda51e4da49126c49ecfed4e4dfd0d932375fbf4f5147315352f726cd7deb277 - Sigstore transparency entry: 708815092
- Sigstore integration time:
-
Permalink:
lonetis/crux-cache@751a77abbdf89b5cd5b02105cbea01519bcf9671 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/lonetis
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-python.yml@751a77abbdf89b5cd5b02105cbea01519bcf9671 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file crux_cache-1.0.0-py3-none-any.whl.
File metadata
- Download URL: crux_cache-1.0.0-py3-none-any.whl
- Upload date:
- Size: 10.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c5bbc02986715e91e2d060b0c0ec7417b4f89119fe01a9cb85a11a52add30c0d
|
|
| MD5 |
09a53df53ad9fbdfa4972a56318fe301
|
|
| BLAKE2b-256 |
3a34f9c4e1b43fd9d182f0ac72c7b0bb4fda7a394b8c23de0d544bc925b2dd00
|
Provenance
The following attestation bundles were made for crux_cache-1.0.0-py3-none-any.whl:
Publisher:
publish-python.yml on lonetis/crux-cache
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
crux_cache-1.0.0-py3-none-any.whl -
Subject digest:
c5bbc02986715e91e2d060b0c0ec7417b4f89119fe01a9cb85a11a52add30c0d - Sigstore transparency entry: 708815106
- Sigstore integration time:
-
Permalink:
lonetis/crux-cache@751a77abbdf89b5cd5b02105cbea01519bcf9671 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/lonetis
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-python.yml@751a77abbdf89b5cd5b02105cbea01519bcf9671 -
Trigger Event:
workflow_dispatch
-
Statement type: