Skip to main content

Python package for accessing CrUX (Chrome User Experience Report) cached data

Project description

crux-cache Python Package

Python package for accessing CrUX (Chrome User Experience Report) cached data from the crux-cache repository.

Installation

pip install crux-cache

Quick Start

from crux_cache import CruxCache

# Initialize the client
cache = CruxCache()

# List available datasets
datasets = cache.list_datasets()
for ds in datasets:
    print(f"{ds['id']}: {ds['latest_origins']} origins")

# Iterate over the latest global dataset
for origin, rank in cache.get_dataset('global'):
    print(f"{origin}: {rank}")

Usage Examples

Filter by Rank (Top Domains)

Use max_rank to filter domains with rank ≤ max_rank. Valid values: 1000, 5000, 10000, 50000, 100000, 500000, 1000000, 5000000, 10000000, 50000000.

from crux_cache import CruxCache

cache = CruxCache()

# Get top 1k domains from the latest US dataset
for origin, rank in cache.get_dataset('us', max_rank=1000):
    print(f"{origin}: {rank}")

# Get top 5k domains (includes top 1k)
for origin, rank in cache.get_dataset('global', max_rank=5000):
    print(f"{origin}: {rank}")

# Get top 1 million domains from global dataset
for origin, rank in cache.get_dataset('global', max_rank=1000000):
    print(f"{origin}: {rank}")

Access Specific Months

from crux_cache import CruxCache

cache = CruxCache()

# List available months for a dataset
months = cache.list_months('de')
print(f"Available months: {', '.join(months)}")
print(f"Latest month: {months[-1]}")

# Get a specific month
for origin, rank in cache.get_dataset('global', month='202510'):
    print(f"{origin}: {rank}")

Cache Management

from crux_cache import CruxCache

# Use default cache (.crux in current directory)
cache = CruxCache()

# Use a custom cache directory
cache = CruxCache(cache_dir='/tmp/crux')

# Set custom metadata TTL (in seconds)
cache = CruxCache(metadata_ttl=3600)  # 1 hour

# Clear the cache
cache.clear_cache()

Features

  • Automatic caching with configurable TTL
  • Access global and country-specific datasets (us, de, jp)
  • Filter by rank value to get top domains (e.g., top 1k, 5k, 1M)
  • Access current or historical data by month
  • Simple API with sensible defaults

API Reference

CruxCache

Main client for accessing CrUX cached data.

__init__(cache_dir=".crux", metadata_ttl=86400)

Initialize the client.

  • cache_dir: Cache directory (default: .crux)
  • metadata_ttl: Metadata cache TTL in seconds (default: 86400 = 1 day)

list_datasets() -> List[Dict]

List all available datasets with their metadata (id, name, total_months, earliest_month, latest_month, latest_origins, total_size).

list_months(dataset_type: str) -> List[str]

List available months for a dataset in YYYYMM format.

get_dataset(dataset_type: str, month: Optional[str] = None, max_rank: Optional[int] = None) -> CruxDataset

Get an iterator for a specific dataset and month. Returns all domains where rank ≤ max_rank.

Parameters:

  • dataset_type: 'global', 'us', 'de', or 'jp'
  • month: YYYYMM format (e.g., '202510'). Defaults to latest month
  • max_rank: Filter by rank (1000, 5000, 10000, 50000, 100000, 500000, 1000000, etc.)

Returns: Iterator yielding (origin, rank) tuples

clear_cache()

Clear all cached files. Metadata and CSV files will be re-downloaded on next access.

CruxDataset

Iterator that yields (origin, rank) tuples when iterating.

Data Format

Each iteration yields a tuple of:

  • origin (str): Full URL (e.g., https://www.google.com)
  • rank (int): Popularity bucket (1000, 10000, 100000, 1000000, etc.)

Caching Behavior

  • Metadata files (datasets.json, manifest.json): Cached with TTL (default: 1 day)
  • CSV chunks: Cached indefinitely (reused across sessions)
  • Cache location: .crux/ in current directory (configurable)
  • Clear cache: Use cache.clear_cache() to remove all cached files

Requirements

  • Python 3.7+
  • requests >= 2.25.0

License

MIT License - See LICENSE

CrUX data provided by Google under CrUX Dataset Terms

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crux_cache-1.0.0.tar.gz (9.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crux_cache-1.0.0-py3-none-any.whl (10.1 kB view details)

Uploaded Python 3

File details

Details for the file crux_cache-1.0.0.tar.gz.

File metadata

  • Download URL: crux_cache-1.0.0.tar.gz
  • Upload date:
  • Size: 9.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for crux_cache-1.0.0.tar.gz
Algorithm Hash digest
SHA256 fda51e4da49126c49ecfed4e4dfd0d932375fbf4f5147315352f726cd7deb277
MD5 f443b591e3ddc766cfda78d565616a1f
BLAKE2b-256 6ad62677f2351b0dcfac55d166e60bfaed7d2bb316dc40e18bf64cd1525b966d

See more details on using hashes here.

Provenance

The following attestation bundles were made for crux_cache-1.0.0.tar.gz:

Publisher: publish-python.yml on lonetis/crux-cache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file crux_cache-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: crux_cache-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 10.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for crux_cache-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c5bbc02986715e91e2d060b0c0ec7417b4f89119fe01a9cb85a11a52add30c0d
MD5 09a53df53ad9fbdfa4972a56318fe301
BLAKE2b-256 3a34f9c4e1b43fd9d182f0ac72c7b0bb4fda7a394b8c23de0d544bc925b2dd00

See more details on using hashes here.

Provenance

The following attestation bundles were made for crux_cache-1.0.0-py3-none-any.whl:

Publisher: publish-python.yml on lonetis/crux-cache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page