Python package for accessing CrUX (Chrome User Experience Report) cached data

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

lonetis

These details have not been verified by PyPI

Project description

crux-cache Python Package

Python package for accessing CrUX (Chrome User Experience Report) cached data from the crux-cache repository.

Installation

pip install crux-cache

Quick Start

from crux_cache import CruxCache

# Initialize the client
cache = CruxCache()

# List available datasets
datasets = cache.list_datasets()
for ds in datasets:
    print(f"{ds['id']}: {ds['latest_origins']} origins")

# Iterate over the latest global dataset
for origin, rank in cache.get_dataset('global'):
    print(f"{origin}: {rank}")

Usage Examples

Filter by Rank (Top Domains)

Use max_rank to filter domains with rank ≤ max_rank. Valid values: 1000, 5000, 10000, 50000, 100000, 500000, 1000000, 5000000, 10000000, 50000000.

from crux_cache import CruxCache

cache = CruxCache()

# Get top 1k domains from the latest US dataset
for origin, rank in cache.get_dataset('us', max_rank=1000):
    print(f"{origin}: {rank}")

# Get top 5k domains (includes top 1k)
for origin, rank in cache.get_dataset('global', max_rank=5000):
    print(f"{origin}: {rank}")

# Get top 1 million domains from global dataset
for origin, rank in cache.get_dataset('global', max_rank=1000000):
    print(f"{origin}: {rank}")

Access Specific Months

from crux_cache import CruxCache

cache = CruxCache()

# List available months for a dataset
months = cache.list_months('de')
print(f"Available months: {', '.join(months)}")
print(f"Latest month: {months[-1]}")

# Get a specific month
for origin, rank in cache.get_dataset('global', month='202510'):
    print(f"{origin}: {rank}")

Cache Management

from crux_cache import CruxCache

# Use default cache (.crux in current directory)
cache = CruxCache()

# Use a custom cache directory
cache = CruxCache(cache_dir='/tmp/crux')

# Set custom metadata TTL (in seconds)
cache = CruxCache(metadata_ttl=3600)  # 1 hour

# Clear the cache
cache.clear_cache()

Features

Automatic caching with configurable TTL
Access global and country-specific datasets (us, de, jp)
Filter by rank value to get top domains (e.g., top 1k, 5k, 1M)
Access current or historical data by month
Simple API with sensible defaults

API Reference

CruxCache

Main client for accessing CrUX cached data.

`init(cache_dir=".crux", metadata_ttl=86400)`

Initialize the client.

cache_dir: Cache directory (default: .crux)
metadata_ttl: Metadata cache TTL in seconds (default: 86400 = 1 day)

`list_datasets() -> List[Dict]`

List all available datasets with their metadata (id, name, total_months, earliest_month, latest_month, latest_origins, total_size).

`list_months(dataset_type: str) -> List[str]`

List available months for a dataset in YYYYMM format.

`get_dataset(dataset_type: str, month: Optional[str] = None, max_rank: Optional[int] = None) -> CruxDataset`

Get an iterator for a specific dataset and month. Returns all domains where rank ≤ max_rank.

Parameters:

dataset_type: 'global', 'us', 'de', or 'jp'
month: YYYYMM format (e.g., '202510'). Defaults to latest month
max_rank: Filter by rank (1000, 5000, 10000, 50000, 100000, 500000, 1000000, etc.)

Returns: Iterator yielding (origin, rank) tuples

`clear_cache()`

Clear all cached files. Metadata and CSV files will be re-downloaded on next access.

CruxDataset

Iterator that yields (origin, rank) tuples when iterating.

Data Format

Each iteration yields a tuple of:

origin (str): Full URL (e.g., https://www.google.com)
rank (int): Popularity bucket (1000, 10000, 100000, 1000000, etc.)

Caching Behavior

Metadata files (datasets.json, manifest.json): Cached with TTL (default: 1 day)
CSV chunks: Cached indefinitely (reused across sessions)
Cache location: .crux/ in current directory (configurable)
Clear cache: Use cache.clear_cache() to remove all cached files

Requirements

Python 3.7+
requests >= 2.25.0

License

MIT License - See LICENSE

CrUX data provided by Google under CrUX Dataset Terms

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

lonetis

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.0.0

Nov 19, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crux_cache-1.0.0.tar.gz (9.6 kB view details)

Uploaded Nov 19, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

crux_cache-1.0.0-py3-none-any.whl (10.1 kB view details)

Uploaded Nov 19, 2025 Python 3

File details

Details for the file crux_cache-1.0.0.tar.gz.

File metadata

Download URL: crux_cache-1.0.0.tar.gz
Upload date: Nov 19, 2025
Size: 9.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for crux_cache-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`fda51e4da49126c49ecfed4e4dfd0d932375fbf4f5147315352f726cd7deb277`
MD5	`f443b591e3ddc766cfda78d565616a1f`
BLAKE2b-256	`6ad62677f2351b0dcfac55d166e60bfaed7d2bb316dc40e18bf64cd1525b966d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for crux_cache-1.0.0.tar.gz:

Publisher: publish-python.yml on lonetis/crux-cache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: crux_cache-1.0.0.tar.gz
- Subject digest: fda51e4da49126c49ecfed4e4dfd0d932375fbf4f5147315352f726cd7deb277
- Sigstore transparency entry: 708815092
- Sigstore integration time: Nov 19, 2025
Source repository:
- Permalink: lonetis/crux-cache@751a77abbdf89b5cd5b02105cbea01519bcf9671
- Branch / Tag: refs/heads/main
- Owner: https://github.com/lonetis
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-python.yml@751a77abbdf89b5cd5b02105cbea01519bcf9671
- Trigger Event: workflow_dispatch

File details

Details for the file crux_cache-1.0.0-py3-none-any.whl.

File metadata

Download URL: crux_cache-1.0.0-py3-none-any.whl
Upload date: Nov 19, 2025
Size: 10.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for crux_cache-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c5bbc02986715e91e2d060b0c0ec7417b4f89119fe01a9cb85a11a52add30c0d`
MD5	`09a53df53ad9fbdfa4972a56318fe301`
BLAKE2b-256	`3a34f9c4e1b43fd9d182f0ac72c7b0bb4fda7a394b8c23de0d544bc925b2dd00`

See more details on using hashes here.

Provenance

The following attestation bundles were made for crux_cache-1.0.0-py3-none-any.whl:

Publisher: publish-python.yml on lonetis/crux-cache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: crux_cache-1.0.0-py3-none-any.whl
- Subject digest: c5bbc02986715e91e2d060b0c0ec7417b4f89119fe01a9cb85a11a52add30c0d
- Sigstore transparency entry: 708815106
- Sigstore integration time: Nov 19, 2025
Source repository:
- Permalink: lonetis/crux-cache@751a77abbdf89b5cd5b02105cbea01519bcf9671
- Branch / Tag: refs/heads/main
- Owner: https://github.com/lonetis
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-python.yml@751a77abbdf89b5cd5b02105cbea01519bcf9671
- Trigger Event: workflow_dispatch

crux-cache 1.0.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

crux-cache Python Package

Installation

Quick Start

Usage Examples

Filter by Rank (Top Domains)

Access Specific Months

Cache Management

Features

API Reference

CruxCache

__init__(cache_dir=".crux", metadata_ttl=86400)

list_datasets() -> List[Dict]

list_months(dataset_type: str) -> List[str]

get_dataset(dataset_type: str, month: Optional[str] = None, max_rank: Optional[int] = None) -> CruxDataset

clear_cache()

CruxDataset

Data Format

Caching Behavior

Requirements

License

Links

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`init(cache_dir=".crux", metadata_ttl=86400)`

`list_datasets() -> List[Dict]`

`list_months(dataset_type: str) -> List[str]`

`get_dataset(dataset_type: str, month: Optional[str] = None, max_rank: Optional[int] = None) -> CruxDataset`

`clear_cache()`