A toolkit for evaluating the culture of MLX large language models (LLMs) on the CD Eval benchmark.

These details have not been verified by PyPI

Project links

Project description

CultureKit

Python 3.11+ License: MIT Status: Alpha

Note: This repository is currently in alpha testing. Features and APIs may change without notice.

A toolkit for evaluating the culture of Large Language Models (LLMs) on the CD Eval benchmark. Supports MLX, Azure OpenAI, and Azure Foundry models.

Overview

CultureKit provides tools and utilities for evaluating how cultural biases and perspectives are reflected in large language models (LLMs). The toolkit focuses on measuring and analyzing model responses against the CD Eval benchmark, which tests models on cultural dimensions.

Features

Multiple Model Support: Works with MLX models, Azure OpenAI, and Azure Foundry models
Comprehensive Evaluation: Tools for scoring models against the CD Eval benchmark
Result Visualization: Notebook for analyzing and visualizing evaluation results
CLI: Command line interface for easy model evaluation

Installation

From PyPI

pip install culturekit

Note: MLX dependencies are primarily designed for macOS/Apple Silicon. On other platforms, MLX functionality will be disabled, but Azure-based models will still work.

Using uv

# Clone the repository
git clone https://github.com/decisions-lab/culturekit.git
cd culturekit

# Install with uv
uv sync

# Or install with dev dependencies
uv sync --extra dev

Using pip from source

# Clone the repository
git clone https://github.com/decisions-lab/culturekit.git
cd culturekit

# Install with pip
pip install -e .

Quick Start

CultureKit comes with a CLI for easy model evaluation:

Evaluating Models

# Run evaluation on an MLX model (macOS only)
uv run python -m culturekit eval --model "mlx-community/Qwen1.5-0.5B-MLX" --model_type mlx

# Run evaluation on an Azure OpenAI model
uv run python -m culturekit eval --model "gpt-4o-mini" --model_type azure_openai --azure_deployment "deployment-name"

# Run evaluation on an Azure Foundry model
uv run python -m culturekit eval --model "foundry-model" --model_type azure_foundry

Scoring Results

# Generate scoring
uv run python -m culturekit score --responses_path "results.jsonl" --output_path "scores.json"

Note: If you've activated the virtual environment (source .venv/bin/activate), you can use python directly instead of uv run python.

Environment Setup

For Azure OpenAI and Azure Foundry models, you need to set up environment variables. Create a .env file in the src/culturekit directory:

# Azure OpenAI Configuration
OPENAI_API_VERSION=2023-03-15-preview
AZURE_OPENAI_API_KEY=your_api_key
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_OPENAI_DEPLOYMENT=deployment_name

# Azure Foundry Configuration
AZURE_FOUNDRY_ENDPOINT=https://your-foundry-endpoint.models.ai.azure.com
AZURE_API_KEY=your_api_key

See the Environment Setup guide for more details.

Documentation

For more detailed information, see the documentation:

Dataset

The toolkit uses the CD Eval benchmark for evaluating cultural dimensions in LLMs. The dataset includes diverse scenarios representing different cultural perspectives and contexts.

Development

Prerequisites

Python 3.11+
uv (install with curl -LsSf https://astral.sh/uv/install.sh | sh)

Setup Development Environment

# Clone the repository
git clone https://github.com/decisions-lab/culturekit.git
cd culturekit

# Install dependencies (including dev dependencies)
uv sync --extra dev

Using uv

Here are the essential uv commands for working with this repository:

Installing Dependencies

# Install all dependencies (including dev dependencies)
uv sync --extra dev

# Install only production dependencies
uv sync

# Update all dependencies to latest compatible versions
uv sync --upgrade --extra dev

Running Commands

# Run Python scripts in the virtual environment
uv run python -m culturekit eval --model "model-name" --model_type mlx

# Run CLI commands
uv run culturekit --help

# Run tests
uv run pytest

# Run linting/formatting
uv run black .
uv run isort .
uv run flake8 .
uv run mypy .

Managing Dependencies

# Add a new dependency
uv add package-name

# Add a dev dependency
uv add --dev package-name

# Add a dependency with version constraint
uv add "package-name>=1.0.0"

# Remove a dependency
uv remove package-name

# Update a specific package
uv sync --upgrade-package package-name

Building and Publishing

# Build the package
uv build

# Publish to PyPI (requires authentication)
uv publish

Other Useful Commands

# Show installed packages
uv pip list

# Show dependency tree
uv tree

# Activate the virtual environment (if needed)
source .venv/bin/activate  # On macOS/Linux

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgements

Thanks to Apple's MLX team for their excellent machine learning framework
CD Eval benchmark creators for providing a standard for cultural dimensions evaluation

Citation

@software{culturekit2025,
  author = {Devansh Gandhi},
  title = {CultureKit: A toolkit for evaluating the culture of MLX large language models},
  year = {2025},
  url = {https://github.com/decisions-lab/culturekit},
  version = {0.0.1}
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.0.2

Dec 19, 2025

0.0.1

Apr 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

culturekit-0.0.2.tar.gz (558.9 kB view details)

Uploaded Dec 19, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

culturekit-0.0.2-py3-none-any.whl (13.4 kB view details)

Uploaded Dec 19, 2025 Python 3

File details

Details for the file culturekit-0.0.2.tar.gz.

File metadata

Download URL: culturekit-0.0.2.tar.gz
Upload date: Dec 19, 2025
Size: 558.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for culturekit-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`512793bcab79fa59aa3fae78a65f2c33bf42b2522af05578144b64457ad5f27a`
MD5	`f78275df2c029fe3bc7a2ae16126c50d`
BLAKE2b-256	`0ad857504ff3410224ac3b8c6d627b1f6f49e6c52ab6e7c867f2f02715ec16ac`

See more details on using hashes here.

File details

Details for the file culturekit-0.0.2-py3-none-any.whl.

File metadata

Download URL: culturekit-0.0.2-py3-none-any.whl
Upload date: Dec 19, 2025
Size: 13.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for culturekit-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`72c84378c37c1a92757e1a35fdb30730da91a6982e9bc5ad066e2f2477f510a9`
MD5	`bc5fa90f4895fa1cc561c1cfc0d2cac5`
BLAKE2b-256	`6c9f9876246668baf3f3678999f42d3f83266375f592c1150d2ee7436021fb27`

See more details on using hashes here.

culturekit 0.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

CultureKit

Overview

Features

Installation

From PyPI

Using uv

Using pip from source

Quick Start

Evaluating Models

Scoring Results

Environment Setup

Documentation

Dataset

Development

Prerequisites

Setup Development Environment

Using uv

Installing Dependencies

Running Commands

Managing Dependencies

Building and Publishing

Other Useful Commands

Contributing

License

Acknowledgements

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes