How Well Can Language Models Answer Questions in Czech?

Project description

Czech-SimpleQA

Problems and answers from OpenAI's SimpleQA eval translated into Czech. This work is based on the data from the paper:

Measuring short-form factuality in large language models Jason Wei, Nguyen Karina, Hyung Won Chung, Yunxin Joy Jiao, Spencer Papay, Amelia Glaese, John Schulman, William Fedus arXiv preprint arXiv:2411.04368, 2024. https://arxiv.org/abs/2411.04368

model	SimpleQA¹	Czech-SimpleQA
gpt-4o-mini-2024-07-18	9.5	8.1
gpt-4o-2024-11-20	38.8	31.4
claude-3-5-sonnet-20240620	35.0	25.8
claude-3-5-sonnet-20241022	N/A	31.1
claude-3-5-haiku-20241022	N/A	9.3

There is a post on my blog with more detailed results!

What the Data Looks Like:

problem	target	czech_problem	czech_target
What was the population count in the 2011 census of the Republic of Nauru?	10,084	Jaký byl počet obyvatel při sčítání lidu v roce 2011 v Republice Nauru?	10 084

I Just Want the Eval Data

The file with the data lives at src/czech_simpleqa/czech_simpleqa.csv.gz, this is the full URL. Getting it with pandas looks like this:

import pandas as pd

eval_data = pd.read_csv(
    "https://raw.githubusercontent.com/jancervenka/"
    "czech-simpleqa/refs/heads/main/src/czech_simpleqa/czech_simpleqa.csv.gz"
)

I Want to Use the Python Package

The package contains everything required to run the eval end-to-end and collect the results. You can install it with pip or any other Python package manager:

pip install czech-simpleqa
python -m czech_simpleqa.eval \
    --answering_model claude-3-5-haiku-20241022 \
    --grading_model gpt-4o \
    --output_file_path output/claude-3-5-haiku-20241022.csv \
    --max_concurrent_tasks 30

CLI Arguments

--answering_model: Model that will generate predicted answers to the problems in the eval.
--grading_model: Model that will grade the predicted answers from the answering model.
--output_file_path: Where to store the .csv file with the eval results.
--max_concurrent_tasks: Maximum number of concurrent model calls (default 20).

Output File Schema

problem	target	predicted_answer	grade
Jaké je rozlišení Cat B15 Q v pixelech?	480 x 800	Cat B15 Q má rozlišení 480 x 800 pixelů.	A

Supported Models

Models from OpenAI and Anthropic are currently supported. Environment variables OPENAI_API_KEY or ANTHROPIC_API_KEY need to be configured.

As reported in the SimpleQA README.md and in the paper. ↩

Project details

Release history Release notifications | RSS feed

This version

0.1.0

Jan 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

czech_simpleqa-0.1.0.tar.gz (908.1 kB view details)

Uploaded Jan 12, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

czech_simpleqa-0.1.0-py3-none-any.whl (865.3 kB view details)

Uploaded Jan 12, 2025 Python 3

File details

Details for the file czech_simpleqa-0.1.0.tar.gz.

File metadata

Download URL: czech_simpleqa-0.1.0.tar.gz
Upload date: Jan 12, 2025
Size: 908.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.5.18

File hashes

Hashes for czech_simpleqa-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`022ebe2f3e91f4cb3be31ea5540730d51923df243486bd8eb96cc8fa316df751`
MD5	`ecb44ba070a6dff4ff42be42bd81b4fb`
BLAKE2b-256	`6185b15fcb1d022c1de8f532561033414f35d815f2a799eba29e4db23b98d92a`

See more details on using hashes here.

File details

Details for the file czech_simpleqa-0.1.0-py3-none-any.whl.

File metadata

Download URL: czech_simpleqa-0.1.0-py3-none-any.whl
Upload date: Jan 12, 2025
Size: 865.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.5.18

File hashes

Hashes for czech_simpleqa-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dbc13c52def5586a2d70eec825a3e41d7e61815a8ded80c5ccd84c4876a0fd75`
MD5	`89aefb773c7efbba952360cf66ffe549`
BLAKE2b-256	`3aa58be57422ac818663bb91a3cc1bb9c551c3a815d823a22f8a01fac181f535`

See more details on using hashes here.

czech-simpleqa 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Czech-SimpleQA

What the Data Looks Like:

I Just Want the Eval Data

I Want to Use the Python Package

CLI Arguments

Output File Schema

Supported Models

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes