PyPI recent updates for llm-benchmark-toolkit

PyPI recent updates for llm-benchmark-toolkit https://pypi.org/project/llm-benchmark-toolkit/ Recent updates to the Python Package Index for llm-benchmark-toolkit en 2.4.2 https://pypi.org/project/llm-benchmark-toolkit/2.4.2/ Benchmark LLMs with 10 benchmarks & 132K+ questions. 8 providers: OpenAI, Anthropic, Groq, Together, Fireworks, DeepSeek, Ollama, HuggingFace. Unified CLI + Web dashboard. nahuelgiudizi@hotmail.com Fri, 05 Dec 2025 23:13:52 GMT 2.4.1 https://pypi.org/project/llm-benchmark-toolkit/2.4.1/ Benchmark LLMs with 10 benchmarks & 132K+ questions. 8 providers: OpenAI, Anthropic, Groq, Together, Fireworks, DeepSeek, Ollama, HuggingFace. Unified CLI + Web dashboard. nahuelgiudizi@hotmail.com Fri, 05 Dec 2025 22:35:18 GMT 2.4.0 https://pypi.org/project/llm-benchmark-toolkit/2.4.0/ Benchmark LLMs with 9 benchmarks & 100K+ questions. 8 providers: OpenAI, Anthropic, Groq, Together, Fireworks, DeepSeek, Ollama, HuggingFace. Web dashboard included. nahuelgiudizi@hotmail.com Fri, 05 Dec 2025 06:16:17 GMT 2.3.2 https://pypi.org/project/llm-benchmark-toolkit/2.3.2/ Benchmark LLMs with 9 benchmarks & 100K+ questions. 8 providers: OpenAI, Anthropic, Groq, Together, Fireworks, DeepSeek, Ollama, HuggingFace. Web dashboard included. nahuelgiudizi@hotmail.com Thu, 04 Dec 2025 02:15:06 GMT 2.3.1 https://pypi.org/project/llm-benchmark-toolkit/2.3.1/ Benchmark LLMs with 9 benchmarks & 100K+ questions. 8 providers: OpenAI, Anthropic, Groq, Together, Fireworks, DeepSeek, Ollama, HuggingFace. Web dashboard included. nahuelgiudizi@hotmail.com Thu, 04 Dec 2025 01:13:45 GMT 2.3.0 https://pypi.org/project/llm-benchmark-toolkit/2.3.0/ Benchmark LLMs with 9 benchmarks & 100K+ questions. 8 providers: OpenAI, Anthropic, Groq, Together, Fireworks, DeepSeek, Ollama, HuggingFace. Web dashboard included. nahuelgiudizi@hotmail.com Wed, 03 Dec 2025 22:26:05 GMT 2.2.1 https://pypi.org/project/llm-benchmark-toolkit/2.2.1/ Benchmark LLMs with real academic datasets: MMLU, TruthfulQA, HellaSwag, ARC & more. Web dashboard included. nahuelgiudizi@hotmail.com Tue, 02 Dec 2025 22:48:15 GMT 2.2.0 https://pypi.org/project/llm-benchmark-toolkit/2.2.0/ Production-ready LLM evaluation with 24K+ real questions nahuel@example.com Tue, 02 Dec 2025 22:42:14 GMT 2.1.0 https://pypi.org/project/llm-benchmark-toolkit/2.1.0/ Production-ready LLM evaluation with 24K+ real questions nahuel@example.com Tue, 02 Dec 2025 02:17:45 GMT 2.0.0 https://pypi.org/project/llm-benchmark-toolkit/2.0.0/ Comprehensive evaluation framework for Large Language Models with academic statistical rigor nahuel@example.com Mon, 01 Dec 2025 20:39:03 GMT 0.4.1 https://pypi.org/project/llm-benchmark-toolkit/0.4.1/ Comprehensive evaluation framework for Large Language Models nahuel@example.com Mon, 01 Dec 2025 06:50:35 GMT 0.4.0 https://pypi.org/project/llm-benchmark-toolkit/0.4.0/ Comprehensive evaluation framework for Large Language Models nahuel@example.com Mon, 01 Dec 2025 05:11:20 GMT 0.3.2 https://pypi.org/project/llm-benchmark-toolkit/0.3.2/ Comprehensive evaluation framework for Large Language Models nahuel@example.com Mon, 01 Dec 2025 00:35:50 GMT 0.3.1 https://pypi.org/project/llm-benchmark-toolkit/0.3.1/ Comprehensive evaluation framework for Large Language Models nahuel@example.com Sun, 30 Nov 2025 06:42:03 GMT 0.3.0 https://pypi.org/project/llm-benchmark-toolkit/0.3.0/ Comprehensive evaluation framework for Large Language Models nahuel@example.com Sun, 30 Nov 2025 06:22:36 GMT