Interactive Game Reasoning Arena
Play games against LLMs, a random bot or watch LLMs compete!
🤖 Available AI Players: HuggingFace transformer models integrated with backend system. Local transformer models run with Hugging Face transformers. No API tokens required!
⚠️ Note on Reasoning Quality: The available models are relatively basic (GPT-2, DistilGPT-2, etc.) and may produce limited or nonsensical reasoning. They are suitable for demonstration purposes but don't expect sophisticated strategic thinking or coherent explanations.
Player 0
Player 1
Interactive Game
Your Move
LLM Model Leaderboard
Track performance across different games!
ai-mistralai-Mixtral-8x7B-Instruct-v0.1 | llm | 164 | 19.5 | 81.25 | 81.25 |
Upload new .db
result files
📊 Metrics Dashboard
Visual summaries of LLM performance across games.
Performance Summary
ai-mistralai-Mixtral-8x7B-Instruct-v0.1 | llm | 164 | 19.5 | 81.25 | 81.25 |
glm-4p5-air | llm | 50 | 41 | 64 | 64 |
kimi-k2-instruct | llm | 16 | 19.5 | 81.25 | 81.25 |
llama-v3-70b-instruct | llm | 10 | 8 | 80 | 80 |
llama-v3-8b-instruct | llm | 10 | 8 | 80 | 80 |
qwen3-235b-a22b-thinking-2507 | llm | 18 | 13 | 83.33 | 83.33 |
GPT-3.5-turbo | llm | 5 | 4.5 | 100 | 100 |
GPT-4 | llm | 44 | 40.5 | 72.73 | 72.73 |
GPT-4-turbo | llm | 5 | 6 | 80 | 80 |
GPT-4o-mini | llm | 32 | 20.5 | 75 | 75 |
o4-mini | llm | 11 | 8 | 72.73 | 72.73 |
Gemma2-9b-it | llm | 58 | 41 | 67.24 | 67.24 |
Gemma-7b-it | llm | 1 | 0.5 | 100 | 100 |
Llama-3-70b-8192 | llm | 54 | 58 | 87.04 | 87.04 |
Llama-3-8b-8192 | llm | 164 | 83 | 82.93 | 82.93 |
llama-3.1-8b-instant | llm | 15 | 5.5 | 86.67 | 86.67 |
Meta-Llama-3.1-70B-Instruct-Turbo | llm | 10 | 14 | 100 | 100 |
Meta-Llama-3.1-8B-Instruct-Turbo | llm | 10 | 7.5 | 80 | 80 |
ai-mistralai-Mixtral-8x7B-Instruct-v0.1 | llm | 59 | 28.5 | 59.32 | 59.32 |
Qwen2-7B-Instruct | llm | 1 | -0.5 | 0 | 0 |
🧠 Analysis of LLM Reasoning
Insights into move legality and decision behavior.
Illegal Move Summary
ai-mistralai-Mixtral-8x7B-Instruct-v0.1 | 0 |
glm-4p5-air | 0 |
kimi-k2-instruct | 0 |
llama-v3-70b-instruct | 0 |
llama-v3-8b-instruct | 0 |
qwen3-235b-a22b-thinking-2507 | 0 |
GPT-3.5-turbo | 0 |
GPT-4 | 0 |
GPT-4-turbo | 0 |
GPT-4o-mini | 0 |
o4-mini | 0 |
Gemma2-9b-it | 0 |
Gemma-7b-it | 0 |
Llama-3-70b-8192 | 0 |
Llama-3-8b-8192 | 0 |
llama-3.1-8b-instant | 0 |
Meta-Llama-3.1-70B-Instruct-Turbo | 0 |
Meta-Llama-3.1-8B-Instruct-Turbo | 0 |
ai-mistralai-Mixtral-8x7B-Instruct-v0.1 | 0 |
Qwen2-7B-Instruct | 0 |
About Game Reasoning Arena
This app analyzes and visualizes LLM performance in games.
- Game Arena: Play games vs. LLMs or watch LLM vs. LLM
- Leaderboard: Performance statistics across games
- Metrics Dashboard: Visual summaries
- Reasoning Analysis: Illegal moves & behavior
Data: SQLite databases in /results/
.