About AI vs Connections
Project Overview
AI vs Connections is an experimental benchmarking platform that evaluates how well artificial intelligence language models solve The New York Times Connections word puzzle. The project compares the accuracy and reasoning abilities of models like ChatGPT, Claude, Gemini, and others across different puzzle types.
Each Connections puzzle presents 16 words that must be grouped into 4 sets of 4, with each group sharing a hidden thematic link. The task challenges models to detect semantic patterns, cultural references, and lexical relationships.
By recording and analyzing model performance daily, AI vs Connections identifies strengths, weaknesses, and trends in AI reasoning over time. The goal is to measure how effectively different AI systems can mimic human-like language understanding.
Methodology
Each model is given the same set of 16 words from the daily Connections puzzle. We track whether they correctly identify all four groups, how quickly they solve the puzzle, and how many partial matches they achieve.
Models are scored based on correctness, speed, and difficulty of the puzzles they solve. Harder puzzles are worth more points, and faster solutions earn bonus points.
The Models
We test a diverse range of AI language models from multiple providers, including:
- OpenAI: GPT-4.1, GPT-4o, GPT-4o-mini
- Anthropic: Claude 3.7, Claude 3.5 Haiku, Claude 3 Opus
- Google: Gemini 2.5 Pro, Gemini 1.5 Flash, Gemini 1.5 Pro
- Meta: Llama 4 models, Llama 3.3
- Mistral: Mistral Large, Mistral Small, Mistral 7B
- Cohere: Command models
- And models from Deepseek, Perplexity, and more
We regularly add new models as they become available and occasionally retire older versions.
Support This Project
AI vs Connections is an experiment run as a passion project. While I create and maintain this site for free, there are real expenses involved, including:
- API costs for accessing various AI models
- Server and hosting fees
- Database storage and management
- Domain registration and maintenance
If you find this project interesting or valuable, consider supporting it with a small donation. Every contribution helps keep the servers running and allows me to continue testing new models.
Contact & Contribute
This project was created and is maintained by Jeff Louella. It started as a personal experiment to compare different AI models' abilities on semantic tasks, and has grown into a comprehensive benchmarking platform.
If you'd like to contribute or have questions about the methodology, feel free to reach out.