AI vs Connections Logo

About AI vs Connections

Project Overview

AI vs Connections is an experimental benchmarking platform that evaluates how well artificial intelligence language models solve The New York Times Connections word puzzle. The project compares the accuracy and reasoning abilities of models like ChatGPT, Claude, Gemini, and others across different puzzle types.

Each Connections puzzle presents 16 words that must be grouped into 4 sets of 4, with each group sharing a hidden thematic link. The task challenges models to detect semantic patterns, cultural references, and lexical relationships.

By recording and analyzing model performance daily, AI vs Connections identifies strengths, weaknesses, and trends in AI reasoning over time. The goal is to measure how effectively different AI systems can mimic human-like language understanding.

Methodology

Each model is given the same set of 16 words from the daily Connections puzzle. We track whether they correctly identify all four groups, how quickly they solve the puzzle, and how many partial matches they achieve.

Models are scored based on correctness, speed, and difficulty of the puzzles they solve. Harder puzzles are worth more points, and faster solutions earn bonus points.

The Models

We test a diverse range of AI language models from multiple providers, including:

  • OpenAI: GPT-4.1, GPT-4o, GPT-4o-mini
  • Anthropic: Claude 3.7, Claude 3.5 Haiku, Claude 3 Opus
  • Google: Gemini 2.5 Pro, Gemini 1.5 Flash, Gemini 1.5 Pro
  • Meta: Llama 4 models, Llama 3.3
  • Mistral: Mistral Large, Mistral Small, Mistral 7B
  • Cohere: Command models
  • And models from Deepseek, Perplexity, and more

We regularly add new models as they become available and occasionally retire older versions.

Support This Project

AI vs Connections is an experiment run as a passion project. While I create and maintain this site for free, there are real expenses involved, including:

  • API costs for accessing various AI models
  • Server and hosting fees
  • Database storage and management
  • Domain registration and maintenance

If you find this project interesting or valuable, consider supporting it with a small donation. Every contribution helps keep the servers running and allows me to continue testing new models.

Buy Me A Coffee

Contact & Contribute

This project was created and is maintained by Jeff Louella. It started as a personal experiment to compare different AI models' abilities on semantic tasks, and has grown into a comprehensive benchmarking platform.

If you'd like to contribute or have questions about the methodology, feel free to reach out.

Site Status:

Having some issues with Gemini 2.5 returning results. Working on a solution.