AI vs Connections LogoBuy Me A Coffee

About AI vs Connections

Project Overview

AI vs Connections tests and compares the ability of various AI language models to solve the popular NY Times Connections puzzle game. We track performance metrics over time to see which models excel at different types of word connection challenges.

The Connections puzzle requires players to categorize 16 words into 4 groups of 4 words each, where each group shares a common theme or connection. This task tests language models' abilities to recognize patterns, understand cultural references, and identify semantic relationships.

Methodology

Each model is given the same set of 16 words from the daily Connections puzzle. We track whether they correctly identify all four groups, how quickly they solve the puzzle, and how many partial matches they achieve.

Models are scored based on correctness, speed, and difficulty of the puzzles they solve. Harder puzzles are worth more points, and faster solutions earn bonus points.

The Models

We test a diverse range of AI language models from multiple providers, including:

  • OpenAI: GPT-4.1, GPT-4o, GPT-4o-mini
  • Anthropic: Claude 3.7, Claude 3.5 Haiku, Claude 3 Opus
  • Google: Gemini 2.5 Pro, Gemini 1.5 Flash, Gemini 1.5 Pro
  • Meta: Llama 4 models, Llama 3.3
  • Mistral: Mistral Large, Mistral Small, Mistral 7B
  • Cohere: Command models
  • And models from Deepseek, Perplexity, and more

We regularly add new models as they become available and occasionally retire older versions.

Support This Project

AI vs Connections is an experiment run as a passion project. While I create and maintain this site for free, there are real expenses involved, including:

  • API costs for accessing various AI models
  • Server and hosting fees
  • Database storage and management
  • Domain registration and maintenance

If you find this project interesting or valuable, consider supporting it with a small donation. Every contribution helps keep the servers running and allows me to continue testing new models.

Buy Me A Coffee

Contact & Contribute

This project was created and is maintained by Jeff Louella. It started as a personal experiment to compare different AI models' abilities on semantic tasks, and has grown into a comprehensive benchmarking platform.

If you'd like to contribute or have questions about the methodology, feel free to reach out.