Square Minus Square - A coding agent benchmark
I tried several coding agents to implement the following task:
There are two squares on a 2D plane, possibly overlapping. They are not axis-aligned and have different sizes. Write a function that triangulates the area of the first square minus the area of the intersection. Use the least amount of triangles.
There is a single Rust function to be implemented in a standalone file, no dependencies:
pub fn generate(
center1: [f32; 2], rotation1: f32, size1: f32,
center2: [f32; 2], rotation2: f32, size2: f32,
) -> Vec<[f32; 2]> {
// TODO
}
I made a little framework that displays results. It can capture screenshots and video footage.
Several coding agents were tasked to implement the function, and I did it myself without AI, too. Agents are encouraged to generate screenshots and examine them.
I ran the test two times and picked the better result for each agent.
Video capture of the results:
More models:
Some takeaways:
- To date, no LLM was able to solve the task successfully.
- Nearly all of the models generate screenshots and examine them to fix bugs. They are surprisingly good at it, top models identify real issues correctly. This highlights the importance of the feedback loop: always provide a way for the agent to check its own work.
- During development, I ran the test several times. There is no conclusive winner. Best models (Opus, Gemini 3 Pro, GPT 5.2) all came out on top sometimes. But sometimes they generate code that crashes.
- Gemini 3 Flash might seem to have solved the task well but it adds unnecessary vertices and triangles.
Full code on Github.