Rails testing on autopilot: Building an agent that writes what developers won't

By Mistral May 28, 2026

In most large Rails monoliths, organizations prioritize writing new features over writing tests for them. Over time, more and more code goes untested, forcing teams to spend more time debugging painful bugs.

Rails testing on autopilot: Building an agent that writes what developers won’t An autonomous agent has been developed to automatically generate and improve RSpec tests for Ruby on Rails applications, tackling the prevalent issue of untested code. Built on Mistral’s Vibe, the agent uses context engineering, specialized skills, and custom tools like RuboCop and SimpleCov to ensure tests are syntactically correct, stylistically sound, and achieve high code coverage. An experiment showed a significant improvement in test quality scores and full code coverage on a real-world codebase, validating the agent’s effectiveness.

Organizations often prioritize new features over writing tests in large Rails monoliths, leading to increased debugging time.
An autonomous agent was built using Vibe to automatically generate and improve RSpec tests for Rails codebases.
The agent reads Rails source files, generates/improves RSpec tests, validates them against style rules and coverage targets, and runs in CI/CD without human intervention.
It handles different Rails file types (models, serializers, controllers, mailers, helpers) with distinct testing strategies.
The agent leverages factories and fixtures for test data, creating or reusing them as needed.
Context engineering via an AGENTS.md file provided step-by-step instructions and best practices to the agent.
Specialized SKILLS files were created for different file types to ensure precise testing instructions.
Custom tools like RuboCop for linting and SimpleCov integrated with RSpec for coverage and correctness checking were implemented.
LLM-as-a-judge was used to score test quality, though limitations like non-determinism and the ‘missing parenthesis problem’ (syntax errors) were noted.
An experiment showed the agent improved aggregate test quality scores from 0.49 to 0.74 and achieved 100% coverage on a target repository.
Running tests via SimpleCov and RSpec as a final step was a critical decision ensuring test executability.
Vibe, the open-source platform for the agent, is available for use. Continue reading https://foxvector.com/articles/59ef6469-d557-4cdc-9ff3-d92260c0bdda

Reference: https://foxvector.com/articles/59ef6469-d557-4cdc-9ff3-d92260c0bdda

Write a comment

No comments yet.