roveflowv0.1
ai-first · open source · flutter

AI-first E2E testing for mobile apps.

Describe your business flows. Agents rove your live app to find bugs, UX flaws, and broken paths.

MIT licensedself-hostableFlutter-first
9:41LIVE
Good morning
Book a slot
Available today
09:00
09:30
10:15
11:00
13:30
15:45
Confirm booking →
Demo surface · no real data
agent · tool calls
tap(text: "Book appointment")ok
screenshot()captured
tap(text: "09:30")ok
tap(text: "Confirm")ok
assert_visible("Booked")pass

Install, init, smoke.

Three commands. No harness to wire up. No selectors to maintain.

cold setup in under 2 minutes
~/flutter-app
$ dart pub global activate roveflow
✓ activated roveflow 0.1.0
 
$ roveflow init
✓ detected Flutter project at .
✓ wrote .claude/skills/roveflow/
✓ wrote .mcp.json · docs/roveflow/scenarios.md
 
next: run /roveflow --only=cold-setup
inside Claude Code.

From flow to result.

Hand the agents a flow. They rove your live app the way users would, finding their own path to satisfy it. Along the way they report the bugs, UX flaws, and broken paths they hit.

flow

You write the flow

Describe what should happen in plain English. Any flow you'd hand a user: book an appointment, sign up a new account, reset a password.

rove

Agents rove your live app

Agents launch your app and find their own path to satisfy the flow. If none works, that's a finding in itself.

result

Findings with evidence

Every scenario returns pass, fail, or an explicit skip, plus the path it took and any bug or UX flaw it hit along the way. With a recording.

Your users don't follow the script.

Real users take paths you didn't write a test for. Roving agents find them first.

The limits of scripts

Scripted suites

  • Only cover paths you thought to write
  • Break when a button or route moves
  • Silent on the paths users actually take
  • Grow expensive to maintain
The roving advantage

Roving agents

  • Cover the paths users actually take
  • Adapt when the UI changes
  • Surface edge cases before ship
  • Stay useful as the app grows

Different modes for different use cases.

Same engine, different briefing. The less you tell the agents, the more autonomously they rove.

ModeWhat you tell the agentsWhat they reportAutonomyStatus
SmokeBook an appointment for next Tuesday at 10am. Hints: New booking → 10am slot → Confirm. Expect the confirmation screen.pass / fail / skipped per scenario
Available
UX emulatorYou're a first-time user trying to book an appointment. Report where you get confused or stuck.usability findings
Coming next
Coverage expanderVisit every screen reachable from home. Check each one. Report anything broken.screen inventory with health status
Coming next
Chained flowsBook an appointment, reschedule it to the next day, then cancel it. No hints. Find each step on your own.pass / fail against the flow
Coming next
Crash hunterUse the app however you like. Report anything that crashes, freezes, or locks up.crashes with reproduction steps
Coming next

Down the table: less direction, more autonomy.

Anatomy of a rove.

You write a flow. Agents run it. You get a structured result. Replay any failing step.

01Prompt

You write the flow.

A short YAML block. Plain English for the flow and its pass/fail conditions. No selectors.

docs/roveflow/scenarios.mdyaml
id: open-detail
goal: Navigate from home to the detail screen.
waypoints: # optional
- tap_text: "Open Detail"
- reach_screen: "Detail"
pass: "Detail" app-bar title visible
fail: navigation does not occur
02Rove

Agents take the controls.

Agents tap, scroll, and check the UI on your live app.

9:41LIVE
Good morning
Book a slot
Available today
09:00
09:30
10:15
11:00
13:30
15:45
Confirm booking →
Demo surface · no real data
agent · tool calls
tap(text: "Book appointment")ok
screenshot()captured
tap(text: "09:30")ok
tap(text: "Confirm")ok
assert_visible("Booked")pass
03Report

You get a structured result.

pass, fail, or an explicit skip per scenario. Every run saves screenshots, notes, and a trace to docs/roveflow/runs/.

docs/roveflow/runs/latest/report.mdmd
# Roveflow smoke · 2026-04-17T10:32Z
| Scenario | Result | Duration |
|---------------|---------------------|----------|
| cold-setup | ✓ pass | 00:18 |
| open-detail | ✓ pass | 00:21 |
| back-to-home | ⊘ skipped: no_data | 00:12 |
## Result states
- pass · fail
- skipped: no_data · setup_failed · setup_lost
Every state is actionable. No scenario ever
"just fails".
04Replay

Record today. Replay next.

Every session records to disk. Walk back through the steps and see what the agent saw. Deterministic replays come next.

/roveflow --only=open-detail
roveflow-runner → open-detail
▸ tap_text("Open Detail")
waypoint · tap_text: Open Detail
▸ reach_screen check
waypoint · reach_screen: Detail
▸ navigate_back
ended_at_home: true
 
result: pass · 3 tool calls · 00:21

Write the flow. Skip the scaffolding.

Scripted tests couple to every key and label. Rename a button, rename a test. Scenarios describe the flow instead.

before · Flutter integration test
integration_test/open_detail_test.dartdart
// integration_test/open_detail_test.dart
import 'package:flutter_test/flutter_test.dart';
import 'package:integration_test/integration_test.dart';
import 'package:nav_app/main.dart';
void main() {
IntegrationTestWidgetsFlutterBinding.ensureInitialized();
testWidgets('open the detail screen', (tester) async {
await tester.pumpWidget(const NavApp());
await tester.pumpAndSettle();
expect(find.byKey(const ValueKey('home')), findsOneWidget);
await tester.tap(find.byKey(const ValueKey('open_detail_button')));
await tester.pumpAndSettle();
expect(find.byKey(const ValueKey('detail_body_text')), findsOneWidget);
expect(find.text('Detail'), findsOneWidget);
// Renaming any key above breaks this test.
});
}
after · Roveflow scenario
docs/roveflow/scenarios.mdyaml
id: open-detail
goal: Navigate from home to the detail screen.
waypoints: # optional
- tap_text: "Open Detail"
- reach_screen: "Detail"
pass: "Detail" visible
fail: navigation does not occur

Rename “Open Detail” to “View More”. The right still passes. The left doesn't.

Ship faster. Break nothing.

CLI in your terminal. Slash command in Claude Code. A report you can read without grepping.

~/flutter-app
$ dart pub global activate roveflow
✓ activated roveflow 0.1.0
 
$ roveflow init
✓ detected Flutter project at .
✓ wrote .claude/skills/roveflow/ · commands · agents
✓ wrote .mcp.json · docs/roveflow/scenarios.md
 
next: run /roveflow --only=cold-setup
inside Claude Code.

One activate. One init. Zero source edits.

roveflow init lays down the Claude Code skill, agent, MCP config, and scenarios file. Everything regenerable.

Already handled.

The first questions teams ask about AI-driven testing, each with Roveflow's answer.

Isn't AI testing too slow?

the answer

Each scenario runs in 30 to 60 seconds. Parallelism across devices, capped at device count. Lightweight agents keep per-scenario latency and cost low. Optimize wall-clock across the suite.

Won't this burn through tokens?

the answer

Pick the model. Haiku keeps runs cheap. Sonnet or Opus take on the hard flows. The skill gives agents enough context that smaller models hold up. Bring your own Anthropic key. No markup.

What if the model silently changes?

the answer

Pin model versions per run. Recorded replays come next, so a run's evidence survives a model bump.

Will results flip between runs?

the answer

Every run takes a fresh path, the way a real user would. That's the whole point: it finds what a fixed script cannot.

Start roving.

Install the CLI. Point it at your mobile app. Catch the bugs no one wrote a test for.

$dart pub global activate roveflow
MIT licensed · self-hostable · Flutter-first