ai-first · open source · flutter

AI-first E2E testing for mobile apps.

Describe your business flows. Agents rove your live app to find bugs, UX flaws, and broken paths.

Get started View on GitHub

MIT licensedself-hostableFlutter-first

9:41LIVE

Good morning

Book a slot

Available today

09:00

09:30

10:15

11:00

13:30

15:45

Confirm booking →

Demo surface · no real data

agent · tool calls●

›tap(text: "Book appointment")ok

›screenshot()captured

›tap(text: "09:30")ok

›tap(text: "Confirm")ok

›assert_visible("Booked")pass

Quickstart

Install, init, smoke.

Three commands. No harness to wire up. No selectors to maintain.

cold setup in under 2 minutes

~/flutter-app

$ dart pub global activate roveflow
 ✓ activated roveflow 0.1.0
 
$ roveflow init
 ✓ detected Flutter project at .
 ✓ wrote .claude/skills/roveflow/
 ✓ wrote .mcp.json · docs/roveflow/scenarios.md
 
 next: run /roveflow --only=cold-setup
 inside Claude Code.

How it works

From flow to result.

Hand the agents a flow. They rove your live app the way users would, finding their own path to satisfy it. Along the way they report the bugs, UX flaws, and broken paths they hit.

flow

You write the flow

Describe what should happen in plain English. Any flow you'd hand a user: book an appointment, sign up a new account, reset a password.

rove

Agents rove your live app

Agents launch your app and find their own path to satisfy the flow. If none works, that's a finding in itself.

result

Findings with evidence

Every scenario returns pass, fail, or an explicit skip, plus the path it took and any bug or UX flaw it hit along the way. With a recording.

01 / The gap

Your users don't follow the script.

Real users take paths you didn't write a test for. Roving agents find them first.

The limits of scripts

Scripted suites

Only cover paths you thought to write
Break when a button or route moves
Silent on the paths users actually take
Grow expensive to maintain

The roving advantage

Roving agents

Cover the paths users actually take
Adapt when the UI changes
Surface edge cases before ship
Stay useful as the app grows

02 / Modes

Different modes for different use cases.

Same engine, different briefing. The less you tell the agents, the more autonomously they rove.

Mode	What you tell the agents	What they report	Status
Smoke	“Book an appointment for next Tuesday at 10am. Hints: New booking → 10am slot → Confirm. Expect the confirmation screen.”	pass / fail / skipped per scenario	Available
UX emulator	“You're a first-time user trying to book an appointment. Report where you get confused or stuck.”	usability findings	Coming next
Coverage expander	“Visit every screen reachable from home. Check each one. Report anything broken.”	screen inventory with health status	Coming next
Chained flows	“Book an appointment, reschedule it to the next day, then cancel it. No hints. Find each step on your own.”	pass / fail against the flow	Coming next
Crash hunter	“Use the app however you like. Report anything that crashes, freezes, or locks up.”	crashes with reproduction steps	Coming next

Down the table: less direction, more autonomy.

03 / Anatomy

Anatomy of a rove.

You write a flow. Agents run it. You get a structured result. Replay any failing step.

01Prompt

You write the flow.

A short YAML block. Plain English for the flow and its pass/fail conditions. No selectors.

docs/roveflow/scenarios.mdyaml

id: open-detail
goal: Navigate from home to the detail screen.
waypoints:  # optional
  - tap_text: "Open Detail"
  - reach_screen: "Detail"
pass: "Detail" app-bar title visible
fail: navigation does not occur

02Rove

Agents take the controls.

Agents tap, scroll, and check the UI on your live app.

9:41LIVE

Good morning

Book a slot

Available today

09:00

09:30

10:15

11:00

13:30

15:45

Confirm booking →

Demo surface · no real data

agent · tool calls●

›tap(text: "Book appointment")ok

›screenshot()captured

›tap(text: "09:30")ok

›tap(text: "Confirm")ok

›assert_visible("Booked")pass

03Report

You get a structured result.

pass, fail, or an explicit skip per scenario. Every run saves screenshots, notes, and a trace to docs/roveflow/runs/.

docs/roveflow/runs/latest/report.mdmd

# Roveflow smoke · 2026-04-17T10:32Z
| Scenario      | Result              | Duration |
|---------------|---------------------|----------|
| cold-setup    | ✓ pass              | 00:18    |
| open-detail   | ✓ pass              | 00:21    |
| back-to-home  | ⊘ skipped: no_data  | 00:12    |
## Result states
- pass · fail
- skipped: no_data · setup_failed · setup_lost
Every state is actionable. No scenario ever
"just fails".

04Replay

Record today. Replay next.

Every session records to disk. Walk back through the steps and see what the agent saw. Deterministic replays come next.

/roveflow --only=open-detail

 roveflow-runner → open-detail
 ▸ tap_text("Open Detail")
   waypoint · tap_text: Open Detail
 ▸ reach_screen check
   waypoint · reach_screen: Detail
 ▸ navigate_back
   ended_at_home: true
 
 result: pass · 3 tool calls · 00:21

04 / Authoring

Write the flow. Skip the scaffolding.

Scripted tests couple to every key and label. Rename a button, rename a test. Scenarios describe the flow instead.

before · Flutter integration test

integration_test/open_detail_test.dartdart

// integration_test/open_detail_test.dart
import 'package:flutter_test/flutter_test.dart';
import 'package:integration_test/integration_test.dart';
import 'package:nav_app/main.dart';
void main() {
  IntegrationTestWidgetsFlutterBinding.ensureInitialized();
  testWidgets('open the detail screen', (tester) async {
    await tester.pumpWidget(const NavApp());
    await tester.pumpAndSettle();
    expect(find.byKey(const ValueKey('home')), findsOneWidget);
    await tester.tap(find.byKey(const ValueKey('open_detail_button')));
    await tester.pumpAndSettle();
    expect(find.byKey(const ValueKey('detail_body_text')), findsOneWidget);
    expect(find.text('Detail'), findsOneWidget);
    // Renaming any key above breaks this test.
  });
}

after · Roveflow scenario

docs/roveflow/scenarios.mdyaml

id: open-detail
goal: Navigate from home to the detail screen.
waypoints:  # optional
  - tap_text: "Open Detail"
  - reach_screen: "Detail"
pass: "Detail" visible
fail: navigation does not occur

Rename “Open Detail” to “View More”. The right still passes. The left doesn't.

05 / Developer experience

Ship faster. Break nothing.

CLI in your terminal. Slash command in Claude Code. A report you can read without grepping.

~/flutter-app

$ dart pub global activate roveflow
✓ activated roveflow 0.1.0
 
$ roveflow init
✓ detected Flutter project at .
✓ wrote .claude/skills/roveflow/ · commands · agents
✓ wrote .mcp.json · docs/roveflow/scenarios.md
 
next: run /roveflow --only=cold-setup
inside Claude Code.

One activate. One init. Zero source edits.

roveflow init lays down the Claude Code skill, agent, MCP config, and scenarios file. Everything regenerable.

06 / Concerns

Already handled.

The first questions teams ask about AI-driven testing, each with Roveflow's answer.

Isn't AI testing too slow?

the answer

Each scenario runs in 30 to 60 seconds. Parallelism across devices, capped at device count. Lightweight agents keep per-scenario latency and cost low. Optimize wall-clock across the suite.

Won't this burn through tokens?

the answer

Pick the model. Haiku keeps runs cheap. Sonnet or Opus take on the hard flows. The skill gives agents enough context that smaller models hold up. Bring your own Anthropic key. No markup.

What if the model silently changes?

the answer

Pin model versions per run. Recorded replays come next, so a run's evidence survives a model bump.

Will results flip between runs?

the answer

Every run takes a fresh path, the way a real user would. That's the whole point: it finds what a fixed script cannot.

Ready when you are

Start roving.

Install the CLI. Point it at your mobile app. Catch the bugs no one wrote a test for.

$dart pub global activate roveflow

View on GitHub Read the docs

MIT licensed · self-hostable · Flutter-first