Best Practices

Maximize the value of your experiments with proven strategies.

Overview

Experimentation is a skill. These best practices help you avoid common pitfalls and get reliable results.

Planning Experiments

Start with a Hypothesis

Don't just "try something different." Formulate a clear hypothesis:

WeakStrong
"Let's test a new design""Reducing screens from 5 to 3 will increase completion by 10%"
"Maybe shorter is better""Users abandon at screen 4; removing it will reduce drop-offs"

A hypothesis gives you something specific to validate or invalidate.

Test One Variable

Change one thing at a time. If you modify copy, design, and flow simultaneously, you won't know which change caused the result.

GoodBad
Change CTA text onlyChange CTA text, color, and position
Add one screenAdd screen, new element type, different theme

Define Success Upfront

Decide what "winning" means before you start:

  • Primary metric: Completion rate
  • Minimum improvement: 5% lift
  • Secondary metrics: Error rate, time spent

This prevents moving goalposts after seeing results.

Running Experiments

Let Experiments Run Their Course

Don't stop early just because one variant is ahead. Statistical significance requires sufficient data.

Signs you can stop:

  • Dashboard shows significance reached
  • Both variants have 1,000+ users
  • Experiment has run for at least one full week

Signs you need more time:

  • Results are close (within 5%)
  • Sample size is small
  • Results fluctuate day to day

Variants Are Locked During Experiments

Once an experiment is live, you cannot edit the variant Stories. This is by design - mixing data from different versions would invalidate your results.

If you need to make changes:

  1. Pause or end the current experiment
  2. Edit the Story
  3. Launch a new experiment

Plan your variants carefully before going live to avoid restarting experiments.

Monitor for Problems

Check daily for:

  • Error spikes: A variant might have bugs
  • Extreme drop-offs: Something is broken
  • Analytics gaps: Events not tracking properly

Catch issues early before they affect too many users.

Traffic Allocation

Start Conservative

Begin with small experiment traffic (10-20%) when testing significant changes. Scale up as confidence grows.

Change TypeStarting Traffic
Minor copy change30-50%
Design overhaul10-20%
New flow10-15%
Risky change5-10%

Balance Speed vs Risk

More traffic = faster results but higher exposure

TrafficProsCons
10%Low riskSlow results
30%BalancedModerate risk
50%Fast resultsHigh exposure

Avoid Traffic Starvation

Each variant needs enough traffic for meaningful data. With 1,000 daily users:

AllocationDaily Users per VariantTime to 1,000
50/505002 days
80/202005 days
95/55020 days

Don't spread traffic too thin across many variants.

Analyzing Results

Wait for Significance

The #1 mistake is concluding too early. Apparent differences often disappear with more data.

Consider Context

Results can be affected by:

  • Day of week (business vs consumer apps)
  • Seasonality (holiday behavior differs)
  • External events (marketing campaigns, PR)
  • Platform differences (iOS vs Android)

Look Beyond Primary Metrics

A "winning" variant might have hidden costs:

Primary MetricCheck Also
Higher completionError rate, time spent
More interactionsFrustration signals, support tickets
Faster completionDid users skip content?

Common Mistakes to Avoid

Confirmation Bias

Don't interpret results to match your expectations. Let data speak.

Problem: "The new design is clearly better" (when results are not significant) Solution: Use objective criteria defined upfront

HiPPO Effect

(Highest Paid Person's Opinion)

Problem: Leadership likes variant B, so you stop the experiment early Solution: Let experiments reach significance regardless of preferences

Survivorship Bias

Problem: Analyzing only completed users, ignoring those who dropped off Solution: Include all users in your analysis

Multiple Testing Problem

Problem: Testing 10 variants and declaring the best one a winner Solution: Use appropriate statistical corrections for multiple comparisons

Experiment Cadence

For Growing Apps

Run continuous experiments:

  1. Finish one experiment
  2. Apply learnings
  3. Start the next experiment

Compound small improvements over time.

For Stable Apps

Run occasional experiments:

  1. Quarterly reviews of performance
  2. Test when metrics decline
  3. Test new features before full rollout

Don't fix what isn't broken.

Documentation

Keep records of all experiments:

## Experiment: Shorter Onboarding
**Dates:** Jan 5-19, 2026
**Hypothesis:** Reducing to 3 screens increases completion
**Traffic:** 50% control, 50% variant
**Result:** 78% vs 71% (significant, p < 0.05)
**Decision:** Promoted 3-screen version
**Learnings:** Users prefer brevity; detail can come later

This institutional knowledge prevents:

  • Re-testing the same ideas
  • Repeating past mistakes
  • Losing context when team members change

Quick Reference

DoDon't
Formulate a hypothesis"Just try something"
Test one variableChange multiple things
Define success metricsMove goalposts
Wait for significanceStop early
Start conservative50% on risky changes
Document everythingRely on memory

For Developers: See A/B Testing Analytics for tracking experiment events.