Best Practices
Maximize the value of your experiments with proven strategies.
Overview
Experimentation is a skill. These best practices help you avoid common pitfalls and get reliable results.
Planning Experiments
Start with a Hypothesis
Don't just "try something different." Formulate a clear hypothesis:
| Weak | Strong |
|---|---|
| "Let's test a new design" | "Reducing screens from 5 to 3 will increase completion by 10%" |
| "Maybe shorter is better" | "Users abandon at screen 4; removing it will reduce drop-offs" |
A hypothesis gives you something specific to validate or invalidate.
Test One Variable
Change one thing at a time. If you modify copy, design, and flow simultaneously, you won't know which change caused the result.
| Good | Bad |
|---|---|
| Change CTA text only | Change CTA text, color, and position |
| Add one screen | Add screen, new element type, different theme |
Define Success Upfront
Decide what "winning" means before you start:
- Primary metric: Completion rate
- Minimum improvement: 5% lift
- Secondary metrics: Error rate, time spent
This prevents moving goalposts after seeing results.
Running Experiments
Let Experiments Run Their Course
Don't stop early just because one variant is ahead. Statistical significance requires sufficient data.
Signs you can stop:
- Dashboard shows significance reached
- Both variants have 1,000+ users
- Experiment has run for at least one full week
Signs you need more time:
- Results are close (within 5%)
- Sample size is small
- Results fluctuate day to day
Variants Are Locked During Experiments
Once an experiment is live, you cannot edit the variant Stories. This is by design - mixing data from different versions would invalidate your results.
If you need to make changes:
- Pause or end the current experiment
- Edit the Story
- Launch a new experiment
Plan your variants carefully before going live to avoid restarting experiments.
Monitor for Problems
Check daily for:
- Error spikes: A variant might have bugs
- Extreme drop-offs: Something is broken
- Analytics gaps: Events not tracking properly
Catch issues early before they affect too many users.
Traffic Allocation
Start Conservative
Begin with small experiment traffic (10-20%) when testing significant changes. Scale up as confidence grows.
| Change Type | Starting Traffic |
|---|---|
| Minor copy change | 30-50% |
| Design overhaul | 10-20% |
| New flow | 10-15% |
| Risky change | 5-10% |
Balance Speed vs Risk
More traffic = faster results but higher exposure
| Traffic | Pros | Cons |
|---|---|---|
| 10% | Low risk | Slow results |
| 30% | Balanced | Moderate risk |
| 50% | Fast results | High exposure |
Avoid Traffic Starvation
Each variant needs enough traffic for meaningful data. With 1,000 daily users:
| Allocation | Daily Users per Variant | Time to 1,000 |
|---|---|---|
| 50/50 | 500 | 2 days |
| 80/20 | 200 | 5 days |
| 95/5 | 50 | 20 days |
Don't spread traffic too thin across many variants.
Analyzing Results
Wait for Significance
The #1 mistake is concluding too early. Apparent differences often disappear with more data.
Consider Context
Results can be affected by:
- Day of week (business vs consumer apps)
- Seasonality (holiday behavior differs)
- External events (marketing campaigns, PR)
- Platform differences (iOS vs Android)
Look Beyond Primary Metrics
A "winning" variant might have hidden costs:
| Primary Metric | Check Also |
|---|---|
| Higher completion | Error rate, time spent |
| More interactions | Frustration signals, support tickets |
| Faster completion | Did users skip content? |
Common Mistakes to Avoid
Confirmation Bias
Don't interpret results to match your expectations. Let data speak.
Problem: "The new design is clearly better" (when results are not significant) Solution: Use objective criteria defined upfront
HiPPO Effect
(Highest Paid Person's Opinion)
Problem: Leadership likes variant B, so you stop the experiment early Solution: Let experiments reach significance regardless of preferences
Survivorship Bias
Problem: Analyzing only completed users, ignoring those who dropped off Solution: Include all users in your analysis
Multiple Testing Problem
Problem: Testing 10 variants and declaring the best one a winner Solution: Use appropriate statistical corrections for multiple comparisons
Experiment Cadence
For Growing Apps
Run continuous experiments:
- Finish one experiment
- Apply learnings
- Start the next experiment
Compound small improvements over time.
For Stable Apps
Run occasional experiments:
- Quarterly reviews of performance
- Test when metrics decline
- Test new features before full rollout
Don't fix what isn't broken.
Documentation
Keep records of all experiments:
## Experiment: Shorter Onboarding
**Dates:** Jan 5-19, 2026
**Hypothesis:** Reducing to 3 screens increases completion
**Traffic:** 50% control, 50% variant
**Result:** 78% vs 71% (significant, p < 0.05)
**Decision:** Promoted 3-screen version
**Learnings:** Users prefer brevity; detail can come laterThis institutional knowledge prevents:
- Re-testing the same ideas
- Repeating past mistakes
- Losing context when team members change
Quick Reference
| Do | Don't |
|---|---|
| Formulate a hypothesis | "Just try something" |
| Test one variable | Change multiple things |
| Define success metrics | Move goalposts |
| Wait for significance | Stop early |
| Start conservative | 50% on risky changes |
| Document everything | Rely on memory |
Related
- Creating Experiments - Step-by-step setup
- Analyzing Results - Understanding outcomes
- Traffic Allocation - Distribution strategies
For Developers: See A/B Testing Analytics for tracking experiment events.