Analyzing Results
Understand experiment outcomes and make data-driven decisions.
Overview
Running an experiment is only half the work. Analyzing results correctly ensures you draw valid conclusions and make improvements that actually help.
Accessing Experiment Data
- Navigate to Analytics in your dashboard
- Use the Experiment filter to select your test
- View metrics broken down by variant
Key Metrics to Compare
Completion Rate
The percentage of users who finish the Story.
| Metric | Formula | Meaning |
|---|---|---|
| Completion Rate | Completions / Views | How compelling is your content? |
Interpreting results:
- Higher is better
- 70-85% is typical for onboarding
- < 50% suggests problems
Drop-off Rate
The percentage of users who abandon before completing.
| Metric | Formula | Meaning |
|---|---|---|
| Dismissal Rate | Dismissals / Views | Are users bailing? |
Interpreting results:
- Lower is better
- Compare per-screen to find problem areas
- High drop-off on one screen = that screen needs work
Engagement Rate
How much users interact with your content.
| Metric | Formula | Meaning |
|---|---|---|
| Interaction Rate | Interactions / Views | Are users engaged? |
Interpreting results:
- Higher generally better
- Context matters - more interactions on a selection screen is good; more "back" button clicks might indicate confusion
Screen-Level Metrics
Beyond totals, examine each screen:
| Screen | Views | Exits | Time Spent |
|---|---|---|---|
| Screen 1 | 1000 | 50 | 8.2s |
| Screen 2 | 950 | 100 | 12.1s |
| Screen 3 | 850 | 75 | 6.5s |
Look for:
- High exits: Screen is problematic
- Long time: Either engaging or confusing
- Short time: Either skipped or simple
Statistical Significance
Not all differences are meaningful. A variant showing 72% completion vs 70% might be random variation, not a real improvement.
What Significance Means
Significant result: The difference is unlikely due to chance. You can confidently say one variant performs better.
Not significant: The difference could be random. More data needed, or variants are essentially equal.
Factors Affecting Significance
| Factor | Effect |
|---|---|
| Sample size | More users = more confidence |
| Effect size | Bigger difference = faster significance |
| Variance | Consistent results = clearer signal |
Waiting for Significance
Resist the urge to conclude early. Common timeline:
| Traffic | Time to Significance |
|---|---|
| 100 users/day | 2-4 weeks |
| 1,000 users/day | 3-7 days |
| 10,000 users/day | 1-2 days |
The dashboard indicates when results reach significance.
Making Decisions
Clear Winner
When one variant significantly outperforms:
- End the experiment
- Promote the winner to evergreen
- Archive the losing variant
- Document what you learned
No Clear Winner
When variants perform similarly:
- The simpler variant wins (less is more)
- Consider secondary metrics
- Or run longer with more traffic
Variant Underperforming
When the experiment is clearly worse:
- Pause or end the experiment to protect user experience
- Analyze why it failed
- Apply learnings to future tests
Note: You cannot edit a live variant. To test a modified version, end the experiment and launch a new one.
Common Analysis Mistakes
Peeking too early Checking results daily and stopping when one variant is ahead leads to false positives. Wait for significance.
Ignoring segments Overall metrics might hide important patterns. A variant might excel for iOS but fail on Android.
Over-indexing on one metric Completion rate improved but error rate also increased? Consider the full picture.
Attributing causation incorrectly Correlation isn't causation. Seasonal effects, marketing campaigns, and app updates can affect results.
Documenting Experiments
Keep a record of experiments:
| Field | Example |
|---|---|
| Hypothesis | "Shorter onboarding increases completion" |
| Variants | 3-screen vs 5-screen |
| Traffic | 50/50 split |
| Duration | Jan 5-19, 2026 |
| Result | 3-screen: 78% vs 5-screen: 71% (significant) |
| Decision | Promote 3-screen to evergreen |
| Learnings | Users prefer brevity over detail |
This history informs future experiments and prevents re-testing the same ideas.
Related
- Best Practices - Do's and don'ts
- A/B Testing Analytics - Technical details for developers