Back to prompts
Interpret A/B Test Results Like a Data Scientist
Get statistical significance analysis, practical significance, and clear next steps from any A/B test
Copy & Paste this prompt
I ran an A/B test and need help interpreting the results. Test details: - What I tested: [DESCRIBE THE CHANGE — e.g., new button color, different headline] - Metric measured: [PRIMARY METRIC — e.g., conversion rate, click-through rate] - Test duration: [HOW LONG IT RAN] Results: - Control (A): [SAMPLE SIZE] visitors, [CONVERSIONS] conversions ([RATE]%) - Variant (B): [SAMPLE SIZE] visitors, [CONVERSIONS] conversions ([RATE]%) - Any secondary metrics: [LIST THEM] Analyze this test: 1. STATISTICAL SIGNIFICANCE - Calculate the p-value and confidence interval - Is this result statistically significant at 95% confidence? - Was the sample size sufficient? What would be needed? 2. PRACTICAL SIGNIFICANCE - What is the absolute lift and relative lift? - Is this difference meaningful in business terms? - Calculate the projected annual impact (if I give you revenue/user data) 3. VALIDITY CHECK - Was the test duration long enough? (full business cycles) - Are there signs of sample ratio mismatch? - Could novelty effect or seasonality explain the result? 4. SEGMENTATION - Suggest 3 segments worth analyzing (device, source, new vs returning) - Could the result be driven by one segment? 5. DECISION & NEXT STEPS - Ship it / Kill it / Keep testing — with clear reasoning - If keep testing: what to change and required sample size - What follow-up test would you recommend? Be rigorous. Do not let me make a decision on noisy data.
#data#analytics#interpret#test#results
Works with
chatgptclaudegemini
💡 Pro Tips
- •Always check for sample ratio mismatch — if A and B have very different sample sizes, something went wrong
- •Statistical significance ≠ practical significance — a 0.01% lift can be "significant" with enough data
- •Run tests for full weeks (7, 14, 21 days) to avoid day-of-week bias
✨ Example Output
Test: New CTA button ("Start Free" vs "Sign Up")
STATISTICAL SIGNIFICANCE:
- Control: 12,450 visitors → 387 conversions (3.11%)
- Variant: 12,380 visitors → 425 conversions (3.43%)
- Absolute lift: +0.32 percentage points
- Relative lift: +10.3%
- p-value: 0.038 → Statistically significant at 95% (barely)
- 95% CI for difference: [+0.02%, +0.62%]
PRACTICAL SIGNIFICANCE:
- The lower bound of the CI is nearly zero — the true effect could be tiny
- At 100K monthly visitors: ~320 extra conversions/month
- If each conversion = 0 → ~6K/month uplift
VALIDITY CHECK:
⚠️ Test ran 8 days — should run at least 2 full weeks to capture weekly cycles
✅ Sample ratio: 50.1% / 49.9% — no mismatch detected
⚠️ Consider novelty effect for UI changes
DECISION: KEEP TESTING for 1 more week. The result is promising but the confidence interval is wide. If it holds after a full 2-week cycle, ship it.