Back to prompts
Separate Correlation from Causation in Any Dataset
Determine whether a relationship in your data is real, spurious, or hiding a confounding variable
Copy & Paste this prompt
I found a relationship in my data and need to understand if it is real. The relationship: - Variable A: [WHAT IT IS] - Variable B: [WHAT IT IS] - Observed correlation: [DESCRIBE — e.g., "when A goes up, B goes up"] - Correlation strength: [IF KNOWN — r value, or just strong/moderate/weak] - Data source: [WHERE THIS CAME FROM] - Sample size: [N] Analyze this relationship: 1. CORRELATION ASSESSMENT - Is this correlation statistically meaningful given the sample size? - What type of correlation is this? (linear, non-linear, monotonic) - Could this be driven by outliers? How to check. 2. CAUSATION TESTS (apply each) - Temporal precedence: Does A actually happen before B? - Dose-response: Does more A mean more B consistently? - Plausibility: Is there a logical mechanism connecting A → B? - Consistency: Has this been found in other datasets/studies? - Specificity: Does A correlate with B specifically, or with everything? 3. CONFOUNDING VARIABLES - List 5 possible confounders (Variable C that causes both A and B) - For each: explain the mechanism and how to test for it - Which confounder is most likely in my case? 4. ALTERNATIVE EXPLANATIONS - Reverse causation: Could B actually cause A? - Collider bias: Am I accidentally selecting a biased sample? - Simpson Paradox: Could the relationship reverse within subgroups? 5. WHAT WOULD PROVE CAUSATION - Design an ideal experiment/test to establish causation - If an experiment is impossible, what observational approaches help? - What data would I need to collect? 6. HONEST CONCLUSION - On a scale of 1-10, how confident should I be that A causes B? - What is the most responsible way to describe this finding? - Draft a one-sentence summary I can use in a report Do not just say "correlation is not causation." Help me figure out WHICH ONE this actually is.
#data#analytics#separate#correlation#from
Works with
chatgptclaudegemini
💡 Pro Tips
- •The most common confounder is self-selection — people who choose X are already different from those who do not
- •Always ask "what would have happened WITHOUT the intervention?" — that counterfactual is what matters
- •If you cannot run an experiment, look for natural experiments (policy changes, geographic differences)
✨ Example Output
Relationship: "Teams that use our analytics dashboard have 23% higher revenue growth" CORRELATION ASSESSMENT: - r = 0.41, n = 340 companies → Statistically significant (p < 0.001) - Moderate positive correlation, appears linear - Need to check: Are the top 5 companies driving this? Remove and retest. CAUSATION TESTS: ❌ Temporal: Did dashboard use START before revenue growth? Or did growing companies adopt it? ⚠️ Dose-response: Do heavier users grow faster? Check usage tiers. ✅ Plausibility: Data-driven decisions → better resource allocation → growth (logical chain exists) ❓ Consistency: Need to check across different company sizes and industries CONFOUNDING VARIABLES: 1. Company maturity — Larger/mature companies both adopt tools AND grow more predictably 2. Management quality — Good managers adopt analytics AND drive growth 3. Budget — Companies with more budget buy tools AND invest in growth ← MOST LIKELY 4. Industry tailwinds — Fast-growing industries adopt more tools 5. Self-selection — Only motivated companies bother setting up dashboards HONEST CONCLUSION: Confidence that dashboard CAUSES growth: 3/10 Most likely explanation: Self-selection + budget confound Report-ready sentence: "Dashboard adoption is associated with 23% higher revenue growth, though the relationship likely reflects organizational maturity rather than direct causation."