Back to prompts
Data & AnalysisPremiumadvanced
4.7

Separate Correlation from Causation in Any Dataset

Determine whether a relationship in your data is real, spurious, or hiding a confounding variable

Copy & Paste this prompt
I found a relationship in my data and need to understand if it is real.

The relationship:
- Variable A: [WHAT IT IS]
- Variable B: [WHAT IT IS]
- Observed correlation: [DESCRIBE — e.g., "when A goes up, B goes up"]
- Correlation strength: [IF KNOWN — r value, or just strong/moderate/weak]
- Data source: [WHERE THIS CAME FROM]
- Sample size: [N]

Analyze this relationship:

1. CORRELATION ASSESSMENT
   - Is this correlation statistically meaningful given the sample size?
   - What type of correlation is this? (linear, non-linear, monotonic)
   - Could this be driven by outliers? How to check.

2. CAUSATION TESTS (apply each)
   - Temporal precedence: Does A actually happen before B?
   - Dose-response: Does more A mean more B consistently?
   - Plausibility: Is there a logical mechanism connecting A → B?
   - Consistency: Has this been found in other datasets/studies?
   - Specificity: Does A correlate with B specifically, or with everything?

3. CONFOUNDING VARIABLES
   - List 5 possible confounders (Variable C that causes both A and B)
   - For each: explain the mechanism and how to test for it
   - Which confounder is most likely in my case?

4. ALTERNATIVE EXPLANATIONS
   - Reverse causation: Could B actually cause A?
   - Collider bias: Am I accidentally selecting a biased sample?
   - Simpson Paradox: Could the relationship reverse within subgroups?

5. WHAT WOULD PROVE CAUSATION
   - Design an ideal experiment/test to establish causation
   - If an experiment is impossible, what observational approaches help?
   - What data would I need to collect?

6. HONEST CONCLUSION
   - On a scale of 1-10, how confident should I be that A causes B?
   - What is the most responsible way to describe this finding?
   - Draft a one-sentence summary I can use in a report

Do not just say "correlation is not causation." Help me figure out WHICH ONE this actually is.
#data#analytics#separate#correlation#from

Works with

chatgptclaudegemini

💡 Pro Tips

  • The most common confounder is self-selection — people who choose X are already different from those who do not
  • Always ask "what would have happened WITHOUT the intervention?" — that counterfactual is what matters
  • If you cannot run an experiment, look for natural experiments (policy changes, geographic differences)

✨ Example Output

Relationship: "Teams that use our analytics dashboard have 23% higher revenue growth"

CORRELATION ASSESSMENT:
- r = 0.41, n = 340 companies → Statistically significant (p < 0.001)
- Moderate positive correlation, appears linear
- Need to check: Are the top 5 companies driving this? Remove and retest.

CAUSATION TESTS:
❌ Temporal: Did dashboard use START before revenue growth? Or did growing companies adopt it?
⚠️ Dose-response: Do heavier users grow faster? Check usage tiers.
✅ Plausibility: Data-driven decisions → better resource allocation → growth (logical chain exists)
❓ Consistency: Need to check across different company sizes and industries

CONFOUNDING VARIABLES:
1. Company maturity — Larger/mature companies both adopt tools AND grow more predictably
2. Management quality — Good managers adopt analytics AND drive growth
3. Budget — Companies with more budget buy tools AND invest in growth ← MOST LIKELY
4. Industry tailwinds — Fast-growing industries adopt more tools
5. Self-selection — Only motivated companies bother setting up dashboards

HONEST CONCLUSION:
Confidence that dashboard CAUSES growth: 3/10
Most likely explanation: Self-selection + budget confound
Report-ready sentence: "Dashboard adoption is associated with 23% higher revenue growth, though the relationship likely reflects organizational maturity rather than direct causation."