Nice episode.

Stuart: another issue you didn't cover is that most meta-analyses use standardised effect sizes, and those are very dependent on the idiosyncracies of individual studies. For example Baguley (2009) says:

"A highly desirable property in an effect size measure would be that it remain stable between different versions of the same measurement instrument, between individuals scoring high or low on one of the variables, or between different study designs. Standardized effect size is particularly vulnerable to changes in any of these factors, because all three influence sample variance."

Pek (2018) does say: "experts of meta-analysis (e.g., Ray & Shadish, 1996) prefer to use reported descriptive information to calculate an exact effect size suited to their own purposes instead of directly using a commonly reported standardized effect size (e.g., Cohen’s d)." ... but I think all the meta-analyses I've seen use standardised effect sizes. (Does that match your experience?)

The issue is serious enough that the APA Task Force on Statistical Inference recommends using unstandardised effect sizes where possible.


Baguley 2009. Standardized or simple effect size: what should be reported?. https://doi.org/10.1348/000712608X377117

Pek 2018. Reporting Effect Sizes in Original Psychological Research - A Discussion and Tutorial https://doi.org/10.1037/met0000126

Expand full comment

Great episode! While I feel more positive about meta-analysis than Stuart seems to, these are important criticisms that producers and consumers of meta-analyses should be mindful of. In my opinion your last point is especially important. So many scholars focus on the point-estimate, but in many cases I think the heterogeneity statistics are much more valuable, especially the credibility interval that's usually computed when using the Hunter and Schmidt method.

I have one criticism. Stuart mentioned that an asymmetric funnel plot isn't necessarily evidence of publication bias - there could be other sample size effects. That's a good point, but the PET-PEESE method, which is based on regressing standard error on effect size, suffers from the same problem. Regression methods like PET-PEESE test for sample-size effects, not only publication bias. I'm certainly no PET-PEESE expert and I haven't read the relevant Datacolada post, so maybe I'm way off, but I wanted to share my thoughts while they're fresh in my mind!

Expand full comment

I learned so much from this episode. If you’ve had the chance to read the Cass review, I would love to hear your thoughts on the quality of this systematic review. That review has certainly elicited a wide range of responses on Twitter, but I understand that for that reason you might want to stay away from such a potentially radioactive discussion.

Expand full comment

Fascinating episode for someone who struggles with statistics. You may have already addressed this in previous episodes, but I wonder what your thoughts are on meta-analyses/systematic reviews with absolutely shocking search strategies?

This is the data collection and as an academic librarian I see some absolute doozies of searches that are not replicable, don’t work and as my colleague said, suggest the researchers were searching while drunk. And yet they get through peer review!

Consult your librarian folks!

Expand full comment

Great episode. I do not think you can get an asymmetric funnel plot from all studies with high power. You need at least a few positive low-power studies. You could have an asymmetric plot if you have a few positive low-powered studies followed by failed high-powered replications.

Expand full comment

It should be shouted from the rafters that people respond to incentives and perverse incentives lead to perverse outcomes.

Expand full comment
Mar 20·edited Mar 20

“Now of course peer reviewers don’t have time to go through every study to ensure it should be there.”

Well then what, I ask, is the point of peer reviews!?

Maybe a potential topic for a future episode?

Expand full comment