https://www.nature.com/articles/s41598-022-14395-4
有一篇自然文章,可以说是振聋发聩。
Principal Component Analysis (PCA) is a multivariate analysis that reduces the complexity of datasets while preserving data covariance. The outcome can be visualized on colorful scatterplots, ideally with only a minimal loss of information. PCA applications, implemented in well-cited packages like EIGENSOFT and PLINK, are extensively used as the foremost analyses in population genetics and related fields (e.g., animal and plant or medical genetics). PCA outcomes are used to shape study design, identify, and characterize individuals and populations, and draw historical and ethnobiological conclusions on origins, evolution, dispersion, and relatedness. The replicability crisis in science has prompted us to evaluate whether PCA results are reliable, robust, and replicable. We analyzed twelve common test cases using an intuitive color-based model alongside human population data. We demonstrate that PCA results can be artifacts of the data and can be easily manipulated to generate desired outcomes. PCA adjustment also yielded unfavorable outcomes in association studies. PCA results may not be reliable, robust, or replicable as the field assumes. Our findings raise concerns about the validity of results reported in the population genetics literature and related fields that place a disproportionate reliance upon PCA outcomes and the insights derived from them. We conclude that PCA may have a biasing role in genetic investigations and that 32,000-216,000 genetic studies should be reevaluated. An alternative mixed-admixture population genetic model is discussed.
主成分分析 (PCA) 是一种多元分析,可在保持数据协方差的同时降低数据集的复杂性。结果可以在彩色散点图上可视化,理想情况下只有最小的信息损失。 PCA 应用程序在 EIGENSOFT 和 PLINK 等被广泛引用的软件包中实施,被广泛用作群体遗传学和相关领域(例如,动物和植物或医学遗传学)中最重要的分析。 PCA 结果用于塑造研究设计、识别和表征个人和群体,并就起源、进化、分散和相关性得出历史和民族生物学结论。科学中的可复制性危机促使我们评估 PCA 结果是否可靠、稳健和可复制。我们使用基于颜色的直观模型和人口数据分析了 12 个常见测试用例。我们证明 PCA 结果可以是数据的产物,并且可以很容易地操纵以产生所需的结果。 PCA 调整也在关联研究中产生了不利的结果。 PCA 结果可能不像该领域所假设的那样可靠、稳健或可复制。我们的研究结果引起了人们对群体遗传学文献和相关领域中报告的结果的有效性的担忧,这些领域过度依赖 PCA 结果和从中得出的见解。我们得出结论,PCA 可能在基因调查中具有偏倚作用,应重新评估 32,000-216,000 项基因研究。讨论了另一种混合混合种群遗传模型。
什么意思,原来方法论这一步的漏洞太大,导致这个领域的所谓专家,可以近乎随心所欲地生成自己倾向的结论。
还是那句话,在C19三年后,反权威主义终于开始起义。终于有人揭开盖子。这是巨大的希望。是科学终于战胜“科学家”的预兆。希望。
相信科学,是对科学的方法的严谨度的信赖,并不是要相信任何科学家,专家,权威的结论,包括任何权威机构的结论。
自称科学本身的人就是最反科学的。