Study uncovers large batch effects in TCGA exome sequencing data
A preprint article published in bioRxiv identified large cross-institutional batch effects for a subset of The Cancer Genome Atlas data. The report provides a warning to all multisite sequencing consortia, and calls for re-examining results that depend on detection of germline single nucleotide variants.
Batch effects -- artifactual similarities in data from samples analyzed under similar conditions -- have long been recognized as a challenge for large-scale genomics projects. Over TCGA's 13-year history, researchers have found and filtered out artifacts such as oxidative DNA damage generated during sample preparation, and harmonized protocol components including sequencing instruments and bioinformatics tools. ...