Neogen® SkimSEEK™: Human Low-pass Whole Genome Sequencing and Imputation

May 10, 2023

Over the past decade, genome-wide association studies (GWAS) have been pivotal for identifying genetic variants associated with traits and diseases. Genotyping arrays have been the foundation for many of these experiments and have revealed insights into single nucleotide polymorphisms (SNPs) found throughout the genome. However, the cost of genome sequencing has decreased significantly over the past decade, which has led to an increase in low-pass whole genome sequencing (LP-WGS) compared to genotyping arrays. Furthermore, with technological advancements and increased human reference genomes, it is now possible to screen millions of single nucleotide polymorphisms (SNPs) compared to hundreds of thousands found on traditional genotyping arrays and without downtime to develop new arrays for newly identified SNPs. This blog will cover the advantages that SkimSEEK™ brings to human genomics applications and present cited examples of the benefits of this revolutionary technology.

Neogen®’s SkimSEEK is a cost-effective alternative to traditional high-depth sequencing methods. With SkimSEEK, researchers can generate high-quality genomic data at a fraction of the cost of conventional high-coverage sequencing. SkimSEEK leverages LP-WGS, meaning that the genome is sequenced to a low depth, then imputed to predict genotypes that are not directly observed in a sample. For example, gaps between the aligned sequencing reads are present when a sample is sequenced with low-pass sequencing and aligned to a genomic reference assembly. The result of the lower depth means that some SNPs of interest may not be directly observed in the raw sequencing data. Still, we can impute those SNPs with up to 99% accuracy using Gencove’s imputation pipeline. In addition, SkimSEEK delivers adapter-trimmed FASTQ files, a full public VCF of approximately 60 million SNPs for each sample.

SkimSEEK is an attractive alternative with significant advantages compared to genotyping arrays for large-scale projects, such as population genetics studies, where it may be required to sequence thousands of genomes. Whole genome sequence data will now allow for the possibility of identifying causal mutations, and these variants can be used to improve the reliability of genomic prediction. Array technology cannot detect rare low-frequency variants (RLFV), contributing to genetic variance since functional variants are more likely rare than common ones. While arrays have traditionally been less expensive than sequencing, newer sequencing platforms and innovations continuously make sequencing more affordable.

SkimSEEK excels at capturing genomic diversity for GWAS applications due to the haplotype diversity in the imputation reference panel. This will allow our customers to identify rare genetic variants in some populations but common in others. An example of this can be found directly in the literature. In their 2021 research article, Li et al. compared low-pass sequencing and imputation, defined as sequencing a genome to an average depth less than 1x, to array genotyping using the Illumina Global Screening Array (GSA) on 120 DNA samples derived from African and European-ancestry individuals that are part of the 1000 Genomes Project. The authors observed that genotypes imputed from sequence data were consistently and considerably more accurate than genotypes imputed from array data, with the mean African non-reference concordance 7% higher for sequencing data. They concluded that low-pass whole genome sequencing provides better coverage of the genome, which will allow the detection of rare variants and reduce the impact of genotyping errors compared to the Illumina GSA array, improving the accuracy of GWAS and polygenic risk scores.

A similar, recently published article noted that even ultra-low-coverage whole genome sequencing (ulcWGS - <0.5x) generated highly accurate GWAS data. Chat et al. performed whole genome sequencing of 72 European individuals to a target coverage of 0.4x. They compared the sequencing performance to the Infinium Global Screening Multi-Disease Array (GSA-MD) and found that the number of variants captured was similar to the imputed GSA-MD for low-frequency and common variants, with high imputation R2 accuracy (mean of 0.93 for SNPs and 0.86 for indels). Using 30x whole genome sequencing as a “truth” dataset, the authors observed that ulcWGS had higher overall non-reference genotype concordance than imputed GSA-MD for SNPs and indels. The authors conclude that LP-WGS is an attractive alternative to arrays when planning and designing GWAS experiments.

As next-generation sequencing continues to evolve, Neogen’s goal is to stay at the forefront of technological innovation and provide state-of-the-art products to our customers. While Neogen has a long-standing reputation in the agricultural industry, we are excited to expand our product portfolio to human genomics with SkimSEEK. SkimSEEK is a powerful tool that will empower our research customers to explore deeper into the genome than ever before, with the quality of service Neogen is known for. If you are interested in learning more about SkimSEEK and how it can accelerate your research, please get in touch with us by email or at 877.443.6489.

 

References:

Li, J. H., Mazur, C. A., Berisa, T. & Pickrell, J. K. "Low-pass sequencing increases the power of GWAS and decreases measurement error of polygenic risk scores compared to genotyping arrays." Genome Res 31, 529–537 (2021).

Chat, V., Ferguson, R., Morales, L. & Kirchhoff, T. "Ultra Low-Coverage Whole-Genome Sequencing as an Alternative to Genotyping Arrays in Genome-Wide Association Studies." Frontiers Genetics 12, 790445 (2022).


Category: Genomics, Healthcare

Partner with us.

Thank you for contacting us. We'll be in touch soon!