Processing of reference panel datasets for genotype imputation
[JGAD000679]
An imputation reference panel dataset (accession code: JGAD000220) based on whole genome sequencing analysis data of 1,037 Japanese individuals has been processed and made available on the TogoImputation*.
The specific procedure of data processing is as follows:
- The bcftools (version 1.9) program was used to compute the tbi index file for the aggregate VCF files, registered in JGAD000220. The bcftools-index-t.cwl (version 1.0) from the imputation server workflow was used.
- The bref3 (version 28Jun21.220) program was used to convert the aggregate VCF files registered in JGAD000220 to the bref3 file format. The beagle-bref3.cwl (version 1.0) from the imputation server workflow was used.
- We created a config file that defines chunks, which are the division units for genotype imputation calculations. We set up a chunk as a whole each chromosome.
[JGAD000679]
Whole genome sequencing (WGS) data from 1,026 individuals (JGAD000220) and 1,964 individuals (JGAD000495), which are registered as controlled-access data in the NBDC Human Database, were processed for germline whole genome sequencing, and an aggregate VCF was generated. Variants were filtered based on the following conditions using the Genotype Imputation Panel Creation Workflow:
(1) Exclusion of variants that did not pass the VQSR filter
(2) Exclusion of multi-allelic sites
(3) Exclusion of variants with a call rate below 95%
(4) Exclusion of variants deviating from Hardy-Weinberg equilibrium (P < 1e-10)
(5) Exclusion of variants with a minor allele count (MAC) less than 2
After filtering, the data were further processed to be made available for use on TogoImputation*.
The specific procedure of data processing is as follows:
- The bcftools (version 1.9) program was used to compute the tbi index file for the aggregate VCF files after quality control filtering. The bcftools-index-t.cwl (version 1.0) from the imputation server workflow was used.
- The bref3 (version 28Jun21.220) program was used to convert the aggregate VCF files after quality control filtering to the bref3 file format. The beagle-bref3.cwl (version 1.0) from the imputation server workflow was used.
- We created a config file that defines chunks, which are the division units for genotype imputation calculations. We set up a chunk as a whole each chromosome.
In order to use the processed data, it is necessary to submit the application for data use of the original data (JGAD000220 and JGAD000495), as well.
How to apply the application for data use?
*The TogoImputation is a service that supports genotype imputation analysis of SNP array data. This system is currently available in the Personal Genome Analysis Section of the National Institute of Genetics Supercomputer System. Workflow Source Code (imputation-server-jp) and UI Source Code (imputationserver-web-ui) are publicly available. Learn more...