NBDC Research ID: hum0343.v2
Click to Latest version.
SUMMARY
Aims: To construct a prediction system for severe disease through whole genome sequencing, RNA sequencing, and ultra-high-precision HLA analysis of patients with COVID-19, asymptomatic infected patients, and patients suspected of having novel coronavirus infection. In addition, we will use anonymized data to analyze the prediction of severity of COVID-19 using mathematical models.
Methods: Genome-wide association study (GWAS), RNA-seq, eQTL/sQTL study
Participants/Materials: GWAS: 5682 Japanese individuals (2393 COVID-19 infected patients and 3289 controls)
RNA-seq: 473 COVID-19 infected patients
Dataset ID | Type of Data | Criteria | Release Date |
---|---|---|---|
hum0343.v1.covid19.v1 | GWAS for COVID-19 | Unrestricted-access | 2022/05/26 |
hum0343.v1.count.v1 | NGS (RNA-seq) for COVID-19 | Unrestricted-access | 2022/05/26 |
hum0343.v2.qtl.v1 | eQTL/sQTL summary statistics for COVID-19 | Unrestricted-access | 2022/06/14 |
*When the research results including the data which were downloaded from NHA/DRA, are published or presented somewhere, the data user must refer the papers which are related to the data, or include them in the acknowledgment. Learn more
MOLECULAR DATA
Participants/Materials |
[GWAS-1] COVID-19 (ICD-10: U071): 2393 cases, Healthy controls: 3289 individuals [GWAS-2] Severe COVID-19: 990 cases and 3289 healthy controls from [GWAS-1] [GWAS-3] COVID-19 under age 65: 1,484 cases and 2,377 healthy controls under age 65 from [GWAS-1] [GWAS-4] Severe COVID-19 under age 65: 440 cases and 2,377 healthy controls under age 65 from [GWAS-3] |
Targets | Genome wide SNPs |
Target Loci for Capture Methods | - |
Platform | Illumina [Infinium Asian Screening Array] |
Source | DNAs extracted from peripheral blood cells |
Cell Lines | - |
Reagents (Kit, Version) | Infinium Asian Screening Array |
Genotype Call Methods (software) |
genotyping: GenomeStudio haplotype phasing: SHAPEIT4 (autosome), SHAPEIT2 (X-chromosome) imputation: Minimac4 |
Association Analysis (software) | PLINK2 |
Filtering Methods |
Sample QC: We excluded samples with (1) sample call rate < 0.97 (2) excess heterozygosity of genotypes > mean + 3SD (3) related samples with PI_HAT > 0.175 (4) outlier samples from East Asian clusters in principal component analysis with 1000 Genomes Project samples. Genotyping QC: We excluded variants with (1) variant call rate < 0.99 (2) significant call rate differences between cases and controls with P < 5.0×10-8 (3) deviation from Hardy-Weinberg equilibrium with P < 1.0×10-6 (4) minor allele count < 5 Imputation QC: MAF ≥ 0.1% and imputation score (Rsq) > 0.5 |
Marker Number (after QC) |
[GWAS-1] 13,484,569 variants [GWAS-2] 13,199,053 variants [GWAS-3] 13,241,602 variants [GWAS-4] 12,764,136 variants |
NBDC Dataset ID |
hum0343.v1.covid19.v1 [GWAS-1][GWAS-2][GWAS-3][GWAS-4] (Click the gwas number to download files) |
Total Data Volume |
[GWAS-1] 361 MB [GWAS-2] 354 MB [GWAS-3] 354 MB [GWAS-4] 343 MB |
Comments (Policies) | NBDC policy |
Participants/Materials | COVID-19 (ICD-10: U071): 473 cases |
Targets | RNA-seq |
Target Loci for Capture Methods | - |
Platform | Illumina [NovaSeq6000] |
Library Source | RNAs extracted from peripheral blood cells |
Cell Lines | - |
Library Construction (kit name) | NEBNext® Poly(A) mRNA Magnetic Isolation Module and NEBNext® Ultra™ Directional RNA Library Prep Kit for Illumina |
Fragmentation Methods | Incubation in the buffer containing Mg2+ at 94°C for 15 minutes |
Spot Type | Paired-end |
Read Length (without Barcodes, Adaptors, Primers, and Linkers) | 100 bp |
Mapping Methods |
Adapter removal: Trimmomatic (v0.39) Alignment: STAR (v2.7.9a) Annotation: GENCODE v30 |
Reference Genome Sequence | GRCh38/hg38 |
Detecting method for read count (software) | Gene level quantification and normalization: RSEM (v1.3.3) |
QC | median transcripts per kilobase million (TPM) > 10 |
Gene Number | 5991 |
NBDC Dataset ID |
(Click the Dataset ID to download the file) |
Total Data Volume | 6 MB |
Comments (Policies) | NBDC policy |
Participants/Materials | COVID-19 (ICD-10: U071): 473 cases |
Targets | eQTL/sQTL summary statistics |
Target Loci for Capture Methods | - |
Platform |
RNA-seq: Illumina [NovaSeq6000] SNP array data: Illumina [Infinium Asian Screening Array] |
Library Source | read count data of RNA-seq and SNP array data of GWAS |
Cell Lines | - |
Library Construction (kit name) |
RNA-seq: See RNA-seq SNP array data: See GWAS |
Detecting method for read count (software) |
Gene level quantification and normalization: RSEM (v1.3.3) Intron cluster quantification: LeafCutter (v0.2.7) |
QC | Following GTEx pipeline (https://github.com/broadinstitute/gtex-pipeline/) |
Detection method of eQTL (cis) |
The eQTL effects of common (>1%) variants in cis (<+-1Mb) window of transcription sites were tested using fastQTL. Variant-gene pairs with cis-eQTL p-value<0.05, annotated with allele frequency (AF), p-value, effect size (beta) and posterior inclusion probability (PIP) were summarized. |
Detection method of eQTL (trans) |
Trans-eQTL effects were tested using tensorQTL. Variant-gene pairs with trans-eQTL p-value <5*10^-8, annotated with AF, p-value and beta were summarized. |
Detection method of sQTL |
The sQTL effects of common (>1%) variants in cis (<+-1Mb) window of intron cluster start sites were tested using fastQTL. Variant-intron cluster pairs with cis-sQTL p-value<0.05, annotated with AF, p-value, beta and PIP were summarized. |
NBDC Dataset ID |
(Click the Dataset ID to download the file) |
Total Data Volume | 714 MB (tsv) |
Comments (Policies) | NBDC policy |
DATA PROVIDER
Principal Investigator: Koichi Fukunaga
Affiliation: Department of Medicine, Pulmonary Division, Keio University School of Medicine
Project / Group Name: -
Funds / Grants (Research Project Number):
Name | Title | Project Number |
---|---|---|
Project Promoting Support for Drug Discovery, Japan Agency for Medical Research and Development (AMED) | Development of genetically-designed COVID19 mucosal immune vaccine with molecular needle platform | JP20nk0101612 |
Research Program on Emerging and Re-emerging Infectious Diseases, Japan Agency for Medical Research and Development (AMED) | Promotion of genetic, immunological, and metabolic research necessary for the development of next-generation vaccines and drugs aiming to prevent the aggravation of coronavirus disease 2019 | JP20fk0108415 |
Research Program on Emerging and Re-emerging Infectious Diseases, Japan Agency for Medical Research and Development (AMED) | Elucidation of pathogenesis and development of therapeutic strategies using genetic, immunological, and metabolic studies against SARS-CoV-2 variants | JP20fk0108452 |
Japan Program for Infectious Diseases Research and Infrastructure, Japan Agency for Medical Research and Development (AMED) | Elucidation of the pathophysiology of the sequelae of coronavirus disease 2019 using a multidisciplinary approach | JP21wm0325031 |
Core Research and Evolutional Science and Technology (CREST), Japan Science and Technology Agency (JST) | Research on Conquering Coronavirus Disease by Advanced Genomic Analysis and Artificial Intelligence | JPMJCR20H2 |
Practical Research Project for Allergic Diseases and Immunology, Japan Agency for Medical Research and Development (AMED) | Genomic prediction medicine of rheumatoid arthritis based on comprehensive immune-omics resources | 20ek0410075h0001 |
KAKENHI Grant-in-Aid for Scientific Research (A) | Elucidation of tissue-specificity of disease biology using trans-layer omics analysis and whole-genome sequencing | 19H01021 |
PUBLICATIONS
Title | DOI | Dataset ID | |
---|---|---|---|
1 | DOCK2 is involved in the host genetics and biology of severe COVID-19 | doi: 10.1038/s41586-022-05163-5 | hum0343.v1.covid19.v1 hum0343.v1.count.v1 hum0343.v2.qtl.v1 |