NBDC Research ID: hum0343.v4
SUMMARY
Aims: To construct a prediction system for severe disease through whole genome sequencing, RNA sequencing, and ultra-high-precision HLA analysis of patients with COVID-19, asymptomatic infected patients, and patients suspected of having novel coronavirus infection. In addition, we will use anonymized data to analyze the prediction of severity of COVID-19 using mathematical models. To elucidate the association between viruses and autoimmune diseases and COVID-19.
Methods: Genome-wide association study (GWAS), RNA-seq, Protein expression analysis, eQTL/sQTL/pQTL study, whole genome sequencing
Participants/Materials: GWAS: 5,682 Japanese individuals (2,393 COVID-19 infected patients and 3,289 controls)
RNA-seq: Maximum 1,019 COVID-19 infected patients
Protein expression: 1,384 COVID-19 infected patients
Whole genome sequencing: 1,164 COVID-19 infected patients
Dataset ID | Type of Data | Criteria | Release Date |
---|---|---|---|
hum0343.v1.covid19.v1 | GWAS for COVID-19 | Unrestricted-access | 2022/05/26 |
hum0343.v1.count.v1 | NGS (RNA-seq) for COVID-19 | Unrestricted-access | 2022/05/26 |
hum0343.v2.qtl.v1 | eQTL/sQTL summary statistics for COVID-19 | Unrestricted-access | 2022/06/14 |
E-GEAD-759 | Unrestricted-access | 2024/06/24 | |
hum0343.v3.qtl.v1 | eQTL/pQTL summary statistics for COVID-19 | Unrestricted-access | 2024/06/24 |
JGAS000739 | The presence or absence of endogenous herpesvirus 6 and anellovirus load calculated from NGS (WGS) for COVID-19 | Controlled-access (Type I) | 2024/10/02 |
*Data users need to apply an application for Using NBDC Human Data to reach the Controlled-access Data. Learn more
*When the research results including the data which were downloaded from NHA/DRA, are published or presented somewhere, the data user must refer the papers which are related to the data, or include them in the acknowledgment. Learn more
MOLECULAR DATA
Participants/Materials |
[GWAS-1] COVID-19 (ICD-10: U071): 2,393 cases, Healthy controls: 3,289 individuals [GWAS-2] Severe COVID-19: 990 cases and 3,289 healthy controls from [GWAS-1] [GWAS-3] COVID-19 under age 65: 1,484 cases and 2,377 healthy controls under age 65 from [GWAS-1] [GWAS-4] Severe COVID-19 under age 65: 440 cases and 2,377 healthy controls under age 65 from [GWAS-3] |
Targets | Genome wide SNPs |
Target Loci for Capture Methods | - |
Platform | Illumina [Infinium Asian Screening Array] |
Source | DNAs extracted from peripheral blood cells |
Cell Lines | - |
Reagents (Kit, Version) | Infinium Asian Screening Array |
Genotype Call Methods (software) |
genotyping: GenomeStudio haplotype phasing: SHAPEIT4 (autosome), SHAPEIT2 (X-chromosome) imputation: Minimac4 |
Association Analysis (software) | PLINK2 |
Filtering Methods |
Sample QC: We excluded samples with (1) sample call rate < 0.97 (2) excess heterozygosity of genotypes > mean + 3SD (3) related samples with PI_HAT > 0.175 (4) outlier samples from East Asian clusters in principal component analysis with 1000 Genomes Project samples. Genotyping QC: We excluded variants with (1) variant call rate < 0.99 (2) significant call rate differences between cases and controls with P < 5.0×10-8 (3) deviation from Hardy-Weinberg equilibrium with P < 1.0×10-6 (4) minor allele count < 5 Imputation QC: MAF ≥ 0.1% and imputation score (Rsq) > 0.5 |
Marker Number (after QC) |
[GWAS-1] 13,484,569 variants [GWAS-2] 13,199,053 variants [GWAS-3] 13,241,602 variants [GWAS-4] 12,764,136 variants |
NBDC Dataset ID |
hum0343.v1.covid19.v1 [GWAS-1][GWAS-2][GWAS-3][GWAS-4] (Click the gwas number to download files) |
Total Data Volume |
[GWAS-1] 361 MB [GWAS-2] 354 MB [GWAS-3] 354 MB [GWAS-4] 343 MB |
Comments (Policies) | NBDC policy |
Participants/Materials | COVID-19 (ICD-10: U071): 473 cases |
Targets | RNA-seq |
Target Loci for Capture Methods | - |
Platform | Illumina [NovaSeq6000] |
Library Source | RNAs extracted from peripheral blood cells |
Cell Lines | - |
Library Construction (kit name) | NEBNext® Poly(A) mRNA Magnetic Isolation Module and NEBNext® Ultra™ Directional RNA Library Prep Kit for Illumina |
Fragmentation Methods | Incubation in the buffer containing Mg2+ at 94°C for 15 minutes |
Spot Type | Paired-end |
Read Length (without Barcodes, Adaptors, Primers, and Linkers) | 100 bp |
Mapping Methods |
Adapter removal: Trimmomatic (v0.39) Alignment: STAR (v2.7.9a) Annotation: GENCODE v30 |
Reference Genome Sequence | GRCh38/hg38 |
Detecting method for read count (software) | Gene level quantification and normalization: RSEM (v1.3.3) |
QC | median transcripts per kilobase million (TPM) > 10 |
Gene Number | 5991 |
NBDC Dataset ID |
(Click the Dataset ID to download the file) |
Total Data Volume | 6 MB |
Comments (Policies) | NBDC policy |
Participants/Materials | COVID-19 (ICD-10: U071): 465 cases (severe cases: 359, mild case: 106) |
Targets | eQTL/sQTL summary statistics |
Target Loci for Capture Methods | - |
Platform |
RNA-seq: Illumina [NovaSeq6000] SNP array data: Illumina [Infinium Asian Screening Array] |
Library Source | read count data of RNA-seq and SNP array data of GWAS |
Cell Lines | - |
Library Construction (kit name) |
RNA-seq: See RNA-seq SNP array data: See GWAS |
Detecting method for read count (software) |
Gene level quantification and normalization: RSEM (v1.3.3) Intron cluster quantification: LeafCutter (v0.2.7) |
QC | Following GTEx pipeline (https://github.com/broadinstitute/gtex-pipeline/) |
Detection method of eQTL (cis) |
The eQTL effects of common (>1%) variants in cis (<+-1Mb) window of transcription sites were tested using fastQTL. Variant-gene pairs with cis-eQTL p-value<0.05, annotated with allele frequency (AF), p-value, effect size (beta) and posterior inclusion probability (PIP) were summarized. |
Detection method of eQTL (trans) |
Trans-eQTL effects were tested using tensorQTL. Variant-gene pairs with trans-eQTL p-value <5*10^-8, annotated with AF, p-value and beta were summarized. |
Detection method of sQTL |
The sQTL effects of common (>1%) variants in cis (<+-1Mb) window of intron cluster start sites were tested using fastQTL. Variant-intron cluster pairs with cis-sQTL p-value<0.05, annotated with AF, p-value, beta and PIP were summarized. |
NBDC Dataset ID |
(Click the Dataset ID to download the file) |
Total Data Volume | 714 MB (tsv) |
Comments (Policies) | NBDC policy |
Participants/Materials | COVID-19 (ICD-10: U071): 1,019 cases |
Targets | RNA-seq |
Target Loci for Capture Methods | - |
Platform | Illumina [NovaSeq6000] |
Library Source | RNAs extracted from peripheral blood cells |
Cell Lines | - |
Library Construction (kit name) | NEBNext® Poly(A) mRNA Magnetic Isolation Module and NEBNext® Ultra™ Directional RNA Library Prep Kit for Illumina |
Fragmentation Methods | Incubation in the buffer containing Mg2+ at 94°C for 15 minutes |
Spot Type | Paired-end |
Read Length (without Barcodes, Adaptors, Primers, and Linkers) | 100 bp |
Mapping Methods |
Alignment: STAR (v2.5.3a) Annotation: GENCODE v30 |
Reference Genome Sequence | GRCh38/hg38 |
Detecting method for read count (software) | Gene level quantification and normalization: RSEM (v1.3.0) |
QC |
Transcripts per kilobase million (TPM) ≥0.1 in ≥20% samples Read count ≥6 in ≥20% samples |
Gene Number | 20329 |
Genomic Expression Archive ID | |
Total Data Volume | 91.6 MB (tsv) |
Comments (Policies) | NBDC policy |
Participants/Materials | COVID-19 (ICD-10: U071): 1,384 cases |
Targets | Protein expression (2932 proteins) |
Target Loci for Capture Methods | - |
Platform | Olink [Olink Explore 3072] |
Library Source | Plasma |
Cell Lines | - |
Library Construction (kit name) | Olink Explore 3072 |
Fragmentation Methods | - |
Spot Type | - |
Read Length (without Barcodes, Adaptors, Primers, and Linkers) | - |
Detecting Methods for Proteins (software) | OlinkAnalyze v3.4.1 |
Normalization Methods | Normalized Protein eXpression (NPX) transformation |
Validation Methods | Bridge sample comparison |
Genomic Expression Archive ID | |
Total Data Volume | 91.6 MB (tsv) |
Comments (Policies) | NBDC policy |
Participants/Materials |
COVID-19 (ICD-10: U071): 1,405 cases (severe cases: 995, mild case: 410) eQTL analysis: 1,019 cases pQTL analysis: 1,384 cases (998 intersecting cases) |
Targets | eQTL/pQTL summary statistics |
Target Loci for Capture Methods | - |
Platform |
RNA-seq: Illumina [NovaSeq6000] SNP array data: Illumina [Infinium Asian Screening Array] Protein expression data: Olink Explore 3072 |
Library Source | read count data of RNA-seq, SNP array data of GWAS and Protein expression data |
Cell Lines | - |
Library Construction (kit name) |
RNA-seq: See RNA-seq SNP array data: See GWAS Protein expression data: See Protein expression |
Detecting method for read count (software) |
Gene level quantification and normalization: RSEM (v1.3.0) Intron cluster quantification: OlinkAnalyze v3.4.1 |
QC | Following GTEx pipeline (https://github.com/broadinstitute/gtex-pipeline/) |
Detection method of eQTL (cis) |
The eQTL effects of cis variants (<+-1Mb window of transcription start sites, minor allele count >2) were tested using fastQTL. Then, variant-gene pairs with p-value<0.05 or posterior inclusion probability (PIP) >0.001, annotated with allele frequency (AF), p-value, effect size (beta) and, PIPs were summarized as separate files. |
Detection method of pQTL (cis) |
The pQTL effects of cis variants (<+-1Mb window of transcription start sites, minor allele count >2) were tested using fastQTL. Then, variant-gene pairs with p-value<0.05 or posterior inclusion probability (PIP) >0.001, annotated with allele frequency (AF), p-value, effect size (beta) and, PIPs were summarized as separate files. |
NBDC Dataset ID |
(Click the Dataset ID to download the file) |
Total Data Volume | 881.5 MB (tsv) |
Comments (Policies) | NBDC policy |
Participants/Materials | COVID-19 (ICD-10: U071): 1,164 cases (severe cases: 1,068) |
Targets | WGS |
Target Loci for Capture Methods | - |
Platform | Illumina [NovaSeq 6000] |
Library Source | DNAs extracted from peripheral blood cells |
Cell Lines | - |
Library Construction (kit name) | TruSeq DNA PCR-free Library Prep Kit |
Fragmentation Methods | Ultrasonic fragmentation |
Spot Type | Paired-end |
Read Length (without Barcodes, Adaptors, Primers, and Linkers) | 150 bp x 2 |
Methods for removing host sequence/detecting viral sequence (software) |
https://github.com/shohei-kojima/integrated_HHV6_recon https://github.com/shohei-kojima/human_anellovirus_detection |
QC | We conducted principal component analysis (PCA) against HapMap3 data using SNP data of the same individuals to confirm the East Asian genetic background. |
Reference sequence for viral genome | Refer to the softwares' GitHub repositry. List of viral sequences |
Japanese Genotype-phenotype Archive Dataset ID | JGAD000874 |
Total Data Volume | 38.5 KB (tsv) |
Comments (Policies) | NBDC policy |
DATA PROVIDER
Principal Investigator: Koichi Fukunaga
Affiliation: Department of Medicine, Pulmonary Division, Keio University School of Medicine
Project / Group Name: -
Funds / Grants (Research Project Number):
Name | Title | Project Number |
---|---|---|
Project Promoting Support for Drug Discovery, Japan Agency for Medical Research and Development (AMED) | Development of genetically-designed COVID19 mucosal immune vaccine with molecular needle platform | JP20nk0101612 |
Research Program on Emerging and Re-emerging Infectious Diseases, Japan Agency for Medical Research and Development (AMED) | Promotion of genetic, immunological, and metabolic research necessary for the development of next-generation vaccines and drugs aiming to prevent the aggravation of coronavirus disease 2019 | JP20fk0108415 |
Research Program on Emerging and Re-emerging Infectious Diseases, Japan Agency for Medical Research and Development (AMED) | Elucidation of pathogenesis and development of therapeutic strategies using genetic, immunological, and metabolic studies against SARS-CoV-2 variants | JP20fk0108452 |
Japan Program for Infectious Diseases Research and Infrastructure, Japan Agency for Medical Research and Development (AMED) | Elucidation of the pathophysiology of the sequelae of coronavirus disease 2019 using a multidisciplinary approach | JP21wm0325031 |
Core Research and Evolutional Science and Technology (CREST), Japan Science and Technology Agency (JST) | Research on Conquering Coronavirus Disease by Advanced Genomic Analysis and Artificial Intelligence | JPMJCR20H2 |
Practical Research Project for Allergic Diseases and Immunology, Japan Agency for Medical Research and Development (AMED) | Genomic prediction medicine of rheumatoid arthritis based on comprehensive immune-omics resources | 20ek0410075h0001 |
KAKENHI Grant-in-Aid for Scientific Research (A) | Elucidation of tissue-specificity of disease biology using trans-layer omics analysis and whole-genome sequencing | 19H01021 |
Program for Promoting Platform of Genomics based Drug Discovery, Japan Agency for Medical Research and Development (AMED) | Systematic evaluation of variant of uncertain significance (VUS) pathogenicity through population genomics data analysis and massively parallel reporter assay | JP22kk0305022 |
Fusion Oriented REsearch for disruptive Science and Technology (FOREST) program, Japan Science and Technology Agency (JST) | Towards a generalized and interpretable model for comprehensive understanding of human gene regulatory mechanisms | JPMJFR225Y |
Promoting Individual Research to Nurture the Seeds of Future Innovation and Organizing Unique, Innovative Network (PRESTO), Japan Science and Technology Agency (JST) | Fundamental research to build an academic system resilient to pandemics | JPMJPR21R7 |
PUBLICATIONS
Title | DOI | Dataset ID | |
---|---|---|---|
1 | DOCK2 is involved in the host genetics and biology of severe COVID-19 | doi: 10.1038/s41586-022-05163-5 | hum0343.v1.covid19.v1 hum0343.v1.count.v1 |
2 | The whole blood transcriptional regulation landscape in 465 COVID-19 infected samples from Japan COVID-19 Task Force | doi: 10.1038/s41467-022-32276-2 | hum0343.v2.qtl.v1 |
3 | Statistically and functionally fine-mapped blood eQTLs and pQTLs from 1,405 humans reveal their distinct regulation patterns and disease relevance | doi: 10.1038/s41588-024-01896-3 | E-GEAD-759 hum0343.v3.qtl.v1 |
4 | Blood DNA virome associates with autoimmune diseases and COVID-19. | JGAD000874 |
USRES (Controlled-access Data)
Principal Investigator | Affiliation | Country/Region | Research Title | Data in Use (Dataset ID) | Period of Data Use |
---|---|---|---|---|---|