NBDC Research ID: hum0343.v3

 

SUMMARY

Aims: To construct a prediction system for severe disease through whole genome sequencing, RNA sequencing, and ultra-high-precision HLA analysis of patients with COVID-19, asymptomatic infected patients, and patients suspected of having novel coronavirus infection. In addition, we will use anonymized data to analyze the prediction of severity of COVID-19 using mathematical models.

Methods: Genome-wide association study (GWAS), RNA-seq, Protein expression analysis, eQTL/sQTL/pQTL study

Participants/Materials: GWAS: 5682 Japanese individuals (2393 COVID-19 infected patients and 3289 controls) 

                                        RNA-seq: Maximum 1019 COVID-19 infected patients

                                        Protein expression: 1384 COVID-19 infected patients

Dataset IDType of DataCriteriaRelease Date
hum0343.v1.covid19.v1 GWAS for COVID-19 Unrestricted-access 2022/05/26
hum0343.v1.count.v1 NGS (RNA-seq) for COVID-19 Unrestricted-access 2022/05/26
hum0343.v2.qtl.v1 eQTL/sQTL summary statistics for COVID-19 Unrestricted-access 2022/06/14
E-GEAD-759

NGS (RNA-seq) for COVID-19

Protein expression for COVID-19

Unrestricted-access 2024/06/24
hum0343.v3.qtl.v1 eQTL/pQTL summary statistics for COVID-19 Unrestricted-access 2024/06/24

*Release Note

*When the research results including the data which were downloaded from NHA/DRA, are published or presented somewhere, the data user must refer the papers which are related to the data, or include them in the acknowledgment. Learn more

 

MOLECULAR DATA

GWAS

Participants/Materials

[GWAS-1]

   COVID-19 (ICD-10: U071): 2393 cases, Healthy controls: 3289 individuals

[GWAS-2]

   Severe COVID-19: 990 cases and 3289 healthy controls from [GWAS-1]

[GWAS-3]

   COVID-19 under age 65: 1,484 cases and 2,377 healthy controls under age 65 from [GWAS-1]

[GWAS-4]

   Severe COVID-19 under age 65: 440 cases and 2,377 healthy controls under age 65 from [GWAS-3]

Targets Genome wide SNPs
Target Loci for Capture Methods -
Platform Illumina [Infinium Asian Screening Array]
Source DNAs extracted from peripheral blood cells
Cell Lines -
Reagents (Kit, Version) Infinium Asian Screening Array
Genotype Call Methods (software)

genotyping: GenomeStudio

haplotype phasing: SHAPEIT4 (autosome), SHAPEIT2 (X-chromosome)

imputation: Minimac4

Association Analysis (software) PLINK2
Filtering Methods

Sample QC: We excluded samples with

      (1) sample call rate < 0.97

      (2) excess heterozygosity of genotypes > mean + 3SD

      (3) related samples with PI_HAT > 0.175

      (4) outlier samples from East Asian clusters in principal component analysis with 1000 Genomes Project samples.

Genotyping QC: We excluded variants with

      (1) variant call rate < 0.99

      (2) significant call rate differences between cases and controls with P < 5.0×10-8

      (3) deviation from Hardy-Weinberg equilibrium with P < 1.0×10-6

      (4) minor allele count < 5

Imputation QC: MAF ≥ 0.1% and imputation score (Rsq) > 0.5

Marker Number (after QC)

[GWAS-1] 13,484,569 variants

[GWAS-2] 13,199,053 variants

[GWAS-3] 13,241,602 variants

[GWAS-4] 12,764,136 variants

NBDC Dataset ID

hum0343.v1.covid19.v1

[GWAS-1][GWAS-2][GWAS-3][GWAS-4]

(Click the gwas number to download files)

Dictionary file

Total Data Volume

[GWAS-1] 361 MB

[GWAS-2] 354 MB

[GWAS-3] 354 MB

[GWAS-4] 343 MB

Comments (Policies) NBDC policy

 

RNA-seq

Participants/Materials COVID-19 (ICD-10: U071): 473 cases
Targets RNA-seq
Target Loci for Capture Methods -
Platform Illumina [NovaSeq6000]
Library Source RNAs extracted from peripheral blood cells
Cell Lines -
Library Construction (kit name) NEBNext® Poly(A) mRNA Magnetic Isolation Module and NEBNext® Ultra™ Directional RNA Library Prep Kit for Illumina
Fragmentation Methods Incubation in the buffer containing Mg2+ at 94°C for 15 minutes
Spot Type Paired-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers) 100 bp
Mapping Methods

Adapter removal: Trimmomatic (v0.39)

Alignment: STAR (v2.7.9a)

Annotation: GENCODE v30

Reference Genome Sequence GRCh38/hg38
Detecting method for read count (software) Gene level quantification and normalization: RSEM (v1.3.3)
QC median transcripts per kilobase million (TPM) > 10
Gene Number 5991
NBDC Dataset ID

hum0343.v1.count.v1

(Click the Dataset ID to download the file)

Sample Information

Total Data Volume 6 MB
Comments (Policies) NBDC policy

 

eQTL/sQTL study

Participants/Materials COVID-19 (ICD-10: U071): 465 cases (severe cases: 359, mild case: 106)
Targets eQTL/sQTL summary statistics
Target Loci for Capture Methods -
Platform

RNA-seq: Illumina [NovaSeq6000]

SNP array data: Illumina [Infinium Asian Screening Array]

Library Source read count data of RNA-seq and SNP array data of GWAS
Cell Lines -
Library Construction (kit name)

RNA-seq: See RNA-seq

SNP array data: See GWAS

Detecting method for read count (software)

Gene level quantification and normalization: RSEM (v1.3.3)

Intron cluster quantification: LeafCutter (v0.2.7)

QC Following GTEx pipeline (https://github.com/broadinstitute/gtex-pipeline/)
Detection method of eQTL (cis)

The eQTL effects of common (>1%) variants in cis (<+-1Mb) window of transcription sites were tested using fastQTL. Variant-gene pairs with cis-eQTL p-value<0.05, annotated with allele frequency (AF), p-value, effect size (beta) and posterior inclusion probability (PIP) were summarized.

Detection method of eQTL (trans)

Trans-eQTL effects were tested using tensorQTL. Variant-gene pairs with trans-eQTL p-value <5*10^-8, annotated with AF, p-value and beta were summarized.

Detection method of sQTL

The sQTL effects of common (>1%) variants in cis (<+-1Mb) window of intron cluster start sites were tested using fastQTL. Variant-intron cluster pairs with cis-sQTL p-value<0.05, annotated with AF, p-value, beta and PIP were summarized.

NBDC Dataset ID

hum0343.v2.qtl.v1

(Click the Dataset ID to download the file)

Dictionary file

Total Data Volume 714 MB (tsv)
Comments (Policies) NBDC policy

 

RNA-seq (E-GEAD-759)

Participants/Materials COVID-19 (ICD-10: U071): 1,019 cases
Targets RNA-seq
Target Loci for Capture Methods -
Platform Illumina [NovaSeq6000]
Library Source RNAs extracted from peripheral blood cells
Cell Lines -
Library Construction (kit name) NEBNext® Poly(A) mRNA Magnetic Isolation Module and NEBNext® Ultra™ Directional RNA Library Prep Kit for Illumina
Fragmentation Methods Incubation in the buffer containing Mg2+ at 94°C for 15 minutes
Spot Type Paired-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers) 100 bp
Mapping Methods

Alignment: STAR (v2.5.3a)

Annotation: GENCODE v30

Reference Genome Sequence GRCh38/hg38
Detecting method for read count (software) Gene level quantification and normalization: RSEM (v1.3.0)
QC

Transcripts per kilobase million (TPM) ≥0.1 in ≥20% samples

Read count ≥6 in ≥20% samples

Gene Number 20329
Genomic Expression Archive ID

E-GEAD-759

Dictionary file

Total Data Volume 91.6 MB (tsv)
Comments (Policies) NBDC policy

 

Protein expression

Participants/Materials COVID-19 (ICD-10: U071): 1,384 cases
Targets Protein expression (2932 proteins)
Target Loci for Capture Methods -
Platform Olink [Olink Explore 3072]
Library Source Plasma
Cell Lines -
Library Construction (kit name) Olink Explore 3072
Fragmentation Methods -
Spot Type -
Read Length (without Barcodes, Adaptors, Primers, and Linkers) -
Detecting Methods for Proteins (software) OlinkAnalyze v3.4.1
Normalization Methods Normalized Protein eXpression (NPX) transformation
Validation Methods Bridge sample comparison
Genomic Expression Archive ID

E-GEAD-759

Dictionary file

Total Data Volume 91.6 MB (tsv)
Comments (Policies) NBDC policy

 

eQTL/pQTL study

Participants/Materials

COVID-19 (ICD-10: U071): 1,405 cases (severe cases: 995, mild case: 410)

          eQTL analysis: 1,019 cases

          pQTL analysis: 1,384 cases

           (998 intersecting cases)

Targets eQTL/pQTL summary statistics
Target Loci for Capture Methods -
Platform

RNA-seq: Illumina [NovaSeq6000]

SNP array data: Illumina [Infinium Asian Screening Array]

Protein expression data: Olink Explore 3072

Library Source read count data of RNA-seq, SNP array data of GWAS and Protein expression data
Cell Lines -
Library Construction (kit name)

RNA-seq: See RNA-seq

SNP array data: See GWAS

Protein expression data: See Protein expression

Detecting method for read count (software)

Gene level quantification and normalization: RSEM (v1.3.0)

Intron cluster quantification: OlinkAnalyze v3.4.1

QC Following GTEx pipeline (https://github.com/broadinstitute/gtex-pipeline/)
Detection method of eQTL (cis)

The eQTL effects of cis variants (<+-1Mb window of transcription start sites, minor allele count >2) were tested using fastQTL. Then, variant-gene pairs with p-value<0.05 or posterior inclusion probability (PIP) >0.001, annotated with allele frequency (AF), p-value, effect size (beta) and, PIPs were summarized as separate files.

Detection method of pQTL (cis)

The pQTL effects of cis variants (<+-1Mb window of transcription start sites, minor allele count >2) were tested using fastQTL. Then, variant-gene pairs with p-value<0.05 or posterior inclusion probability (PIP) >0.001, annotated with allele frequency (AF), p-value, effect size (beta) and, PIPs were summarized as separate files.

NBDC Dataset ID

hum0343.v3.qtl.v1

(Click the Dataset ID to download the file)

Dictionary file

Total Data Volume 881.5 MB (tsv)
Comments (Policies) NBDC policy

 

DATA PROVIDER

Principal Investigator: Koichi Fukunaga

Affiliation: Department of Medicine, Pulmonary Division, Keio University School of Medicine

Project / Group Name: -

Funds / Grants (Research Project Number):

Name Title Project Number
Project Promoting Support for Drug Discovery, Japan Agency for Medical Research and Development (AMED) Development of genetically-designed COVID19 mucosal immune vaccine with molecular needle platform JP20nk0101612
Research Program on Emerging and Re-emerging Infectious Diseases, Japan Agency for Medical Research and Development (AMED) Promotion of genetic, immunological, and metabolic research necessary for the development of next-generation vaccines and drugs aiming to prevent the aggravation of coronavirus disease 2019 JP20fk0108415
Research Program on Emerging and Re-emerging Infectious Diseases, Japan Agency for Medical Research and Development (AMED) Elucidation of pathogenesis and development of therapeutic strategies using genetic, immunological, and metabolic studies against SARS-CoV-2 variants JP20fk0108452
Japan Program for Infectious Diseases Research and Infrastructure, Japan Agency for Medical Research and Development (AMED) Elucidation of the pathophysiology of the sequelae of coronavirus disease 2019 using a multidisciplinary approach JP21wm0325031
Core Research and Evolutional Science and Technology (CREST), Japan Science and Technology Agency (JST) Research on Conquering Coronavirus Disease by Advanced Genomic Analysis and Artificial Intelligence JPMJCR20H2
Practical Research Project for Allergic Diseases and Immunology, Japan Agency for Medical Research and Development (AMED) Genomic prediction medicine of rheumatoid arthritis based on comprehensive immune-omics resources 20ek0410075h0001
KAKENHI Grant-in-Aid for Scientific Research (A) Elucidation of tissue-specificity of disease biology using trans-layer omics analysis and whole-genome sequencing 19H01021
Program for Promoting Platform of Genomics based Drug Discovery, Japan Agency for Medical Research and Development (AMED) Systematic evaluation of variant of uncertain significance (VUS) pathogenicity through population genomics data analysis and massively parallel reporter assay JP22kk0305022
Fusion Oriented REsearch for disruptive Science and Technology (FOREST) program, Japan Science and Technology Agency (JST) Towards a generalized and interpretable model for comprehensive understanding of human gene regulatory mechanisms JPMJFR225Y
Promoting Individual Research to Nurture the Seeds of Future Innovation and Organizing Unique, Innovative Network (PRESTO), Japan Science and Technology Agency (JST) Fundamental research to build an academic system resilient to pandemics JPMJPR21R7

 

PUBLICATIONS

Title DOIDataset ID
1 DOCK2 is involved in the host genetics and biology of severe COVID-19 doi: 10.1038/s41586-022-05163-5 hum0343.v1.covid19.v1
hum0343.v1.count.v1
2 The whole blood transcriptional regulation landscape in 465 COVID-19 infected samples from Japan COVID-19 Task Force doi: 10.1038/s41467-022-32276-2 hum0343.v2.qtl.v1
3 Statistically and functionally fine-mapped blood e/pQTLs from 1,405 humans reveal their distinct regulation patterns and disease relevance E-GEAD-759
hum0343.v3.qtl.v1