NBDC Research ID: hum0343.v4

 

SUMMARY

Aims: To construct a prediction system for severe disease through whole genome sequencing, RNA sequencing, and ultra-high-precision HLA analysis of patients with COVID-19, asymptomatic infected patients, and patients suspected of having novel coronavirus infection. In addition, we will use anonymized data to analyze the prediction of severity of COVID-19 using mathematical models. To elucidate the association between viruses and autoimmune diseases and COVID-19.

Methods: Genome-wide association study (GWAS), RNA-seq, Protein expression analysis, eQTL/sQTL/pQTL study, whole genome sequencing

Participants/Materials: GWAS: 5,682 Japanese individuals (2,393 COVID-19 infected patients and 3,289 controls) 

                                        RNA-seq: Maximum 1,019 COVID-19 infected patients

                                        Protein expression: 1,384 COVID-19 infected patients

                                        Whole genome sequencing: 1,164 COVID-19 infected patients

Dataset IDType of DataCriteriaRelease Date
hum0343.v1.covid19.v1 GWAS for COVID-19 Unrestricted-access 2022/05/26
hum0343.v1.count.v1 NGS (RNA-seq) for COVID-19 Unrestricted-access 2022/05/26
hum0343.v2.qtl.v1 eQTL/sQTL summary statistics for COVID-19 Unrestricted-access 2022/06/14
E-GEAD-759

NGS (RNA-seq) for COVID-19

Protein expression for COVID-19

Unrestricted-access 2024/06/24
hum0343.v3.qtl.v1 eQTL/pQTL summary statistics for COVID-19 Unrestricted-access 2024/06/24
JGAS000739 The presence or absence of endogenous herpesvirus 6 and anellovirus load calculated from NGS (WGS) for COVID-19 Controlled-access (Type I) 2024/10/02

*Release Note

*Data users need to apply an application for Using NBDC Human Data to reach the Controlled-access Data. Learn more

*When the research results including the data which were downloaded from NHA/DRA, are published or presented somewhere, the data user must refer the papers which are related to the data, or include them in the acknowledgment. Learn more

 

MOLECULAR DATA

GWAS

Participants/Materials

[GWAS-1]

   COVID-19 (ICD-10: U071): 2,393 cases, Healthy controls: 3,289 individuals

[GWAS-2]

   Severe COVID-19: 990 cases and 3,289 healthy controls from [GWAS-1]

[GWAS-3]

   COVID-19 under age 65: 1,484 cases and 2,377 healthy controls under age 65 from [GWAS-1]

[GWAS-4]

   Severe COVID-19 under age 65: 440 cases and 2,377 healthy controls under age 65 from [GWAS-3]

Targets Genome wide SNPs
Target Loci for Capture Methods -
Platform Illumina [Infinium Asian Screening Array]
Source DNAs extracted from peripheral blood cells
Cell Lines -
Reagents (Kit, Version) Infinium Asian Screening Array
Genotype Call Methods (software)

genotyping: GenomeStudio

haplotype phasing: SHAPEIT4 (autosome), SHAPEIT2 (X-chromosome)

imputation: Minimac4

Association Analysis (software) PLINK2
Filtering Methods

Sample QC: We excluded samples with

      (1) sample call rate < 0.97

      (2) excess heterozygosity of genotypes > mean + 3SD

      (3) related samples with PI_HAT > 0.175

      (4) outlier samples from East Asian clusters in principal component analysis with 1000 Genomes Project samples.

Genotyping QC: We excluded variants with

      (1) variant call rate < 0.99

      (2) significant call rate differences between cases and controls with P < 5.0×10-8

      (3) deviation from Hardy-Weinberg equilibrium with P < 1.0×10-6

      (4) minor allele count < 5

Imputation QC: MAF ≥ 0.1% and imputation score (Rsq) > 0.5

Marker Number (after QC)

[GWAS-1] 13,484,569 variants

[GWAS-2] 13,199,053 variants

[GWAS-3] 13,241,602 variants

[GWAS-4] 12,764,136 variants

NBDC Dataset ID

hum0343.v1.covid19.v1

[GWAS-1][GWAS-2][GWAS-3][GWAS-4]

(Click the gwas number to download files)

Dictionary file

Total Data Volume

[GWAS-1] 361 MB

[GWAS-2] 354 MB

[GWAS-3] 354 MB

[GWAS-4] 343 MB

Comments (Policies) NBDC policy

 

RNA-seq

Participants/Materials COVID-19 (ICD-10: U071): 473 cases
Targets RNA-seq
Target Loci for Capture Methods -
Platform Illumina [NovaSeq6000]
Library Source RNAs extracted from peripheral blood cells
Cell Lines -
Library Construction (kit name) NEBNext® Poly(A) mRNA Magnetic Isolation Module and NEBNext® Ultra™ Directional RNA Library Prep Kit for Illumina
Fragmentation Methods Incubation in the buffer containing Mg2+ at 94°C for 15 minutes
Spot Type Paired-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers) 100 bp
Mapping Methods

Adapter removal: Trimmomatic (v0.39)

Alignment: STAR (v2.7.9a)

Annotation: GENCODE v30

Reference Genome Sequence GRCh38/hg38
Detecting method for read count (software) Gene level quantification and normalization: RSEM (v1.3.3)
QC median transcripts per kilobase million (TPM) > 10
Gene Number 5991
NBDC Dataset ID

hum0343.v1.count.v1

(Click the Dataset ID to download the file)

Sample Information

Total Data Volume 6 MB
Comments (Policies) NBDC policy

 

eQTL/sQTL study

Participants/Materials COVID-19 (ICD-10: U071): 465 cases (severe cases: 359, mild case: 106)
Targets eQTL/sQTL summary statistics
Target Loci for Capture Methods -
Platform

RNA-seq: Illumina [NovaSeq6000]

SNP array data: Illumina [Infinium Asian Screening Array]

Library Source read count data of RNA-seq and SNP array data of GWAS
Cell Lines -
Library Construction (kit name)

RNA-seq: See RNA-seq

SNP array data: See GWAS

Detecting method for read count (software)

Gene level quantification and normalization: RSEM (v1.3.3)

Intron cluster quantification: LeafCutter (v0.2.7)

QC Following GTEx pipeline (https://github.com/broadinstitute/gtex-pipeline/)
Detection method of eQTL (cis)

The eQTL effects of common (>1%) variants in cis (<+-1Mb) window of transcription sites were tested using fastQTL. Variant-gene pairs with cis-eQTL p-value<0.05, annotated with allele frequency (AF), p-value, effect size (beta) and posterior inclusion probability (PIP) were summarized.

Detection method of eQTL (trans)

Trans-eQTL effects were tested using tensorQTL. Variant-gene pairs with trans-eQTL p-value <5*10^-8, annotated with AF, p-value and beta were summarized.

Detection method of sQTL

The sQTL effects of common (>1%) variants in cis (<+-1Mb) window of intron cluster start sites were tested using fastQTL. Variant-intron cluster pairs with cis-sQTL p-value<0.05, annotated with AF, p-value, beta and PIP were summarized.

NBDC Dataset ID

hum0343.v2.qtl.v1

(Click the Dataset ID to download the file)

Dictionary file

Total Data Volume 714 MB (tsv)
Comments (Policies) NBDC policy

 

RNA-seq (E-GEAD-759)

Participants/Materials COVID-19 (ICD-10: U071): 1,019 cases
Targets RNA-seq
Target Loci for Capture Methods -
Platform Illumina [NovaSeq6000]
Library Source RNAs extracted from peripheral blood cells
Cell Lines -
Library Construction (kit name) NEBNext® Poly(A) mRNA Magnetic Isolation Module and NEBNext® Ultra™ Directional RNA Library Prep Kit for Illumina
Fragmentation Methods Incubation in the buffer containing Mg2+ at 94°C for 15 minutes
Spot Type Paired-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers) 100 bp
Mapping Methods

Alignment: STAR (v2.5.3a)

Annotation: GENCODE v30

Reference Genome Sequence GRCh38/hg38
Detecting method for read count (software) Gene level quantification and normalization: RSEM (v1.3.0)
QC

Transcripts per kilobase million (TPM) ≥0.1 in ≥20% samples

Read count ≥6 in ≥20% samples

Gene Number 20329
Genomic Expression Archive ID

E-GEAD-759

Dictionary file

Total Data Volume 91.6 MB (tsv)
Comments (Policies) NBDC policy

 

Protein expression

Participants/Materials COVID-19 (ICD-10: U071): 1,384 cases
Targets Protein expression (2932 proteins)
Target Loci for Capture Methods -
Platform Olink [Olink Explore 3072]
Library Source Plasma
Cell Lines -
Library Construction (kit name) Olink Explore 3072
Fragmentation Methods -
Spot Type -
Read Length (without Barcodes, Adaptors, Primers, and Linkers) -
Detecting Methods for Proteins (software) OlinkAnalyze v3.4.1
Normalization Methods Normalized Protein eXpression (NPX) transformation
Validation Methods Bridge sample comparison
Genomic Expression Archive ID

E-GEAD-759

Dictionary file

Total Data Volume 91.6 MB (tsv)
Comments (Policies) NBDC policy

 

eQTL/pQTL study

Participants/Materials

COVID-19 (ICD-10: U071): 1,405 cases (severe cases: 995, mild case: 410)

          eQTL analysis: 1,019 cases

          pQTL analysis: 1,384 cases

           (998 intersecting cases)

Targets eQTL/pQTL summary statistics
Target Loci for Capture Methods -
Platform

RNA-seq: Illumina [NovaSeq6000]

SNP array data: Illumina [Infinium Asian Screening Array]

Protein expression data: Olink Explore 3072

Library Source read count data of RNA-seq, SNP array data of GWAS and Protein expression data
Cell Lines -
Library Construction (kit name)

RNA-seq: See RNA-seq

SNP array data: See GWAS

Protein expression data: See Protein expression

Detecting method for read count (software)

Gene level quantification and normalization: RSEM (v1.3.0)

Intron cluster quantification: OlinkAnalyze v3.4.1

QC Following GTEx pipeline (https://github.com/broadinstitute/gtex-pipeline/)
Detection method of eQTL (cis)

The eQTL effects of cis variants (<+-1Mb window of transcription start sites, minor allele count >2) were tested using fastQTL. Then, variant-gene pairs with p-value<0.05 or posterior inclusion probability (PIP) >0.001, annotated with allele frequency (AF), p-value, effect size (beta) and, PIPs were summarized as separate files.

Detection method of pQTL (cis)

The pQTL effects of cis variants (<+-1Mb window of transcription start sites, minor allele count >2) were tested using fastQTL. Then, variant-gene pairs with p-value<0.05 or posterior inclusion probability (PIP) >0.001, annotated with allele frequency (AF), p-value, effect size (beta) and, PIPs were summarized as separate files.

NBDC Dataset ID

hum0343.v3.qtl.v1

(Click the Dataset ID to download the file)

Dictionary file

Total Data Volume 881.5 MB (tsv)
Comments (Policies) NBDC policy

 

JGAS000739

Participants/Materials COVID-19 (ICD-10: U071): 1,164 cases (severe cases: 1,068)
Targets WGS
Target Loci for Capture Methods -
Platform Illumina [NovaSeq 6000]
Library Source DNAs extracted from peripheral blood cells
Cell Lines -
Library Construction (kit name) TruSeq DNA PCR-free Library Prep Kit
Fragmentation Methods Ultrasonic fragmentation
Spot Type Paired-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers) 150 bp x 2
Methods for removing host sequence/detecting viral sequence (software)

https://github.com/shohei-kojima/integrated_HHV6_recon

https://github.com/shohei-kojima/human_anellovirus_detection

QC We conducted principal component analysis (PCA) against HapMap3 data using SNP data of the same individuals to confirm the East Asian genetic background.
Reference sequence for viral genome Refer to the softwares' GitHub repositry. List of viral sequences
Japanese Genotype-phenotype Archive Dataset ID JGAD000874
Total Data Volume 38.5 KB (tsv)
Comments (Policies) NBDC policy

 

DATA PROVIDER

Principal Investigator: Koichi Fukunaga

Affiliation: Department of Medicine, Pulmonary Division, Keio University School of Medicine

Project / Group Name: -

Funds / Grants (Research Project Number):

Name Title Project Number
Project Promoting Support for Drug Discovery, Japan Agency for Medical Research and Development (AMED) Development of genetically-designed COVID19 mucosal immune vaccine with molecular needle platform JP20nk0101612
Research Program on Emerging and Re-emerging Infectious Diseases, Japan Agency for Medical Research and Development (AMED) Promotion of genetic, immunological, and metabolic research necessary for the development of next-generation vaccines and drugs aiming to prevent the aggravation of coronavirus disease 2019 JP20fk0108415
Research Program on Emerging and Re-emerging Infectious Diseases, Japan Agency for Medical Research and Development (AMED) Elucidation of pathogenesis and development of therapeutic strategies using genetic, immunological, and metabolic studies against SARS-CoV-2 variants JP20fk0108452
Japan Program for Infectious Diseases Research and Infrastructure, Japan Agency for Medical Research and Development (AMED) Elucidation of the pathophysiology of the sequelae of coronavirus disease 2019 using a multidisciplinary approach JP21wm0325031
Core Research and Evolutional Science and Technology (CREST), Japan Science and Technology Agency (JST) Research on Conquering Coronavirus Disease by Advanced Genomic Analysis and Artificial Intelligence JPMJCR20H2
Practical Research Project for Allergic Diseases and Immunology, Japan Agency for Medical Research and Development (AMED) Genomic prediction medicine of rheumatoid arthritis based on comprehensive immune-omics resources 20ek0410075h0001
KAKENHI Grant-in-Aid for Scientific Research (A) Elucidation of tissue-specificity of disease biology using trans-layer omics analysis and whole-genome sequencing 19H01021
Program for Promoting Platform of Genomics based Drug Discovery, Japan Agency for Medical Research and Development (AMED) Systematic evaluation of variant of uncertain significance (VUS) pathogenicity through population genomics data analysis and massively parallel reporter assay JP22kk0305022
Fusion Oriented REsearch for disruptive Science and Technology (FOREST) program, Japan Science and Technology Agency (JST) Towards a generalized and interpretable model for comprehensive understanding of human gene regulatory mechanisms JPMJFR225Y
Promoting Individual Research to Nurture the Seeds of Future Innovation and Organizing Unique, Innovative Network (PRESTO), Japan Science and Technology Agency (JST) Fundamental research to build an academic system resilient to pandemics JPMJPR21R7

 

PUBLICATIONS

Title DOIDataset ID
1 DOCK2 is involved in the host genetics and biology of severe COVID-19 doi: 10.1038/s41586-022-05163-5 hum0343.v1.covid19.v1
hum0343.v1.count.v1
2 The whole blood transcriptional regulation landscape in 465 COVID-19 infected samples from Japan COVID-19 Task Force doi: 10.1038/s41467-022-32276-2 hum0343.v2.qtl.v1
3 Statistically and functionally fine-mapped blood eQTLs and pQTLs from 1,405 humans reveal their distinct regulation patterns and disease relevance doi: 10.1038/s41588-024-01896-3 E-GEAD-759
hum0343.v3.qtl.v1
4 Blood DNA virome associates with autoimmune diseases and COVID-19. JGAD000874

 

USRES (Controlled-access Data)

Principal InvestigatorAffiliationCountry/RegionResearch TitleData in Use (Dataset ID)Period of Data Use