NBDC Research ID: hum0343.v2

SUMMARY

Aims: To construct a prediction system for severe disease through whole genome sequencing, RNA sequencing, and ultra-high-precision HLA analysis of patients with COVID-19, asymptomatic infected patients, and patients suspected of having novel coronavirus infection. In addition, we will use anonymized data to analyze the prediction of severity of COVID-19 using mathematical models.

Methods: Genome-wide association study (GWAS), RNA-seq, eQTL/sQTL study

Participants/Materials: GWAS: 5682 Japanese individuals (2393 COVID-19 infected patients and 3289 controls)　

RNA-seq: 473 COVID-19 infected patients

Dataset ID	Type of Data	Criteria	Release Date
hum0343.v1.covid19.v1	GWAS for COVID-19	Unrestricted-access	2022/05/26
hum0343.v1.count.v1	NGS (RNA-seq) for COVID-19	Unrestricted-access	2022/05/26
hum0343.v2.qtl.v1	eQTL/sQTL summary statistics for COVID-19	Unrestricted-access	2022/06/14

*Release Note

*When the research results including the data which were downloaded from NHA/DRA, are published or presented somewhere, the data user must refer the papers which are related to the data, or include them in the acknowledgment. Learn more

MOLECULAR DATA

GWAS


Participants/Materials	[GWAS-1] COVID-19 (ICD-10: U071): 2393 cases, Healthy controls: 3289 individuals [GWAS-2] Severe COVID-19: 990 cases and 3289 healthy controls from [GWAS-1] [GWAS-3] COVID-19 under age 65: 1,484 cases and 2,377 healthy controls under age 65 from [GWAS-1] [GWAS-4] Severe COVID-19 under age 65: 440 cases and 2,377 healthy controls under age 65 from [GWAS-3]
Targets	Genome wide SNPs
Target Loci for Capture Methods	-
Platform	Illumina [Infinium Asian Screening Array]
Source	DNAs extracted from peripheral blood cells
Cell Lines	-
Reagents (Kit, Version)	Infinium Asian Screening Array
Genotype Call Methods (software)	genotyping: GenomeStudio haplotype phasing: SHAPEIT4 (autosome), SHAPEIT2 (X-chromosome) imputation: Minimac4
Association Analysis (software)	PLINK2
Filtering Methods	Sample QC: We excluded samples with (1) sample call rate < 0.97 (2) excess heterozygosity of genotypes > mean + 3SD (3) related samples with PI_HAT > 0.175 (4) outlier samples from East Asian clusters in principal component analysis with 1000 Genomes Project samples. Genotyping QC: We excluded variants with (1) variant call rate < 0.99 (2) significant call rate differences between cases and controls with P < 5.0×10-8 (3) deviation from Hardy-Weinberg equilibrium with P < 1.0×10-6 (4) minor allele count < 5 Imputation QC: MAF ≥ 0.1% and imputation score (Rsq) > 0.5
Marker Number (after QC)	[GWAS-1] 13,484,569 variants [GWAS-2] 13,199,053 variants [GWAS-3] 13,241,602 variants [GWAS-4] 12,764,136 variants
NBDC Dataset ID	hum0343.v1.covid19.v1 [GWAS-1][GWAS-2][GWAS-3][GWAS-4] (Click the gwas number to download files) Dictionary file
Total Data Volume	[GWAS-1] 361 MB [GWAS-2] 354 MB [GWAS-3] 354 MB [GWAS-4] 343 MB
Comments (Policies)	NBDC policy

RNA-seq


Participants/Materials	COVID-19 (ICD-10: U071): 473 cases
Targets	RNA-seq
Target Loci for Capture Methods	-
Platform	Illumina [NovaSeq6000]
Library Source	RNAs extracted from peripheral blood cells
Cell Lines	-
Library Construction (kit name)	NEBNext® Poly(A) mRNA Magnetic Isolation Module and NEBNext® Ultra™ Directional RNA Library Prep Kit for Illumina
Fragmentation Methods	Incubation in the buffer containing Mg2+ at 94°C for 15 minutes
Spot Type	Paired-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers)	100 bp
Mapping Methods	Adapter removal: Trimmomatic (v0.39) Alignment: STAR (v2.7.9a) Annotation: GENCODE v30
Reference Genome Sequence	GRCh38/hg38
Detecting method for read count (software)	Gene level quantification and normalization: RSEM (v1.3.3)
QC	median transcripts per kilobase million (TPM) > 10
Gene Number	5991
NBDC Dataset ID	hum0343.v1.count.v1 (Click the Dataset ID to download the file) Sample Information
Total Data Volume	6 MB
Comments (Policies)	NBDC policy

eQTL/sQTL study


Participants/Materials	COVID-19 (ICD-10: U071): 473 cases
Targets	eQTL/sQTL summary statistics
Target Loci for Capture Methods	-
Platform	RNA-seq: Illumina [NovaSeq6000] SNP array data: Illumina [Infinium Asian Screening Array]
Library Source	read count data of RNA-seq and SNP array data of GWAS
Cell Lines	-
Library Construction (kit name)	RNA-seq: See RNA-seq SNP array data: See GWAS
Detecting method for read count (software)	Gene level quantification and normalization: RSEM (v1.3.3) Intron cluster quantification: LeafCutter (v0.2.7)
QC	Following GTEx pipeline (https://github.com/broadinstitute/gtex-pipeline/)
Detection method of eQTL (cis)	The eQTL effects of common (>1%) variants in cis (<+-1Mb) window of transcription sites were tested using fastQTL. Variant-gene pairs with cis-eQTL p-value<0.05, annotated with allele frequency (AF), p-value, effect size (beta) and posterior inclusion probability (PIP) were summarized.
Detection method of eQTL (trans)	Trans-eQTL effects were tested using tensorQTL. Variant-gene pairs with trans-eQTL p-value <5*10^-8, annotated with AF, p-value and beta were summarized.
Detection method of sQTL	The sQTL effects of common (>1%) variants in cis (<+-1Mb) window of intron cluster start sites were tested using fastQTL. Variant-intron cluster pairs with cis-sQTL p-value<0.05, annotated with AF, p-value, beta and PIP were summarized.
NBDC Dataset ID	hum0343.v2.qtl.v1 (Click the Dataset ID to download the file) Dictionary file
Total Data Volume	714 MB (tsv)
Comments (Policies)	NBDC policy

DATA PROVIDER

Principal Investigator: Koichi Fukunaga

Affiliation: Department of Medicine, Pulmonary Division, Keio University School of Medicine

Project / Group Name: -

Funds / Grants (Research Project Number):

Name	Title	Project Number
Project Promoting Support for Drug Discovery, Japan Agency for Medical Research and Development (AMED)	Development of genetically-designed COVID19 mucosal immune vaccine with molecular needle platform	JP20nk0101612
Research Program on Emerging and Re-emerging Infectious Diseases, Japan Agency for Medical Research and Development (AMED)	Promotion of genetic, immunological, and metabolic research necessary for the development of next-generation vaccines and drugs aiming to prevent the aggravation of coronavirus disease 2019	JP20fk0108415
Research Program on Emerging and Re-emerging Infectious Diseases, Japan Agency for Medical Research and Development (AMED)	Elucidation of pathogenesis and development of therapeutic strategies using genetic, immunological, and metabolic studies against SARS-CoV-2 variants	JP20fk0108452
Japan Program for Infectious Diseases Research and Infrastructure, Japan Agency for Medical Research and Development (AMED)	Elucidation of the pathophysiology of the sequelae of coronavirus disease 2019 using a multidisciplinary approach	JP21wm0325031
Core Research and Evolutional Science and Technology (CREST), Japan Science and Technology Agency (JST)	Research on Conquering Coronavirus Disease by Advanced Genomic Analysis and Artificial Intelligence	JPMJCR20H2
Practical Research Project for Allergic Diseases and Immunology, Japan Agency for Medical Research and Development (AMED)	Genomic prediction medicine of rheumatoid arthritis based on comprehensive immune-omics resources	20ek0410075h0001
KAKENHI Grant-in-Aid for Scientific Research (A)	Elucidation of tissue-specificity of disease biology using trans-layer omics analysis and whole-genome sequencing	19H01021

PUBLICATIONS

	Title	DOI	Dataset ID
1	DOCK2 is involved in the host genetics and biology of severe COVID-19	doi: 10.1038/s41586-022-05163-5	hum0343.v1.covid19.v1 hum0343.v1.count.v1 hum0343.v2.qtl.v1