NBDC Research ID: hum0343.v1

Click to Latest version.

 

SUMMARY

Aims: To construct a prediction system for severe disease through whole genome sequencing, RNA sequencing, and ultra-high-precision HLA analysis of patients with COVID-19, asymptomatic infected patients, and patients suspected of having novel coronavirus infection. In addition, we will use anonymized data to analyze the prediction of severity of COVID-19 using mathematical models.

Methods: Genome-wide association study (GWAS), RNA-seq

Participants/Materials: GWAS: 5682 Japanese individuals (2393 COVID-19 infected patients and 3289 controls) 

                                        RNA-seq: 473 COVID-19 infected patients

Data Set IDType of DataCriteriaRelease Date
hum0343.v1.covid19.v1 GWAS for COVID-19 Un-restricted Access 2022/5/26
hum0343.v1.count.v1 NGS (RNA-seq) for COVID-19 Un-restricted Access 2022/5/26

*Release Note

 

MOLECULAR DATA

GWAS

Participants/Materials

[GWAS-1]

   COVID-19 (ICD-10: U071): 2393 cases, Healthy controls: 3289 individuals

[GWAS-2]

   Severe COVID-19: 990 cases and 3289 healthy controls from [GWAS-1]

[GWAS-3]

   COVID-19 under age 65: 1,484 cases and 2,377 healthy controls under age 65 from [GWAS-1]

[GWAS-4]

   Severe COVID-19 under age 65: 440 cases and 2,377 healthy controls under age 65 from [GWAS-3]

Targets Genome wide SNPs
Target Loci for Capture Methods -
Platform Illumina [Infinium Asian Screening Array]
Source DNAs extracted from peripheral blood cells
Cell Lines -
Reagents (Kit, Version) Infinium Asian Screening Array
Genotype Call Methods (software)

genotyping: GenomeStudio

haplotype phasing: SHAPEIT4 (autosome), SHAPEIT2 (X-chromosome)

imputation: Minimac4

Association Analysis (software) PLINK2
Filtering Methods

Sample QC: We excluded samples with

      (1) sample call rate < 0.97

      (2) excess heterozygosity of genotypes > mean + 3SD

      (3) related samples with PI_HAT > 0.175

      (4) outlier samples from East Asian clusters in principal component analysis with 1000 Genomes Project samples.

Genotyping QC: We excluded variants with

      (1) variant call rate < 0.99

      (2) significant call rate differences between cases and controls with P < 5.0×10-8

      (3) deviation from Hardy-Weinberg equilibrium with P < 1.0×10-6

      (4) minor allele count < 5

Imputation QC: MAF ≥ 0.1% and imputation score (Rsq) > 0.5

Marker Number (after QC)

[GWAS-1] 13,484,569 variants

[GWAS-2] 13,199,053 variants

[GWAS-3] 13,241,602 variants

[GWAS-4] 12,764,136 variants

NBDC Data Set ID

hum0343.v1.covid19.v1

[GWAS-1][GWAS-2][GWAS-3][GWAS-4]

(Click the gwas number to download files)

Dictionary file

Total Data Volume

[GWAS-1] 361 MB

[GWAS-2] 354 MB

[GWAS-3] 354 MB

[GWAS-4] 343 MB

Comments (Policies) NBDC policy

 

RNA-seq

Participants/Materials COVID-19 (ICD-10: U071): 473 cases
Targets RNA-seq
Target Loci for Capture Methods -
Platform Illumina [NovaSeq6000]
Library Source RNAs extracted from peripheral blood cells
Cell Lines -
Library Construction (kit name) NEBNext® Poly(A) mRNA Magnetic Isolation Module and NEBNext® Ultra™ Directional RNA Library Prep Kit for Illumina
Fragmentation Methods Incubation in the buffer containing Mg2+ at 94°C for 15 minutes
Spot Type Paired-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers) 100 bp
Mapping Methods

Adapter removal: Trimmomatic (v0.39)

Alignment: STAR (v2.7.9a)

Annotation: GENCODE v30

Reference Genome Sequence GRCh38/hg38
Detecting method for read count (software) Gene level quantification and normalization: RSEM (v1.3.3)
QC median transcripts per kilobase million (TPM) > 10
Gene Number 5991
NBDC Data Set ID

hum0343.v1.count.v1

(Click the Data Set ID to download the file)

Sample Information

Total Data Volume 6 MB
Comments (Policies) NBDC policy

*When the research results including the data which were downloaded from NHA/DRA, are published or presented somewhere, the data user must refer the papers which are related to the data, or include them in the acknowledgment. Learn more

 

DATA PROVIDER

Principal Investigator: Koichi Fukunaga

Affiliation: Department of Medicine, Pulmonary Division, Keio University School of Medicine

Project / Group Name: -

Funds / Grants (Research Project Number):

Name Title Project Number
Project Promoting Support for Drug Discovery, Japan Agency for Medical Research and Development (AMED) Development of genetically-designed COVID19 mucosal immune vaccine with molecular needle platform JP20nk0101612
Research Program on Emerging and Re-emerging Infectious Diseases, Japan Agency for Medical Research and Development (AMED) Promotion of genetic, immunological, and metabolic research necessary for the development of next-generation vaccines and drugs aiming to prevent the aggravation of coronavirus disease 2019 JP20fk0108415
Research Program on Emerging and Re-emerging Infectious Diseases, Japan Agency for Medical Research and Development (AMED) Elucidation of pathogenesis and development of therapeutic strategies using genetic, immunological, and metabolic studies against SARS-CoV-2 variants JP20fk0108452
Japan Program for Infectious Diseases Research and Infrastructure, Japan Agency for Medical Research and Development (AMED) Elucidation of the pathophysiology of the sequelae of coronavirus disease 2019 using a multidisciplinary approach JP21wm0325031
Core Research and Evolutional Science and Technology (CREST), Japan Science and Technology Agency (JST) Research on Conquering Coronavirus Disease by Advanced Genomic Analysis and Artificial Intelligence JPMJCR20H2
Practical Research Project for Allergic Diseases and Immunology, Japan Agency for Medical Research and Development (AMED) Genomic prediction medicine of rheumatoid arthritis based on comprehensive immune-omics resources 20ek0410075h0001
KAKENHI Grant-in-Aid for Scientific Research (A) Elucidation of tissue-specificity of disease biology using trans-layer omics analysis and whole-genome sequencing 19H01021

 

PUBLICATIONS

Title DOIData Set ID
1

 

USERS (Controlled-Access Data)

Principal Investigator: Affiliation: Data in Use (Data Set ID)Period of Data Use