NBDC Research ID: hum0197.v10

Click to Latest version.

SUMMARY

Aims: Elucidation of disease biology based on trans-omics analysis, GWAS in the Japanese and trans-ethnic populations

Methods: Metagenome shotgun sequencing, genome-wide association study (GWAS), small RNA-seq and eQTL analyses

Participants/Materials:

Metagenomic data of gut microbiome in the Japanese population (95 + 103 + 227 + 30 + 136 individuals)

Autoimmune pulmonary alveolar proteinosis cases: 198, Control participants: 395

Populations: Biobank Japan (n = 179,000), UK biobank (n = 361,000), ans FinnGen (n = 136,000), Phenotypes: 215

141 Japanese individuals

Metagenomic data of gut microbiome in Inflammatory Bowel Disease (35 Ulcerative Colitis and 39 Crohn's disease) and 40 Healthy controls

Intracranial germ cell tumors cases: 133, Control participants: 762

Populations: Biobank Japan (n = 161,801) and UK biobank (n = 377,583), Phenotypes: 9

 

Data Set IDType of DataCriteriaRelease Date
JGAS000205 Metagenome Controlled Access (Type I) 2019/11/15
hum0197.v2.gwas.v1 GWAS for autoimmune pulmonary alveolar proteinosis Un-restricted Access 2020/11/27
JGAS000260 Metagenome Controlled Access (Type I) 2020/11/27
hum0197.v3.gwas.v1 GWAS for 215 phenotypes Un-restricted Access 2021/03/22
JGAS000316 Metagenome Controlled Access (Type I) 2021/10/12
JGAS000415 Metagenome Controlled Access (Type I) 2021/12/10
hum0197.v5.gwas.v1 GWAS for 10 phenotypes Un-restricted Access 2021/12/21
hum0197.v5.finemap.v1 Fine-mapping for 79 phenotypes Un-restricted Access 2021/12/21
JGAS000504 Read count data of miRNA Controlled Access (Type I) 2022/02/08
hum0197.v6.eqtl.v1 eQTL data Un-restricted Access 2022/02/08
JGAS000530 Metagenome Controlled Access (Type I) 2022/05/23
JGAS000531 Metagenome Controlled Access (Type I) 2022/06/03
hum0197.v9.gwas.GCT.v1 GWAS for intracranial germ cell tumors Un-restricted Access 2022/06/10
hum0197.v10.gwas.v1 GWAS for 9 phenotypes Un-restricted Access 2022/06/16

*Release Note

*Data users need to apply an application for Using NBDC Human Data to reach the Controlled-access Data. Learn more

*When the research results including the data which were downloaded from NHA/DRA, are published or presented somewhere, the data user must refer the papers which are related to the data, or include in the acknowledgment. Learn more

 

MOLECULAR DATA

JGAS000205/JGAS000260/JGAS000316/JGAS000415/JGAS000530/JGAS000531

Participants/Materials:

95+103+227+30+136 Japanese individuals

Inflammatory Bowel Disease

   35 Ulcerative Colitis

   39 Crohn's disease

40 Healthy controls

Targets Metagenome
Target Loci for Capture Methods -
Platform Illumina [HiSeq 3000]
Library Source DNA extracted from gut microbiome
Cell Lines -
Library Construction (kit name) KAPA Hyper Prep Kit
Fragmentation Methods Ultrasonic fragmentation (Covaris)
Spot Type Paired-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers) 150 bp
Japanese Genotype-phenotype Archive Data set ID

JGAD000290 (95 Japanese individuals)

JGAD000363 (103 Japanese individuals)

JGAD000427 (227 Japanese individuals)

JGAD000532 (30 Japanese individuals)

JGAD000649 (Inflammatory Bowel Disease)

JGAD000650 (136 Japanese individuals)

Total Data Volume

JGAD000290:477 GB(fastq)

JGAD000363:408 GB(fastq)

JGAD000427:881.2 GB(fastq)

JGAD000532:106.7 GB(fastq)

JGAD000649:374.6 GB (fastq)

JGAD000650:541.4 GB(fastq)

Comments (Policies) NBDC policy

 

hum0197.v2.gwas.v1

Participants/Materials

Autoimmune pulmonary alveolar proteinosis cases (ICD10: J840): 198

Control participants: 395

Targets genome wide SNPs
Target Loci for Capture Methods -
Platform Illumina [Infinium Asian Screening Array]
Library Source DNAs extracted from peripheral blood cells
Cell Lines -
Reagents (Kit, Version) Infinium Asian Screening Array
Genotype Call Methods (software) GenomeStudio for genotyping, shapeit2 for haplotype phasing, and minimac3 for imputation
Association Analysis (software) PLINK2
Filtering Methods

Sample QC: We excluded samples with low genotyping call rates (call rate < 98%) and in close genetic relation (PI_HAT > 0.175). We included samples of the estimated East Asian ancestry.

Variant QC: We excluded variants with (1) genotyping call rate < 98%, (2) P value for Hardy–Weinberg equilibrium < 1.0 × 10−6, and (3) minor allele count < 5, or (4) > 10% frequency difference with the imputation reference panel.

Marker Number (after QC) 12,153,232 autosomal variants and 242,876 X-chromosomal variants after QC.
NBDC Data Set ID

hum0197.v2.gwas.v1

(Click the Data Set ID to download the file)

Dictionary file

Total Data Volume 390MB for autosome (txt.gz) and 19MB for X chromosome (txt.gz)
Comments (Policies) NBDC policy

 

hum0197.v3.gwas.v1

Participants/Materials Biobank Japan (n = 179,000), UK biobank (n = 361,000), FinnGen (n = 136,000), no. Phenotypes: 215
Targets genome wide SNPs
Target Loci for Capture Methods -
Platform

BBJ: Illumina [HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip]

UK Biobank: Applied Biosystems [UK BiLEVE Axiom Array, UK Biobank Axiom Array]

FinnGen: Thermo Fisher Scientific [FinnGen1 ThermoFisher Array or other genotyping arrays]

Library Source DNAs extracted from peripheral blood cells
Cell Lines -
Reagents (Kit, Version)

BBJ: HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip

UK Biobank: UK BiLEVE Axiom Array, UK Biobank Axiom Array

FinnGen: FinnGen1 ThermoFisher Array or other genotyping arrays

Genotype Call Methods (software)

BBJ: Eagle, Minimac3

UK Biobank: IMPUTE4

FinnGen: beagle4.1

Association Analysis (software)

For binary traits, SAIGE software was used with age, age2, sex, age×sex, age2×sex, and top 20 principal components as covariates. For quantitative traits (biomarkers), BOLT-LMM or plink software was used with the same covariates.

 

Filtering Methods

BBJ: We included imputed variants with Rsq > 0.7.

UK Biobank: We excluded the variants with (i) INFO score ≤ 0.8, (ii) MAF ≤ 0.0001 (except for missense and protein-truncating variants annotated by VEP, which were excluded if MAF ≤ 1 × 10-6), and (iii) PHWE ≤ 1 × 10-10.

FinnGen: We excluded variants with an imputation INFO score < 0.8 or MAF < 0.0001.

Marker Number (after QC)

BBJ: 13,530,797 variants

UK Biobank: 13,791,467 variants

FinnGen: 16,859,359 variants

NBDC Data Set ID

hum0197.v3.gwas.v1

(Click the Data Set ID to download the file)

Dictionary file (BBJ, EUR, META)

Total Data Volume

BBJ: ~1.5G for autosome and ~33M for chrX

UK Biobank: ~1.5G for autosome and ~15M for chrX

FinnGen: ~740M for autosome and ~20M for chrX

Comments (Policies) NBDC policy

 

hum0197.v5.gwas.v1 / hum0197.v5.finemap.v1

Participants/Materials Biobank Japan (n = 179,000), no. Phenotypes: 79
Targets genome wide SNPs
Target Loci for Capture Methods -
Platform Illumina [HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip]
Library Source DNAs extracted from peripheral blood cells
Cell Lines -
Reagents (Kit, Version) HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip
Genotype Call Methods (software) Eagle, Minimac3
Association Analysis (software)

GWAS: For binary traits, SAIGE software was used with age, age2, sex, age×sex, age2×sex, and top 20 principal components as covariates. For quantitative traits (biomarkers), BOLT-LMM was used with the same covariates.

Fine-mapping: FINEMAP and SuSiE were used with GWAS summary statistics and in-sample dosage LD, allowing up to 10 causal variants per region.

Filtering Methods

GWAS: We included imputed variants with Rsq > 0.7. For binary traits, variants with MAC < 10 were additionally excluded.

Fine-mapping: We defined fine-mapping regions based on a 3 Mb window around each lead variant and merged regions if they overlapped. We excluded the major histocompatibility complex (MHC) region (chr 6: 25–36 Mb) from analysis due to extensive LD structure in the region. For each method, we only included variants from successfully fine-mapped regions while excluding those from failed regions (e.g., due to conversion failure or available memory restrictions).

Marker Number (after QC) 13,531,752 variants (ref: hg19)
NBDC Data Set ID

hum0197.v5.gwas.v1 / hum0197.v5.finemap.v1

(Click the Data Set ID to download the file)

Dictionary file

Total Data Volume 14 GB
Comments (Policies) NBDC policy

 

JGAS000504

Participants/Materials: 141 Japanese individuals
Targets small RNA-seq
Target Loci for Capture Methods -
Platform Illumina [HiSeq 2500]
Library Source RNAs extracted from PBMC
Cell Lines -
Library Construction (kit name) SMARTer smRNA-Seq Kit
Fragmentation Methods -
Spot Type Single-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers) 100 bp
Mapping Methods bowtie (GRCh37)
Detecting method for read count (software) featureCounts + miRbase v22
QC We performed adapter trimming using Cutadapt v1.8 and removed reads with a low quality score (Phred quality score < 20 in >20% of total bases) using fastp v0.20.0. Also, we removed reads with a length of >29 bp or <15 bp, which are not expected to be mature miRNAs. Mature miRNAs detected with ≥1 read in at least half of the individuals were included in the dataset.
miRNA number 343
Japanese Genotype-phenotype Archive Data set ID JGAD000621
Total Data Volume 54.7 KB (txt)
Comments (Policies) NBDC policy

 

hum0197.v6.eqtl.v1

Participants/Materials 141 Japanese individuals
Targets eQTL
Target Loci for Capture Methods -
Platform

small RNA-seq: Illumina [HiSeq 2500]

WGS: Illumina [HiSeq X Ten]

Library Source read count data of JGAS000504 and whole genome sequencing data using genomic DNA exracted from whole blood
Cell Lines -
Reagents (Kit, Version)

small RNA-seq: See JGAS000504

WGS: TruSeq DNA PCR-Free Library Preparation Kit

Genotype Call / Detecting read count Methods (software)

See JGAS000504 for read count data.

WGS: Sequenced reads were aligned against the reference human genome with the decoy sequence (GRCh37, human_g1k_v37_decoy) using BWA-MEM v0.7.13.

QC

See JGAS000504 for read count data.

WGS: We removed the variants (i)with low genotyping call rates (<0.90), (ii)with ExcessHet > 60 or (iii) with Hardy–Weinberg Pvalue < 1.0 × 10−10. Genotype refinement was performed using Beagle v5.1.

Marker Number (after QC)

See JGAS000504 for read count data.

WGS: 12,171,854 variants

eQTL algorithm We analyzed the association between genetic variants with minor allele frequency (MAF) ≥ 0.01 within a cis-window around each miRNA (±1 Mb of the mature miRNA) and normalized expression values using MatrixEQTL v2.3.
NBDC Data Set ID

hum0197.v6.eqtl.v1

(Click the Data Set ID to download the file)

Dictionary file

Total Data Volume 1.1 MB (txt)
Comments (Policies) NBDC policy

 

hum0197.v9.gwas.GCT.v1

Participants/Materials

Intracranial germ cell tumors cases (ICD10: C719): 133

Control participants: 762

Targets genome wide SNPs
Target Loci for Capture Methods -
Platform Illumina [Infinium Asian Screening Array]
Library Source DNAs extracted from peripheral blood cells
Cell Lines -
Reagents (Kit, Version) Infinium Asian Screening Array
Genotype Call Methods (software)

GenomeStudio for genotyping

shapeit2 for haplotype phasing

minimac3 for imputation

Association Analysis (software) PLINK2
Filtering Methods

Sample QC:

We excluded individuals (i) with genotyping call rate < 0.97, (ii) in close kinship (PI_HAT > 0.17), and (iii) estimated of non-East Asian ancestry were excluded.

Variant QC:

We excluded variants with (i) genotyping call rate < 0.99, (ii) minor allele count < 5, (iii) P value for Hardy–Weinberg equilibrium < 1.0 × 10−5 in controls, and (iv) > 10% allele frequency difference with the imputation reference panel or the allele frequency panel of Tohoku Medical Megabank Project.

Post-imputation QC:

We excluded imputed variants with Rsq < 0.7 and minor allele frequency < 0.5%.

Marker Number (after QC)

7,803,874 autosomal variants

181,867 X-chromosomal variants

NBDC Data Set ID

hum0197.v9.gwas.GCT.v1

(Click the Data Set ID to download the file)

Dictionary file

Total Data Volume 248 MB (txt)
Comments (Policies) NBDC policy

 

hum0197.v10.gwas.v1

Participants/Materials

Biobank Japan (n=161,801), UK biobank (n=377,583), no. Phenotypes: 9

   Patients: Autoimmune [Rheumatoid arthritis (ICD10: M05), Graves' disease (ICD10: C719), type I diabetes mellitus (ICD10: E10)]

                  Allergy [asthma (ICD10: J45), Atopic dermatitis (ICD10: L20), Pollinosis (ICD10: J301)]

   Controls: non-autoimmune +non-allergy individuals

(There is overlap among patients in each disease category)

Targets genome wide SNPs
Target Loci for Capture Methods -
Platform

BBJ: Illumina [HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip]

UK Biobank: Applied Biosystems [UK BiLEVE Axiom Array, UK Biobank Axiom Array]

Library Source DNAs extracted from peripheral blood cells
Cell Lines -
Reagents (Kit, Version)

BBJ: HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip

UK Biobank: UK BiLEVE Axiom Array, UK Biobank Axiom Array

Genotype Call Methods (software)

BBJ: Eagle, Minimac3

UK Biobank: IMPUTE4

Association Analysis (software)

SAIGE software was used with age, sex, and top five principal components as covariates.

RE2C software was used for the multi-trait meta-analysis adjusting for sample overlap between GWAS summary data.

Filtering Methods

We excluded the variants with Rsq < 0.7 and MAF < 0.005.

Marker Number (after QC)

BBJ: 8,374,220 autosomal variants for individual trait / 8,369,174 autosomal variants for meta-analysis

UKB: 10,864,380 autosomal variants for individual trait / 10,858,065 autosomal variants for meta-analysis

BBJ + UK Biobank: 5,965,154 autosomal variants for meta-analysis

NBDC Data Set ID

hum0197.v10.gwas.v1

(Click the Data Set ID to download the files)

Dictionary file

Total Data Volume

BBJ: ~ 760MB for individual trait / ~ 430MB for multi-trait meta-analysis

UK Biobank: ~ 1.1GB for individual trait / ~ 550MB for multi-trait meta-analysis

BBJ+UK Biobank: ~ 310MB for multi-trait meta-analysis

Comments (Policies) NBDC policy

 

DATA PROVIDER

Principal Investigator: Yukinori Okada

Affiliation: Department of Statistical Genetics, Osaka University Graduate School of Medicine

Project / Group Name: -

Funds / Grants (Research Project Number):

NameTitleProject Number
Precursory Research for Innovative Medical care (PRIME), Advanced Research & Development Programs for Medical Innovation, Japan Agency for Medical Research and Development (AMED) Crosstalk among microbiome, host, disease, and drug discovery enhanced by statistical genetics JP19gm6010001
FORCE, Advanced Research & Development Programs for Medical Innovation, Japan Agency for Medical Research and Development (AMED) Elucidation of disease-specific microbiota and personalized medicine by metagenome-wide association studies JP20gm4010006
Practical Research Project for Rare / Intractable Diseases, Japan Agency for Medical Research and Development (AMED) Biology and in silico drug repositioning of pulmonary alveolar proteinosis using trans-layer omics analysis JP20ek0109413
Practical Research Project for Allergic Diseases and Immunology, Japan Agency for Medical Research and Development (AMED) Nucleic genome drug discovery for autoimmune diseases through in-silico and patient-oriented screening utilizing large-scale disease genetics JP19ek0410041
Practical Research Project for Allergic Diseases and Immunology, Japan Agency for Medical Research and Development (AMED) Genomic prediction medicine of rheumatoid arthritis based on comprehensive immune-omics resources JP21ek0410075
Platform Program for Promotion of Genome Medicine, Japan Agency for Medical Research and Development (AMED) Implementation of genomic prediction medicine based on statistical genetics JP21km0405211
Platform Program for Promotion of Genome Medicine, Japan Agency for Medical Research and Development (AMED) Next-generation genomics analyses elucidates biology, personalized medicine, and drug discovery of psoriasis JP21km0405217
KAKENHI Grant-in-Aid for Scientific Research (A) Elucidation of disease biology and tissue specificity by trans-layer omics analysis and whole-genome sequencing 19H01021

 

PUBLICATIONS

TitleDOIData Set ID
1 Metagenome-wide association study of gut microbiome revealed novel aetiology of rheumatoid arthritis in the Japanese population. doi: 10.1136/annrheumdis-2019-215743 JGAD000290
2 Genetic determinants of risk in autoimmune pulmonary alveolar proteinosis. doi: 10.1038/s41467-021-21011-y hum0197.v2.gwas.v1
3 A metagenome-wide association study of gut microbiome in patients with multiple sclerosis revealed novel disease pathology. doi: 10.3389/fcimb.2020.585973 JGAD000363
4 A global atlas of genetic associations of 220 deep phenotypes doi: 10.1101/2020.10.23.20213652 hum0197.v3.gwas.v1
5 Metagenome-wide association study revealed disease-specific landscape of the gut microbiome of systemic lupus erythematosus in Japanese doi: 10.1136/annrheumdis-2021-220687 JGAD000427
6 Whole gut virome analysis of 476 Japanese revealed a link between phage and autoimmune disease doi: 10.1136/annrheumdis-2021-221267 JGAD000532
7 Insights from complex trait fine-mapping across diverse populations doi: 10.1101/2021.09.03.21262975

hum0197.v5.gwas.v1

hum0197.v5.finemap.v1

8 Genetic architecture of microRNA expression and its link to complex diseases in the Japanese population. doi: 10.1093/hmg/ddab361

JGAD000621

hum0197.v6.eqtl.v1

9 Multi-trait and cross-population genome-wide association studies across autoimmune and allergic diseases identify shared and distinct genetic components. doi: 10.1136/annrheumdis-2022-222460 hum0197.v10.gwas.v1

 

USRES (Controlled-Access Data)

Principal InvestigatorAffiliationCountry/RegionResearch TitleData in Use (Data Set ID)Period of Data Use
Ilana Brito Meinig School of Biomedical Engineering, Cornell University United States of America Comparative metagenomics of lupus patients' microbiomes JGAD000290, JGAD000363, JGAD000427, JGAD000532 2022/05/12-2024/05/04