NBDC Research ID: hum0197.v10
Click to Latest version.
SUMMARY
Aims: Elucidation of disease biology based on trans-omics analysis, GWAS in the Japanese and trans-ethnic populations
Methods: Metagenome shotgun sequencing, genome-wide association study (GWAS), small RNA-seq and eQTL analyses
Participants/Materials:
Metagenomic data of gut microbiome in the Japanese population (95 + 103 + 227 + 30 + 136 individuals)
Autoimmune pulmonary alveolar proteinosis cases: 198, Control participants: 395
Populations: Biobank Japan (n = 179,000), UK biobank (n = 361,000), ans FinnGen (n = 136,000), Phenotypes: 215
141 Japanese individuals
Metagenomic data of gut microbiome in Inflammatory Bowel Disease (35 Ulcerative Colitis and 39 Crohn's disease) and 40 Healthy controls
Intracranial germ cell tumors cases: 133, Control participants: 762
Populations: Biobank Japan (n = 161,801) and UK biobank (n = 377,583), Phenotypes: 9
Data Set ID | Type of Data | Criteria | Release Date |
---|---|---|---|
JGAS000205 | Metagenome | Controlled Access (Type I) | 2019/11/15 |
hum0197.v2.gwas.v1 | GWAS for autoimmune pulmonary alveolar proteinosis | Un-restricted Access | 2020/11/27 |
JGAS000260 | Metagenome | Controlled Access (Type I) | 2020/11/27 |
hum0197.v3.gwas.v1 | GWAS for 215 phenotypes | Un-restricted Access | 2021/03/22 |
JGAS000316 | Metagenome | Controlled Access (Type I) | 2021/10/12 |
JGAS000415 | Metagenome | Controlled Access (Type I) | 2021/12/10 |
hum0197.v5.gwas.v1 | GWAS for 10 phenotypes | Un-restricted Access | 2021/12/21 |
hum0197.v5.finemap.v1 | Fine-mapping for 79 phenotypes | Un-restricted Access | 2021/12/21 |
JGAS000504 | Read count data of miRNA | Controlled Access (Type I) | 2022/02/08 |
hum0197.v6.eqtl.v1 | eQTL data | Un-restricted Access | 2022/02/08 |
JGAS000530 | Metagenome | Controlled Access (Type I) | 2022/05/23 |
JGAS000531 | Metagenome | Controlled Access (Type I) | 2022/06/03 |
hum0197.v9.gwas.GCT.v1 | GWAS for intracranial germ cell tumors | Un-restricted Access | 2022/06/10 |
hum0197.v10.gwas.v1 | GWAS for 9 phenotypes | Un-restricted Access | 2022/06/16 |
*Data users need to apply an application for Using NBDC Human Data to reach the Controlled-access Data. Learn more
*When the research results including the data which were downloaded from NHA/DRA, are published or presented somewhere, the data user must refer the papers which are related to the data, or include in the acknowledgment. Learn more
MOLECULAR DATA
JGAS000205/JGAS000260/JGAS000316/JGAS000415/JGAS000530/JGAS000531
Participants/Materials: |
95+103+227+30+136 Japanese individuals Inflammatory Bowel Disease 35 Ulcerative Colitis 39 Crohn's disease 40 Healthy controls |
Targets | Metagenome |
Target Loci for Capture Methods | - |
Platform | Illumina [HiSeq 3000] |
Library Source | DNA extracted from gut microbiome |
Cell Lines | - |
Library Construction (kit name) | KAPA Hyper Prep Kit |
Fragmentation Methods | Ultrasonic fragmentation (Covaris) |
Spot Type | Paired-end |
Read Length (without Barcodes, Adaptors, Primers, and Linkers) | 150 bp |
Japanese Genotype-phenotype Archive Data set ID |
JGAD000290 (95 Japanese individuals) JGAD000363 (103 Japanese individuals) JGAD000427 (227 Japanese individuals) JGAD000532 (30 Japanese individuals) JGAD000649 (Inflammatory Bowel Disease) JGAD000650 (136 Japanese individuals) |
Total Data Volume |
JGAD000290:477 GB(fastq) JGAD000363:408 GB(fastq) JGAD000427:881.2 GB(fastq) JGAD000532:106.7 GB(fastq) JGAD000649:374.6 GB (fastq) JGAD000650:541.4 GB(fastq) |
Comments (Policies) | NBDC policy |
Participants/Materials |
Autoimmune pulmonary alveolar proteinosis cases (ICD10: J840): 198 Control participants: 395 |
Targets | genome wide SNPs |
Target Loci for Capture Methods | - |
Platform | Illumina [Infinium Asian Screening Array] |
Library Source | DNAs extracted from peripheral blood cells |
Cell Lines | - |
Reagents (Kit, Version) | Infinium Asian Screening Array |
Genotype Call Methods (software) | GenomeStudio for genotyping, shapeit2 for haplotype phasing, and minimac3 for imputation |
Association Analysis (software) | PLINK2 |
Filtering Methods |
Sample QC: We excluded samples with low genotyping call rates (call rate < 98%) and in close genetic relation (PI_HAT > 0.175). We included samples of the estimated East Asian ancestry. Variant QC: We excluded variants with (1) genotyping call rate < 98%, (2) P value for Hardy–Weinberg equilibrium < 1.0 × 10−6, and (3) minor allele count < 5, or (4) > 10% frequency difference with the imputation reference panel. |
Marker Number (after QC) | 12,153,232 autosomal variants and 242,876 X-chromosomal variants after QC. |
NBDC Data Set ID |
(Click the Data Set ID to download the file) |
Total Data Volume | 390MB for autosome (txt.gz) and 19MB for X chromosome (txt.gz) |
Comments (Policies) | NBDC policy |
Participants/Materials | Biobank Japan (n = 179,000), UK biobank (n = 361,000), FinnGen (n = 136,000), no. Phenotypes: 215 |
Targets | genome wide SNPs |
Target Loci for Capture Methods | - |
Platform |
BBJ: Illumina [HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip] UK Biobank: Applied Biosystems [UK BiLEVE Axiom Array, UK Biobank Axiom Array] FinnGen: Thermo Fisher Scientific [FinnGen1 ThermoFisher Array or other genotyping arrays] |
Library Source | DNAs extracted from peripheral blood cells |
Cell Lines | - |
Reagents (Kit, Version) |
BBJ: HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip UK Biobank: UK BiLEVE Axiom Array, UK Biobank Axiom Array FinnGen: FinnGen1 ThermoFisher Array or other genotyping arrays |
Genotype Call Methods (software) |
BBJ: Eagle, Minimac3 UK Biobank: IMPUTE4 FinnGen: beagle4.1 |
Association Analysis (software) |
For binary traits, SAIGE software was used with age, age2, sex, age×sex, age2×sex, and top 20 principal components as covariates. For quantitative traits (biomarkers), BOLT-LMM or plink software was used with the same covariates.
|
Filtering Methods |
BBJ: We included imputed variants with Rsq > 0.7. UK Biobank: We excluded the variants with (i) INFO score ≤ 0.8, (ii) MAF ≤ 0.0001 (except for missense and protein-truncating variants annotated by VEP, which were excluded if MAF ≤ 1 × 10-6), and (iii) PHWE ≤ 1 × 10-10. FinnGen: We excluded variants with an imputation INFO score < 0.8 or MAF < 0.0001. |
Marker Number (after QC) |
BBJ: 13,530,797 variants UK Biobank: 13,791,467 variants FinnGen: 16,859,359 variants |
NBDC Data Set ID |
(Click the Data Set ID to download the file) |
Total Data Volume |
BBJ: ~1.5G for autosome and ~33M for chrX UK Biobank: ~1.5G for autosome and ~15M for chrX FinnGen: ~740M for autosome and ~20M for chrX |
Comments (Policies) | NBDC policy |
hum0197.v5.gwas.v1 / hum0197.v5.finemap.v1
Participants/Materials | Biobank Japan (n = 179,000), no. Phenotypes: 79 |
Targets | genome wide SNPs |
Target Loci for Capture Methods | - |
Platform | Illumina [HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip] |
Library Source | DNAs extracted from peripheral blood cells |
Cell Lines | - |
Reagents (Kit, Version) | HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip |
Genotype Call Methods (software) | Eagle, Minimac3 |
Association Analysis (software) |
GWAS: For binary traits, SAIGE software was used with age, age2, sex, age×sex, age2×sex, and top 20 principal components as covariates. For quantitative traits (biomarkers), BOLT-LMM was used with the same covariates. Fine-mapping: FINEMAP and SuSiE were used with GWAS summary statistics and in-sample dosage LD, allowing up to 10 causal variants per region. |
Filtering Methods |
GWAS: We included imputed variants with Rsq > 0.7. For binary traits, variants with MAC < 10 were additionally excluded. Fine-mapping: We defined fine-mapping regions based on a 3 Mb window around each lead variant and merged regions if they overlapped. We excluded the major histocompatibility complex (MHC) region (chr 6: 25–36 Mb) from analysis due to extensive LD structure in the region. For each method, we only included variants from successfully fine-mapped regions while excluding those from failed regions (e.g., due to conversion failure or available memory restrictions). |
Marker Number (after QC) | 13,531,752 variants (ref: hg19) |
NBDC Data Set ID |
hum0197.v5.gwas.v1 / hum0197.v5.finemap.v1 (Click the Data Set ID to download the file) |
Total Data Volume | 14 GB |
Comments (Policies) | NBDC policy |
Participants/Materials: | 141 Japanese individuals |
Targets | small RNA-seq |
Target Loci for Capture Methods | - |
Platform | Illumina [HiSeq 2500] |
Library Source | RNAs extracted from PBMC |
Cell Lines | - |
Library Construction (kit name) | SMARTer smRNA-Seq Kit |
Fragmentation Methods | - |
Spot Type | Single-end |
Read Length (without Barcodes, Adaptors, Primers, and Linkers) | 100 bp |
Mapping Methods | bowtie (GRCh37) |
Detecting method for read count (software) | featureCounts + miRbase v22 |
QC | We performed adapter trimming using Cutadapt v1.8 and removed reads with a low quality score (Phred quality score < 20 in >20% of total bases) using fastp v0.20.0. Also, we removed reads with a length of >29 bp or <15 bp, which are not expected to be mature miRNAs. Mature miRNAs detected with ≥1 read in at least half of the individuals were included in the dataset. |
miRNA number | 343 |
Japanese Genotype-phenotype Archive Data set ID | JGAD000621 |
Total Data Volume | 54.7 KB (txt) |
Comments (Policies) | NBDC policy |
Participants/Materials | 141 Japanese individuals |
Targets | eQTL |
Target Loci for Capture Methods | - |
Platform |
small RNA-seq: Illumina [HiSeq 2500] WGS: Illumina [HiSeq X Ten] |
Library Source | read count data of JGAS000504 and whole genome sequencing data using genomic DNA exracted from whole blood |
Cell Lines | - |
Reagents (Kit, Version) |
small RNA-seq: See JGAS000504 WGS: TruSeq DNA PCR-Free Library Preparation Kit |
Genotype Call / Detecting read count Methods (software) |
See JGAS000504 for read count data. WGS: Sequenced reads were aligned against the reference human genome with the decoy sequence (GRCh37, human_g1k_v37_decoy) using BWA-MEM v0.7.13. |
QC |
See JGAS000504 for read count data. WGS: We removed the variants (i)with low genotyping call rates (<0.90), (ii)with ExcessHet > 60 or (iii) with Hardy–Weinberg Pvalue < 1.0 × 10−10. Genotype refinement was performed using Beagle v5.1. |
Marker Number (after QC) |
See JGAS000504 for read count data. WGS: 12,171,854 variants |
eQTL algorithm | We analyzed the association between genetic variants with minor allele frequency (MAF) ≥ 0.01 within a cis-window around each miRNA (±1 Mb of the mature miRNA) and normalized expression values using MatrixEQTL v2.3. |
NBDC Data Set ID |
(Click the Data Set ID to download the file) |
Total Data Volume | 1.1 MB (txt) |
Comments (Policies) | NBDC policy |
Participants/Materials |
Intracranial germ cell tumors cases (ICD10: C719): 133 Control participants: 762 |
Targets | genome wide SNPs |
Target Loci for Capture Methods | - |
Platform | Illumina [Infinium Asian Screening Array] |
Library Source | DNAs extracted from peripheral blood cells |
Cell Lines | - |
Reagents (Kit, Version) | Infinium Asian Screening Array |
Genotype Call Methods (software) |
GenomeStudio for genotyping shapeit2 for haplotype phasing minimac3 for imputation |
Association Analysis (software) | PLINK2 |
Filtering Methods |
Sample QC: We excluded individuals (i) with genotyping call rate < 0.97, (ii) in close kinship (PI_HAT > 0.17), and (iii) estimated of non-East Asian ancestry were excluded. Variant QC: We excluded variants with (i) genotyping call rate < 0.99, (ii) minor allele count < 5, (iii) P value for Hardy–Weinberg equilibrium < 1.0 × 10−5 in controls, and (iv) > 10% allele frequency difference with the imputation reference panel or the allele frequency panel of Tohoku Medical Megabank Project. Post-imputation QC: We excluded imputed variants with Rsq < 0.7 and minor allele frequency < 0.5%. |
Marker Number (after QC) |
7,803,874 autosomal variants 181,867 X-chromosomal variants |
NBDC Data Set ID |
(Click the Data Set ID to download the file) |
Total Data Volume | 248 MB (txt) |
Comments (Policies) | NBDC policy |
Participants/Materials |
Biobank Japan (n=161,801), UK biobank (n=377,583), no. Phenotypes: 9 Patients: Autoimmune [Rheumatoid arthritis (ICD10: M05), Graves' disease (ICD10: C719), type I diabetes mellitus (ICD10: E10)] Allergy [asthma (ICD10: J45), Atopic dermatitis (ICD10: L20), Pollinosis (ICD10: J301)] Controls: non-autoimmune +non-allergy individuals (There is overlap among patients in each disease category) |
Targets | genome wide SNPs |
Target Loci for Capture Methods | - |
Platform |
BBJ: Illumina [HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip] UK Biobank: Applied Biosystems [UK BiLEVE Axiom Array, UK Biobank Axiom Array] |
Library Source | DNAs extracted from peripheral blood cells |
Cell Lines | - |
Reagents (Kit, Version) |
BBJ: HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip UK Biobank: UK BiLEVE Axiom Array, UK Biobank Axiom Array |
Genotype Call Methods (software) |
BBJ: Eagle, Minimac3 UK Biobank: IMPUTE4 |
Association Analysis (software) |
SAIGE software was used with age, sex, and top five principal components as covariates. RE2C software was used for the multi-trait meta-analysis adjusting for sample overlap between GWAS summary data. |
Filtering Methods |
We excluded the variants with Rsq < 0.7 and MAF < 0.005. |
Marker Number (after QC) |
BBJ: 8,374,220 autosomal variants for individual trait / 8,369,174 autosomal variants for meta-analysis UKB: 10,864,380 autosomal variants for individual trait / 10,858,065 autosomal variants for meta-analysis BBJ + UK Biobank: 5,965,154 autosomal variants for meta-analysis |
NBDC Data Set ID |
(Click the Data Set ID to download the files) |
Total Data Volume |
BBJ: ~ 760MB for individual trait / ~ 430MB for multi-trait meta-analysis UK Biobank: ~ 1.1GB for individual trait / ~ 550MB for multi-trait meta-analysis BBJ+UK Biobank: ~ 310MB for multi-trait meta-analysis |
Comments (Policies) | NBDC policy |
DATA PROVIDER
Principal Investigator: Yukinori Okada
Affiliation: Department of Statistical Genetics, Osaka University Graduate School of Medicine
Project / Group Name: -
Funds / Grants (Research Project Number):
Name | Title | Project Number |
---|---|---|
Precursory Research for Innovative Medical care (PRIME), Advanced Research & Development Programs for Medical Innovation, Japan Agency for Medical Research and Development (AMED) | Crosstalk among microbiome, host, disease, and drug discovery enhanced by statistical genetics | JP19gm6010001 |
FORCE, Advanced Research & Development Programs for Medical Innovation, Japan Agency for Medical Research and Development (AMED) | Elucidation of disease-specific microbiota and personalized medicine by metagenome-wide association studies | JP20gm4010006 |
Practical Research Project for Rare / Intractable Diseases, Japan Agency for Medical Research and Development (AMED) | Biology and in silico drug repositioning of pulmonary alveolar proteinosis using trans-layer omics analysis | JP20ek0109413 |
Practical Research Project for Allergic Diseases and Immunology, Japan Agency for Medical Research and Development (AMED) | Nucleic genome drug discovery for autoimmune diseases through in-silico and patient-oriented screening utilizing large-scale disease genetics | JP19ek0410041 |
Practical Research Project for Allergic Diseases and Immunology, Japan Agency for Medical Research and Development (AMED) | Genomic prediction medicine of rheumatoid arthritis based on comprehensive immune-omics resources | JP21ek0410075 |
Platform Program for Promotion of Genome Medicine, Japan Agency for Medical Research and Development (AMED) | Implementation of genomic prediction medicine based on statistical genetics | JP21km0405211 |
Platform Program for Promotion of Genome Medicine, Japan Agency for Medical Research and Development (AMED) | Next-generation genomics analyses elucidates biology, personalized medicine, and drug discovery of psoriasis | JP21km0405217 |
KAKENHI Grant-in-Aid for Scientific Research (A) | Elucidation of disease biology and tissue specificity by trans-layer omics analysis and whole-genome sequencing | 19H01021 |
PUBLICATIONS
Title | DOI | Data Set ID | |
---|---|---|---|
1 | Metagenome-wide association study of gut microbiome revealed novel aetiology of rheumatoid arthritis in the Japanese population. | doi: 10.1136/annrheumdis-2019-215743 | JGAD000290 |
2 | Genetic determinants of risk in autoimmune pulmonary alveolar proteinosis. | doi: 10.1038/s41467-021-21011-y | hum0197.v2.gwas.v1 |
3 | A metagenome-wide association study of gut microbiome in patients with multiple sclerosis revealed novel disease pathology. | doi: 10.3389/fcimb.2020.585973 | JGAD000363 |
4 | A global atlas of genetic associations of 220 deep phenotypes | doi: 10.1101/2020.10.23.20213652 | hum0197.v3.gwas.v1 |
5 | Metagenome-wide association study revealed disease-specific landscape of the gut microbiome of systemic lupus erythematosus in Japanese | doi: 10.1136/annrheumdis-2021-220687 | JGAD000427 |
6 | Whole gut virome analysis of 476 Japanese revealed a link between phage and autoimmune disease | doi: 10.1136/annrheumdis-2021-221267 | JGAD000532 |
7 | Insights from complex trait fine-mapping across diverse populations | doi: 10.1101/2021.09.03.21262975 |
hum0197.v5.gwas.v1 hum0197.v5.finemap.v1 |
8 | Genetic architecture of microRNA expression and its link to complex diseases in the Japanese population. | doi: 10.1093/hmg/ddab361 |
JGAD000621 hum0197.v6.eqtl.v1 |
9 | Multi-trait and cross-population genome-wide association studies across autoimmune and allergic diseases identify shared and distinct genetic components. | doi: 10.1136/annrheumdis-2022-222460 | hum0197.v10.gwas.v1 |
USRES (Controlled-Access Data)
Principal Investigator | Affiliation | Country/Region | Research Title | Data in Use (Data Set ID) | Period of Data Use |
---|---|---|---|---|---|
Ilana Brito | Meinig School of Biomedical Engineering, Cornell University | United States of America | Comparative metagenomics of lupus patients' microbiomes | JGAD000290, JGAD000363, JGAD000427, JGAD000532 | 2022/05/12-2024/05/04 |