NBDC Research ID: hum0197.v21
SUMMARY
Aims: Elucidation of disease biology based on trans-omics analysis, GWAS in the Japanese and trans-ethnic populations, Elucidation of the mechanism of COVID-19 severity, Improving the performance of type 2 diabetes polygenic predictions, Elucidation of the genetic architecture of recurrent pregnancy loss, Elucidation of the association between Jomon component in the Japanese population and phenotypes and diseases
Methods: Metagenome shotgun sequencing, genome-wide association study (GWAS), small RNA-seq and eQTL analyses, whole genome sequencing (WGS)
Participants/Materials:
Metagenomic data of gut microbiome in the Japanese population (95 + 103 + 227 + 30 + 136 individuals)
Autoimmune pulmonary alveolar proteinosis cases: 198, Control participants: 395
Populations: Biobank Japan (n = 179,000), UK biobank (n = 361,000), and FinnGen (n = 136,000), Phenotypes: 220
141 Japanese individuals
Metagenomic data of gut microbiome in Inflammatory Bowel Disease (35 Ulcerative Colitis and 39 Crohn's disease) and 40 Healthy controls
Intracranial germ cell tumors cases: 133, Control participants: 762
Populations: Biobank Japan (n = 161,801) and UK biobank (n = 377,583), Phenotypes: 9
PBMC from Japanese population (COVID-19: n = 30 + 43, Healthy controls: n = 31 + 44)
Microbial genome: Metagenome-Assembled Genome (MAG), Viral genome, CRISPR spacers
Metagenomic data of gut microbiome in the Japanese population (88 + 5 individuals) and healthy individuals (n = 73)
BioBank Japan (n=180,215), UK Biobank (n=377,441), and large-scale meta-analysis including the summary statistics of other cohorts [FinnGen, Breast Cancer Association Consortium (BCAC), and Prostate Cancer Association Group to Investigate Cancer Associated Alterations in the Genome (PRACTICAL)] for breast and prostate cancer (n=648,746 and 482,080), Phenotypes: 15
Hunner-type interstitial cystitis cases: 144, Control participants: 41,516
524 Japanese individuals for gut microbiome–host genome association analysis, 362 Japanese individuals for plasma metabolite–host genome association analysis
The weights of variants existing in the target cohorts, Tohoku Medical Megabank and the second cohort of BBJ, calculated from GWAS results on 27,642 type 2 diabetes cases and 70,242 controls from BioBank Japan and UK Biobank
Recurrent pregnancy loss cases: 1,728, Control participants: 24,315
Autoimmune diseases cases: 2,238, Healthy controls: 2,919
The first cohort of BioBank Japan (n = 171,287)
Dataset ID | Type of Data | Criteria | Release Date |
---|---|---|---|
JGAS000205 | Metagenome | Controlled-access (Type I) | 2019/11/15 |
hum0197.v2.gwas.v1 | GWAS for autoimmune pulmonary alveolar proteinosis | Unrestricted-access | 2020/11/27 |
JGAS000260 | Metagenome | Controlled-access (Type I) | 2020/11/27 |
hum0197.v3.gwas.v1 | GWAS for 215 phenotypes | Unrestricted-access | 2021/03/22 |
JGAS000316 | Metagenome | Controlled-access (Type I) | 2021/10/12 |
JGAS000415 | Metagenome | Controlled-access (Type I) | 2021/12/10 |
hum0197.v5.gwas.v1 | GWAS for 10 phenotypes | Unrestricted-access | 2021/12/21 |
hum0197.v5.finemap.v1 | Fine-mapping for 79 phenotypes | Unrestricted-access | 2021/12/21 |
JGAS000504 | Read count data of miRNA | Controlled-access (Type I) | 2022/02/08 |
hum0197.v6.eqtl.v1 | eQTL data | Unrestricted-access | 2022/02/08 |
JGAS000530 | Metagenome | Controlled-access (Type I) | 2022/05/23 |
JGAS000531 | Metagenome | Controlled-access (Type I) | 2022/06/03 |
hum0197.v9.gwas.GCT.v1 | GWAS for intracranial germ cell tumors | Unrestricted-access | 2022/06/10 |
hum0197.v10.gwas.v1 | GWAS for 9 phenotypes | Unrestricted-access | 2022/06/16 |
JGAS000543 | Raw sequencing data of single-cell RNA-seq | Controlled-access (Type I) | 2022/07/21 |
hum0197.v12 | MAG, Viral genome and CRISPR spacers of Microbial genome | Unrestricted-access | 2022/12/01 |
JGAS000543 (data addition) | clinical data | Controlled-access (Type I) | 2023/02/14 |
JGAS000593 | Raw sequencing data of single-cell RNA-seq, clinical data | Controlled-access (Type I) | 2023/02/14 |
hum0197.v3.gwas.v1 (data addition) | GWAS for 5 phenotypes | Unrestricted-access | 2023/02/16 |
JGAS000600 | Metagenome | Controlled-access (Type I) | 2023/03/29 |
hum0197.v16.gwas.v1 | GWAS for 15 phenotypes | Unrestricted-access | 2023/06/06 |
hum0197.v17.hic-gwas.v1 | GWAS for Hunner-type interstitial cystitis | Unrestricted-access | 2023/06/27 |
hum0197.v18.gwas.v1 |
GWAS for gut microbiome GWAS for plasma metabolite GWAS for KEGG Gene Ortholog and KEGG Pathway |
Unrestricted-access | 2023/10/02 |
hum0197.v19.prs.v1 | The weights of variants calculated from GWAS results on type 2 diabetes | Unrestricted-access | 2024/05/29 |
hum0197.v20.gwas.v1 | GWAS for recurrent pregnancy loss | Unrestricted-access | 2024/05/30 |
hum0197.v21.gwas-ehhv6.v1 | GWAS for autoimmune diseases | Unrestricted-access | 2024/10/28 |
JGAS000741 | Controlled-access (Type I) | 2024/10/28 | |
hum0197.v21.gwas-jomon.v1 | GWAS for the individual Jomon proportions | Unrestricted-access | 2024/10/28 |
* Data users need to apply an application for Using NBDC Human Data to reach the Controlled-access Data. Learn more
*When the research results including the data which were downloaded from NHA/DRA, are published or presented somewhere, the data user must refer the papers which are related to the data, or include in the acknowledgment. Learn more
MOLECULAR DATA
JGAS000205 / JGAS000260 / JGAS000316 / JGAS000415 / JGAS000530 / JGAS000531
Participants/Materials: |
95+103+227+30+136 Japanese individuals Inflammatory Bowel Disease 35 Ulcerative Colitis 39 Crohn's disease 40 Healthy controls |
Targets | Metagenome |
Target Loci for Capture Methods | - |
Platform | Illumina [HiSeq 3000, NovaSeq 6000] |
Library Source | DNA extracted from gut microbiome |
Cell Lines | - |
Library Construction (kit name) | KAPA Hyper Prep Kit |
Fragmentation Methods | Ultrasonic fragmentation (Covaris) |
Spot Type | Paired-end |
Read Length (without Barcodes, Adaptors, Primers, and Linkers) | 150 bp |
Japanese Genotype-phenotype Archive Dataset ID |
JGAD000290 (95 Japanese individuals) JGAD000363 (103 Japanese individuals) JGAD000427 (227 Japanese individuals) JGAD000532 (30 Japanese individuals) JGAD000649 (Inflammatory Bowel Disease) JGAD000650 (136 Japanese individuals) |
Total Data Volume |
JGAD000290:477 GB(fastq) JGAD000363:408 GB(fastq) JGAD000427:881.2 GB(fastq) JGAD000532:106.7 GB(fastq) JGAD000649:374.6 GB (fastq) JGAD000650:541.4 GB(fastq) |
Comments (Policies) | NBDC policy |
Participants/Materials |
Autoimmune pulmonary alveolar proteinosis cases (ICD10: J840): 198 Control participants: 395 |
Targets | genome wide SNPs |
Target Loci for Capture Methods | - |
Platform | Illumina [Infinium Asian Screening Array] |
Library Source | DNAs extracted from peripheral blood cells |
Cell Lines | - |
Reagents (Kit, Version) | Infinium Asian Screening Array |
Genotype Call Methods (software) | GenomeStudio for genotyping, shapeit2 for haplotype phasing, and minimac3 for imputation |
Association Analysis (software) | PLINK2 |
Filtering Methods |
Sample QC: We excluded samples with low genotyping call rates (call rate < 98%) and in close genetic relation (PI_HAT > 0.175). We included samples of the estimated East Asian ancestry. Variant QC: We excluded variants with (1) genotyping call rate < 98%, (2) P value for Hardy–Weinberg equilibrium < 1.0 × 10−6, and (3) minor allele count < 5, or (4) > 10% frequency difference with the imputation reference panel. |
Marker Number (after QC) | 12,153,232 autosomal variants and 242,876 X-chromosomal variants after QC. |
NBDC Dataset ID |
(Click the Dataset ID to download the file) |
Total Data Volume | 390MB for autosome (txt.gz) and 19MB for X chromosome (txt.gz) |
Comments (Policies) | NBDC policy |
Participants/Materials | Biobank Japan (n = 179,000), UK biobank (n = 361,000), FinnGen (n = 136,000), no. Phenotypes: 220 |
Targets | genome wide SNPs |
Target Loci for Capture Methods | - |
Platform |
BBJ: Illumina [HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip] UK Biobank: Applied Biosystems [UK BiLEVE Axiom Array, UK Biobank Axiom Array] FinnGen: Thermo Fisher Scientific [FinnGen1 ThermoFisher Array or other genotyping arrays] |
Library Source | DNAs extracted from peripheral blood cells |
Cell Lines | - |
Reagents (Kit, Version) |
BBJ: HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip UK Biobank: UK BiLEVE Axiom Array, UK Biobank Axiom Array FinnGen: FinnGen1 ThermoFisher Array or other genotyping arrays |
Genotype Call Methods (software) |
BBJ: Eagle, Minimac3 UK Biobank: IMPUTE4 FinnGen: beagle4.1 |
Association Analysis (software) |
For binary traits, SAIGE software was used with age, age2, sex, age×sex, age2×sex, and top 20 principal components as covariates. For quantitative traits (biomarkers), BOLT-LMM or plink software was used with the same covariates.
|
Filtering Methods |
BBJ: We included imputed variants with Rsq > 0.7. UK Biobank: We excluded the variants with (i) INFO score ≤ 0.8, (ii) MAF ≤ 0.0001 (except for missense and protein-truncating variants annotated by VEP, which were excluded if MAF ≤ 1 × 10-6), and (iii) PHWE ≤ 1 × 10-10. FinnGen: We excluded variants with an imputation INFO score < 0.8 or MAF < 0.0001. |
Marker Number (after QC) |
BBJ: 13,530,797 variants UK Biobank: 13,791,467 variants FinnGen: 16,859,359 variants |
NBDC Dataset ID |
(Click the Dataset ID to download the file) |
Total Data Volume |
BBJ: ~1.5G for autosome and ~33M for chrX UK Biobank: ~1.5G for autosome and ~15M for chrX FinnGen: ~740M for autosome and ~20M for chrX |
Comments (Policies) | NBDC policy |
hum0197.v5.gwas.v1 / hum0197.v5.finemap.v1
Participants/Materials | Biobank Japan (n = 179,000), no. Phenotypes: 79 |
Targets | genome wide SNPs |
Target Loci for Capture Methods | - |
Platform | Illumina [HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip] |
Library Source | DNAs extracted from peripheral blood cells |
Cell Lines | - |
Reagents (Kit, Version) | HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip |
Genotype Call Methods (software) | Eagle, Minimac3 |
Association Analysis (software) |
GWAS: For binary traits, SAIGE software was used with age, age2, sex, age×sex, age2×sex, and top 20 principal components as covariates. For quantitative traits (biomarkers), BOLT-LMM was used with the same covariates. Fine-mapping: FINEMAP and SuSiE were used with GWAS summary statistics and in-sample dosage LD, allowing up to 10 causal variants per region. |
Filtering Methods |
GWAS: We included imputed variants with Rsq > 0.7. For binary traits, variants with MAC < 10 were additionally excluded. Fine-mapping: We defined fine-mapping regions based on a 3 Mb window around each lead variant and merged regions if they overlapped. We excluded the major histocompatibility complex (MHC) region (chr 6: 25–36 Mb) from analysis due to extensive LD structure in the region. For each method, we only included variants from successfully fine-mapped regions while excluding those from failed regions (e.g., due to conversion failure or available memory restrictions). |
Marker Number (after QC) | 13,531,752 variants (ref: hg19) |
NBDC Dataset ID |
hum0197.v5.gwas.v1 / hum0197.v5.finemap.v1 (Click the Dataset ID to download the file) |
Total Data Volume | 14 GB |
Comments (Policies) | NBDC policy |
Participants/Materials: | 141 Japanese individuals |
Targets | small RNA-seq |
Target Loci for Capture Methods | - |
Platform | Illumina [HiSeq 2500] |
Library Source | RNAs extracted from PBMC |
Cell Lines | - |
Library Construction (kit name) | SMARTer smRNA-Seq Kit |
Fragmentation Methods | - |
Spot Type | Single-end |
Read Length (without Barcodes, Adaptors, Primers, and Linkers) | 100 bp |
Mapping Methods | bowtie (GRCh37) |
Detecting method for read count (software) | featureCounts + miRbase v22 |
QC | We performed adapter trimming using Cutadapt v1.8 and removed reads with a low quality score (Phred quality score < 20 in >20% of total bases) using fastp v0.20.0. Also, we removed reads with a length of >29 bp or <15 bp, which are not expected to be mature miRNAs. Mature miRNAs detected with ≥1 read in at least half of the individuals were included in the dataset. |
miRNA number | 343 |
Japanese Genotype-phenotype Archive Dataset ID | JGAD000621 |
Total Data Volume | 54.7 KB (txt) |
Comments (Policies) | NBDC policy |
Participants/Materials | 141 Japanese individuals |
Targets | eQTL |
Target Loci for Capture Methods | - |
Platform |
small RNA-seq: Illumina [HiSeq 2500] WGS: Illumina [HiSeq X Ten] |
Library Source | read count data of JGAS000504 and whole genome sequencing data using genomic DNA exracted from whole blood |
Cell Lines | - |
Reagents (Kit, Version) |
small RNA-seq: See JGAS000504 WGS: TruSeq DNA PCR-Free Library Preparation Kit |
Genotype Call / Detecting read count Methods (software) |
See JGAS000504 for read count data. WGS: Sequenced reads were aligned against the reference human genome with the decoy sequence (GRCh37, human_g1k_v37_decoy) using BWA-MEM v0.7.13. |
QC |
See JGAS000504 for read count data. WGS: We removed the variants (i)with low genotyping call rates (<0.90), (ii)with ExcessHet > 60 or (iii) with Hardy–Weinberg Pvalue < 1.0 × 10−10. Genotype refinement was performed using Beagle v5.1. |
Marker Number (after QC) |
See JGAS000504 for read count data. WGS: 12,171,854 variants |
eQTL algorithm | We analyzed the association between genetic variants with minor allele frequency (MAF) ≥ 0.01 within a cis-window around each miRNA (±1 Mb of the mature miRNA) and normalized expression values using MatrixEQTL v2.3. |
NBDC Dataset ID |
(Click the Dataset ID to download the file) |
Total Data Volume | 1.1 MB (txt) |
Comments (Policies) | NBDC policy |
Participants/Materials |
Intracranial germ cell tumors cases (ICD10: C719): 133 Control participants: 762 |
Targets | genome wide SNPs |
Target Loci for Capture Methods | - |
Platform | Illumina [Infinium Asian Screening Array] |
Library Source | DNAs extracted from peripheral blood cells |
Cell Lines | - |
Reagents (Kit, Version) | Infinium Asian Screening Array |
Genotype Call Methods (software) |
GenomeStudio for genotyping shapeit2 for haplotype phasing minimac3 for imputation |
Association Analysis (software) | PLINK2 |
Filtering Methods |
Sample QC: We excluded individuals (i) with genotyping call rate < 0.97, (ii) in close kinship (PI_HAT > 0.17), and (iii) estimated of non-East Asian ancestry were excluded. Variant QC: We excluded variants with (i) genotyping call rate < 0.99, (ii) minor allele count < 5, (iii) P value for Hardy–Weinberg equilibrium < 1.0 × 10−5 in controls, and (iv) > 10% allele frequency difference with the imputation reference panel or the allele frequency panel of Tohoku Medical Megabank Project. Post-imputation QC: We excluded imputed variants with Rsq < 0.7 and minor allele frequency < 0.5%. |
Marker Number (after QC) |
7,803,874 autosomal variants 181,867 X-chromosomal variants |
NBDC Dataset ID |
(Click the Dataset ID to download the file) |
Total Data Volume | 248 MB (txt) |
Comments (Policies) | NBDC policy |
Participants/Materials |
Biobank Japan (n=161,801), UK biobank (n=377,583), no. Phenotypes: 9 Patients: Autoimmune [Rheumatoid arthritis (ICD10: M05), Graves' disease (ICD10: C719), type I diabetes mellitus (ICD10: E10)] Allergy [asthma (ICD10: J45), Atopic dermatitis (ICD10: L20), Pollinosis (ICD10: J301)] Controls: non-autoimmune +non-allergy individuals (There is overlap among patients in each disease category) |
Targets | genome wide SNPs |
Target Loci for Capture Methods | - |
Platform |
BBJ: Illumina [HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip] UK Biobank: Applied Biosystems [UK BiLEVE Axiom Array, UK Biobank Axiom Array] |
Library Source | DNAs extracted from peripheral blood cells |
Cell Lines | - |
Reagents (Kit, Version) |
BBJ: HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip UK Biobank: UK BiLEVE Axiom Array, UK Biobank Axiom Array |
Genotype Call Methods (software) |
BBJ: Eagle, Minimac3 UK Biobank: IMPUTE4 |
Association Analysis (software) |
SAIGE software was used with age, sex, and top five principal components as covariates. RE2C software was used for the multi-trait meta-analysis adjusting for sample overlap between GWAS summary data. |
Filtering Methods |
We excluded the variants with Rsq < 0.7 and MAF < 0.005. |
Marker Number (after QC) |
BBJ: 8,374,220 autosomal variants for individual trait / 8,369,174 autosomal variants for meta-analysis UKB: 10,864,380 autosomal variants for individual trait / 10,858,065 autosomal variants for meta-analysis BBJ + UK Biobank: 5,965,154 autosomal variants for meta-analysis |
NBDC Dataset ID |
(Click the Dataset ID to download the files) |
Total Data Volume |
BBJ: ~ 760MB for individual trait / ~ 430MB for multi-trait meta-analysis UK Biobank: ~ 1.1GB for individual trait / ~ 550MB for multi-trait meta-analysis BBJ+UK Biobank: ~ 310MB for multi-trait meta-analysis |
Comments (Policies) | NBDC policy |
Participants/Materials |
COVID-19 (ICD10: U071) : 30 + 43 cases Healthy controls : 31 + 44 individuals |
Targets | scRNA-seq |
Target Loci for Capture Methods | - |
Platform | Illumina [NovaSeq 6000] |
Library Source | RNAs extracted from PBMC |
Cell Lines | - |
Library Construction (kit name) | Chromium Next GEM Single Cell 5’ Library & Gel Bead Kit v1.1, Chromium Next GEM Chip G Single Cell Kit, Single Index Kit T Set A |
Fragmentation Methods | Enzymatic fragmentation |
Spot Type | Paired-end |
Read Length (without Barcodes, Adaptors, Primers, and Linkers) | 91 bp |
NBDC Dataset ID | |
Total Data Volume | 1.3 + 2.0 TB (fastq, xlsx [clinical data]) |
Comments (Policies) | NBDC policy |
Participants/Materials: |
Japanese gut microbiome JGAS000205 / JGAS000260 / JGAS000316 / JGAS000415 / JGAS000530 / JGAS000531, Public data (DRA006684) |
Targets | Metagenome |
Target Loci for Capture Methods | - |
Platform | Illumina [HiSeq 2500/3000, NovaSeq 6000] |
Library Source | JGAS000205 / JGAS000260 / JGAS000316 / JGAS000415 / JGAS000530 / JGAS000531, DRA006684 |
Cell Lines | - |
MAG methods | De novo assembly with metaspades was performed. Then, binning with dastools (metabat2、maxbin2、concoct) was applied. |
JDDBJ Sequence Read Archive ID |
JGA MAG: 20220531NSUB000031HIGH_JGA_JMAG_GENOME_*.acclist.txt DRA014186 (JGAS000205 / JGAS000260 / JGAS000316 / JGAS000415 / JGAS000530 / JGAS000531) DRA014188 (JGAS000205 / JGAS000260 / JGAS000316 / JGAS000415 / JGAS000530 / JGAS000531) DRA014191 (JGAS000205 / JGAS000260 / JGAS000316 / JGAS000415 / JGAS000530 / JGAS000531) DRA014192 (JGAS000205 / JGAS000260 / JGAS000316 / JGAS000415 / JGAS000530 / JGAS000531) TPA MAG: EMNX01000001-EMNX01000025、EMNY01000001-EMNY01000068、EMNZ01000001-EMNZ01000149, EMOA01000001-EMOA01000067 DRA014184 (DRA006684) |
Total Data Volume |
JGA MAG: 153 GB (fasta) DRA014186: 11.5 GB (fasta) DRA014188: 11.9 GB (fasta) DRA014191: 12.2 GB (fasta) DRA014192: 5.75 GB (fasta) TPA MAG: 11.9 MB (fasta) DRA014184: 3.65 GB (fasta) |
Comments (Policies) | NBDC policy |
Participants/Materials: |
Japanese gut microbiome JGAS000205 / JGAS000260 / JGAS000316 / JGAS000415 / JGAS000530 / JGAS000531, Public data (DRA006684) |
Targets | NGS (WGS) |
Target Loci for Capture Methods | - |
Platform | Illumina [HiSeq 2500/3000, NovaSeq 6000] |
Library Source | JGAS000205 / JGAS000260 / JGAS000316 / JGAS000415 / JGAS000530 / JGAS000531, DRA006684 |
Cell Lines | - |
Virus genome contsruction | De novo assembly with metaspades was performed. Then, viral contigs were detected with virfinder and virsorter. |
JDDBJ Sequence Read Archive ID |
JGAS000205 / JGAS000260 / JGAS000316 / JGAS000415 / JGAS000530 / JGAS000531: BRDB01000001-BRDB01028816 DRA006684: EMNW01000001-EMNW01002579 |
Total Data Volume |
JGAS000205 / JGAS000260 / JGAS000316 / JGAS000415 / JGAS000530 / JGAS000531: 1.09 GB (fasta) DRA006684: 98.3 MB (fasta) |
Comments (Policies) | NBDC policy |
Participants/Materials: |
Japanese gut microbiome JGAS000205 / JGAS000260 / JGAS000316 / JGAS000415 / JGAS000530 / JGAS000531, Public data (DRA006684) |
Targets | NGS (WGS) |
Target Loci for Capture Methods | - |
Platform | Illumina [HiSeq 2500/3000, NovaSeq 6000] |
Library Source | JGAS000205 / JGAS000260 / JGAS000316 / JGAS000415 / JGAS000530 / JGAS000531, DRA006684 |
Cell Lines | - |
CRISPR contsruction | MINCED was applied to the MAGs. |
DDBJ Sequence Read Archive ID |
DRA014186 (JGAS000205 / JGAS000260 / JGAS000316 / JGAS000415 / JGAS000530 / JGAS000531) DRA014184 (DRA006684) |
Total Data Volume |
DRA014184: 17.9 MB (fasta) DRA014186: 1.43 MB (fasta) |
Comments (Policies) | NBDC policy |
Participants/Materials: |
88 Japanese individuals (shotgun sequencing) 73 healthy individuals (shotgun sequencing) - DNA extraction was performed with phenol-chloroform extraction: 73 samples - DNA extraction with DNeasy PowerSoil Pro kit: 47 samples 5 Japanese individuals (deep shotgun sequencing) |
Targets | Metagenome |
Target Loci for Capture Methods | - |
Platform | Illumina [HiSeq 3000, NovaSeq 6000] |
Library Source | DNA extracted from gut microbiome |
Cell Lines | - |
Library Construction (kit name) | KAPA Hyper Prep Kit |
Fragmentation Methods | Ultrasonic fragmentation (Covaris) |
Spot Type | Paired-end |
Read Length (without Barcodes, Adaptors, Primers, and Linkers) | 150 bp |
Japanese Genotype-phenotype Archive Dataset ID | JGAD000729 |
Total Data Volume | 2.6 TB(fastq) |
Comments (Policies) | NBDC policy |
Participants/Materials |
BioBank Japan (n=180,215), UK Biobank (n=377,441) large-scale meta-analysis including the summary statistics of other cohorts (FinnGen, BCAC, and PRACTICAL) for breast and prostate cancer (n=648,746 and 482,080) Patients: biliary tract (ICD10: C22.1, 23-24), breast (ICD10: C509, cervical (ICD10: C53), colorectal (ICD10: C18-20), endometrial (ICD10: C54), esophageal (ICD10: C15), gastric (ICD10: C16), hepatocellular (ICD10: C22.0), lung (ICD10: C34), non-Hodgkin's lymphoma (ICD10: C82-83), ovarian (ICD10: C56), pancreatic (ICD10: C25), and prostate (ICD10: C61) cancer Controls: without cancer individuals (There is overlap among patients in each disease category) |
Targets | genome wide SNPs |
Target Loci for Capture Methods | - |
Platform |
BBJ: Illumina [HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip] UK Biobank: Applied Biosystems [UK BiLEVE Axiom Array, UK Biobank Axiom Array] FinnGen: Thermo Fisher Scientific [FinnGen1 ThermoFisher Array or other genotyping arrays] BCAC: Illumina [iCOGS OncoArray] PRACTICAL: Illumina [iCOGS OncoArray] |
Library Source | DNAs extracted from peripheral blood cells |
Cell Lines | - |
Reagents (Kit, Version) |
BBJ: HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip UK Biobank: UK BiLEVE Axiom Array, UK Biobank Axiom Array FinnGen: FinnGen1 ThermoFisher Array or other genotyping arrays BCAC: Infinium OncoArray-500K v1.0 BeadChip Kit PRACTICAL: Infinium OncoArray-500K v1.0 BeadChip Kit |
Genotype Call Methods (software) |
BBJ: Eagle, Minimac3 UK Biobank: IMPUTE4 FinnGen: beagle4.1 BCAC: IMPUTE2 PRACTICAL: IMPUTE2 |
Association Analysis (software) |
SAIGE software was used with age, sex, and top five principal components as covariates. RE2C software was used for the multi-trait meta-analysis adjusting for sample overlap between GWAS summary data. |
Filtering Methods |
Sample QC and Variant QC for each dataset: refer to ReadMe file We excluded the variants with Rsq < 0.7 and MAF < 0.01. |
Marker Number (after QC) |
BBJ: 13MN (7,398,798) , each cancer (7,442,557 (7,420,485-7,444,681)) UK Biobank: 13MN (9,602,853), each cancer (9,620,786 (9,620,343-9,620,935)) BBJ + UK Biobank: 13MN (5,374,018), each cancer (5,696,155 (5,677,934-5,698,357)) BBJ + UK Biobank + FinnGen + BCAC (breast cancer): 5,104,756 BBJ + UK Biobank + FinnGen + PRACTICAL (prostate cancer): 5,105,796 BBJ + UK Biobank + FinnGen + BCAC + PRACTICAL (breast cancer + prostate cancer): 5,100,089 *mean (min-max) for each cancer |
NBDC Dataset ID |
(Click the Dataset ID to download the files) |
Total Data Volume |
BBJ: 13MN (287 MB), each cancer (625 (605-633) MB) UK Biobank: 13MN (362 MB), each cancer (841 (814-859) MB) BBJ + UK Biobank: 13MN (202 MB), each cancer (260 (255-264) MB) BBJ + UK Biobank + FinnGen + BCAC (breast cancer): 242 MB BBJ + UK Biobank + FinnGen + PRACTICAL (prostate cancer): 243 MB BBJ + UK Biobank + FinnGen + BCAC + PRACTICAL (breast cancer + prostate cancer): 253 MB *mean (min-max) for each cancer |
Comments (Policies) | NBDC policy |
Participants/Materials |
Hunner-type interstitial cystitis cases (ICD10: N301): 144 Control participants: 41,516 |
Targets | genome wide SNPs |
Target Loci for Capture Methods | - |
Platform | Illumina [Infinium Asian Screening Array] |
Library Source | DNAs extracted from peripheral blood cells |
Cell Lines | - |
Reagents (Kit, Version) | Infinium Asian Screening Array |
Genotype Call Methods (software) |
GenomeStudio for genotyping shapeit4 for haplotype phasing minimac4 for imputation |
Association Analysis (software) | SAIGE |
Filtering Methods |
Sample QC: We excluded individuals with low genotyping call rates (call rate < 98%). We included individuals of the estimated Japanese ancestry using PCA. Variant QC: We excluded variants with (1) genotyping call rate < 99%, (2) minor allele count < 5, (3) P-value for Hardy–Weinberg equilibrium < 1.0 × 10^−10, and (4) > 5% allele frequency difference compared with the imputation reference panel or the allele frequency panel of Tohoku Medical Megabank Project. Post-imputation QC: We excluded imputed variants with Rsq < 0.7 and minor allele frequency < 0.5%. |
Marker Number (after QC) | 7,909,790 variants (hg19) |
NBDC Dataset ID |
(Click the Dataset ID to download the file) |
Total Data Volume | 700 MB (txt) |
Comments (Policies) | NBDC policy |
Participants/Materials |
524 Japanese individuals (423 species in the gut microbiome) 306 Japanese individuals (306 plasma metabolites) 524 Japanese individuals (KEGG Gene Ortholog and KEGG Pathway) |
Targets | genome wide SNPs |
Target Loci for Capture Methods | - |
Platform |
SNP array: Illumina [Infinium Asian Screening Array] Whole genome sequencing: Illumina [HiSeq X Ten] Metagenome shotgun sequencing: Illumina [HiSeq 2500/3000、NovaSeq 6000] |
Library Source | DNAs extracted from peripheral blood cells |
Cell Lines | - |
Reagents (Kit, Version) |
SNP array: Infinium Asian Screening Array Whole genome sequencing: TruSeq DNA PCR-Free Library Preparation Kit Metagenome shotgun sequencing: KAPA Hyper Prep Kit |
Genotype Call Methods (software) |
SNP array: Genotyping: GenomeStudio Haplotype phasing: shapeit4 Imputation: minimac4 WGS: WA-MEM v0.7.13 + GATK v3.8-0 |
Association Analysis (software) | PLINK2 |
Filtering Methods |
SNP array data: Sample QC: We excluded individuals with low genotyping call rates (call rate < 98%). We included individuals of the estimated Asian ancestry using PCA. Variant QC: We excluded variants with (1) genotyping call rate < 99%, (2) minor allele count < 5, (3) P-value for Hardy–Weinberg equilibrium < 1.0 × 10^−10, and (4) > 5% allele frequency difference compared with the imputation reference panel or the allele frequency panel of Tohoku Medical Megabank Project. Post-imputation QC: We excluded imputed variants with Rsq < 0.7 and minor allele frequency < 1%. WGS: We excluded variants with genotype call rate <90%, ExcessHet > 60, Hardy-Weinberg P<1.0×10−10 After imputation with Beagle v5.1, we excluded imputed variants with minor allele frequency < 1%. |
Marker Number (after QC) |
Gut microbiome/KEGG (SNP array): 7,213,470 variants (hg19) Metabolome (WGS): 6,840,258 variants (GRCh37) |
NBDC Dataset ID |
hum0197.v18.gwas.v1 (Gut microbiome, Plasma metabolites, KEGG) (Click the link above to download the files) |
Total Data Volume |
Gut microbiome: 206 GB Metabolome: 90.7 GB KEGG: 300 MB |
Comments (Policies) | NBDC policy |
Participants/Materials |
BioBank Japan Type 2 diabetes (ICD10: E11): 27,642 cases Control participants: 70,242 UK Biobank Type 2 diabetes (ICD10: E11): 27,642 cases Control participants: 70,242 |
Targets | genome wide SNPs |
Target Loci for Capture Methods | - |
Platform |
BBJ: Illumina [HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip] UK Biobank: Applied Biosystems [UK BiLEVE Axiom Array, UK Biobank Axiom Array] |
Library Source | DNAs extracted from peripheral blood cells |
Cell Lines | - |
Reagents (Kit, Version) |
BBJ: HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip UK Biobank: UK BiLEVE Axiom Array, UK Biobank Axiom Array |
Genotype Call Methods (software) |
plink2 |
Association Analysis (software) |
BBJ: Eagle, Minimac3 UK Biobank: IMPUTE4 |
Filtering Methods |
Variants with imputation quality of Rsq < 0.3 or minor allele frequency (MAF) < 1% were excluded The details are described below |
Marker Number (after QC) |
BBJ second cohort: 728,824 variants ToMMo: 855,161 variants |
NBDC Dataset ID |
(Click the Dataset ID to download the file) |
Total Data Volume | 180 MB (txt) |
Comments (Policies) | NBDC policy |
Participants/Materials |
Recurrent pregnancy loss cases (ICD10: N96): 1,728 Control participants: 24,315 |
Targets | genome wide SNPs |
Target Loci for Capture Methods | - |
Platform | Illumina [Infinium Asian Screening Array] |
Library Source | DNAs extracted from peripheral blood cells |
Cell Lines | - |
Reagents (Kit, Version) | Infinium Asian Screening Array |
Genotype Call Methods (software) |
Genotyping: GenomeStudio Haplotype phasing: shapeit4 Imputation: minimac4 |
Association Analysis (software) | SAIGE |
Filtering Methods |
Sample QC: We excluded individuals with low genotyping call rates (call rate < 98%). We included individuals of the estimated Japanese ancestry using PCA. Variant QC: We excluded variants with (1) genotyping call rate < 99%, (2) minor allele count < 5, (3) P-value for Hardy–Weinberg equilibrium < 1.0 × 10^−10, and (4) > 5% allele frequency difference compared with the imputation reference panel or the allele frequency panel of Tohoku Medical Megabank Project. Post-imputation QC: We excluded imputed variants with Rsq < 0.7 and minor allele frequency < 0.5%. |
Marker Number (after QC) | 8,717,430 variants(hg19) |
NBDC Dataset ID |
(Click the Dataset ID to download the file) |
Total Data Volume | 465 MB (txt) |
Comments (Policies) | NBDC policy |
Participants/Materials |
Autoimmune diseases (ICD10: L400, M0690, M329, J840, G35): 2,238 cases Control participants: 2,919 |
Targets | WGS |
Target Loci for Capture Methods | - |
Platform | Illumina [NovaSeq 6000/HiSeq X Ten] |
Library Source | DNAs extracted from peripheral blood cells |
Cell Lines | - |
Library Construction (kit name) | TruSeq DNA PCR-free Library Prep kit |
Fragmentation Methods | Ultrasonic fragmentation |
Spot Type | Paired-end |
Read Length (without Barcodes, Adaptors, Primers, and Linkers) | 150 bp x 2 |
Methods for removing host sequence/detecting viral sequence (software) |
https://github.com/shohei-kojima/integrated_HHV6_recon https://github.com/shohei-kojima/human_anellovirus_detection |
QC | We conducted principal component analysis (PCA) against HapMap3 data using SNP data of the same individuals to confirm the East Asian genetic background. |
Reference sequence for viral genome | Refer to the softwares' GitHub repositry. |
Japanese Genotype-phenotype Archive Dataset ID | JGAD000876 |
Total Data Volume | 181.3 GB (fastq) |
Comments (Policies) | NBDC policy |
Participants/Materials |
Systemic lupus erythematosus (ICD10: M329): 8 cases eHHV-6B-positive: 3 cases eHHV-6B-negative: 5 cases |
Targets | scRNA-seq |
Target Loci for Capture Methods | - |
Platform | Illumina [NovaSeq 6000] |
Library Source | RNAs extracted from PBMC |
Cell Lines | - |
Library Construction (kit name) | Chromium Next GEM Single Cell 5’ Library & Gel Bead Kit v1.1, Chromium Next GEM Chip G Single Cell Kit, Single Index Kit T Set A |
Fragmentation Methods | Enzymatic fragmentation |
Spot Type | Paired-end |
Read Length (without Barcodes, Adaptors, Primers, and Linkers) | 91 bp |
NBDC Dataset ID | JGAD000876 |
Total Data Volume | 181.3 GB (fastq) |
Comments (Policies) | NBDC policy |
Participants/Materials |
Autoimmune diseases (ICD10: L400, M0690, M329, J840, G35): 238 cases eHHV-6B-positive: 22 cases eHHV-6B-negative: 216 cases |
Targets | genome wide SNPs |
Target Loci for Capture Methods | - |
Platform | Illumina [NovaSeq 6000/HiSeq X Ten] |
Library Source | DNAs extracted from peripheral blood cells |
Cell Lines | - |
Reagents (Kit, Version) | TruSeq DNA PCR-free Library Prep kit/span> |
Genotype Call Methods (software) | The FASTQ reads were aligned to T2T-CHM13v2.0 with BWA-MEM (v0.7.27), followed by GATK4 MarkDuplicates and Base Quality Score Recalibration (v4.2.6.1) according to the GATK Best Practice. Then, we performed per-sample SNP and indel calling using GATK4 HaplotypeCaller and joint genotyping using GATK4 GenomicsDBImport and GenotypeGVCF. We conducted LD-based genotype refinement for low-confidence genotypes and missing sites in WGS data using BEAGLE v5.4 with default settings. |
Association Analysis (software) | PLINK v2.0 software was used with top two principal components and sex as covariates. |
Filtering Methods |
Sample QC: Individuals were excluded if they showed conflicting sex assignments between genetically inferred sex by variants and WGS coverage, deviating heterozygosity rate (±3 standard deviations), or cryptic relatedness (pi-hat > 0.2). We included samples of the estimated Japanese ancestry using PCA. Four cases were excluded. Variant QC: We excluded (1) non-autosomal variants, (2) multi-allelic sites and spanning deletions, and (3) variants with P-value for Hardy?Weinberg equilibrium < 1e-10 in cases and < 1e-6 in controls. |
Marker Number (after QC) | 6,464,509 SNPs |
NBDC Dataset ID |
(Click the Dataset ID to download the file) |
Total Data Volume | 416 MB (tsv) |
Comments (Policies) | NBDC policy |
Participants/Materials | The first cohort of Biobank Japan (n = 171,287) |
Targets | genome wide SNPs |
Target Loci for Capture Methods | - |
Platform | Illumina [HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip] |
Library Source | DNAs extracted from peripheral blood cells |
Cell Lines | - |
Reagents (Kit, Version) | HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip |
Genotype Call Methods (software) | Eagle, Minimac3 |
Association Analysis (software) |
1) GCTA-fastGWA with the adjustment of covariates: age, age2, sex, the top 20 PCs, 45 disease status, geographic regions, and PCA clusters. 2) Fixed-effect meta-analysis of Mainland summary data including individuals from the Mainland and EA_admix clusters (n = 151,075) and of Ryukyu summary data including individuals from the Ryukyu, Ryukyu admix, and Hokkaido_sub clusters (n = 10,080) using METAL. |
Filtering Methods |
Sample QC: We excluded (i) individuals with lower call rates (< 99%), (ii) closely related individuals with genetic relatedness ≥ 0.178 calculated from a genetic related matrix (GRM) by GCTA (version 1.93.3beta2). We included samples of the estimated Japanese ancestry using PCA. Variant QC: We excluded variants with (i) call rate < 99%, (ii) P value for Hardy-Weinberg equilibrium (HWE) < 1.0 × 10-6, (iii) number of heterozygotes < 5, and (iv) a concordance rate < 99.5% or a non-reference concordance rate between GWAS array and whole genome sequencing. after association test: Double genomic control correction method using METAL was conducted. Computing Z score for each variant by considering the sign of the beta coefficient and the associated p-value, we left the variants with positive Z score. |
Marker Number (after QC) | 3,454,970 SNPs |
NBDC Dataset ID |
(Click the Dataset ID to download the file) |
Total Data Volume | 65 MB (txt) |
Comments (Policies) | NBDC policy |
DATA PROVIDER
Principal Investigator: Yukinori Okada
Affiliation: Department of Statistical Genetics, Osaka University Graduate School of Medicine
Project / Group Name: -
Funds / Grants (Research Project Number):
Name | Title | Project Number |
---|---|---|
Precursory Research for Innovative Medical care (PRIME), Advanced Research & Development Programs for Medical Innovation, Japan Agency for Medical Research and Development (AMED) | Crosstalk among microbiome, host, disease, and drug discovery enhanced by statistical genetics | JP19gm6010001 |
FORCE, Advanced Research & Development Programs for Medical Innovation, Japan Agency for Medical Research and Development (AMED) | Elucidation of disease-specific microbiota and personalized medicine by metagenome-wide association studies | JP20gm4010006 |
Practical Research Project for Rare / Intractable Diseases, Japan Agency for Medical Research and Development (AMED) | Biology and in silico drug repositioning of pulmonary alveolar proteinosis using trans-layer omics analysis | JP20ek0109413 |
Practical Research Project for Allergic Diseases and Immunology, Japan Agency for Medical Research and Development (AMED) | Nucleic genome drug discovery for autoimmune diseases through in-silico and patient-oriented screening utilizing large-scale disease genetics | JP19ek0410041 |
Practical Research Project for Allergic Diseases and Immunology, Japan Agency for Medical Research and Development (AMED) | Genomic prediction medicine of rheumatoid arthritis based on comprehensive immune-omics resources | JP21ek0410075 |
Platform Program for Promotion of Genome Medicine, Japan Agency for Medical Research and Development (AMED) | Implementation of genomic prediction medicine based on statistical genetics | JP21km0405211 |
Platform Program for Promotion of Genome Medicine, Japan Agency for Medical Research and Development (AMED) | Next-generation genomics analyses elucidates biology, personalized medicine, and drug discovery of psoriasis | JP21km0405217 |
KAKENHI Grant-in-Aid for Scientific Research (A) | Elucidation of disease biology and tissue specificity by trans-layer omics analysis and whole-genome sequencing | 19H01021 |
KAKENHI Grant-in-Aid for Scientific Research (A) | Elucidation of immune and allergic disease dynamics by integrative sequencing analysis | 22H00476 |
PUBLICATIONS
Title | DOI | Dataset ID | |
---|---|---|---|
1 | Metagenome-wide association study of gut microbiome revealed novel aetiology of rheumatoid arthritis in the Japanese population. | doi: 10.1136/annrheumdis-2019-215743 | JGAD000290 |
2 | Genetic determinants of risk in autoimmune pulmonary alveolar proteinosis. | doi: 10.1038/s41467-021-21011-y | hum0197.v2.gwas.v1 |
3 | A metagenome-wide association study of gut microbiome in patients with multiple sclerosis revealed novel disease pathology. | doi: 10.3389/fcimb.2020.585973 | JGAD000363 |
4 | A global atlas of genetic associations of 220 deep phenotypes | doi: 10.1101/2020.10.23.20213652 | hum0197.v3.gwas.v1 |
5 | Metagenome-wide association study revealed disease-specific landscape of the gut microbiome of systemic lupus erythematosus in Japanese | doi: 10.1136/annrheumdis-2021-220687 | JGAD000427 |
6 | Whole gut virome analysis of 476 Japanese revealed a link between phage and autoimmune disease | doi: 10.1136/annrheumdis-2021-221267 | JGAD000532 |
7 | Insights from complex trait fine-mapping across diverse populations | doi: 10.1101/2021.09.03.21262975 |
hum0197.v5.gwas.v1 hum0197.v5.finemap.v1 |
8 | Genetic architecture of microRNA expression and its link to complex diseases in the Japanese population. | doi: 10.1093/hmg/ddab361 |
JGAD000621 hum0197.v6.eqtl.v1 |
9 | Multi-trait and cross-population genome-wide association studies across autoimmune and allergic diseases identify shared and distinct genetic components. | doi: 10.1136/annrheumdis-2022-222460 | hum0197.v10.gwas.v1 |
10 | DOCK2 is involved in the host genetics and biology of severe COVID-19 | doi: 10.1038/s41586-022-05163-5 | JGAD000662 |
11 | Prokaryotic and viral genomes recovered from 787 Japanese gut metagenomes revealed microbial features linked to diets, populations, and diseases | doi: 10.1016/j.xgen.2022.100219 | hum0197.v12 |
12 | Reconstruction of the personal information from human genome reads in gut metagenome sequencing data | doi: 10.1038/s41564-023-01381-3 | JGAD000729 |
13 | Pan-cancer and cross-population genome-wide association studies dissect shared genetic backgrounds underlying carcinogenesis | doi: 10.1038/s41467-023-39136-7 | hum0197.v16.gwas.v1 |
14 | Genome-wide association analysis identifies susceptibility loci within the major histocompatibility complex region for Hunner-type interstitial cystitis | doi: 10.1016/j.xcrm.2023.101114 | hum0197.v17.hic-gwas.v1 |
15 | Analysis of gut microbiome, host genetics, and plasma metabolites reveals gut microbiome-host interactions in the Japanese population | doi: 10.1016/j.celrep.2023.113324 | hum0197.v18.gwas.v1 |
16 | Body mass index stratification optimizes polygenic prediction of type 2 diabetes in cross-biobank analyses | doi: 10.1038/s41588-024-01782-y | hum0197.v19.prs.v1 |
17 | Common and rare genetic variants predisposing females to unexplained recurrent pregnancy loss | doi: 10.1038/s41467-024-49993-5 | hum0197.v20.gwas.v1 |
18 | Blood DNA virome associates with autoimmune diseases and COVID-19 | hum0197.v21.gwas-ehhv6.v1 JGAD000876 |
|
19 | Genetic Legacy of Ancient Hunter-Gatherer Jomon in Japanese Populations | hum0197.v21.gwas-jomon.v1 |
USRES (Controlled-access Data)
Principal Investigator | Affiliation | Country/Region | Research Title | Data in Use (Dataset ID) | Period of Data Use |
---|---|---|---|---|---|
Ilana Brito | Meinig School of Biomedical Engineering, Cornell University | United States of America | Comparative metagenomics of lupus patients' microbiomes | JGAD000290, JGAD000363, JGAD000427, JGAD000532 |
2022/05/12-2024/05/04 |
Yongxin Li | Department of Chemistry, The University of Hong Kong | Hong Kong | Comparison of gut bacterial diversity and composition in MS/EAE | JGAD000363 | 2022/09/19-2024/07/01 |
Tina Fuchs | Institute for Clinical Chemistry, Medical Faculty Mannheim, Heidelberg University | Germany | Investigating the clonality of VIREM cells in COVID-19 patients | JGAD000662, JGAD000772 | 2024/02/26-2024/12/31 |
Koichi Matsuda | Department of Computational Biology and Medical Sciences, Graduate school of Frontier Sciences, The University of Tokyo | Japan | Disease Cohort Research Network for Disease Marker Exploratory Studies | JGAD000290, JGAD000363, JGAD000427, JGAD000532, JGAD000649, JGAD000650, JGAD000662, JGAD000722, JGAD000729 |
2024/06/17-2029/03/31 |