NBDC Research ID: hum0197.v21

 

SUMMARY

Aims: Elucidation of disease biology based on trans-omics analysis, GWAS in the Japanese and trans-ethnic populations, Elucidation of the mechanism of COVID-19 severity, Improving the performance of type 2 diabetes polygenic predictions, Elucidation of the genetic architecture of recurrent pregnancy loss, Elucidation of the association between Jomon component in the Japanese population and phenotypes and diseases

Methods: Metagenome shotgun sequencing, genome-wide association study (GWAS), small RNA-seq and eQTL analyses, whole genome sequencing (WGS)

Participants/Materials:

Metagenomic data of gut microbiome in the Japanese population (95 + 103 + 227 + 30 + 136 individuals)

Autoimmune pulmonary alveolar proteinosis cases: 198, Control participants: 395

Populations: Biobank Japan (n = 179,000), UK biobank (n = 361,000), and FinnGen (n = 136,000), Phenotypes: 220

141 Japanese individuals

Metagenomic data of gut microbiome in Inflammatory Bowel Disease (35 Ulcerative Colitis and 39 Crohn's disease) and 40 Healthy controls

Intracranial germ cell tumors cases: 133, Control participants: 762

Populations: Biobank Japan (n = 161,801) and UK biobank (n = 377,583), Phenotypes: 9

PBMC from Japanese population (COVID-19: n = 30 + 43, Healthy controls: n = 31 + 44)

Microbial genome: Metagenome-Assembled Genome (MAG), Viral genome, CRISPR spacers

Metagenomic data of gut microbiome in the Japanese population (88 + 5 individuals) and healthy individuals (n = 73)

BioBank Japan (n=180,215), UK Biobank (n=377,441), and large-scale meta-analysis including the summary statistics of other cohorts [FinnGen, Breast Cancer Association Consortium (BCAC), and Prostate Cancer Association Group to Investigate Cancer Associated Alterations in the Genome (PRACTICAL)] for breast and prostate cancer (n=648,746 and 482,080), Phenotypes: 15

Hunner-type interstitial cystitis cases: 144, Control participants: 41,516

524 Japanese individuals for gut microbiome–host genome association analysis, 362 Japanese individuals for plasma metabolite–host genome association analysis

The weights of variants existing in the target cohorts, Tohoku Medical Megabank and the second cohort of BBJ, calculated from GWAS results on 27,642 type 2 diabetes cases and 70,242 controls from BioBank Japan and UK Biobank

Recurrent pregnancy loss cases: 1,728, Control participants: 24,315

Autoimmune diseases cases: 2,238, Healthy controls: 2,919

The first cohort of BioBank Japan (n = 171,287)

 

Dataset IDType of DataCriteriaRelease Date
JGAS000205 Metagenome Controlled-access (Type I) 2019/11/15
hum0197.v2.gwas.v1 GWAS for autoimmune pulmonary alveolar proteinosis Unrestricted-access 2020/11/27
JGAS000260 Metagenome Controlled-access (Type I) 2020/11/27
hum0197.v3.gwas.v1 GWAS for 215 phenotypes Unrestricted-access 2021/03/22
JGAS000316 Metagenome Controlled-access (Type I) 2021/10/12
JGAS000415 Metagenome Controlled-access (Type I) 2021/12/10
hum0197.v5.gwas.v1 GWAS for 10 phenotypes Unrestricted-access 2021/12/21
hum0197.v5.finemap.v1 Fine-mapping for 79 phenotypes Unrestricted-access 2021/12/21
JGAS000504 Read count data of miRNA Controlled-access (Type I) 2022/02/08
hum0197.v6.eqtl.v1 eQTL data Unrestricted-access 2022/02/08
JGAS000530 Metagenome Controlled-access (Type I) 2022/05/23
JGAS000531 Metagenome Controlled-access (Type I) 2022/06/03
hum0197.v9.gwas.GCT.v1 GWAS for intracranial germ cell tumors Unrestricted-access 2022/06/10
hum0197.v10.gwas.v1 GWAS for 9 phenotypes Unrestricted-access 2022/06/16
JGAS000543 Raw sequencing data of single-cell RNA-seq Controlled-access (Type I) 2022/07/21
hum0197.v12 MAG, Viral genome and CRISPR spacers of Microbial genome Unrestricted-access 2022/12/01
JGAS000543 (data addition) clinical data Controlled-access (Type I) 2023/02/14
JGAS000593 Raw sequencing data of single-cell RNA-seq, clinical data Controlled-access (Type I) 2023/02/14
hum0197.v3.gwas.v1 (data addition) GWAS for 5 phenotypes Unrestricted-access 2023/02/16
JGAS000600 Metagenome Controlled-access (Type I) 2023/03/29
hum0197.v16.gwas.v1 GWAS for 15 phenotypes Unrestricted-access 2023/06/06
hum0197.v17.hic-gwas.v1 GWAS for Hunner-type interstitial cystitis Unrestricted-access 2023/06/27
hum0197.v18.gwas.v1

GWAS for gut microbiome

GWAS for plasma metabolite

GWAS for KEGG Gene Ortholog and KEGG Pathway

Unrestricted-access 2023/10/02
hum0197.v19.prs.v1 The weights of variants calculated from GWAS results on type 2 diabetes Unrestricted-access 2024/05/29
hum0197.v20.gwas.v1 GWAS for recurrent pregnancy loss Unrestricted-access 2024/05/30
hum0197.v21.gwas-ehhv6.v1 GWAS for autoimmune diseases Unrestricted-access 2024/10/28
JGAS000741

The presence or absence of endogenous herpesvirus 6 and anellovirus load calculated from NGS (WGS) for autoimmune diseases

Raw sequencing data of single-cell RNA-seq

Controlled-access (Type I) 2024/10/28
hum0197.v21.gwas-jomon.v1 GWAS for the individual Jomon proportions Unrestricted-access 2024/10/28

*Release Note

* Data users need to apply an application for Using NBDC Human Data to reach the Controlled-access Data. Learn more

*When the research results including the data which were downloaded from NHA/DRA, are published or presented somewhere, the data user must refer the papers which are related to the data, or include in the acknowledgment. Learn more

 

MOLECULAR DATA

JGAS000205 / JGAS000260 / JGAS000316 / JGAS000415 / JGAS000530 / JGAS000531

Participants/Materials:

95+103+227+30+136 Japanese individuals

Inflammatory Bowel Disease

   35 Ulcerative Colitis

   39 Crohn's disease

40 Healthy controls

Targets Metagenome
Target Loci for Capture Methods -
Platform Illumina [HiSeq 3000, NovaSeq 6000]
Library Source DNA extracted from gut microbiome
Cell Lines -
Library Construction (kit name) KAPA Hyper Prep Kit
Fragmentation Methods Ultrasonic fragmentation (Covaris)
Spot Type Paired-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers) 150 bp
Japanese Genotype-phenotype Archive Dataset ID

JGAD000290 (95 Japanese individuals)

JGAD000363 (103 Japanese individuals)

JGAD000427 (227 Japanese individuals)

JGAD000532 (30 Japanese individuals)

JGAD000649 (Inflammatory Bowel Disease)

JGAD000650 (136 Japanese individuals)

Total Data Volume

JGAD000290:477 GB(fastq)

JGAD000363:408 GB(fastq)

JGAD000427:881.2 GB(fastq)

JGAD000532:106.7 GB(fastq)

JGAD000649:374.6 GB (fastq)

JGAD000650:541.4 GB(fastq)

Comments (Policies) NBDC policy

 

hum0197.v2.gwas.v1

Participants/Materials

Autoimmune pulmonary alveolar proteinosis cases (ICD10: J840): 198

Control participants: 395

Targets genome wide SNPs
Target Loci for Capture Methods -
Platform Illumina [Infinium Asian Screening Array]
Library Source DNAs extracted from peripheral blood cells
Cell Lines -
Reagents (Kit, Version) Infinium Asian Screening Array
Genotype Call Methods (software) GenomeStudio for genotyping, shapeit2 for haplotype phasing, and minimac3 for imputation
Association Analysis (software) PLINK2
Filtering Methods

Sample QC: We excluded samples with low genotyping call rates (call rate < 98%) and in close genetic relation (PI_HAT > 0.175). We included samples of the estimated East Asian ancestry.

Variant QC: We excluded variants with (1) genotyping call rate < 98%, (2) P value for Hardy–Weinberg equilibrium < 1.0 × 10−6, and (3) minor allele count < 5, or (4) > 10% frequency difference with the imputation reference panel.

Marker Number (after QC) 12,153,232 autosomal variants and 242,876 X-chromosomal variants after QC.
NBDC Dataset ID

hum0197.v2.gwas.v1

(Click the Dataset ID to download the file)

Dictionary file

Total Data Volume 390MB for autosome (txt.gz) and 19MB for X chromosome (txt.gz)
Comments (Policies) NBDC policy

 

hum0197.v3.gwas.v1

Participants/Materials Biobank Japan (n = 179,000), UK biobank (n = 361,000), FinnGen (n = 136,000), no. Phenotypes: 220
Targets genome wide SNPs
Target Loci for Capture Methods -
Platform

BBJ: Illumina [HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip]

UK Biobank: Applied Biosystems [UK BiLEVE Axiom Array, UK Biobank Axiom Array]

FinnGen: Thermo Fisher Scientific [FinnGen1 ThermoFisher Array or other genotyping arrays]

Library Source DNAs extracted from peripheral blood cells
Cell Lines -
Reagents (Kit, Version)

BBJ: HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip

UK Biobank: UK BiLEVE Axiom Array, UK Biobank Axiom Array

FinnGen: FinnGen1 ThermoFisher Array or other genotyping arrays

Genotype Call Methods (software)

BBJ: Eagle, Minimac3

UK Biobank: IMPUTE4

FinnGen: beagle4.1

Association Analysis (software)

For binary traits, SAIGE software was used with age, age2, sex, age×sex, age2×sex, and top 20 principal components as covariates. For quantitative traits (biomarkers), BOLT-LMM or plink software was used with the same covariates.

 

Filtering Methods

BBJ: We included imputed variants with Rsq > 0.7.

UK Biobank: We excluded the variants with (i) INFO score ≤ 0.8, (ii) MAF ≤ 0.0001 (except for missense and protein-truncating variants annotated by VEP, which were excluded if MAF ≤ 1 × 10-6), and (iii) PHWE ≤ 1 × 10-10.

FinnGen: We excluded variants with an imputation INFO score < 0.8 or MAF < 0.0001.

Marker Number (after QC)

BBJ: 13,530,797 variants

UK Biobank: 13,791,467 variants

FinnGen: 16,859,359 variants

NBDC Dataset ID

hum0197.v3.gwas.v1

(Click the Dataset ID to download the file)

Dictionary file (BBJ, EUR, META)

Total Data Volume

BBJ: ~1.5G for autosome and ~33M for chrX

UK Biobank: ~1.5G for autosome and ~15M for chrX

FinnGen: ~740M for autosome and ~20M for chrX

Comments (Policies) NBDC policy

 

hum0197.v5.gwas.v1 / hum0197.v5.finemap.v1

Participants/Materials Biobank Japan (n = 179,000), no. Phenotypes: 79
Targets genome wide SNPs
Target Loci for Capture Methods -
Platform Illumina [HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip]
Library Source DNAs extracted from peripheral blood cells
Cell Lines -
Reagents (Kit, Version) HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip
Genotype Call Methods (software) Eagle, Minimac3
Association Analysis (software)

GWAS: For binary traits, SAIGE software was used with age, age2, sex, age×sex, age2×sex, and top 20 principal components as covariates. For quantitative traits (biomarkers), BOLT-LMM was used with the same covariates.

Fine-mapping: FINEMAP and SuSiE were used with GWAS summary statistics and in-sample dosage LD, allowing up to 10 causal variants per region.

Filtering Methods

GWAS: We included imputed variants with Rsq > 0.7. For binary traits, variants with MAC < 10 were additionally excluded.

Fine-mapping: We defined fine-mapping regions based on a 3 Mb window around each lead variant and merged regions if they overlapped. We excluded the major histocompatibility complex (MHC) region (chr 6: 25–36 Mb) from analysis due to extensive LD structure in the region. For each method, we only included variants from successfully fine-mapped regions while excluding those from failed regions (e.g., due to conversion failure or available memory restrictions).

Marker Number (after QC) 13,531,752 variants (ref: hg19)
NBDC Dataset ID

hum0197.v5.gwas.v1 / hum0197.v5.finemap.v1

(Click the Dataset ID to download the file)

Dictionary file

Total Data Volume 14 GB
Comments (Policies) NBDC policy

 

JGAS000504

Participants/Materials: 141 Japanese individuals
Targets small RNA-seq
Target Loci for Capture Methods -
Platform Illumina [HiSeq 2500]
Library Source RNAs extracted from PBMC
Cell Lines -
Library Construction (kit name) SMARTer smRNA-Seq Kit
Fragmentation Methods -
Spot Type Single-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers) 100 bp
Mapping Methods bowtie (GRCh37)
Detecting method for read count (software) featureCounts + miRbase v22
QC We performed adapter trimming using Cutadapt v1.8 and removed reads with a low quality score (Phred quality score < 20 in >20% of total bases) using fastp v0.20.0. Also, we removed reads with a length of >29 bp or <15 bp, which are not expected to be mature miRNAs. Mature miRNAs detected with ≥1 read in at least half of the individuals were included in the dataset.
miRNA number 343
Japanese Genotype-phenotype Archive Dataset ID JGAD000621
Total Data Volume 54.7 KB (txt)
Comments (Policies) NBDC policy

 

hum0197.v6.eqtl.v1

Participants/Materials 141 Japanese individuals
Targets eQTL
Target Loci for Capture Methods -
Platform

small RNA-seq: Illumina [HiSeq 2500]

WGS: Illumina [HiSeq X Ten]

Library Source read count data of JGAS000504 and whole genome sequencing data using genomic DNA exracted from whole blood
Cell Lines -
Reagents (Kit, Version)

small RNA-seq: See JGAS000504

WGS: TruSeq DNA PCR-Free Library Preparation Kit

Genotype Call / Detecting read count Methods (software)

See JGAS000504 for read count data.

WGS: Sequenced reads were aligned against the reference human genome with the decoy sequence (GRCh37, human_g1k_v37_decoy) using BWA-MEM v0.7.13.

QC

See JGAS000504 for read count data.

WGS: We removed the variants (i)with low genotyping call rates (<0.90), (ii)with ExcessHet > 60 or (iii) with Hardy–Weinberg Pvalue < 1.0 × 10−10. Genotype refinement was performed using Beagle v5.1.

Marker Number (after QC)

See JGAS000504 for read count data.

WGS: 12,171,854 variants

eQTL algorithm We analyzed the association between genetic variants with minor allele frequency (MAF) ≥ 0.01 within a cis-window around each miRNA (±1 Mb of the mature miRNA) and normalized expression values using MatrixEQTL v2.3.
NBDC Dataset ID

hum0197.v6.eqtl.v1

(Click the Dataset ID to download the file)

Dictionary file

Total Data Volume 1.1 MB (txt)
Comments (Policies) NBDC policy

 

hum0197.v9.gwas.GCT.v1

Participants/Materials

Intracranial germ cell tumors cases (ICD10: C719): 133

Control participants: 762

Targets genome wide SNPs
Target Loci for Capture Methods -
Platform Illumina [Infinium Asian Screening Array]
Library Source DNAs extracted from peripheral blood cells
Cell Lines -
Reagents (Kit, Version) Infinium Asian Screening Array
Genotype Call Methods (software)

GenomeStudio for genotyping

shapeit2 for haplotype phasing

minimac3 for imputation

Association Analysis (software) PLINK2
Filtering Methods

Sample QC:

We excluded individuals (i) with genotyping call rate < 0.97, (ii) in close kinship (PI_HAT > 0.17), and (iii) estimated of non-East Asian ancestry were excluded.

Variant QC:

We excluded variants with (i) genotyping call rate < 0.99, (ii) minor allele count < 5, (iii) P value for Hardy–Weinberg equilibrium < 1.0 × 10−5 in controls, and (iv) > 10% allele frequency difference with the imputation reference panel or the allele frequency panel of Tohoku Medical Megabank Project.

Post-imputation QC:

We excluded imputed variants with Rsq < 0.7 and minor allele frequency < 0.5%.

Marker Number (after QC)

7,803,874 autosomal variants

181,867 X-chromosomal variants

NBDC Dataset ID

hum0197.v9.gwas.GCT.v1

(Click the Dataset ID to download the file)

Dictionary file

Total Data Volume 248 MB (txt)
Comments (Policies) NBDC policy

 

hum0197.v10.gwas.v1

Participants/Materials

Biobank Japan (n=161,801), UK biobank (n=377,583), no. Phenotypes: 9

   Patients: Autoimmune [Rheumatoid arthritis (ICD10: M05), Graves' disease (ICD10: C719), type I diabetes mellitus (ICD10: E10)]

                  Allergy [asthma (ICD10: J45), Atopic dermatitis (ICD10: L20), Pollinosis (ICD10: J301)]

   Controls: non-autoimmune +non-allergy individuals

(There is overlap among patients in each disease category)

Targets genome wide SNPs
Target Loci for Capture Methods -
Platform

BBJ: Illumina [HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip]

UK Biobank: Applied Biosystems [UK BiLEVE Axiom Array, UK Biobank Axiom Array]

Library Source DNAs extracted from peripheral blood cells
Cell Lines -
Reagents (Kit, Version)

BBJ: HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip

UK Biobank: UK BiLEVE Axiom Array, UK Biobank Axiom Array

Genotype Call Methods (software)

BBJ: Eagle, Minimac3

UK Biobank: IMPUTE4

Association Analysis (software)

SAIGE software was used with age, sex, and top five principal components as covariates.

RE2C software was used for the multi-trait meta-analysis adjusting for sample overlap between GWAS summary data.

Filtering Methods

We excluded the variants with Rsq < 0.7 and MAF < 0.005.

Marker Number (after QC)

BBJ: 8,374,220 autosomal variants for individual trait / 8,369,174 autosomal variants for meta-analysis

UKB: 10,864,380 autosomal variants for individual trait / 10,858,065 autosomal variants for meta-analysis

BBJ + UK Biobank: 5,965,154 autosomal variants for meta-analysis

NBDC Dataset ID

hum0197.v10.gwas.v1

(Click the Dataset ID to download the files)

Dictionary file

Total Data Volume

BBJ: ~ 760MB for individual trait / ~ 430MB for multi-trait meta-analysis

UK Biobank: ~ 1.1GB for individual trait / ~ 550MB for multi-trait meta-analysis

BBJ+UK Biobank: ~ 310MB for multi-trait meta-analysis

Comments (Policies) NBDC policy

 

JGAS000543 / JGAS000593

Participants/Materials

COVID-19 (ICD10: U071) : 30 + 43 cases

Healthy controls : 31 + 44 individuals

Targets scRNA-seq
Target Loci for Capture Methods -
Platform Illumina [NovaSeq 6000]
Library Source RNAs extracted from PBMC
Cell Lines -
Library Construction (kit name) Chromium Next GEM Single Cell 5’ Library & Gel Bead Kit v1.1, Chromium Next GEM Chip G Single Cell Kit, Single Index Kit T Set A
Fragmentation Methods Enzymatic fragmentation
Spot Type Paired-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers) 91 bp
NBDC Dataset ID

JGAD000662

JGAD000722

Total Data Volume 1.3 + 2.0 TB (fastq, xlsx [clinical data])
Comments (Policies) NBDC policy

 

hum0197.v12.MAG.v1

Participants/Materials:

Japanese gut microbiome

   JGAS000205 / JGAS000260 / JGAS000316 / JGAS000415 / JGAS000530 / JGAS000531, Public data (DRA006684)

Targets Metagenome
Target Loci for Capture Methods -
Platform Illumina [HiSeq 2500/3000, NovaSeq 6000]
Library Source JGAS000205 / JGAS000260 / JGAS000316 / JGAS000415 / JGAS000530 / JGAS000531, DRA006684
Cell Lines -
MAG methods De novo assembly with metaspades was performed. Then, binning with dastools (metabat2、maxbin2、concoct) was applied.
JDDBJ Sequence Read Archive ID

JGA MAG: 20220531NSUB000031HIGH_JGA_JMAG_GENOME_*.acclist.txt

DRA014186 (JGAS000205 / JGAS000260 / JGAS000316 / JGAS000415 / JGAS000530 / JGAS000531)

DRA014188 (JGAS000205 / JGAS000260 / JGAS000316 / JGAS000415 / JGAS000530 / JGAS000531)

DRA014191 (JGAS000205 / JGAS000260 / JGAS000316 / JGAS000415 / JGAS000530 / JGAS000531)

DRA014192 (JGAS000205 / JGAS000260 / JGAS000316 / JGAS000415 / JGAS000530 / JGAS000531)

TPA MAG: EMNX01000001-EMNX01000025、EMNY01000001-EMNY01000068、EMNZ01000001-EMNZ01000149, EMOA01000001-EMOA01000067

DRA014184 (DRA006684)

Total Data Volume

JGA MAG: 153 GB (fasta)

DRA014186: 11.5 GB (fasta)

DRA014188: 11.9 GB (fasta)

DRA014191: 12.2 GB (fasta)

DRA014192: 5.75 GB (fasta)

TPA MAG: 11.9 MB (fasta)

DRA014184: 3.65 GB (fasta)

Comments (Policies) NBDC policy

 

hum0197.v12.VIRUS.v1

Participants/Materials:

Japanese gut microbiome

   JGAS000205 / JGAS000260 / JGAS000316 / JGAS000415 / JGAS000530 / JGAS000531, Public data (DRA006684)

Targets NGS (WGS)
Target Loci for Capture Methods -
Platform Illumina [HiSeq 2500/3000, NovaSeq 6000]
Library Source JGAS000205 / JGAS000260 / JGAS000316 / JGAS000415 / JGAS000530 / JGAS000531, DRA006684
Cell Lines -
Virus genome contsruction De novo assembly with metaspades was performed. Then, viral contigs were detected with virfinder and virsorter.
JDDBJ Sequence Read Archive ID

JGAS000205 / JGAS000260 / JGAS000316 / JGAS000415 / JGAS000530 / JGAS000531: BRDB01000001-BRDB01028816

DRA006684: EMNW01000001-EMNW01002579

Total Data Volume

JGAS000205 / JGAS000260 / JGAS000316 / JGAS000415 / JGAS000530 / JGAS000531: 1.09 GB (fasta)

DRA006684: 98.3 MB (fasta)

Comments (Policies) NBDC policy

 

hum0197.v12.CRISPR.v1

Participants/Materials:

Japanese gut microbiome

   JGAS000205 / JGAS000260 / JGAS000316 / JGAS000415 / JGAS000530 / JGAS000531, Public data (DRA006684)

Targets NGS (WGS)
Target Loci for Capture Methods -
Platform Illumina [HiSeq 2500/3000, NovaSeq 6000]
Library Source JGAS000205 / JGAS000260 / JGAS000316 / JGAS000415 / JGAS000530 / JGAS000531, DRA006684
Cell Lines -
CRISPR contsruction MINCED was applied to the MAGs.
DDBJ Sequence Read Archive ID

DRA014186 (JGAS000205 / JGAS000260 / JGAS000316 / JGAS000415 / JGAS000530 / JGAS000531)

DRA014184 (DRA006684)

Total Data Volume

DRA014184: 17.9 MB (fasta)

DRA014186: 1.43 MB (fasta)

Comments (Policies) NBDC policy

 

JGAS000600

Participants/Materials:

88 Japanese individuals (shotgun sequencing)

73 healthy individuals (shotgun sequencing)

   - DNA extraction was performed with phenol-chloroform extraction: 73 samples

   - DNA extraction with DNeasy PowerSoil Pro kit: 47 samples

5 Japanese individuals (deep shotgun sequencing)

Targets Metagenome
Target Loci for Capture Methods -
Platform Illumina [HiSeq 3000, NovaSeq 6000]
Library Source DNA extracted from gut microbiome
Cell Lines -
Library Construction (kit name) KAPA Hyper Prep Kit
Fragmentation Methods Ultrasonic fragmentation (Covaris)
Spot Type Paired-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers) 150 bp
Japanese Genotype-phenotype Archive Dataset ID JGAD000729
Total Data Volume 2.6 TB(fastq)
Comments (Policies) NBDC policy

 

hum0197.v16.gwas.v1

Participants/Materials

BioBank Japan (n=180,215), UK Biobank (n=377,441)

large-scale meta-analysis including the summary statistics of other cohorts (FinnGen, BCAC, and PRACTICAL) for breast and prostate cancer (n=648,746 and 482,080)

no. Phenotypes: 15

      Patients: biliary tract (ICD10: C22.1, 23-24), breast (ICD10: C509, cervical (ICD10: C53), colorectal (ICD10: C18-20), endometrial (ICD10: C54), esophageal (ICD10: C15), gastric (ICD10: C16), hepatocellular (ICD10: C22.0), lung (ICD10: C34), non-Hodgkin's lymphoma (ICD10: C82-83), ovarian (ICD10: C56), pancreatic (ICD10: C25), and prostate (ICD10: C61) cancer

      Controls: without cancer individuals

(There is overlap among patients in each disease category)

Targets genome wide SNPs
Target Loci for Capture Methods -
Platform

BBJ: Illumina [HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip]

UK Biobank: Applied Biosystems [UK BiLEVE Axiom Array, UK Biobank Axiom Array]

FinnGen: Thermo Fisher Scientific [FinnGen1 ThermoFisher Array or other genotyping arrays]

BCAC: Illumina [iCOGS OncoArray]

PRACTICAL: Illumina [iCOGS OncoArray]

Library Source DNAs extracted from peripheral blood cells
Cell Lines -
Reagents (Kit, Version)

BBJ: HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip

UK Biobank: UK BiLEVE Axiom Array, UK Biobank Axiom Array

FinnGen: FinnGen1 ThermoFisher Array or other genotyping arrays

BCAC: Infinium OncoArray-500K v1.0 BeadChip Kit

PRACTICAL: Infinium OncoArray-500K v1.0 BeadChip Kit

Genotype Call Methods (software)

BBJ: Eagle, Minimac3

UK Biobank: IMPUTE4

FinnGen: beagle4.1

BCAC: IMPUTE2

PRACTICAL: IMPUTE2

Association Analysis (software)

SAIGE software was used with age, sex, and top five principal components as covariates.

RE2C software was used for the multi-trait meta-analysis adjusting for sample overlap between GWAS summary data.

Filtering Methods

Sample QC and Variant QC for each dataset: refer to ReadMe file

We excluded the variants with Rsq < 0.7 and MAF < 0.01.

Marker Number (after QC)

BBJ: 13MN (7,398,798) , each cancer (7,442,557 (7,420,485-7,444,681))

UK Biobank: 13MN (9,602,853), each cancer (9,620,786  (9,620,343-9,620,935))

BBJ + UK Biobank: 13MN (5,374,018), each cancer (5,696,155 (5,677,934-5,698,357))

BBJ + UK Biobank + FinnGen + BCAC (breast cancer): 5,104,756

BBJ + UK Biobank + FinnGen + PRACTICAL (prostate cancer): 5,105,796

BBJ + UK Biobank + FinnGen + BCAC + PRACTICAL (breast cancer + prostate cancer): 5,100,089

   *mean (min-max) for each cancer

NBDC Dataset ID

hum0197.v16.gwas.v1

(Click the Dataset ID to download the files)

Dictionary file

Total Data Volume

BBJ: 13MN (287 MB), each cancer (625 (605-633) MB)

UK Biobank: 13MN (362 MB), each cancer (841 (814-859) MB)

BBJ + UK Biobank: 13MN (202 MB), each cancer (260 (255-264) MB)

BBJ + UK Biobank + FinnGen + BCAC (breast cancer): 242 MB

BBJ + UK Biobank + FinnGen + PRACTICAL (prostate cancer): 243 MB

BBJ + UK Biobank + FinnGen + BCAC + PRACTICAL (breast cancer + prostate cancer): 253 MB

   *mean (min-max) for each cancer

Comments (Policies) NBDC policy

 

hum0197.v17.hic-gwas.v1

Participants/Materials

Hunner-type interstitial cystitis cases (ICD10: N301): 144

Control participants: 41,516

Targets genome wide SNPs
Target Loci for Capture Methods -
Platform Illumina [Infinium Asian Screening Array]
Library Source DNAs extracted from peripheral blood cells
Cell Lines -
Reagents (Kit, Version) Infinium Asian Screening Array
Genotype Call Methods (software)

GenomeStudio for genotyping

shapeit4 for haplotype phasing

minimac4 for imputation

Association Analysis (software) SAIGE
Filtering Methods

Sample QC: We excluded individuals with low genotyping call rates (call rate < 98%). We included individuals of the estimated Japanese ancestry using PCA.

Variant QC: We excluded variants with (1) genotyping call rate < 99%, (2) minor allele count < 5, (3) P-value for Hardy–Weinberg equilibrium < 1.0 × 10^−10, and (4) > 5% allele frequency difference compared with the imputation reference panel or the allele frequency panel of Tohoku Medical Megabank Project.

Post-imputation QC: We excluded imputed variants with Rsq < 0.7 and minor allele frequency < 0.5%.

Marker Number (after QC) 7,909,790 variants (hg19)
NBDC Dataset ID

hum0197.v17.hic-gwas.v1

(Click the Dataset ID to download the file)

Dictionary file

Total Data Volume 700 MB (txt)
Comments (Policies) NBDC policy

 

hum0197.v18.gwas.v1

Participants/Materials

524 Japanese individuals (423 species in the gut microbiome)

306 Japanese individuals (306 plasma metabolites)

524 Japanese individuals (KEGG Gene Ortholog and KEGG Pathway)

Targets genome wide SNPs
Target Loci for Capture Methods -
Platform

SNP array: Illumina [Infinium Asian Screening Array]

Whole genome sequencing: Illumina [HiSeq X Ten]

Metagenome shotgun sequencing: Illumina [HiSeq 2500/3000、NovaSeq 6000]

Library Source DNAs extracted from peripheral blood cells
Cell Lines -
Reagents (Kit, Version)

SNP array: Infinium Asian Screening Array

Whole genome sequencing: TruSeq DNA PCR-Free Library Preparation Kit

Metagenome shotgun sequencing: KAPA Hyper Prep Kit

Genotype Call Methods (software)

SNP array:

      Genotyping: GenomeStudio

     Haplotype phasing: shapeit4

     Imputation: minimac4

WGS:

     WA-MEM v0.7.13 + GATK v3.8-0

Association Analysis (software) PLINK2
Filtering Methods

SNP array data:

     Sample QC: We excluded individuals with low genotyping call rates (call rate < 98%). We included individuals of the estimated Asian ancestry using PCA.

      Variant QC: We excluded variants with (1) genotyping call rate < 99%, (2) minor allele count < 5, (3) P-value for Hardy–Weinberg equilibrium < 1.0 × 10^−10, and (4) > 5% allele frequency difference compared with the imputation reference panel or the allele frequency panel of Tohoku Medical Megabank Project.

     Post-imputation QC: We excluded imputed variants with Rsq < 0.7 and minor allele frequency < 1%.

WGS:

     We excluded variants with genotype call rate <90%, ExcessHet > 60, Hardy-Weinberg P<1.0×10−10

     After imputation with Beagle v5.1, we excluded imputed variants with minor allele frequency < 1%.

Marker Number (after QC)

Gut microbiome/KEGG (SNP array): 7,213,470 variants (hg19)

Metabolome (WGS): 6,840,258 variants (GRCh37)

NBDC Dataset ID

hum0197.v18.gwas.v1 (Gut microbiome, Plasma metabolites, KEGG)

(Click the link above to download the files)

Dictionary file

Total Data Volume

Gut microbiome: 206 GB

Metabolome: 90.7 GB

KEGG: 300 MB

Comments (Policies) NBDC policy

 

hum0197.v19.prs.v1

Participants/Materials

BioBank Japan

    Type 2 diabetes (ICD10: E11): 27,642 cases

    Control participants: 70,242

UK Biobank

    Type 2 diabetes (ICD10: E11): 27,642 cases

    Control participants: 70,242

Targets genome wide SNPs
Target Loci for Capture Methods -
Platform

BBJ: Illumina [HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip]

UK Biobank: Applied Biosystems [UK BiLEVE Axiom Array, UK Biobank Axiom Array]

Library Source DNAs extracted from peripheral blood cells
Cell Lines -
Reagents (Kit, Version)

BBJ: HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip

UK Biobank: UK BiLEVE Axiom Array, UK Biobank Axiom Array

Genotype Call Methods (software)

plink2

Association Analysis (software)

BBJ: Eagle, Minimac3

UK Biobank: IMPUTE4

Filtering Methods

Variants with imputation quality of Rsq < 0.3 or minor allele frequency

(MAF) < 1% were excluded

The details are described below

https://doi.org/10.1038/s41588-024-01782-y

Marker Number (after QC)

BBJ second cohort: 728,824 variants

ToMMo: 855,161 variants

NBDC Dataset ID

hum0197.v19.prs.v1

(Click the Dataset ID to download the file)

Dictionary file

Total Data Volume 180 MB (txt)
Comments (Policies) NBDC policy

 

hum0197.v20.gwas.v1

Participants/Materials

Recurrent pregnancy loss cases (ICD10: N96): 1,728

Control participants: 24,315

Targets genome wide SNPs
Target Loci for Capture Methods -
Platform Illumina [Infinium Asian Screening Array]
Library Source DNAs extracted from peripheral blood cells
Cell Lines -
Reagents (Kit, Version) Infinium Asian Screening Array
Genotype Call Methods (software)

Genotyping: GenomeStudio

Haplotype phasing: shapeit4

Imputation: minimac4

Association Analysis (software) SAIGE
Filtering Methods

Sample QC: We excluded individuals with low genotyping call rates (call rate < 98%). We included individuals of the estimated Japanese ancestry using PCA.

Variant QC: We excluded variants with (1) genotyping call rate < 99%, (2) minor allele count < 5, (3) P-value for Hardy–Weinberg equilibrium < 1.0 × 10^−10, and (4) > 5% allele frequency difference compared with the imputation reference panel or the allele frequency panel of Tohoku Medical Megabank Project.

Post-imputation QC: We excluded imputed variants with Rsq < 0.7 and minor allele frequency < 0.5%.

Marker Number (after QC) 8,717,430 variants(hg19)
NBDC Dataset ID

hum0197.v20.gwas.v1

(Click the Dataset ID to download the file)

Dictionary file

Total Data Volume 465 MB (txt)
Comments (Policies) NBDC policy

 

JGAS000741 (WGS)

Participants/Materials

Autoimmune diseases (ICD10: L400, M0690, M329, J840, G35): 2,238 cases

Control participants: 2,919

Targets WGS
Target Loci for Capture Methods -
Platform Illumina [NovaSeq 6000/HiSeq X Ten]
Library Source DNAs extracted from peripheral blood cells
Cell Lines -
Library Construction (kit name) TruSeq DNA PCR-free Library Prep kit
Fragmentation Methods Ultrasonic fragmentation
Spot Type Paired-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers) 150 bp x 2
Methods for removing host sequence/detecting viral sequence (software)

https://github.com/shohei-kojima/integrated_HHV6_recon

https://github.com/shohei-kojima/human_anellovirus_detection

QC We conducted principal component analysis (PCA) against HapMap3 data using SNP data of the same individuals to confirm the East Asian genetic background.
Reference sequence for viral genome Refer to the softwares' GitHub repositry.
Japanese Genotype-phenotype Archive Dataset ID JGAD000876
Total Data Volume 181.3 GB (fastq)
Comments (Policies) NBDC policy

 

JGAS000741 (scRNA-seq)

Participants/Materials

Systemic lupus erythematosus (ICD10: M329): 8 cases

   eHHV-6B-positive: 3 cases

   eHHV-6B-negative: 5 cases

Targets scRNA-seq
Target Loci for Capture Methods -
Platform Illumina [NovaSeq 6000]
Library Source RNAs extracted from PBMC
Cell Lines -
Library Construction (kit name) Chromium Next GEM Single Cell 5’ Library & Gel Bead Kit v1.1, Chromium Next GEM Chip G Single Cell Kit, Single Index Kit T Set A
Fragmentation Methods Enzymatic fragmentation
Spot Type Paired-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers) 91 bp
NBDC Dataset ID JGAD000876
Total Data Volume 181.3 GB (fastq)
Comments (Policies) NBDC policy

 

hum0197.v21.gwas-ehhv6.v1

Participants/Materials

Autoimmune diseases (ICD10: L400, M0690, M329, J840, G35): 238 cases

   eHHV-6B-positive: 22 cases

   eHHV-6B-negative: 216 cases

Targets genome wide SNPs
Target Loci for Capture Methods -
Platform Illumina [NovaSeq 6000/HiSeq X Ten]
Library Source DNAs extracted from peripheral blood cells
Cell Lines -
Reagents (Kit, Version) TruSeq DNA PCR-free Library Prep kit/span>
Genotype Call Methods (software) The FASTQ reads were aligned to T2T-CHM13v2.0 with BWA-MEM (v0.7.27), followed by GATK4 MarkDuplicates and Base Quality Score Recalibration (v4.2.6.1) according to the GATK Best Practice. Then, we performed per-sample SNP and indel calling using GATK4 HaplotypeCaller and joint genotyping using GATK4 GenomicsDBImport and GenotypeGVCF. We conducted LD-based genotype refinement for low-confidence genotypes and missing sites in WGS data using BEAGLE v5.4 with default settings.
Association Analysis (software) PLINK v2.0 software was used with top two principal components and sex as covariates.
Filtering Methods

Sample QC: Individuals were excluded if they showed conflicting sex assignments between genetically inferred sex by variants and WGS coverage, deviating heterozygosity rate (±3 standard deviations), or cryptic relatedness (pi-hat > 0.2). We included samples of the estimated Japanese ancestry using PCA. Four cases were excluded.

Variant QC: We excluded (1) non-autosomal variants, (2) multi-allelic sites and spanning deletions, and (3) variants with P-value for Hardy?Weinberg equilibrium < 1e-10 in cases and < 1e-6 in controls.

Marker Number (after QC) 6,464,509 SNPs
NBDC Dataset ID

hum0197.v21.gwas-ehhv6.v1

(Click the Dataset ID to download the file)

Dictionary file

Total Data Volume 416 MB (tsv)
Comments (Policies) NBDC policy

 

hum0197.v21.gwas-jomon.v1

Participants/Materials The first cohort of Biobank Japan (n = 171,287)
Targets genome wide SNPs
Target Loci for Capture Methods -
Platform Illumina [HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip]
Library Source DNAs extracted from peripheral blood cells
Cell Lines -
Reagents (Kit, Version) HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip
Genotype Call Methods (software) Eagle, Minimac3
Association Analysis (software)

1) GCTA-fastGWA with the adjustment of covariates: age, age2, sex, the top 20 PCs, 45 disease status, geographic regions, and PCA clusters.

2) Fixed-effect meta-analysis of Mainland summary data including individuals from the Mainland and EA_admix clusters (n = 151,075) and of Ryukyu summary data including individuals from the Ryukyu, Ryukyu admix, and Hokkaido_sub clusters (n = 10,080) using METAL.

Filtering Methods

Sample QC: We excluded (i) individuals with lower call rates (< 99%), (ii) closely related individuals with genetic relatedness ≥ 0.178 calculated from a genetic related matrix (GRM) by GCTA (version 1.93.3beta2). We included samples of the estimated Japanese ancestry using PCA.

Variant QC: We excluded variants with (i) call rate < 99%, (ii) P value for Hardy-Weinberg equilibrium (HWE) < 1.0 × 10-6, (iii) number of heterozygotes < 5, and (iv) a concordance rate < 99.5% or a non-reference concordance rate between GWAS array and whole genome sequencing.

after association test: Double genomic control correction method using METAL was conducted. Computing Z score for each variant by considering the sign of the beta coefficient and the associated p-value, we left the variants with positive Z score.

Marker Number (after QC) 3,454,970 SNPs
NBDC Dataset ID

hum0197.v21.gwas-jomon.v1

(Click the Dataset ID to download the file)

Dictionary file

Total Data Volume 65 MB (txt)
Comments (Policies) NBDC policy

 

DATA PROVIDER

Principal Investigator: Yukinori Okada

Affiliation: Department of Statistical Genetics, Osaka University Graduate School of Medicine

Project / Group Name: -

Funds / Grants (Research Project Number):

NameTitleProject Number
Precursory Research for Innovative Medical care (PRIME), Advanced Research & Development Programs for Medical Innovation, Japan Agency for Medical Research and Development (AMED) Crosstalk among microbiome, host, disease, and drug discovery enhanced by statistical genetics JP19gm6010001
FORCE, Advanced Research & Development Programs for Medical Innovation, Japan Agency for Medical Research and Development (AMED) Elucidation of disease-specific microbiota and personalized medicine by metagenome-wide association studies JP20gm4010006
Practical Research Project for Rare / Intractable Diseases, Japan Agency for Medical Research and Development (AMED) Biology and in silico drug repositioning of pulmonary alveolar proteinosis using trans-layer omics analysis JP20ek0109413
Practical Research Project for Allergic Diseases and Immunology, Japan Agency for Medical Research and Development (AMED) Nucleic genome drug discovery for autoimmune diseases through in-silico and patient-oriented screening utilizing large-scale disease genetics JP19ek0410041
Practical Research Project for Allergic Diseases and Immunology, Japan Agency for Medical Research and Development (AMED) Genomic prediction medicine of rheumatoid arthritis based on comprehensive immune-omics resources JP21ek0410075
Platform Program for Promotion of Genome Medicine, Japan Agency for Medical Research and Development (AMED) Implementation of genomic prediction medicine based on statistical genetics JP21km0405211
Platform Program for Promotion of Genome Medicine, Japan Agency for Medical Research and Development (AMED) Next-generation genomics analyses elucidates biology, personalized medicine, and drug discovery of psoriasis JP21km0405217
KAKENHI Grant-in-Aid for Scientific Research (A) Elucidation of disease biology and tissue specificity by trans-layer omics analysis and whole-genome sequencing 19H01021
KAKENHI Grant-in-Aid for Scientific Research (A) Elucidation of immune and allergic disease dynamics by integrative sequencing analysis 22H00476

 

PUBLICATIONS

TitleDOIDataset ID
1 Metagenome-wide association study of gut microbiome revealed novel aetiology of rheumatoid arthritis in the Japanese population. doi: 10.1136/annrheumdis-2019-215743 JGAD000290
2 Genetic determinants of risk in autoimmune pulmonary alveolar proteinosis. doi: 10.1038/s41467-021-21011-y hum0197.v2.gwas.v1
3 A metagenome-wide association study of gut microbiome in patients with multiple sclerosis revealed novel disease pathology. doi: 10.3389/fcimb.2020.585973 JGAD000363
4 A global atlas of genetic associations of 220 deep phenotypes doi: 10.1101/2020.10.23.20213652 hum0197.v3.gwas.v1
5 Metagenome-wide association study revealed disease-specific landscape of the gut microbiome of systemic lupus erythematosus in Japanese doi: 10.1136/annrheumdis-2021-220687 JGAD000427
6 Whole gut virome analysis of 476 Japanese revealed a link between phage and autoimmune disease doi: 10.1136/annrheumdis-2021-221267 JGAD000532
7 Insights from complex trait fine-mapping across diverse populations doi: 10.1101/2021.09.03.21262975

hum0197.v5.gwas.v1

hum0197.v5.finemap.v1

8 Genetic architecture of microRNA expression and its link to complex diseases in the Japanese population. doi: 10.1093/hmg/ddab361

JGAD000621

hum0197.v6.eqtl.v1

9 Multi-trait and cross-population genome-wide association studies across autoimmune and allergic diseases identify shared and distinct genetic components. doi: 10.1136/annrheumdis-2022-222460 hum0197.v10.gwas.v1
10 DOCK2 is involved in the host genetics and biology of severe COVID-19 doi: 10.1038/s41586-022-05163-5 JGAD000662
11 Prokaryotic and viral genomes recovered from 787 Japanese gut metagenomes revealed microbial features linked to diets, populations, and diseases doi: 10.1016/j.xgen.2022.100219 hum0197.v12
12 Reconstruction of the personal information from human genome reads in gut metagenome sequencing data doi: 10.1038/s41564-023-01381-3 JGAD000729
13 Pan-cancer and cross-population genome-wide association studies dissect shared genetic backgrounds underlying carcinogenesis doi: 10.1038/s41467-023-39136-7 hum0197.v16.gwas.v1
14 Genome-wide association analysis identifies susceptibility loci within the major histocompatibility complex region for Hunner-type interstitial cystitis doi: 10.1016/j.xcrm.2023.101114 hum0197.v17.hic-gwas.v1
15 Analysis of gut microbiome, host genetics, and plasma metabolites reveals gut microbiome-host interactions in the Japanese population doi: 10.1016/j.celrep.2023.113324 hum0197.v18.gwas.v1
16 Body mass index stratification optimizes polygenic prediction of type 2 diabetes in cross-biobank analyses doi: 10.1038/s41588-024-01782-y hum0197.v19.prs.v1
17 Common and rare genetic variants predisposing females to unexplained recurrent pregnancy loss doi: 10.1038/s41467-024-49993-5 hum0197.v20.gwas.v1
18 Blood DNA virome associates with autoimmune diseases and COVID-19 hum0197.v21.gwas-ehhv6.v1
JGAD000876
19 Genetic Legacy of Ancient Hunter-Gatherer Jomon in Japanese Populations hum0197.v21.gwas-jomon.v1

 

USRES (Controlled-access Data)

Principal InvestigatorAffiliationCountry/RegionResearch TitleData in Use (Dataset ID)Period of Data Use
Ilana Brito Meinig School of Biomedical Engineering, Cornell University United States of America Comparative metagenomics of lupus patients' microbiomes JGAD000290, JGAD000363,
JGAD000427, JGAD000532
2022/05/12-2024/05/04
Yongxin Li Department of Chemistry, The University of Hong Kong Hong Kong Comparison of gut bacterial diversity and composition in MS/EAE JGAD000363 2022/09/19-2024/07/01
Tina Fuchs Institute for Clinical Chemistry, Medical Faculty Mannheim, Heidelberg University Germany Investigating the clonality of VIREM cells in COVID-19 patients JGAD000662, JGAD000772 2024/02/26-2024/12/31
Koichi Matsuda Department of Computational Biology and Medical Sciences, Graduate school of Frontier Sciences, The University of Tokyo Japan Disease Cohort Research Network for Disease Marker Exploratory Studies JGAD000290, JGAD000363,
JGAD000427, JGAD000532,
JGAD000649, JGAD000650,
JGAD000662, JGAD000722,
JGAD000729
2024/06/17-2029/03/31