NBDC Research ID: hum0197.v21

SUMMARY

Aims: Elucidation of disease biology based on trans-omics analysis, GWAS in the Japanese and trans-ethnic populations, Elucidation of the mechanism of COVID-19 severity, Improving the performance of type 2 diabetes polygenic predictions, Elucidation of the genetic architecture of recurrent pregnancy loss, Elucidation of the association between Jomon component in the Japanese population and phenotypes and diseases

Methods: Metagenome shotgun sequencing, genome-wide association study (GWAS), small RNA-seq and eQTL analyses, whole genome sequencing (WGS)

Participants/Materials:

Metagenomic data of gut microbiome in the Japanese population (95 + 103 + 227 + 30 + 136 individuals)

Autoimmune pulmonary alveolar proteinosis cases: 198, Control participants: 395

Populations: Biobank Japan (n = 179,000), UK biobank (n = 361,000), and FinnGen (n = 136,000), Phenotypes: 220

141 Japanese individuals

Metagenomic data of gut microbiome in Inflammatory Bowel Disease (35 Ulcerative Colitis and 39 Crohn's disease) and 40 Healthy controls

Intracranial germ cell tumors cases: 133, Control participants: 762

Populations: Biobank Japan (n = 161,801) and UK biobank (n = 377,583), Phenotypes: 9

PBMC from Japanese population (COVID-19: n = 30 + 43, Healthy controls: n = 31 + 44)

Microbial genome: Metagenome-Assembled Genome (MAG), Viral genome, CRISPR spacers

Metagenomic data of gut microbiome in the Japanese population (88 + 5 individuals) and healthy individuals (n = 73)

BioBank Japan (n=180,215), UK Biobank (n=377,441), and large-scale meta-analysis including the summary statistics of other cohorts [FinnGen, Breast Cancer Association Consortium (BCAC), and Prostate Cancer Association Group to Investigate Cancer Associated Alterations in the Genome (PRACTICAL)] for breast and prostate cancer (n=648,746 and 482,080), Phenotypes: 15

Hunner-type interstitial cystitis cases: 144, Control participants: 41,516

524 Japanese individuals for gut microbiome–host genome association analysis, 362 Japanese individuals for plasma metabolite–host genome association analysis

The weights of variants existing in the target cohorts, Tohoku Medical Megabank and the second cohort of BBJ, calculated from GWAS results on 27,642 type 2 diabetes cases and 70,242 controls from BioBank Japan and UK Biobank

Recurrent pregnancy loss cases: 1,728, Control participants: 24,315

Autoimmune diseases cases: 2,238, Healthy controls: 2,919

The first cohort of BioBank Japan (n = 171,287)

Dataset ID	Type of Data	Criteria	Release Date
JGAS000205	Metagenome	Controlled-access (Type I)	2019/11/15
hum0197.v2.gwas.v1	GWAS for autoimmune pulmonary alveolar proteinosis	Unrestricted-access	2020/11/27
JGAS000260	Metagenome	Controlled-access (Type I)	2020/11/27
hum0197.v3.gwas.v1	GWAS for 215 phenotypes	Unrestricted-access	2021/03/22
JGAS000316	Metagenome	Controlled-access (Type I)	2021/10/12
JGAS000415	Metagenome	Controlled-access (Type I)	2021/12/10
hum0197.v5.gwas.v1	GWAS for 10 phenotypes	Unrestricted-access	2021/12/21
hum0197.v5.finemap.v1	Fine-mapping for 79 phenotypes	Unrestricted-access	2021/12/21
JGAS000504	Read count data of miRNA	Controlled-access (Type I)	2022/02/08
hum0197.v6.eqtl.v1	eQTL data	Unrestricted-access	2022/02/08
JGAS000530	Metagenome	Controlled-access (Type I)	2022/05/23
JGAS000531	Metagenome	Controlled-access (Type I)	2022/06/03
hum0197.v9.gwas.GCT.v1	GWAS for intracranial germ cell tumors	Unrestricted-access	2022/06/10
hum0197.v10.gwas.v1	GWAS for 9 phenotypes	Unrestricted-access	2022/06/16
JGAS000543	Raw sequencing data of single-cell RNA-seq	Controlled-access (Type I)	2022/07/21
hum0197.v12	MAG, Viral genome and CRISPR spacers of Microbial genome	Unrestricted-access	2022/12/01
JGAS000543 (data addition)	clinical data	Controlled-access (Type I)	2023/02/14
JGAS000593	Raw sequencing data of single-cell RNA-seq, clinical data	Controlled-access (Type I)	2023/02/14
hum0197.v3.gwas.v1 (data addition)	GWAS for 5 phenotypes	Unrestricted-access	2023/02/16
JGAS000600	Metagenome	Controlled-access (Type I)	2023/03/29
hum0197.v16.gwas.v1	GWAS for 15 phenotypes	Unrestricted-access	2023/06/06
hum0197.v17.hic-gwas.v1	GWAS for Hunner-type interstitial cystitis	Unrestricted-access	2023/06/27
hum0197.v18.gwas.v1	GWAS for gut microbiome GWAS for plasma metabolite GWAS for KEGG Gene Ortholog and KEGG Pathway	Unrestricted-access	2023/10/02
hum0197.v19.prs.v1	The weights of variants calculated from GWAS results on type 2 diabetes	Unrestricted-access	2024/05/29
hum0197.v20.gwas.v1	GWAS for recurrent pregnancy loss	Unrestricted-access	2024/05/30
hum0197.v21.gwas-ehhv6.v1	GWAS for autoimmune diseases	Unrestricted-access	2024/10/28
JGAS000741	The presence or absence of endogenous herpesvirus 6 and anellovirus load calculated from NGS (WGS) for autoimmune diseases Raw sequencing data of single-cell RNA-seq	Controlled-access (Type I)	2024/10/28
hum0197.v21.gwas-jomon.v1	GWAS for the individual Jomon proportions	Unrestricted-access	2024/10/28

*Release Note

* Data users need to apply an application for Using NBDC Human Data to reach the Controlled-access Data. Learn more

*When the research results including the data which were downloaded from NHA/DRA, are published or presented somewhere, the data user must refer the papers which are related to the data, or include in the acknowledgment. Learn more

MOLECULAR DATA

JGAS000205 / JGAS000260 / JGAS000316 / JGAS000415 / JGAS000530 / JGAS000531


Participants/Materials:	95+103+227+30+136 Japanese individuals Inflammatory Bowel Disease 35 Ulcerative Colitis 39 Crohn's disease 40 Healthy controls
Targets	Metagenome
Target Loci for Capture Methods	-
Platform	Illumina [HiSeq 3000, NovaSeq 6000]
Library Source	DNA extracted from gut microbiome
Cell Lines	-
Library Construction (kit name)	KAPA Hyper Prep Kit
Fragmentation Methods	Ultrasonic fragmentation (Covaris)
Spot Type	Paired-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers)	150 bp
Japanese Genotype-phenotype Archive Dataset ID	JGAD000290 (95 Japanese individuals) JGAD000363 (103 Japanese individuals) JGAD000427 (227 Japanese individuals) JGAD000532 (30 Japanese individuals) JGAD000649 (Inflammatory Bowel Disease) JGAD000650 (136 Japanese individuals)
Total Data Volume	JGAD000290：477 GB（fastq） JGAD000363：408 GB（fastq） JGAD000427：881.2 GB（fastq） JGAD000532：106.7 GB（fastq） JGAD000649：374.6 GB （fastq） JGAD000650：541.4 GB（fastq）
Comments (Policies)	NBDC policy

hum0197.v2.gwas.v1


Participants/Materials	Autoimmune pulmonary alveolar proteinosis cases (ICD10: J840): 198 Control participants: 395
Targets	genome wide SNPs
Target Loci for Capture Methods	-
Platform	Illumina [Infinium Asian Screening Array]
Library Source	DNAs extracted from peripheral blood cells
Cell Lines	-
Reagents (Kit, Version)	Infinium Asian Screening Array
Genotype Call Methods (software)	GenomeStudio for genotyping, shapeit2 for haplotype phasing, and minimac3 for imputation
Association Analysis (software)	PLINK2
Filtering Methods	Sample QC: We excluded samples with low genotyping call rates (call rate < 98%) and in close genetic relation (PI_HAT > 0.175). We included samples of the estimated East Asian ancestry. Variant QC: We excluded variants with (1) genotyping call rate < 98%, (2) P value for Hardy–Weinberg equilibrium < 1.0 × 10⁻⁶, and (3) minor allele count < 5, or (4) > 10% frequency difference with the imputation reference panel.
Marker Number (after QC)	12,153,232 autosomal variants and 242,876 X-chromosomal variants after QC.
NBDC Dataset ID	hum0197.v2.gwas.v1 (Click the Dataset ID to download the file) Dictionary file
Total Data Volume	390MB for autosome (txt.gz) and 19MB for X chromosome (txt.gz)
Comments (Policies)	NBDC policy

hum0197.v3.gwas.v1


Participants/Materials	Biobank Japan (n = 179,000), UK biobank (n = 361,000), FinnGen (n = 136,000), no. Phenotypes: 220
Targets	genome wide SNPs
Target Loci for Capture Methods	-
Platform	BBJ: Illumina [HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip] UK Biobank: Applied Biosystems [UK BiLEVE Axiom Array, UK Biobank Axiom Array] FinnGen: Thermo Fisher Scientific [FinnGen1 ThermoFisher Array or other genotyping arrays]
Library Source	DNAs extracted from peripheral blood cells
Cell Lines	-
Reagents (Kit, Version)	BBJ: HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip UK Biobank: UK BiLEVE Axiom Array, UK Biobank Axiom Array FinnGen: FinnGen1 ThermoFisher Array or other genotyping arrays
Genotype Call Methods (software)	BBJ: Eagle, Minimac3 UK Biobank: IMPUTE4 FinnGen: beagle4.1
Association Analysis (software)	For binary traits, SAIGE software was used with age, age2, sex, age×sex, age2×sex, and top 20 principal components as covariates. For quantitative traits (biomarkers), BOLT-LMM or plink software was used with the same covariates.
Filtering Methods	BBJ: We included imputed variants with Rsq > 0.7. UK Biobank: We excluded the variants with (i) INFO score ≤ 0.8, (ii) MAF ≤ 0.0001 (except for missense and protein-truncating variants annotated by VEP, which were excluded if MAF ≤ 1 × 10-6), and (iii) PHWE ≤ 1 × 10-10. FinnGen: We excluded variants with an imputation INFO score < 0.8 or MAF < 0.0001.
Marker Number (after QC)	BBJ: 13,530,797 variants UK Biobank: 13,791,467 variants FinnGen: 16,859,359 variants
NBDC Dataset ID	hum0197.v3.gwas.v1 (Click the Dataset ID to download the file) Dictionary file (BBJ, EUR, META)
Total Data Volume	BBJ: ~1.5G for autosome and ~33M for chrX UK Biobank: ~1.5G for autosome and ~15M for chrX FinnGen: ~740M for autosome and ~20M for chrX
Comments (Policies)	NBDC policy

hum0197.v5.gwas.v1 / hum0197.v5.finemap.v1


Participants/Materials	Biobank Japan (n = 179,000), no. Phenotypes: 79
Targets	genome wide SNPs
Target Loci for Capture Methods	-
Platform	Illumina [HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip]
Library Source	DNAs extracted from peripheral blood cells
Cell Lines	-
Reagents (Kit, Version)	HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip
Genotype Call Methods (software)	Eagle, Minimac3
Association Analysis (software)	GWAS: For binary traits, SAIGE software was used with age, age2, sex, age×sex, age2×sex, and top 20 principal components as covariates. For quantitative traits (biomarkers), BOLT-LMM was used with the same covariates. Fine-mapping: FINEMAP and SuSiE were used with GWAS summary statistics and in-sample dosage LD, allowing up to 10 causal variants per region.
Filtering Methods	GWAS: We included imputed variants with Rsq > 0.7. For binary traits, variants with MAC < 10 were additionally excluded. Fine-mapping: We defined fine-mapping regions based on a 3 Mb window around each lead variant and merged regions if they overlapped. We excluded the major histocompatibility complex (MHC) region (chr 6: 25–36 Mb) from analysis due to extensive LD structure in the region. For each method, we only included variants from successfully fine-mapped regions while excluding those from failed regions (e.g., due to conversion failure or available memory restrictions).
Marker Number (after QC)	13,531,752 variants (ref: hg19)
NBDC Dataset ID	hum0197.v5.gwas.v1 / hum0197.v5.finemap.v1 (Click the Dataset ID to download the file) Dictionary file
Total Data Volume	14 GB
Comments (Policies)	NBDC policy

JGAS000504


Participants/Materials:	141 Japanese individuals
Targets	small RNA-seq
Target Loci for Capture Methods	-
Platform	Illumina [HiSeq 2500]
Library Source	RNAs extracted from PBMC
Cell Lines	-
Library Construction (kit name)	SMARTer smRNA-Seq Kit
Fragmentation Methods	-
Spot Type	Single-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers)	100 bp
Mapping Methods	bowtie (GRCh37)
Detecting method for read count (software)	featureCounts + miRbase v22
QC	We performed adapter trimming using Cutadapt v1.8 and removed reads with a low quality score (Phred quality score < 20 in >20% of total bases) using fastp v0.20.0. Also, we removed reads with a length of >29 bp or <15 bp, which are not expected to be mature miRNAs. Mature miRNAs detected with ≥1 read in at least half of the individuals were included in the dataset.
miRNA number	343
Japanese Genotype-phenotype Archive Dataset ID	JGAD000621
Total Data Volume	54.7 KB (txt)
Comments (Policies)	NBDC policy

hum0197.v6.eqtl.v1


Participants/Materials	141 Japanese individuals
Targets	eQTL
Target Loci for Capture Methods	-
Platform	small RNA-seq: Illumina [HiSeq 2500] WGS: Illumina [HiSeq X Ten]
Library Source	read count data of JGAS000504 and whole genome sequencing data using genomic DNA exracted from whole blood
Cell Lines	-
Reagents (Kit, Version)	small RNA-seq: See JGAS000504 WGS: TruSeq DNA PCR-Free Library Preparation Kit
Genotype Call / Detecting read count Methods (software)	See JGAS000504 for read count data. WGS: Sequenced reads were aligned against the reference human genome with the decoy sequence (GRCh37, human_g1k_v37_decoy) using BWA-MEM v0.7.13.
QC	See JGAS000504 for read count data. WGS: We removed the variants (i)with low genotyping call rates (<0.90), (ii)with ExcessHet > 60 or (iii) with Hardy–Weinberg Pvalue < 1.0 × 10⁻¹⁰. Genotype refinement was performed using Beagle v5.1.
Marker Number (after QC)	See JGAS000504 for read count data. WGS: 12,171,854 variants
eQTL algorithm	We analyzed the association between genetic variants with minor allele frequency (MAF) ≥ 0.01 within a cis-window around each miRNA (±1 Mb of the mature miRNA) and normalized expression values using MatrixEQTL v2.3.
NBDC Dataset ID	hum0197.v6.eqtl.v1 (Click the Dataset ID to download the file) Dictionary file
Total Data Volume	1.1 MB (txt)
Comments (Policies)	NBDC policy

hum0197.v9.gwas.GCT.v1


Participants/Materials	Intracranial germ cell tumors cases (ICD10: C719): 133 Control participants: 762
Targets	genome wide SNPs
Target Loci for Capture Methods	-
Platform	Illumina [Infinium Asian Screening Array]
Library Source	DNAs extracted from peripheral blood cells
Cell Lines	-
Reagents (Kit, Version)	Infinium Asian Screening Array
Genotype Call Methods (software)	GenomeStudio for genotyping shapeit2 for haplotype phasing minimac3 for imputation
Association Analysis (software)	PLINK2
Filtering Methods	Sample QC: We excluded individuals (i) with genotyping call rate < 0.97, (ii) in close kinship (PI_HAT > 0.17), and (iii) estimated of non-East Asian ancestry were excluded. Variant QC: We excluded variants with (i) genotyping call rate < 0.99, (ii) minor allele count < 5, (iii) P value for Hardy–Weinberg equilibrium < 1.0 × 10−5 in controls, and (iv) > 10% allele frequency difference with the imputation reference panel or the allele frequency panel of Tohoku Medical Megabank Project. Post-imputation QC: We excluded imputed variants with Rsq < 0.7 and minor allele frequency < 0.5%.
Marker Number (after QC)	7,803,874 autosomal variants 181,867 X-chromosomal variants
NBDC Dataset ID	hum0197.v9.gwas.GCT.v1 (Click the Dataset ID to download the file) Dictionary file
Total Data Volume	248 MB (txt)
Comments (Policies)	NBDC policy

hum0197.v10.gwas.v1


Participants/Materials	Biobank Japan (n=161,801), UK biobank (n=377,583), no. Phenotypes: 9 Patients: Autoimmune [Rheumatoid arthritis (ICD10: M05), Graves' disease (ICD10: C719), type I diabetes mellitus (ICD10: E10)] Allergy [asthma (ICD10: J45), Atopic dermatitis (ICD10: L20), Pollinosis (ICD10: J301)] Controls: non-autoimmune +non-allergy individuals (There is overlap among patients in each disease category)
Targets	genome wide SNPs
Target Loci for Capture Methods	-
Platform	BBJ: Illumina [HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip] UK Biobank: Applied Biosystems [UK BiLEVE Axiom Array, UK Biobank Axiom Array]
Library Source	DNAs extracted from peripheral blood cells
Cell Lines	-
Reagents (Kit, Version)	BBJ: HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip UK Biobank: UK BiLEVE Axiom Array, UK Biobank Axiom Array
Genotype Call Methods (software)	BBJ: Eagle, Minimac3 UK Biobank: IMPUTE4
Association Analysis (software)	SAIGE software was used with age, sex, and top five principal components as covariates. RE2C software was used for the multi-trait meta-analysis adjusting for sample overlap between GWAS summary data.
Filtering Methods	We excluded the variants with Rsq < 0.7 and MAF < 0.005.
Marker Number (after QC)	BBJ: 8,374,220 autosomal variants for individual trait / 8,369,174 autosomal variants for meta-analysis UKB: 10,864,380 autosomal variants for individual trait / 10,858,065 autosomal variants for meta-analysis BBJ + UK Biobank: 5,965,154 autosomal variants for meta-analysis
NBDC Dataset ID	hum0197.v10.gwas.v1 (Click the Dataset ID to download the files) Dictionary file
Total Data Volume	BBJ: ~ 760MB for individual trait / ~ 430MB for multi-trait meta-analysis UK Biobank: ~ 1.1GB for individual trait / ~ 550MB for multi-trait meta-analysis BBJ+UK Biobank: ~ 310MB for multi-trait meta-analysis
Comments (Policies)	NBDC policy

JGAS000543 / JGAS000593


Participants/Materials	COVID-19 (ICD10: U071) : 30 + 43 cases Healthy controls : 31 + 44 individuals
Targets	scRNA-seq
Target Loci for Capture Methods	-
Platform	Illumina [NovaSeq 6000]
Library Source	RNAs extracted from PBMC
Cell Lines	-
Library Construction (kit name)	Chromium Next GEM Single Cell 5’ Library & Gel Bead Kit v1.1, Chromium Next GEM Chip G Single Cell Kit, Single Index Kit T Set A
Fragmentation Methods	Enzymatic fragmentation
Spot Type	Paired-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers)	91 bp
NBDC Dataset ID	JGAD000662 JGAD000722
Total Data Volume	1.3 + 2.0 TB （fastq, xlsx [clinical data]）
Comments (Policies)	NBDC policy

hum0197.v12.MAG.v1


Participants/Materials:	Japanese gut microbiome JGAS000205 / JGAS000260 / JGAS000316 / JGAS000415 / JGAS000530 / JGAS000531, Public data (DRA006684)
Targets	Metagenome
Target Loci for Capture Methods	-
Platform	Illumina [HiSeq 2500/3000, NovaSeq 6000]
Library Source	JGAS000205 / JGAS000260 / JGAS000316 / JGAS000415 / JGAS000530 / JGAS000531, DRA006684
Cell Lines	-
MAG methods	De novo assembly with metaspades was performed. Then, binning with dastools (metabat2、maxbin2、concoct) was applied.
JDDBJ Sequence Read Archive ID	JGA MAG: 20220531NSUB000031HIGH_JGA_JMAG_GENOME_*.acclist.txt DRA014186 (JGAS000205 / JGAS000260 / JGAS000316 / JGAS000415 / JGAS000530 / JGAS000531) DRA014188 (JGAS000205 / JGAS000260 / JGAS000316 / JGAS000415 / JGAS000530 / JGAS000531) DRA014191 (JGAS000205 / JGAS000260 / JGAS000316 / JGAS000415 / JGAS000530 / JGAS000531) DRA014192 (JGAS000205 / JGAS000260 / JGAS000316 / JGAS000415 / JGAS000530 / JGAS000531) TPA MAG: EMNX01000001-EMNX01000025、EMNY01000001-EMNY01000068、EMNZ01000001-EMNZ01000149, EMOA01000001-EMOA01000067 DRA014184 (DRA006684)
Total Data Volume	JGA MAG: 153 GB (fasta) DRA014186: 11.5 GB (fasta) DRA014188: 11.9 GB (fasta) DRA014191: 12.2 GB (fasta) DRA014192: 5.75 GB (fasta) TPA MAG: 11.9 MB (fasta) DRA014184: 3.65 GB (fasta)
Comments (Policies)	NBDC policy

hum0197.v12.VIRUS.v1


Participants/Materials:	Japanese gut microbiome JGAS000205 / JGAS000260 / JGAS000316 / JGAS000415 / JGAS000530 / JGAS000531, Public data (DRA006684)
Targets	NGS (WGS)
Target Loci for Capture Methods	-
Platform	Illumina [HiSeq 2500/3000, NovaSeq 6000]
Library Source	JGAS000205 / JGAS000260 / JGAS000316 / JGAS000415 / JGAS000530 / JGAS000531, DRA006684
Cell Lines	-
Virus genome contsruction	De novo assembly with metaspades was performed. Then, viral contigs were detected with virfinder and virsorter.
JDDBJ Sequence Read Archive ID	JGAS000205 / JGAS000260 / JGAS000316 / JGAS000415 / JGAS000530 / JGAS000531: BRDB01000001-BRDB01028816 DRA006684: EMNW01000001-EMNW01002579
Total Data Volume	JGAS000205 / JGAS000260 / JGAS000316 / JGAS000415 / JGAS000530 / JGAS000531: 1.09 GB (fasta) DRA006684: 98.3 MB (fasta)
Comments (Policies)	NBDC policy

hum0197.v12.CRISPR.v1


Participants/Materials:	Japanese gut microbiome JGAS000205 / JGAS000260 / JGAS000316 / JGAS000415 / JGAS000530 / JGAS000531, Public data (DRA006684)
Targets	NGS (WGS)
Target Loci for Capture Methods	-
Platform	Illumina [HiSeq 2500/3000, NovaSeq 6000]
Library Source	JGAS000205 / JGAS000260 / JGAS000316 / JGAS000415 / JGAS000530 / JGAS000531, DRA006684
Cell Lines	-
CRISPR contsruction	MINCED was applied to the MAGs.
DDBJ Sequence Read Archive ID	DRA014186 (JGAS000205 / JGAS000260 / JGAS000316 / JGAS000415 / JGAS000530 / JGAS000531) DRA014184 (DRA006684)
Total Data Volume	DRA014184: 17.9 MB (fasta) DRA014186: 1.43 MB (fasta)
Comments (Policies)	NBDC policy

JGAS000600


Participants/Materials:	88 Japanese individuals (shotgun sequencing) 73 healthy individuals (shotgun sequencing) - DNA extraction was performed with phenol-chloroform extraction: 73 samples - DNA extraction with DNeasy PowerSoil Pro kit: 47 samples 5 Japanese individuals (deep shotgun sequencing)
Targets	Metagenome
Target Loci for Capture Methods	-
Platform	Illumina [HiSeq 3000, NovaSeq 6000]
Library Source	DNA extracted from gut microbiome
Cell Lines	-
Library Construction (kit name)	KAPA Hyper Prep Kit
Fragmentation Methods	Ultrasonic fragmentation (Covaris)
Spot Type	Paired-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers)	150 bp
Japanese Genotype-phenotype Archive Dataset ID	JGAD000729
Total Data Volume	2.6 TB（fastq）
Comments (Policies)	NBDC policy

hum0197.v16.gwas.v1


Participants/Materials	BioBank Japan (n=180,215), UK Biobank (n=377,441) large-scale meta-analysis including the summary statistics of other cohorts (FinnGen, BCAC, and PRACTICAL) for breast and prostate cancer (n=648,746 and 482,080) no. Phenotypes: 15 Patients: biliary tract (ICD10: C22.1, 23-24), breast (ICD10: C509, cervical (ICD10: C53), colorectal (ICD10: C18-20), endometrial (ICD10: C54), esophageal (ICD10: C15), gastric (ICD10: C16), hepatocellular (ICD10: C22.0), lung (ICD10: C34), non-Hodgkin's lymphoma (ICD10: C82-83), ovarian (ICD10: C56), pancreatic (ICD10: C25), and prostate (ICD10: C61) cancer Controls: without cancer individuals (There is overlap among patients in each disease category)
Targets	genome wide SNPs
Target Loci for Capture Methods	-
Platform	BBJ: Illumina [HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip] UK Biobank: Applied Biosystems [UK BiLEVE Axiom Array, UK Biobank Axiom Array] FinnGen: Thermo Fisher Scientific [FinnGen1 ThermoFisher Array or other genotyping arrays] BCAC: Illumina [iCOGS OncoArray] PRACTICAL: Illumina [iCOGS OncoArray]
Library Source	DNAs extracted from peripheral blood cells
Cell Lines	-
Reagents (Kit, Version)	BBJ: HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip UK Biobank: UK BiLEVE Axiom Array, UK Biobank Axiom Array FinnGen: FinnGen1 ThermoFisher Array or other genotyping arrays BCAC: Infinium OncoArray-500K v1.0 BeadChip Kit PRACTICAL: Infinium OncoArray-500K v1.0 BeadChip Kit
Genotype Call Methods (software)	BBJ: Eagle, Minimac3 UK Biobank: IMPUTE4 FinnGen: beagle4.1 BCAC: IMPUTE2 PRACTICAL: IMPUTE2
Association Analysis (software)	SAIGE software was used with age, sex, and top five principal components as covariates. RE2C software was used for the multi-trait meta-analysis adjusting for sample overlap between GWAS summary data.
Filtering Methods	Sample QC and Variant QC for each dataset: refer to ReadMe file We excluded the variants with Rsq < 0.7 and MAF < 0.01.
Marker Number (after QC)	BBJ: 13MN (7,398,798) , each cancer (7,442,557 (7,420,485-7,444,681)) UK Biobank: 13MN (9,602,853), each cancer (9,620,786　 (9,620,343-9,620,935)) BBJ + UK Biobank: 13MN (5,374,018), each cancer (5,696,155 (5,677,934-5,698,357)) BBJ + UK Biobank + FinnGen + BCAC (breast cancer): 5,104,756 BBJ + UK Biobank + FinnGen + PRACTICAL (prostate cancer): 5,105,796 BBJ + UK Biobank + FinnGen + BCAC + PRACTICAL (breast cancer + prostate cancer): 5,100,089 *mean (min-max) for each cancer
NBDC Dataset ID	hum0197.v16.gwas.v1 (Click the Dataset ID to download the files) Dictionary file
Total Data Volume	BBJ: 13MN (287 MB), each cancer (625 (605-633) MB) UK Biobank: 13MN (362 MB), each cancer (841 (814-859) MB) BBJ + UK Biobank: 13MN (202 MB), each cancer (260 (255-264) MB) BBJ + UK Biobank + FinnGen + BCAC (breast cancer): 242 MB BBJ + UK Biobank + FinnGen + PRACTICAL (prostate cancer): 243 MB BBJ + UK Biobank + FinnGen + BCAC + PRACTICAL (breast cancer + prostate cancer): 253 MB *mean (min-max) for each cancer
Comments (Policies)	NBDC policy

hum0197.v17.hic-gwas.v1


Participants/Materials	Hunner-type interstitial cystitis cases (ICD10: N301): 144 Control participants: 41,516
Targets	genome wide SNPs
Target Loci for Capture Methods	-
Platform	Illumina [Infinium Asian Screening Array]
Library Source	DNAs extracted from peripheral blood cells
Cell Lines	-
Reagents (Kit, Version)	Infinium Asian Screening Array
Genotype Call Methods (software)	GenomeStudio for genotyping shapeit4 for haplotype phasing minimac4 for imputation
Association Analysis (software)	SAIGE
Filtering Methods	Sample QC: We excluded individuals with low genotyping call rates (call rate < 98%). We included individuals of the estimated Japanese ancestry using PCA. Variant QC: We excluded variants with (1) genotyping call rate < 99%, (2) minor allele count < 5, (3) P-value for Hardy–Weinberg equilibrium < 1.0 × 10^−10, and (4) > 5% allele frequency difference compared with the imputation reference panel or the allele frequency panel of Tohoku Medical Megabank Project. Post-imputation QC: We excluded imputed variants with Rsq < 0.7 and minor allele frequency < 0.5%.
Marker Number (after QC)	7,909,790 variants (hg19)
NBDC Dataset ID	hum0197.v17.hic-gwas.v1 (Click the Dataset ID to download the file) Dictionary file
Total Data Volume	700 MB (txt)
Comments (Policies)	NBDC policy

hum0197.v18.gwas.v1


Participants/Materials	524 Japanese individuals　(423 species in the gut microbiome) 306 Japanese individuals　(306 plasma metabolites) 524 Japanese individuals　(KEGG Gene Ortholog and KEGG Pathway)
Targets	genome wide SNPs
Target Loci for Capture Methods	-
Platform	SNP array: Illumina [Infinium Asian Screening Array] Whole genome sequencing: Illumina [HiSeq X Ten] Metagenome shotgun sequencing: Illumina [HiSeq 2500/3000、NovaSeq 6000]
Library Source	DNAs extracted from peripheral blood cells
Cell Lines	-
Reagents (Kit, Version)	SNP array: Infinium Asian Screening Array Whole genome sequencing: TruSeq DNA PCR-Free Library Preparation Kit Metagenome shotgun sequencing: KAPA Hyper Prep Kit
Genotype Call Methods (software)	SNP array: Genotyping: GenomeStudio Haplotype phasing: shapeit4 Imputation: minimac4 WGS: WA-MEM v0.7.13 + GATK v3.8-0
Association Analysis (software)	PLINK2
Filtering Methods	SNP array data: Sample QC: We excluded individuals with low genotyping call rates (call rate < 98%). We included individuals of the estimated Asian ancestry using PCA. Variant QC: We excluded variants with (1) genotyping call rate < 99%, (2) minor allele count < 5, (3) P-value for Hardy–Weinberg equilibrium < 1.0 × 10^−10, and (4) > 5% allele frequency difference compared with the imputation reference panel or the allele frequency panel of Tohoku Medical Megabank Project. Post-imputation QC: We excluded imputed variants with Rsq < 0.7 and minor allele frequency < 1%. WGS: We excluded variants with genotype call rate <90%, ExcessHet > 60, Hardy-Weinberg P<1.0×10−10 After imputation with Beagle v5.1, we excluded imputed variants with minor allele frequency < 1%.
Marker Number (after QC)	Gut microbiome/KEGG (SNP array): 7,213,470 variants (hg19) Metabolome (WGS): 6,840,258 variants (GRCh37)
NBDC Dataset ID	hum0197.v18.gwas.v1 (Gut microbiome, Plasma metabolites, KEGG ) (Click the link above to download the files) Dictionary file
Total Data Volume	Gut microbiome: 206 GB Metabolome: 90.7 GB KEGG: 300 MB
Comments (Policies)	NBDC policy

hum0197.v19.prs.v1


Participants/Materials	BioBank Japan Type 2 diabetes (ICD10: E11): 27,642 cases Control participants: 70,242 UK Biobank Type 2 diabetes (ICD10: E11): 27,642 cases Control participants: 70,242
Targets	genome wide SNPs
Target Loci for Capture Methods	-
Platform	BBJ: Illumina [HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip] UK Biobank: Applied Biosystems [UK BiLEVE Axiom Array, UK Biobank Axiom Array]
Library Source	DNAs extracted from peripheral blood cells
Cell Lines	-
Reagents (Kit, Version)	BBJ: HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip UK Biobank: UK BiLEVE Axiom Array, UK Biobank Axiom Array
Genotype Call Methods (software)	plink2
Association Analysis (software)	BBJ: Eagle, Minimac3 UK Biobank: IMPUTE4
Filtering Methods	Variants with imputation quality of Rsq < 0.3 or minor allele frequency (MAF) < 1% were excluded The details are described below https://doi.org/10.1038/s41588-024-01782-y
Marker Number (after QC)	BBJ second cohort: 728,824 variants ToMMo: 855,161 variants
NBDC Dataset ID	hum0197.v19.prs.v1 (Click the Dataset ID to download the file) Dictionary file
Total Data Volume	180 MB (txt)
Comments (Policies)	NBDC policy

hum0197.v20.gwas.v1


Participants/Materials	Recurrent pregnancy loss cases (ICD10: N96): 1,728 Control participants: 24,315
Targets	genome wide SNPs
Target Loci for Capture Methods	-
Platform	Illumina [Infinium Asian Screening Array]
Library Source	DNAs extracted from peripheral blood cells
Cell Lines	-
Reagents (Kit, Version)	Infinium Asian Screening Array
Genotype Call Methods (software)	Genotyping: GenomeStudio Haplotype phasing: shapeit4 Imputation: minimac4
Association Analysis (software)	SAIGE
Filtering Methods	Sample QC: We excluded individuals with low genotyping call rates (call rate < 98%). We included individuals of the estimated Japanese ancestry using PCA. Variant QC: We excluded variants with (1) genotyping call rate < 99%, (2) minor allele count < 5, (3) P-value for Hardy–Weinberg equilibrium < 1.0 × 10^−10, and (4) > 5% allele frequency difference compared with the imputation reference panel or the allele frequency panel of Tohoku Medical Megabank Project. Post-imputation QC: We excluded imputed variants with Rsq < 0.7 and minor allele frequency < 0.5%.
Marker Number (after QC)	8,717,430 variants（hg19）
NBDC Dataset ID	hum0197.v20.gwas.v1 (Click the Dataset ID to download the file) Dictionary file
Total Data Volume	465 MB (txt)
Comments (Policies)	NBDC policy

JGAS000741 (WGS)


Participants/Materials	Autoimmune diseases (ICD10: L400, M0690, M329, J840, G35): 2,238 cases Control participants: 2,919
Targets	WGS
Target Loci for Capture Methods	-
Platform	Illumina [NovaSeq 6000/HiSeq X Ten]
Library Source	DNAs extracted from peripheral blood cells
Cell Lines	-
Library Construction (kit name)	TruSeq DNA PCR-free Library Prep kit
Fragmentation Methods	Ultrasonic fragmentation
Spot Type	Paired-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers)	150 bp x 2
Methods for removing host sequence/detecting viral sequence (software)	https://github.com/shohei-kojima/integrated_HHV6_recon https://github.com/shohei-kojima/human_anellovirus_detection
QC	We conducted principal component analysis (PCA) against HapMap3 data using SNP data of the same individuals to confirm the East Asian genetic background.
Reference sequence for viral genome	Refer to the softwares' GitHub repositry.
Japanese Genotype-phenotype Archive Dataset ID	JGAD000876
Total Data Volume	181.3 GB (fastq)
Comments (Policies)	NBDC policy

JGAS000741 (scRNA-seq)


Participants/Materials	Systemic lupus erythematosus (ICD10: M329): 8 cases eHHV-6B-positive: 3 cases eHHV-6B-negative: 5 cases
Targets	scRNA-seq
Target Loci for Capture Methods	-
Platform	Illumina [NovaSeq 6000]
Library Source	RNAs extracted from PBMC
Cell Lines	-
Library Construction (kit name)	Chromium Next GEM Single Cell 5’ Library & Gel Bead Kit v1.1, Chromium Next GEM Chip G Single Cell Kit, Single Index Kit T Set A
Fragmentation Methods	Enzymatic fragmentation
Spot Type	Paired-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers)	91 bp
NBDC Dataset ID	JGAD000876
Total Data Volume	181.3 GB (fastq)
Comments (Policies)	NBDC policy

hum0197.v21.gwas-ehhv6.v1


Participants/Materials	Autoimmune diseases (ICD10: L400, M0690, M329, J840, G35): 238 cases eHHV-6B-positive: 22 cases eHHV-6B-negative: 216 cases
Targets	genome wide SNPs
Target Loci for Capture Methods	-
Platform	Illumina [NovaSeq 6000/HiSeq X Ten]
Library Source	DNAs extracted from peripheral blood cells
Cell Lines	-
Reagents (Kit, Version)	TruSeq DNA PCR-free Library Prep kit/span>
Genotype Call Methods (software)	The FASTQ reads were aligned to T2T-CHM13v2.0 with BWA-MEM (v0.7.27), followed by GATK4 MarkDuplicates and Base Quality Score Recalibration (v4.2.6.1) according to the GATK Best Practice. Then, we performed per-sample SNP and indel calling using GATK4 HaplotypeCaller and joint genotyping using GATK4 GenomicsDBImport and GenotypeGVCF. We conducted LD-based genotype refinement for low-confidence genotypes and missing sites in WGS data using BEAGLE v5.4 with default settings.
Association Analysis (software)	PLINK v2.0 software was used with top two principal components and sex as covariates.
Filtering Methods	Sample QC: Individuals were excluded if they showed conflicting sex assignments between genetically inferred sex by variants and WGS coverage, deviating heterozygosity rate (±3 standard deviations), or cryptic relatedness (pi-hat > 0.2). We included samples of the estimated Japanese ancestry using PCA. Four cases were excluded. Variant QC: We excluded (1) non-autosomal variants, (2) multi-allelic sites and spanning deletions, and (3) variants with P-value for Hardy?Weinberg equilibrium < 1e-10 in cases and < 1e-6 in controls.
Marker Number (after QC)	6,464,509 SNPs
NBDC Dataset ID	hum0197.v21.gwas-ehhv6.v1 (Click the Dataset ID to download the file) Dictionary file
Total Data Volume	416 MB (tsv)
Comments (Policies)	NBDC policy

hum0197.v21.gwas-jomon.v1


Participants/Materials	The first cohort of Biobank Japan (n = 171,287)
Targets	genome wide SNPs
Target Loci for Capture Methods	-
Platform	Illumina [HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip]
Library Source	DNAs extracted from peripheral blood cells
Cell Lines	-
Reagents (Kit, Version)	HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip
Genotype Call Methods (software)	Eagle, Minimac3
Association Analysis (software)	1) GCTA-fastGWA with the adjustment of covariates: age, age2, sex, the top 20 PCs, 45 disease status, geographic regions, and PCA clusters. 2) Fixed-effect meta-analysis of Mainland summary data including individuals from the Mainland and EA_admix clusters (n = 151,075) and of Ryukyu summary data including individuals from the Ryukyu, Ryukyu admix, and Hokkaido_sub clusters (n = 10,080) using METAL.
Filtering Methods	Sample QC: We excluded (i) individuals with lower call rates (< 99%), (ii) closely related individuals with genetic relatedness ≥ 0.178 calculated from a genetic related matrix (GRM) by GCTA (version 1.93.3beta2). We included samples of the estimated Japanese ancestry using PCA. Variant QC: We excluded variants with (i) call rate < 99%, (ii) P value for Hardy-Weinberg equilibrium (HWE) < 1.0 × 10-6, (iii) number of heterozygotes < 5, and (iv) a concordance rate < 99.5% or a non-reference concordance rate between GWAS array and whole genome sequencing. after association test: Double genomic control correction method using METAL was conducted. Computing Z score for each variant by considering the sign of the beta coefficient and the associated p-value, we left the variants with positive Z score.
Marker Number (after QC)	3,454,970 SNPs
NBDC Dataset ID	hum0197.v21.gwas-jomon.v1 (Click the Dataset ID to download the file) Dictionary file
Total Data Volume	65 MB (txt)
Comments (Policies)	NBDC policy

DATA PROVIDER

Principal Investigator: Yukinori Okada

Affiliation: Department of Statistical Genetics, Osaka University Graduate School of Medicine

Project / Group Name： -

Funds / Grants (Research Project Number):

Name	Title	Project Number
Precursory Research for Innovative Medical care (PRIME), Advanced Research & Development Programs for Medical Innovation, Japan Agency for Medical Research and Development (AMED)	Crosstalk among microbiome, host, disease, and drug discovery enhanced by statistical genetics	JP19gm6010001
FORCE, Advanced Research & Development Programs for Medical Innovation, Japan Agency for Medical Research and Development (AMED)	Elucidation of disease-specific microbiota and personalized medicine by metagenome-wide association studies	JP20gm4010006
Practical Research Project for Rare / Intractable Diseases, Japan Agency for Medical Research and Development (AMED)	Biology and in silico drug repositioning of pulmonary alveolar proteinosis using trans-layer omics analysis	JP20ek0109413
Practical Research Project for Allergic Diseases and Immunology, Japan Agency for Medical Research and Development (AMED)	Nucleic genome drug discovery for autoimmune diseases through in-silico and patient-oriented screening utilizing large-scale disease genetics	JP19ek0410041
Practical Research Project for Allergic Diseases and Immunology, Japan Agency for Medical Research and Development (AMED)	Genomic prediction medicine of rheumatoid arthritis based on comprehensive immune-omics resources	JP21ek0410075
Platform Program for Promotion of Genome Medicine, Japan Agency for Medical Research and Development (AMED)	Implementation of genomic prediction medicine based on statistical genetics	JP21km0405211
Platform Program for Promotion of Genome Medicine, Japan Agency for Medical Research and Development (AMED)	Next-generation genomics analyses elucidates biology, personalized medicine, and drug discovery of psoriasis	JP21km0405217
KAKENHI Grant-in-Aid for Scientific Research (A)	Elucidation of disease biology and tissue specificity by trans-layer omics analysis and whole-genome sequencing	19H01021
KAKENHI Grant-in-Aid for Scientific Research (A)	Elucidation of immune and allergic disease dynamics by integrative sequencing analysis	22H00476

PUBLICATIONS

	Title	DOI	Dataset ID
1	Metagenome-wide association study of gut microbiome revealed novel aetiology of rheumatoid arthritis in the Japanese population.	doi: 10.1136/annrheumdis-2019-215743	JGAD000290
2	Genetic determinants of risk in autoimmune pulmonary alveolar proteinosis.	doi: 10.1038/s41467-021-21011-y	hum0197.v2.gwas.v1
3	A metagenome-wide association study of gut microbiome in patients with multiple sclerosis revealed novel disease pathology.	doi: 10.3389/fcimb.2020.585973	JGAD000363
4	A global atlas of genetic associations of 220 deep phenotypes	doi: 10.1101/2020.10.23.20213652	hum0197.v3.gwas.v1
5	Metagenome-wide association study revealed disease-specific landscape of the gut microbiome of systemic lupus erythematosus in Japanese	doi: 10.1136/annrheumdis-2021-220687	JGAD000427
6	Whole gut virome analysis of 476 Japanese revealed a link between phage and autoimmune disease	doi: 10.1136/annrheumdis-2021-221267	JGAD000532
7	Insights from complex trait fine-mapping across diverse populations	doi: 10.1101/2021.09.03.21262975	hum0197.v5.gwas.v1 hum0197.v5.finemap.v1
8	Genetic architecture of microRNA expression and its link to complex diseases in the Japanese population.	doi: 10.1093/hmg/ddab361	JGAD000621 hum0197.v6.eqtl.v1
9	Multi-trait and cross-population genome-wide association studies across autoimmune and allergic diseases identify shared and distinct genetic components.	doi: 10.1136/annrheumdis-2022-222460	hum0197.v10.gwas.v1
10	DOCK2 is involved in the host genetics and biology of severe COVID-19	doi: 10.1038/s41586-022-05163-5	JGAD000662
11	Prokaryotic and viral genomes recovered from 787 Japanese gut metagenomes revealed microbial features linked to diets, populations, and diseases	doi: 10.1016/j.xgen.2022.100219	hum0197.v12
12	Reconstruction of the personal information from human genome reads in gut metagenome sequencing data	doi: 10.1038/s41564-023-01381-3	JGAD000363 JGAD000427 JGAD000532 JGAD000650 JGAD000729
13	Pan-cancer and cross-population genome-wide association studies dissect shared genetic backgrounds underlying carcinogenesis	doi: 10.1038/s41467-023-39136-7	hum0197.v16.gwas.v1
14	Single-cell analyses and host genetics highlight the role of innate immune cells in COVID-19 severity	doi: 10.1038/s41588-023-01375-1	JGAD000662 JGAD000722
15	Genome-wide association analysis identifies susceptibility loci within the major histocompatibility complex region for Hunner-type interstitial cystitis	doi: 10.1016/j.xcrm.2023.101114	hum0197.v17.hic-gwas.v1
16	Analysis of gut microbiome, host genetics, and plasma metabolites reveals gut microbiome-host interactions in the Japanese population	doi: 10.1016/j.celrep.2023.113324	hum0197.v18.gwas.v1
17	Body mass index stratification optimizes polygenic prediction of type 2 diabetes in cross-biobank analyses	doi: 10.1038/s41588-024-01782-y	hum0197.v19.prs.v1
18	Common and rare genetic variants predisposing females to unexplained recurrent pregnancy loss	doi: 10.1038/s41467-024-49993-5	hum0197.v20.gwas.v1
19	Blood DNA virome associates with autoimmune diseases and COVID-19		hum0197.v21.gwas-ehhv6.v1 JGAD000876
20	Genetic Legacy of Ancient Hunter-Gatherer Jomon in Japanese Populations		hum0197.v21.gwas-jomon.v1

USRES (Controlled-access Data)

Principal Investigator	Affiliation	Country/Region	Research Title	Data in Use (Dataset ID)	Period of Data Use
Ilana Brito	Meinig School of Biomedical Engineering, Cornell University	United States of America	Comparative metagenomics of lupus patients' microbiomes	JGAD000290, JGAD000363, JGAD000427, JGAD000532	2022/05/12-2024/05/04
Yongxin Li	Department of Chemistry, The University of Hong Kong	Hong Kong	Comparison of gut bacterial diversity and composition in MS/EAE	JGAD000363	2022/09/19-2024/07/01
Tina Fuchs	Institute for Clinical Chemistry, Medical Faculty Mannheim, Heidelberg University	Germany	Investigating the clonality of VIREM cells in COVID-19 patients	JGAD000662, JGAD000772	2024/02/26-2024/12/31
Koichi Matsuda	Department of Computational Biology and Medical Sciences, Graduate school of Frontier Sciences, The University of Tokyo	Japan	Disease Cohort Research Network for Disease Marker Exploratory Studies	JGAD000290, JGAD000363, JGAD000427, JGAD000532, JGAD000649, JGAD000650, JGAD000662, JGAD000722, JGAD000729	2024/06/17-2029/03/31