NBDC Research ID: hum0197.v10

SUMMARY

Aims: Elucidation of disease biology based on trans-omics analysis, GWAS in the Japanese and trans-ethnic populations

Methods: Metagenome shotgun sequencing, genome-wide association study (GWAS), small RNA-seq and eQTL analyses

Participants/Materials:

Metagenomic data of gut microbiome in the Japanese population (95 + 103 + 227 + 30 + 136 individuals)

Autoimmune pulmonary alveolar proteinosis cases: 198, Control participants: 395

Populations: Biobank Japan (n = 179,000), UK biobank (n = 361,000), ans FinnGen (n = 136,000), Phenotypes: 215

141 Japanese individuals

Metagenomic data of gut microbiome in Inflammatory Bowel Disease (35 Ulcerative Colitis and 39 Crohn's disease) and 40 Healthy controls

Intracranial germ cell tumors cases: 133, Control participants: 762

Populations: Biobank Japan (n = 161,801) and UK biobank (n = 377,583), Phenotypes: 9

Data Set ID	Type of Data	Criteria	Release Date
JGAS000205	Metagenome	Controlled Access (Type I)	2019/11/15
hum0197.v2.gwas.v1	GWAS for autoimmune pulmonary alveolar proteinosis	Un-restricted Access	2020/11/27
JGAS000260	Metagenome	Controlled Access (Type I)	2020/11/27
hum0197.v3.gwas.v1	GWAS for 215 phenotypes	Un-restricted Access	2021/03/22
JGAS000316	Metagenome	Controlled Access (Type I)	2021/10/12
JGAS000415	Metagenome	Controlled Access (Type I)	2021/12/10
hum0197.v5.gwas.v1	GWAS for 10 phenotypes	Un-restricted Access	2021/12/21
hum0197.v5.finemap.v1	Fine-mapping for 79 phenotypes	Un-restricted Access	2021/12/21
JGAS000504	Read count data of miRNA	Controlled Access (Type I)	2022/02/08
hum0197.v6.eqtl.v1	eQTL data	Un-restricted Access	2022/02/08
JGAS000530	Metagenome	Controlled Access (Type I)	2022/05/23
JGAS000531	Metagenome	Controlled Access (Type I)	2022/06/03
hum0197.v9.gwas.GCT.v1	GWAS for intracranial germ cell tumors	Un-restricted Access	2022/06/10
hum0197.v10.gwas.v1	GWAS for 9 phenotypes	Un-restricted Access	2022/06/16

*Release Note

*Data users need to apply an application for Using NBDC Human Data to reach the Controlled-access Data. Learn more

*When the research results including the data which were downloaded from NHA/DRA, are published or presented somewhere, the data user must refer the papers which are related to the data, or include in the acknowledgment. Learn more

MOLECULAR DATA

JGAS000205/JGAS000260/JGAS000316/JGAS000415/JGAS000530/JGAS000531


Participants/Materials:	95+103+227+30+136 Japanese individuals Inflammatory Bowel Disease 35 Ulcerative Colitis 39 Crohn's disease 40 Healthy controls
Targets	Metagenome
Target Loci for Capture Methods	-
Platform	Illumina [HiSeq 3000]
Library Source	DNA extracted from gut microbiome
Cell Lines	-
Library Construction (kit name)	KAPA Hyper Prep Kit
Fragmentation Methods	Ultrasonic fragmentation (Covaris)
Spot Type	Paired-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers)	150 bp
Japanese Genotype-phenotype Archive Data set ID	JGAD000290 (95 Japanese individuals) JGAD000363 (103 Japanese individuals) JGAD000427 (227 Japanese individuals) JGAD000532 (30 Japanese individuals) JGAD000649 (Inflammatory Bowel Disease) JGAD000650 (136 Japanese individuals)
Total Data Volume	JGAD000290：477 GB（fastq） JGAD000363：408 GB（fastq） JGAD000427：881.2 GB（fastq） JGAD000532：106.7 GB（fastq） JGAD000649：374.6 GB （fastq） JGAD000650：541.4 GB（fastq）
Comments (Policies)	NBDC policy

hum0197.v2.gwas.v1


Participants/Materials	Autoimmune pulmonary alveolar proteinosis cases (ICD10: J840): 198 Control participants: 395
Targets	genome wide SNPs
Target Loci for Capture Methods	-
Platform	Illumina [Infinium Asian Screening Array]
Library Source	DNAs extracted from peripheral blood cells
Cell Lines	-
Reagents (Kit, Version)	Infinium Asian Screening Array
Genotype Call Methods (software)	GenomeStudio for genotyping, shapeit2 for haplotype phasing, and minimac3 for imputation
Association Analysis (software)	PLINK2
Filtering Methods	Sample QC: We excluded samples with low genotyping call rates (call rate < 98%) and in close genetic relation (PI_HAT > 0.175). We included samples of the estimated East Asian ancestry. Variant QC: We excluded variants with (1) genotyping call rate < 98%, (2) P value for Hardy–Weinberg equilibrium < 1.0 × 10⁻⁶, and (3) minor allele count < 5, or (4) > 10% frequency difference with the imputation reference panel.
Marker Number (after QC)	12,153,232 autosomal variants and 242,876 X-chromosomal variants after QC.
NBDC Data Set ID	hum0197.v2.gwas.v1 (Click the Data Set ID to download the file) Dictionary file
Total Data Volume	390MB for autosome (txt.gz) and 19MB for X chromosome (txt.gz)
Comments (Policies)	NBDC policy

hum0197.v3.gwas.v1


Participants/Materials	Biobank Japan (n = 179,000), UK biobank (n = 361,000), FinnGen (n = 136,000), no. Phenotypes: 215
Targets	genome wide SNPs
Target Loci for Capture Methods	-
Platform	BBJ: Illumina [HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip] UK Biobank: Applied Biosystems [UK BiLEVE Axiom Array, UK Biobank Axiom Array] FinnGen: Thermo Fisher Scientific [FinnGen1 ThermoFisher Array or other genotyping arrays]
Library Source	DNAs extracted from peripheral blood cells
Cell Lines	-
Reagents (Kit, Version)	BBJ: HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip UK Biobank: UK BiLEVE Axiom Array, UK Biobank Axiom Array FinnGen: FinnGen1 ThermoFisher Array or other genotyping arrays
Genotype Call Methods (software)	BBJ: Eagle, Minimac3 UK Biobank: IMPUTE4 FinnGen: beagle4.1
Association Analysis (software)	For binary traits, SAIGE software was used with age, age2, sex, age×sex, age2×sex, and top 20 principal components as covariates. For quantitative traits (biomarkers), BOLT-LMM or plink software was used with the same covariates.
Filtering Methods	BBJ: We included imputed variants with Rsq > 0.7. UK Biobank: We excluded the variants with (i) INFO score ≤ 0.8, (ii) MAF ≤ 0.0001 (except for missense and protein-truncating variants annotated by VEP, which were excluded if MAF ≤ 1 × 10-6), and (iii) PHWE ≤ 1 × 10-10. FinnGen: We excluded variants with an imputation INFO score < 0.8 or MAF < 0.0001.
Marker Number (after QC)	BBJ: 13,530,797 variants UK Biobank: 13,791,467 variants FinnGen: 16,859,359 variants
NBDC Data Set ID	hum0197.v3.gwas.v1 (Click the Data Set ID to download the file) Dictionary file (BBJ, EUR, META)
Total Data Volume	BBJ: ~1.5G for autosome and ~33M for chrX UK Biobank: ~1.5G for autosome and ~15M for chrX FinnGen: ~740M for autosome and ~20M for chrX
Comments (Policies)	NBDC policy

hum0197.v5.gwas.v1 / hum0197.v5.finemap.v1


Participants/Materials	Biobank Japan (n = 179,000), no. Phenotypes: 79
Targets	genome wide SNPs
Target Loci for Capture Methods	-
Platform	Illumina [HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip]
Library Source	DNAs extracted from peripheral blood cells
Cell Lines	-
Reagents (Kit, Version)	HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip
Genotype Call Methods (software)	Eagle, Minimac3
Association Analysis (software)	GWAS: For binary traits, SAIGE software was used with age, age2, sex, age×sex, age2×sex, and top 20 principal components as covariates. For quantitative traits (biomarkers), BOLT-LMM was used with the same covariates. Fine-mapping: FINEMAP and SuSiE were used with GWAS summary statistics and in-sample dosage LD, allowing up to 10 causal variants per region.
Filtering Methods	GWAS: We included imputed variants with Rsq > 0.7. For binary traits, variants with MAC < 10 were additionally excluded. Fine-mapping: We defined fine-mapping regions based on a 3 Mb window around each lead variant and merged regions if they overlapped. We excluded the major histocompatibility complex (MHC) region (chr 6: 25–36 Mb) from analysis due to extensive LD structure in the region. For each method, we only included variants from successfully fine-mapped regions while excluding those from failed regions (e.g., due to conversion failure or available memory restrictions).
Marker Number (after QC)	13,531,752 variants (ref: hg19)
NBDC Data Set ID	hum0197.v5.gwas.v1 / hum0197.v5.finemap.v1 (Click the Data Set ID to download the file) Dictionary file
Total Data Volume	14 GB
Comments (Policies)	NBDC policy

JGAS000504


Participants/Materials:	141 Japanese individuals
Targets	small RNA-seq
Target Loci for Capture Methods	-
Platform	Illumina [HiSeq 2500]
Library Source	RNAs extracted from PBMC
Cell Lines	-
Library Construction (kit name)	SMARTer smRNA-Seq Kit
Fragmentation Methods	-
Spot Type	Single-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers)	100 bp
Mapping Methods	bowtie (GRCh37)
Detecting method for read count (software)	featureCounts + miRbase v22
QC	We performed adapter trimming using Cutadapt v1.8 and removed reads with a low quality score (Phred quality score < 20 in >20% of total bases) using fastp v0.20.0. Also, we removed reads with a length of >29 bp or <15 bp, which are not expected to be mature miRNAs. Mature miRNAs detected with ≥1 read in at least half of the individuals were included in the dataset.
miRNA number	343
Japanese Genotype-phenotype Archive Data set ID	JGAD000621
Total Data Volume	54.7 KB (txt)
Comments (Policies)	NBDC policy

hum0197.v6.eqtl.v1


Participants/Materials	141 Japanese individuals
Targets	eQTL
Target Loci for Capture Methods	-
Platform	small RNA-seq: Illumina [HiSeq 2500] WGS: Illumina [HiSeq X Ten]
Library Source	read count data of JGAS000504 and whole genome sequencing data using genomic DNA exracted from whole blood
Cell Lines	-
Reagents (Kit, Version)	small RNA-seq: See JGAS000504 WGS: TruSeq DNA PCR-Free Library Preparation Kit
Genotype Call / Detecting read count Methods (software)	See JGAS000504 for read count data. WGS: Sequenced reads were aligned against the reference human genome with the decoy sequence (GRCh37, human_g1k_v37_decoy) using BWA-MEM v0.7.13.
QC	See JGAS000504 for read count data. WGS: We removed the variants (i)with low genotyping call rates (<0.90), (ii)with ExcessHet > 60 or (iii) with Hardy–Weinberg Pvalue < 1.0 × 10⁻¹⁰. Genotype refinement was performed using Beagle v5.1.
Marker Number (after QC)	See JGAS000504 for read count data. WGS: 12,171,854 variants
eQTL algorithm	We analyzed the association between genetic variants with minor allele frequency (MAF) ≥ 0.01 within a cis-window around each miRNA (±1 Mb of the mature miRNA) and normalized expression values using MatrixEQTL v2.3.
NBDC Data Set ID	hum0197.v6.eqtl.v1 (Click the Data Set ID to download the file) Dictionary file
Total Data Volume	1.1 MB (txt)
Comments (Policies)	NBDC policy

hum0197.v9.gwas.GCT.v1


Participants/Materials	Intracranial germ cell tumors cases (ICD10: C719): 133 Control participants: 762
Targets	genome wide SNPs
Target Loci for Capture Methods	-
Platform	Illumina [Infinium Asian Screening Array]
Library Source	DNAs extracted from peripheral blood cells
Cell Lines	-
Reagents (Kit, Version)	Infinium Asian Screening Array
Genotype Call Methods (software)	GenomeStudio for genotyping shapeit2 for haplotype phasing minimac3 for imputation
Association Analysis (software)	PLINK2
Filtering Methods	Sample QC: We excluded individuals (i) with genotyping call rate < 0.97, (ii) in close kinship (PI_HAT > 0.17), and (iii) estimated of non-East Asian ancestry were excluded. Variant QC: We excluded variants with (i) genotyping call rate < 0.99, (ii) minor allele count < 5, (iii) P value for Hardy–Weinberg equilibrium < 1.0 × 10−5 in controls, and (iv) > 10% allele frequency difference with the imputation reference panel or the allele frequency panel of Tohoku Medical Megabank Project. Post-imputation QC: We excluded imputed variants with Rsq < 0.7 and minor allele frequency < 0.5%.
Marker Number (after QC)	7,803,874 autosomal variants 181,867 X-chromosomal variants
NBDC Data Set ID	hum0197.v9.gwas.GCT.v1 (Click the Data Set ID to download the file) Dictionary file
Total Data Volume	248 MB (txt)
Comments (Policies)	NBDC policy

hum0197.v10.gwas.v1


Participants/Materials	Biobank Japan (n=161,801), UK biobank (n=377,583), no. Phenotypes: 9 Patients: Autoimmune [Rheumatoid arthritis (ICD10: M05), Graves' disease (ICD10: C719), type I diabetes mellitus (ICD10: E10)] Allergy [asthma (ICD10: J45), Atopic dermatitis (ICD10: L20), Pollinosis (ICD10: J301)] Controls: non-autoimmune +non-allergy individuals (There is overlap among patients in each disease category)
Targets	genome wide SNPs
Target Loci for Capture Methods	-
Platform	BBJ: Illumina [HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip] UK Biobank: Applied Biosystems [UK BiLEVE Axiom Array, UK Biobank Axiom Array]
Library Source	DNAs extracted from peripheral blood cells
Cell Lines	-
Reagents (Kit, Version)	BBJ: HumanOmniExpressExome BeadChip, HumanOmniExpress BeadChip, HumanExome BeadChip UK Biobank: UK BiLEVE Axiom Array, UK Biobank Axiom Array
Genotype Call Methods (software)	BBJ: Eagle, Minimac3 UK Biobank: IMPUTE4
Association Analysis (software)	SAIGE software was used with age, sex, and top five principal components as covariates. RE2C software was used for the multi-trait meta-analysis adjusting for sample overlap between GWAS summary data.
Filtering Methods	We excluded the variants with Rsq < 0.7 and MAF < 0.005.
Marker Number (after QC)	BBJ: 8,374,220 autosomal variants for individual trait / 8,369,174 autosomal variants for meta-analysis UKB: 10,864,380 autosomal variants for individual trait / 10,858,065 autosomal variants for meta-analysis BBJ + UK Biobank: 5,965,154 autosomal variants for meta-analysis
NBDC Data Set ID	hum0197.v10.gwas.v1 (Click the Data Set ID to download the files) Dictionary file
Total Data Volume	BBJ: ~ 760MB for individual trait / ~ 430MB for multi-trait meta-analysis UK Biobank: ~ 1.1GB for individual trait / ~ 550MB for multi-trait meta-analysis BBJ+UK Biobank: ~ 310MB for multi-trait meta-analysis
Comments (Policies)	NBDC policy

DATA PROVIDER

Principal Investigator: Yukinori Okada

Affiliation: Department of Statistical Genetics, Osaka University Graduate School of Medicine

Project / Group Name： -

Funds / Grants (Research Project Number):

Name	Title	Project Number
Precursory Research for Innovative Medical care (PRIME), Advanced Research & Development Programs for Medical Innovation, Japan Agency for Medical Research and Development (AMED)	Crosstalk among microbiome, host, disease, and drug discovery enhanced by statistical genetics	JP19gm6010001
FORCE, Advanced Research & Development Programs for Medical Innovation, Japan Agency for Medical Research and Development (AMED)	Elucidation of disease-specific microbiota and personalized medicine by metagenome-wide association studies	JP20gm4010006
Practical Research Project for Rare / Intractable Diseases, Japan Agency for Medical Research and Development (AMED)	Biology and in silico drug repositioning of pulmonary alveolar proteinosis using trans-layer omics analysis	JP20ek0109413
Practical Research Project for Allergic Diseases and Immunology, Japan Agency for Medical Research and Development (AMED)	Nucleic genome drug discovery for autoimmune diseases through in-silico and patient-oriented screening utilizing large-scale disease genetics	JP19ek0410041
Practical Research Project for Allergic Diseases and Immunology, Japan Agency for Medical Research and Development (AMED)	Genomic prediction medicine of rheumatoid arthritis based on comprehensive immune-omics resources	JP21ek0410075
Platform Program for Promotion of Genome Medicine, Japan Agency for Medical Research and Development (AMED)	Implementation of genomic prediction medicine based on statistical genetics	JP21km0405211
Platform Program for Promotion of Genome Medicine, Japan Agency for Medical Research and Development (AMED)	Next-generation genomics analyses elucidates biology, personalized medicine, and drug discovery of psoriasis	JP21km0405217
KAKENHI Grant-in-Aid for Scientific Research (A)	Elucidation of disease biology and tissue specificity by trans-layer omics analysis and whole-genome sequencing	19H01021

PUBLICATIONS

	Title	DOI	Data Set ID
1	Metagenome-wide association study of gut microbiome revealed novel aetiology of rheumatoid arthritis in the Japanese population.	doi: 10.1136/annrheumdis-2019-215743	JGAD000290
2	Genetic determinants of risk in autoimmune pulmonary alveolar proteinosis.	doi: 10.1038/s41467-021-21011-y	hum0197.v2.gwas.v1
3	A metagenome-wide association study of gut microbiome in patients with multiple sclerosis revealed novel disease pathology.	doi: 10.3389/fcimb.2020.585973	JGAD000363
4	A global atlas of genetic associations of 220 deep phenotypes	doi: 10.1101/2020.10.23.20213652	hum0197.v3.gwas.v1
5	Metagenome-wide association study revealed disease-specific landscape of the gut microbiome of systemic lupus erythematosus in Japanese	doi: 10.1136/annrheumdis-2021-220687	JGAD000427
6	Whole gut virome analysis of 476 Japanese revealed a link between phage and autoimmune disease	doi: 10.1136/annrheumdis-2021-221267	JGAD000532
7	Insights from complex trait fine-mapping across diverse populations	doi: 10.1101/2021.09.03.21262975	hum0197.v5.gwas.v1 hum0197.v5.finemap.v1
8	Genetic architecture of microRNA expression and its link to complex diseases in the Japanese population.	doi: 10.1093/hmg/ddab361	JGAD000621 hum0197.v6.eqtl.v1
9	Multi-trait and cross-population genome-wide association studies across autoimmune and allergic diseases identify shared and distinct genetic components.	doi: 10.1136/annrheumdis-2022-222460	hum0197.v10.gwas.v1

USRES (Controlled-Access Data)

Principal Investigator	Affiliation	Country/Region	Research Title	Data in Use (Data Set ID)	Period of Data Use
Ilana Brito	Meinig School of Biomedical Engineering, Cornell University	United States of America	Comparative metagenomics of lupus patients' microbiomes	JGAD000290, JGAD000363, JGAD000427, JGAD000532	2022/05/12-2024/05/04