NBDC Research ID: hum0311.v2

SUMMARY

Aims: The BioBank Japan (BBJ) is a biobank established in the Institute of Medical Science, the University of Tokyo to collect clinical information and biological materials (DNA and serum samples). It collected about 200 thousand participants of 47 diseases started in 2003 (BBJ 1st cohort), and 67 thousand participants of 38 diseases started in 2013 (BBJ 2nd cohort), both in collaboration with 12 medical centers." This project is aiming at further utilization of the materials, and clinical and genomic information managed by BBJ to contribute to precision medicine by storing, managing, and providing the materials and data, as well as identifying biomarkers associated with disease risk, prognosis, and drug sensitivity.

Methods: Asian Screening Array (ASA-24v1-0_A2)

Imputation results based on TOPMed r2 (GRCh38)

Participants/Materials: 11,716 + 180,882 patients from BBJ 1st cohort and 42,689 patients from BBJ 2nd cohort

URL： https://biobankjp.org/en/index.html

Data Set ID	Type of Data	Criteria	Release Date
JGAS000412	Genotype data for 11,716 patients from BBJ 1st cohort and 42,689 patients from BBJ 2nd cohort	Controlled Access (Type I)	2021/11/30
JGAS000541	Imputation data and index data for 180,882 patients from BBJ 1st cohort	Controlled Access (Type I)	2022/07/28

*Release Note

*Data users need to apply an application for Using NBDC Human Data to reach the Controlled-access Data. Learn more

MOLECULAR DATA

JGAS000412


Participants/Materials	11,716 patients from BBJ 1st cohort and 42,689 patients from BBJ 2nd cohort ICD10: C34, C15, C16, C18-C21, C22, C25, C23, C24, C61, C50, C53, C54, C56, C81-C86, C90-C93, I63, G45.9, I67.1, G40, J45, A15-A19, J44.9, J84.1-9, I21.0-9, I20.0, I20.1, I20.8, I20.9, R00, I44.0-3, I45.5-6, I47-I49, I50, I70.9, B18.1, B18.2, K74.6, N04, N20-N23, M80-M81, E10, E11, E88.8, E78.0-5, E78.8-9, E05.0, M05-M06, J30.1, L91.0, L20, L51.1-2, L27.0, D25, N80, R56.0, H40, H25-H26, K05, G12.2, I61, C64
Targets	genome wide SNPs
Target Loci for Capture Methods	-
Platform	Illumina [Asian Screening Array (ASA-24v1-0_A2)]
Library Source	DNAs extracted from peripheral blood cells or saliva
Cell Lines	-
Reagents (Kit, Version)	Infinium Asian Screening Array-24 v1.0 BeadChip Kit
Genotype Call Methods (software)	GenomeStudio
Marker Number (after QC)	657,060 SNVs (GRCh38)
Japanese Genotype-phenotype Archive Data set ID	JGAD000529
Total Data Volume	1,020 GB (idat, csv, plink binary)
Comments (Policies)	NBDC policy

JGAS000541


Participants/Materials	180,882 patients from BBJ 1st cohort ICD10: A15-A16, B16-B17.0, B18.0-B18.1, B17.1, B18.2, C15, C16, C18, C22, C23-C24, C25, C33-C34, C50, C53, C54, C56, C61, C81, D25, E05, E10, E78.0-E78.5, G12, G40-G41, H25-H26, H40-H42, I20, I21-I22, I44-I49, I50, I60, I69.0, I63, I69.3, I70, J30, J41-J44, J45-J46, J80-J84, K05, K74.3-K74.6, L00-L99, L20, M05-M06, M80-M82, N04, N20-N23, N80, R00-R9
Targets	genome wide SNVs
Target Loci for Capture Methods	-
Platform	Illumina [HumanOmniExpressExome, HumanOmniExpress, HumanExome]
Library Source	DNAs extracted from peripheral blood cells or saliva
Cell Lines	-
Reagents (Kit, Version)	HumanOmniExpressExome-8, HumanOmniExpress-12, HumanExome-12 kit
Genotype Call Methods (software)	GenomeStudio Software Eagle software (v2.4.1) without a reference panel Minimac4 software (v1.0.2)
Reference Genome Sequence	TOPMed reference panel (Version R2 on GRC38)
Filtering Methods	Before imputation, we excluded SNPs using the following criteria: - Heterozygosity count for each chip < 5 - P-value for Hardy–Weinberg equilibrium (HWE) for each chip < 1.0 x 10^-6 * - Genotype concordance rate with whole-genome sequencing (WGS) for 939 samples < 99.5% and its non-reference discordance rate >= 0.5% - Lower call rate SNPs if the position was the same when merging datasets - Call rate < 99% * P-values for chrX SNPs were calculated by using female samples We also excluded samples using the following criteria ： - Call Rate < 98% - Samples whose inferred sex was not matched with the clinical information - Lower call rate samples for duplicated or monozygotic twin in the dataset - Outliers from East Asian clusters from principal component analysis with 1KGp3v5 samples.
Marker Number (after QC)	autosomes: 515,587 SNVs (GRCh38) X-chromosome: 11,140 SNVs (GRCh38)
Japanese Genotype-phenotype Archive Data set ID	JGAD000660
Total Data Volume	11.1 TB (vcf, tbi)
Comments (Policies)	NBDC policy

DATA PROVIDER

Principal Investigator: Koichi Matsuda

Affiliation: Graduate school of Frontier Science, The University of Tokyo

Project / Group Name: Management of disease-oriented biobank in Japan for utilization

URL： https://biobankjp.org/en/index.html

Funds / Grants (Research Project Number):

Name	Title	Project Number
Biobank - Construction and Utilization biobank for genomic medicine REalization (B-Cure), Japan Agency for Medical Research and Development (AMED)	Management of disease-oriented biobank in Japan for utilization	JP20km0605001
Platform Program for Promotion of Genome Medicine, Japan Agency for Medical Research and Development (AMED)	Phenotype-wide association study of 180,000 Biobank Japan samples using high density imputation of TOPMED reference panel	JP21km0405215

PUBLICATIONS

	Title	DOI	Data Set ID
1
2

USERS (Controlled-Access Data)

Principal Investigator	Affiliation	Research Title	Data in Use (Data Set ID)	Period of Data Use