NBDC Research ID: hum0174.v4

SUMMARY

Aims: To build a database of genomic structural variants in Japanese population

Methods: We sequenced genomic DNAs using PacBio, 10X Genomics and Nanopore sequencing technologies, and analyzed genomic structural variations.

Participants/Materials: Japanese (collected by Japanese B cell DNA bank)

Dataset ID	Type of Data	Criteria	Release Date
JGAS000173	NGS (WGS): Sequence raw data, Structural Variants data for each sample	Controlled-access (Type I)	2020/10/06
JGAS000173 (Data addition)	NGS (WGS)	Controlled-access (Type I)	2020/11/27
JGAS000580	NGS (WGS)	Controlled-access (Type I)	2023/06/29
JGAS000286	NGS (WGS): Sequence raw data, Structural Variants data for each sample	Controlled-access (Type I)	2023/07/06

* Data users need to apply an application for Using NBDC Human Data to reach the Controlled-access Data. Learn more

MOLECULAR DATA


Participants/Materials:	Purified DNA from Japanese-origin B cell lines: 10 samples
Targets	WGS
Target Loci for Capture Methods	-
Platform	1. PacBio [Sequel] 2. 10x Genomics [Chromium Controller]
Library Source	Purified DNA from Japanese-origin B cell lines
Cell Lines	the Health Science Research Resources Bank (HSRRB), the National Institutes of Biomedical Innovation, Health and Nutrition (NIBIOHN)
Library Construction (kit name)	1. the library prep. kit for SMRT sequencing by Pacific Biosciences 2. 10X Genomics-Chromium system
Fragmentation Methods	1. Megaruptor, g-tube 2. None
Spot Type	1. Single-end 2. Paired-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers)	1. 14000 bp 2. 151 bp
QC Methods	1. Qubit, Pulsed-field gel electrophoresis, TapeStation, Bioanalyzer 2. qPCR, Bioanalyzer
Mapping Methods	1. minimap2 2. longranger by 10X Genomics
Depth (average)	1. 29x 2. 19x
Structural Variants Detection Methods	1. Sniffles 2. longranger by 10X Genomics
Polymorphism Number (after QC)	1. 16870/sample 2. 11700/sample
Japanese Genotype-phenotype Archive Dataset ID	JGAD000251
Total Data Volume	1 TB (fastq, bam [ref: unmapped], bed, vcf [ref: hg38])
Comments (Policies)	NBDC policy


Participants/Materials:	Purified DNA from Japanese-origin B cell liens: 11 samples
Targets	WGS
Target Loci for Capture Methods	-
Platform	PacBio [Sequel]
Library Source	Purified DNA from Japanese-origin B cell lines
Cell Lines	the Health Science Research Resources Bank (HSRRB), the National Institutes of Biomedical Innovation, Health and Nutrition (NIBIOHN)
Library Construction (kit name)	the library prep. kit for SMRT sequencing by Pacific Biosciences
Fragmentation Methods	Megaruptor, g-tube
Spot Type	Single-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers)	14000 bp
Japanese Genotype-phenotype Archive Dataset ID	JGAD000251
Total Data Volume	3.44 TB (bam)
Comments (Policies)	NBDC policy


Participants/Materials:	Purified DNA from Japanese-origin B cell liens: 1 samples
Targets	WGS
Target Loci for Capture Methods	MHC, LRC, Chr1, SMN1/SMN2
Platform	Nanopore [PromethION]
Library Source	Purified DNA from Japanese-origin B cell lines
Cell Lines	the Health Science Research Resources Bank (HSRRB), the National Institutes of Biomedical Innovation, Health and Nutrition (NIBIOHN)
Library Construction (kit name)	Ultra-Long DNA Sequencing Kit (SQK-ULK001)
Fragmentation Methods	Transposase-based
Spot Type	Single-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers)	56.2 Kbp ~ 63.8 Kbp (N50)
Mapping Methods	minimap2 (v2.24) with "-x map-ont"
Mapping Quality	-
Reference Genome Sequence	T2T-CHM13v2.0
Coverage (Depth)	81x ~ 104x (median)
Japanese Genotype-phenotype Archive Dataset ID	JGAD000706
Total Data Volume	1.4 GB (bam)
Comments (Policies)	NBDC policy


Participants/Materials:	Purified DNA from Japanese-origin B cell lines: 177 samples (CCS: 112 samples, CLR: 65 samples)
Targets	WGS
Target Loci for Capture Methods	-
Platform	PacBio [Sequel, Sequel II]
Library Source	Purified DNA from Japanese-origin B cell lines
Cell Lines	the Health Science Research Resources Bank (HSRRB), the National Institutes of Biomedical Innovation, Health and Nutrition (NIBIOHN)
Library Construction (kit name)	the library prep. kit for SMRT sequencing by Pacific Biosciences
Fragmentation Methods	Megaruptor, g-tube
Spot Type	Single-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers)	14000 bp
QC Methods	Qubit, Pulsed-field gel electrophoresis, TapeStation, Bioanalyzer
Mapping Methods	minimap2
Depth (average)	CCS: 9.5x CLR: 36x
SNV Call	DeepVariant
SNV Haplotyping	WhatsHap
Structural Variants Detection Methods	pbsv
diploid assembly	HiCanu
Japanese Genotype-phenotype Archive Dataset ID	JGAD000392
Total Data Volume	31.8 TB (bam, vcf, fasta)
Comments (Policies)	NBDC policy

Principal Investigator: Shinichi Morishita

Affiliation: Graduate School of Frontier Sciences, the University of Tokyo

Project / Group Name： -

Funds / Grants (Research Project Number):

Name	Title	Project Number
Advanced Genome Research and Bioinformatics Study to Facilitate Medical Innovation, Platform Program for Promotion of Genome Medicine, Japan Agency for Medical Research and Development (AMED)	Informatics for analyzing de novo human genome assemblies	JP16km0405204
Biobank - Construction and Utilization biobank for genomic medicine REalization, Japan Agency for Medical Research and Development (AMED)	Informatics for analyzing de novo human genome assemblies	JP21tm0424219

	Title	DOI	Dataset ID
1	Rapid and ongoing evolution of repetitive sequence structures in human centromeres.	doi: 10.1126/sciadv.abd9230	JGAD000251
2	JTK: targeted diploid genome assembler	doi: 10.1093/bioinformatics/btad398	JGAD000706

Principal Investigator	Affiliation	Country/Region	Research Title	Data in Use (Dataset ID)	Period of Data Use