Aims: To build a database of genomic structural variants in Japanese population

Methods: We sequenced genomic DNAs using PacBio, 10X Genomics and Nanopore sequencing technologies, and analyzed genomic structural variations.

Participants/Materials: Japanese (collected by Japanese B cell DNA bank)


Dataset IDType of DataCriteriaRelease Date
JGAS000173 NGS (WGS): Sequence raw data, Structural Variants data for each sample Controlled-access (Type I) 2020/10/06
JGAS000173 (Data addition) NGS (WGS) Controlled-access (Type I) 2020/11/27
JGAS000580 NGS (WGS) Controlled-access (Type I) 2023/06/29
JGAS000286 NGS (WGS): Sequence raw data, Structural Variants data for each sample Controlled-access (Type I) 2023/07/06

Participants/Materials: Purified DNA from Japanese-origin B cell lines: 10 samples
Targets WGS
Target Loci for Capture Methods -

1. PacBio [Sequel]

2. 10x Genomics [Chromium Controller]

Library Source Purified DNA from Japanese-origin B cell lines
Cell Lines the Health Science Research Resources Bank (HSRRB), the National Institutes of Biomedical Innovation, Health and Nutrition (NIBIOHN)
Library Construction (kit name)

1. the library prep. kit for SMRT sequencing by Pacific Biosciences

2. 10X Genomics-Chromium system

Fragmentation Methods

1. Megaruptor, g-tube

2. None

Spot Type

1. Single-end

2. Paired-end

Read Length (without Barcodes, Adaptors, Primers, and Linkers)

1. 14000 bp

2. 151 bp

QC Methods

1. Qubit, Pulsed-field gel electrophoresis, TapeStation, Bioanalyzer

2. qPCR, Bioanalyzer

Mapping Methods

1. minimap2

2. longranger by 10X Genomics

Depth (average)

1. 29x

2. 19x

Structural Variants Detection Methods

1. Sniffles

2. longranger by 10X Genomics

Polymorphism Number (after QC)

1. 16870/sample

2. 11700/sample

Japanese Genotype-phenotype Archive Dataset ID JGAD000251
Total Data Volume 1 TB (fastq, bam [ref: unmapped], bed, vcf [ref: hg38])
Comments (Policies) NBDC policy


JGAS000173 (Data addition)

Participants/Materials: Purified DNA from Japanese-origin B cell liens: 11 samples
Targets WGS
Target Loci for Capture Methods -
Platform PacBio [Sequel]
Library Source Purified DNA from Japanese-origin B cell lines
Cell Lines the Health Science Research Resources Bank (HSRRB), the National Institutes of Biomedical Innovation, Health and Nutrition (NIBIOHN)
Library Construction (kit name) the library prep. kit for SMRT sequencing by Pacific Biosciences
Fragmentation Methods Megaruptor, g-tube
Spot Type Single-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers) 14000 bp
Japanese Genotype-phenotype Archive Dataset ID JGAD000251
Total Data Volume 3.44 TB (bam)
JGAD000251
Total Data Volume 3.44 TB (bam)



Participants/Materials: Purified DNA from Japanese-origin B cell liens: 1 samples
Targets WGS
Target Loci for Capture Methods MHC, LRC, Chr1, SMN1/SMN2
Platform Nanopore [PromethION]
Library Source Purified DNA from Japanese-origin B cell lines
Cell Lines the Health Science Research Resources Bank (HSRRB), the National Institutes of Biomedical Innovation, Health and Nutrition (NIBIOHN)
Library Construction (kit name) Ultra-Long DNA Sequencing Kit (SQK-ULK001)
Fragmentation Methods Transposase-based
Spot Type Single-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers) 56.2 Kbp ~ 63.8 Kbp (N50)
Mapping Methods minimap2 (v2.24) with "-x map-ont"
Mapping Quality -
Reference Genome Sequence T2T-CHM13v2.0
Coverage (Depth) 81x ~ 104x (median)
Japanese Genotype-phenotype Archive Dataset ID JGAD000706
Total Data Volume 1.4 GB (bam)
JGAD000706
Total Data Volume 1.4 GB (bam)




Purified DNA from Japanese-origin B cell lines: 177 samples

    (CCS: 112 samples, CLR: 65 samples)

Targets WGS
Target Loci for Capture Methods -
Platform PacBio [Sequel, Sequel II]
Library Source Purified DNA from Japanese-origin B cell lines
Cell Lines the Health Science Research Resources Bank (HSRRB), the National Institutes of Biomedical Innovation, Health and Nutrition (NIBIOHN)
Library Construction (kit name) the library prep. kit for SMRT sequencing by Pacific Biosciences
Fragmentation Methods Megaruptor, g-tube
Spot Type Single-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers) 14000 bp
QC Methods Qubit, Pulsed-field gel electrophoresis, TapeStation, Bioanalyzer
Mapping Methods minimap2
Depth (average)

CCS: 9.5x

CLR: 36x

SNV Call DeepVariant
SNV Haplotyping WhatsHap
Structural Variants Detection Methods pbsv
diploid assembly HiCanu
Japanese Genotype-phenotype Archive Dataset ID JGAD000392
Total Data Volume 31.8 TB (bam, vcf, fasta)
Comments (Policies) NBDC policy



Principal Investigator: Shinichi Morishita

Affiliation: Graduate School of Frontier Sciences, the University of Tokyo

Project / Group Name: -

Funds / Grants (Research Project Number):

NameTitleProject Number
Advanced Genome Research and Bioinformatics Study to Facilitate Medical Innovation, Platform Program for Promotion of Genome Medicine, Japan Agency for Medical Research and Development (AMED) Informatics for analyzing de novo human genome assemblies JP16km0405204
Biobank - Construction and Utilization biobank for genomic medicine REalization, Japan Agency for Medical Research and Development (AMED) Informatics for analyzing de novo human genome assemblies JP21tm0424219



TitleDOIDataset ID
1 Rapid and ongoing evolution of repetitive sequence structures in human centromeres. doi: 10.1126/sciadv.abd9230 JGAD000251
2 JTK: targeted diploid genome assembler doi: 10.1093/bioinformatics/btad398 JGAD000706


