NBDC Research ID: hum0248.v2

SUMMARY

Aims: To construct a Japanese reference genome sequence

Methods: Sequence data were obtained from each of three healthy Japanese males by the PacBio, Bionano, and Illumina HiSeq platforms. De novo assembly was performed for each individual, and then the three assemblies were integrated by meta-assembly. In addition, the majority variant was adopted by majority vote for the polymorphic sites among the three assemblies. Finally, the meta-scaffolds were anchored by markers from genetic and radiation hybrid maps and integrated as a pseudo-chromosome sequence set.

Participants/Materials: three Japanese male individuals

URL： https://www.megabank.tohoku.ac.jp/english/timeline/20190225_01/

Dataset ID	Type of Data	Criteria	Release Date
AP023461-AP024084	Japanese reference genome sequence	Unrestricted-access	2020/10/30
JGAS000259	Japanese reference genome sequence	Controlled-access (Type II)	2020/11/06

*Release Note

*Data users need to apply an application for Using NBDC Human Data to reach the Controlled-access Data. Learn more

*When the research results including the data which were downloaded from NHA/DRA, are published or presented somewhere, the data user must refer the papers which are related to the data, or include in the acknowledgment. Learn more

MOLECULAR DATA

AP023461-AP024084 / JGAS000259


Participants/Materials	3 Japanese male individuals
Targets	WGS
Target Loci for Capture Methods	-
Platform	PacBio [RS II] Bionano [Irys, Saphyr] Illumina [HiSeq 2500]
Library Source	gDNA extracted from peripheral blood cells
Cell Lines	-
Library Construction (kit name)	PacBio: DNA template prep kit 2.0 Bionano: DNA isolation in gel plug, treated with Proteinase K and RNase, solubilized plug with GELase, nick-label-repair with Nt.BspQI and Nb.BssSI for jg1a, or with direct labeling and staining for jg1b and jg1c. Illumina: TruSeq DNA PCR-Free HT sample prep kit
Fragmentation Methods	Illumina: Ultrasonic fragmentation (Covaris)
Spot Type	Illumina: Paired-end, Mate pair
Read Length (without Barcodes, Adaptors, Primers, and Linkers)	PacBio: 10 kb - jg1a: 10,589 bp (mean) - jg1b: 10,066 bp (mean) - jg1c: 9,226 bp (mean) Bionano: >146 kb - jg1a.BspQI: 318,216 bp (mean) - jg1a.BssSI: 228,101 bp (mean) - jg1b.DLS: 169,138 bp (mean) - jg1c.DLS: 146,026 bp (mean) Illumina: 162 or 259 bp
QC Methods	PacBio: QC with Falcon software with length_cutoff = 9000, length_cutoff_pr = 15000 Bionano: QC with BionanoSolve software with default settings Illumina: NA
Genome Sequence Construction Methods	1. highly contiguous de novo assembly: 1) PacBio long reads were de novo assembled to yield primary contigs 2) Bionano raw data were also de novo assembled (independent of the PacBio assembly) to yield genome maps 3) the PacBio-derived contigs were scaffolded by the Bionano genome maps 2. Polishing the hybrid scaffolds with Illumina short reads (paired-end) 3. Integrating and filling the gaps of the hybrid scaffolds of each individual with an aid of mate pair Illumina short reads 4. meta-assembly with Metassembler software 5. Anchoring scaffolds to chromosomes with genetic and radiation hybrid maps
Coverage（Depth）	PacBio: >122× - jg1a: 122× - jg1b: 123× - jg1c: 128× Bionano: >123× - jg1a.BspQI: 123× - jg1a.BssSI: 140× - jg1b: 160× - jg1c: 175× Illumina paired end: >26× - jg1a.162PE: 29× - jg1a.259PE: 26× - jg1b.162PE: 31× - jg1b.259PE: 28× - jg1c.162PE: 31× - jg1c.259PE: 26× Illumina mate-pair: >12× - jg1a: 13× - jg1b: 12× - jg1c: 12×
Variation Detection Methods	SNVs between hs37d5 and JG1 in the autosomes and X chromosome were called using minimap2 and paftools software
Single Nucleotide Variants Number	2,501,575 SNVs
Structural Variants Detection Methods	genome-by-genome alignement with minimap2 and paftools software between GRCh38 and JG1
Structural Varinats Number	8,697 insertions and 6,190 deletions >50 bp in length.
Mass Submission System ID	AP023461-AP024084
Japanese Genotype-phenotype Archive Dataset ID	JGAD000362
Total Data Volume	AP023461-AP024084: 821 MB (fasta) JGAD000362: 2.4 TB (fasta, fastq, bnx)
Comments (Policies)	AP023461-AP024084: NBDC policy JGAD000362: NBDC policy & hum0184 policy Contact Information of ToMMo Supercomputer:

DATA PROVIDER

Principal Investigator: Masayuki Yamamoto

Affiliation: Tohoku University School of Medicine, Tohoku Medical Megabank Organization

Project / Group Name: JRGA (Japanese Reference Genome Assembly)

URL： jMorp: https://jmorp.megabank.tohoku.ac.jp/

Funds / Grants (Research Project Number):

Name	Title	Project Number
Japan Agency for Medical Research and Development (AMED)	Tohoku Medical Megabank Project (Tohoku University) Special account of the Great East Japan Earthquake disaster recovery	JP20km0105001
Japan Agency for Medical Research and Development (AMED)	Tohoku Medical Megabank Project (Tohoku University) General accounting	JP20km0105002
Platform Program for Promotion of Genome Medicine, Japan Agency for Medical Research and Development (AMED)	Facilitation of R&D Platform for AMED Genome Mecidine Support	JP20km0405001
KAKENHI Grant-in-Aid for Scientific Research on Innovative Areas (Research in a proposed research area)	Constructive understanding of multi-scale dynamism of neuropsychiatric disorders	JP19H05200
KAKENHI Grant-in-Aid for Scientific Research (C)	NGS analysis of a large genome cohort by deep learning Research Project	JP19K06625

PUBLICATIONS

	Title	DOI	Dataset ID
1	Construction and Integration of Three De Novo Japanese Human Genome Assemblies toward a Population-Specific Reference	doi:10.1101/861658	AP023461-AP024084 JGAD000362
2

Title

DOI

Dataset ID

Construction and Integration of Three De Novo Japanese Human Genome Assemblies toward a Population-Specific Reference

doi:10.1101/861658

AP023461-AP024084

JGAD000362

USRES (Controlled-access Data)

Principal Investigator	Affiliation	Country/Region	Research Title	Data in Use (Dataset ID)	Period of Data Use