NBDC Research ID: hum0248.v2

 

SUMMARY

Aims: To construct a Japanese reference genome sequence

Methods: Sequence data were obtained from each of three healthy Japanese males by the PacBio, Bionano, and Illumina HiSeq platforms. De novo assembly was performed for each individual, and then the three assemblies were integrated by meta-assembly. In addition, the majority variant was adopted by majority vote for the polymorphic sites among the three assemblies. Finally, the meta-scaffolds were anchored by markers from genetic and radiation hybrid maps and integrated as a pseudo-chromosome sequence set.

Participants/Materials: three Japanese male individuals

URL: https://www.megabank.tohoku.ac.jp/english/timeline/20190225_01/

 

Dataset IDType of DataCriteriaRelease Date
AP023461-AP024084 Japanese reference genome sequence Unrestricted-access 2020/10/30
JGAS000259 Japanese reference genome sequence Controlled-access (Type II) 2020/11/06

*Release Note 

*Data users need to apply an application for Using NBDC Human Data to reach the Controlled-access Data. Learn more

*When the research results including the data which were downloaded from NHA/DRA, are published or presented somewhere, the data user must refer the papers which are related to the data, or include in the acknowledgment. Learn more

 

MOLECULAR DATA

AP023461-AP024084 / JGAS000259

Participants/Materials 3 Japanese male individuals
Targets WGS
Target Loci for Capture Methods -
Platform

PacBio [RS II]

Bionano [Irys, Saphyr]

Illumina [HiSeq 2500]

Library Source gDNA extracted from peripheral blood cells
Cell Lines -
Library Construction (kit name)

PacBio: DNA template prep kit 2.0

Bionano: DNA isolation in gel plug, treated with Proteinase K and RNase, solubilized plug with GELase, nick-label-repair with Nt.BspQI and Nb.BssSI for jg1a, or with direct labeling and staining for jg1b and jg1c.

Illumina: TruSeq DNA PCR-Free HT sample prep kit

Fragmentation Methods Illumina: Ultrasonic fragmentation (Covaris)
Spot Type Illumina: Paired-end, Mate pair
Read Length (without Barcodes, Adaptors, Primers, and Linkers)

PacBio: 10 kb

    - jg1a: 10,589 bp (mean)

    - jg1b: 10,066 bp (mean)

    - jg1c: 9,226 bp (mean)

Bionano: >146 kb

    - jg1a.BspQI: 318,216 bp (mean)

    - jg1a.BssSI: 228,101 bp (mean)

    - jg1b.DLS: 169,138 bp (mean)

    - jg1c.DLS: 146,026 bp (mean)

Illumina: 162 or 259 bp

QC Methods

PacBio: QC with Falcon software with length_cutoff = 9000, length_cutoff_pr = 15000

Bionano: QC with BionanoSolve software with default settings

Illumina: NA

Genome Sequence Construction Methods

1. highly contiguous de novo assembly:

  1) PacBio long reads were de novo assembled to yield primary contigs

  2) Bionano raw data were also de novo assembled (independent of the PacBio assembly) to yield genome maps

  3) the PacBio-derived contigs were scaffolded by the Bionano genome maps

2. Polishing the hybrid scaffolds with Illumina short reads (paired-end)

3. Integrating and filling the gaps of the hybrid scaffolds of each individual with an aid of mate pair Illumina short reads

4. meta-assembly with Metassembler software

5. Anchoring scaffolds to chromosomes with genetic and radiation hybrid maps

Coverage(Depth)

PacBio: >122×

    - jg1a: 122×

    - jg1b: 123×

    - jg1c: 128×

Bionano: >123×

    - jg1a.BspQI: 123×

    - jg1a.BssSI: 140×

    - jg1b: 160×

    - jg1c: 175×

Illumina paired end: >26×

    - jg1a.162PE: 29×

    - jg1a.259PE: 26×

    - jg1b.162PE: 31×

    - jg1b.259PE: 28×

    - jg1c.162PE: 31×

    - jg1c.259PE: 26×

Illumina mate-pair: >12×

    - jg1a: 13×

    - jg1b: 12×

    - jg1c: 12×

Variation Detection Methods SNVs between hs37d5 and JG1 in the autosomes and X chromosome were called using minimap2 and paftools software
Single Nucleotide Variants Number 2,501,575 SNVs
Structural Variants Detection Methods genome-by-genome alignement with minimap2 and paftools software between GRCh38 and JG1
Structural Varinats Number 8,697 insertions and 6,190 deletions >50 bp in length.
Mass Submission System ID AP023461-AP024084
Japanese Genotype-phenotype Archive Dataset ID JGAD000362
Total Data Volume

AP023461-AP024084: 821 MB (fasta)

JGAD000362: 2.4 TB (fasta, fastq, bnx)

Comments (Policies)

AP023461-AP024084: NBDC policy

JGAD000362: NBDC policy & hum0184 policy

Contact Information of ToMMo Supercomputer:

 

DATA PROVIDER

Principal Investigator: Masayuki Yamamoto

Affiliation: Tohoku University School of Medicine, Tohoku Medical Megabank Organization

Project / Group Name: JRGA (Japanese Reference Genome Assembly)

URL: jMorp: https://jmorp.megabank.tohoku.ac.jp/

Funds / Grants (Research Project Number):

NameTitleProject Number
Japan Agency for Medical Research and Development (AMED) Tohoku Medical Megabank Project (Tohoku University) Special account of the Great East Japan Earthquake disaster recovery JP20km0105001
Japan Agency for Medical Research and Development (AMED) Tohoku Medical Megabank Project (Tohoku University) General accounting JP20km0105002
Platform Program for Promotion of Genome Medicine, Japan Agency for Medical Research and Development (AMED) Facilitation of R&D Platform for AMED Genome Mecidine Support JP20km0405001
KAKENHI Grant-in-Aid for Scientific Research on Innovative Areas (Research in a proposed research area) Constructive understanding of multi-scale dynamism of neuropsychiatric disorders JP19H05200
KAKENHI Grant-in-Aid for Scientific Research (C) NGS analysis of a large genome cohort by deep learning Research Project JP19K06625

 

PUBLICATIONS

TitleDOIDataset ID
1 Construction and Integration of Three De Novo Japanese Human Genome Assemblies toward a Population-Specific Reference doi:10.1101/861658

AP023461-AP024084

JGAD000362

2

 

USRES (Controlled-access Data)

Principal InvestigatorAffiliationCountry/RegionResearch TitleData in Use (Dataset ID)Period of Data Use