NBDC Research ID: hum0248.v2
SUMMARY
Aims: To construct a Japanese reference genome sequence
Methods: Sequence data were obtained from each of three healthy Japanese males by the PacBio, Bionano, and Illumina HiSeq platforms. De novo assembly was performed for each individual, and then the three assemblies were integrated by meta-assembly. In addition, the majority variant was adopted by majority vote for the polymorphic sites among the three assemblies. Finally, the meta-scaffolds were anchored by markers from genetic and radiation hybrid maps and integrated as a pseudo-chromosome sequence set.
Participants/Materials: three Japanese male individuals
URL: https://www.megabank.tohoku.ac.jp/english/timeline/20190225_01/
Dataset ID | Type of Data | Criteria | Release Date |
---|---|---|---|
AP023461-AP024084 | Japanese reference genome sequence | Unrestricted-access | 2020/10/30 |
JGAS000259 | Japanese reference genome sequence | Controlled-access (Type II) | 2020/11/06 |
*Data users need to apply an application for Using NBDC Human Data to reach the Controlled-access Data. Learn more
*When the research results including the data which were downloaded from NHA/DRA, are published or presented somewhere, the data user must refer the papers which are related to the data, or include in the acknowledgment. Learn more
MOLECULAR DATA
AP023461-AP024084 / JGAS000259
Participants/Materials | 3 Japanese male individuals |
Targets | WGS |
Target Loci for Capture Methods | - |
Platform |
PacBio [RS II] Bionano [Irys, Saphyr] Illumina [HiSeq 2500] |
Library Source | gDNA extracted from peripheral blood cells |
Cell Lines | - |
Library Construction (kit name) |
PacBio: DNA template prep kit 2.0 Bionano: DNA isolation in gel plug, treated with Proteinase K and RNase, solubilized plug with GELase, nick-label-repair with Nt.BspQI and Nb.BssSI for jg1a, or with direct labeling and staining for jg1b and jg1c. Illumina: TruSeq DNA PCR-Free HT sample prep kit |
Fragmentation Methods | Illumina: Ultrasonic fragmentation (Covaris) |
Spot Type | Illumina: Paired-end, Mate pair |
Read Length (without Barcodes, Adaptors, Primers, and Linkers) |
PacBio: 10 kb - jg1a: 10,589 bp (mean) - jg1b: 10,066 bp (mean) - jg1c: 9,226 bp (mean) Bionano: >146 kb - jg1a.BspQI: 318,216 bp (mean) - jg1a.BssSI: 228,101 bp (mean) - jg1b.DLS: 169,138 bp (mean) - jg1c.DLS: 146,026 bp (mean) Illumina: 162 or 259 bp |
QC Methods |
PacBio: QC with Falcon software with length_cutoff = 9000, length_cutoff_pr = 15000 Bionano: QC with BionanoSolve software with default settings Illumina: NA |
Genome Sequence Construction Methods |
1. highly contiguous de novo assembly: 1) PacBio long reads were de novo assembled to yield primary contigs 2) Bionano raw data were also de novo assembled (independent of the PacBio assembly) to yield genome maps 3) the PacBio-derived contigs were scaffolded by the Bionano genome maps 2. Polishing the hybrid scaffolds with Illumina short reads (paired-end) 3. Integrating and filling the gaps of the hybrid scaffolds of each individual with an aid of mate pair Illumina short reads 4. meta-assembly with Metassembler software 5. Anchoring scaffolds to chromosomes with genetic and radiation hybrid maps |
Coverage(Depth) |
PacBio: >122× - jg1a: 122× - jg1b: 123× - jg1c: 128× Bionano: >123× - jg1a.BspQI: 123× - jg1a.BssSI: 140× - jg1b: 160× - jg1c: 175× Illumina paired end: >26× - jg1a.162PE: 29× - jg1a.259PE: 26× - jg1b.162PE: 31× - jg1b.259PE: 28× - jg1c.162PE: 31× - jg1c.259PE: 26× Illumina mate-pair: >12× - jg1a: 13× - jg1b: 12× - jg1c: 12× |
Variation Detection Methods | SNVs between hs37d5 and JG1 in the autosomes and X chromosome were called using minimap2 and paftools software |
Single Nucleotide Variants Number | 2,501,575 SNVs |
Structural Variants Detection Methods | genome-by-genome alignement with minimap2 and paftools software between GRCh38 and JG1 |
Structural Varinats Number | 8,697 insertions and 6,190 deletions >50 bp in length. |
Mass Submission System ID | AP023461-AP024084 |
Japanese Genotype-phenotype Archive Dataset ID | JGAD000362 |
Total Data Volume |
AP023461-AP024084: 821 MB (fasta) JGAD000362: 2.4 TB (fasta, fastq, bnx) |
Comments (Policies) |
AP023461-AP024084: NBDC policy JGAD000362: NBDC policy & hum0184 policy |
DATA PROVIDER
Principal Investigator: Masayuki Yamamoto
Affiliation: Tohoku University School of Medicine, Tohoku Medical Megabank Organization
Project / Group Name: JRGA (Japanese Reference Genome Assembly)
URL: jMorp: https://jmorp.megabank.tohoku.ac.jp/
Funds / Grants (Research Project Number):
Name | Title | Project Number |
---|---|---|
Japan Agency for Medical Research and Development (AMED) | Tohoku Medical Megabank Project (Tohoku University) Special account of the Great East Japan Earthquake disaster recovery | JP20km0105001 |
Japan Agency for Medical Research and Development (AMED) | Tohoku Medical Megabank Project (Tohoku University) General accounting | JP20km0105002 |
Platform Program for Promotion of Genome Medicine, Japan Agency for Medical Research and Development (AMED) | Facilitation of R&D Platform for AMED Genome Mecidine Support | JP20km0405001 |
KAKENHI Grant-in-Aid for Scientific Research on Innovative Areas (Research in a proposed research area) | Constructive understanding of multi-scale dynamism of neuropsychiatric disorders | JP19H05200 |
KAKENHI Grant-in-Aid for Scientific Research (C) | NGS analysis of a large genome cohort by deep learning Research Project | JP19K06625 |
PUBLICATIONS
Title | DOI | Dataset ID | |
---|---|---|---|
1 | Construction and Integration of Three De Novo Japanese Human Genome Assemblies toward a Population-Specific Reference | doi:10.1101/861658 |
AP023461-AP024084 JGAD000362 |
2 |
USRES (Controlled-access Data)
Principal Investigator | Affiliation | Country/Region | Research Title | Data in Use (Dataset ID) | Period of Data Use |
---|---|---|---|---|---|