NBDC Research ID: hum0184.v2
SUMMARY
Aims: Tohoku University Tohoku Medical Megabank Organization (ToMMo) and Iwate Tohoku Medical Megabank Organization (IMM) were founded to establish an advanced medical system to foster the reconstruction from the Great East Japan Earthquake. These organizations are developing a biobank that includes medical and genome information for supporting health and welfare in the Tohoku area. In the first stage, the part of our mission was to sequence the 4,000 individuals to construct Japanese whole-genome reference panel.
Methods: Whole genome sequencing
Participants/Materials: 4,566 Japanese general residents
URL: https://jmorp.megabank.tohoku.ac.jp/
Dataset ID | Type of Data | Criteria | Release Date |
---|---|---|---|
JGAS000239 | NGS (WGS) | Controlled-access (Type II) | 2020/09/01 |
JGAS000239 (Dataset addition) | bam/gvcf data of NGS(WGS) | Controlled-access (Type I) | 2022/02/18 |
*Data users need to apply an application for Using NBDC Human Data to reach the Controlled-access Data. Learn more
MOLECULAR DATA
Participants/Materials: | 4,566 Japanese general residents |
Targets | WGS |
Target Loci for Capture Methods | - |
Platform | Illumina [HiSeq 2500, NovaSeq 6000] |
Library Source | DNA extracted from peripheral blood cells |
Cell Lines | - |
Library Construction (kit name) | TruSeq DNA PCR-Free Library Prep Kit |
Fragmentation Methods | Ultrasonic fragmentation (Covaris LE220) |
Spot Type | Paired-end |
Read Length (without Barcodes, Adaptors, Primers, and Linkers) |
HiSeq 2500: 162 bp / 259 bp NovaSeq 6000: 150 bp |
Japanese Genotype-phenotype Archive Dataset ID | |
Total Data Volume |
JGAD000338: 260 TB (fastq) JGAD000339: 230 TB (bam, gvcf, vcf [ref: GRCh37/hg19 (hs37d5)]) |
Comments (Policies) |
Participants/Materials: | 4,566 Japanese general residents |
Targets | WGS |
Source | fastq files of JGAD000338 |
QC |
Data with bad base quality and high %GC content were removed. Aligment: Data matched for the following condition were removed. - Low mapping rate - Different insert size - Gender information mismatch between meta-data and genotype data - Suspected sex chromosome aberration Genotyping: GATK’s best practices includes a variant filtering step following Variant Quality Score Recalibration (VQSR) - DP/GP (DP < 5, GQ < 20, DP > 60, GQ < 95 ) - Heterozygosity (F>=0.05) - Hardy-Weinberg equilibrium (p < 10^-6) - Repeat & Low Complexity Principal Component Analysis (PCA): PCA was performed with individuals included in the 1000 genomes project and outliers from Japanese cluster were removed. After these filtering steps, variants located in the regions listed as the HighConfidenceRegion (Genome-In-A-Bottle project) were flagged. |
Deduplication | Picard 2.10.6 |
Calibration for re-alignment and base quality | GATK 3.7 |
Mapping Methods | BWA mem 0.7.12 |
Mapping Quality | Reads with MAPQ< 20 were excluded at variant calling with GATK 3.7 HaplotypeCaller |
Reference Genome Sequence | GRCh37/hg19 (hs37d5) |
Coverage (Depth) | HiSeq 2500: 31.8x, NovaSeq 6000: 28.0x |
Detecting Methods for Variation | GATK 3.7 HaplotypeCaller |
SNV Numbers (after QC) |
76,768,387 (Autosomal Chromosomes) 2,898,518 (X Chromosome) |
INDEL Numbers (after QC) |
10,202,908 (Autosomal Chromosomes) 410,435 (X Chromosome) |
Japanese Genotype-phenotype Archive Dataset ID | JGAD000625: Whole genome sequencing analyzed data included in the JGAD000117 were mapped to the GRCh37 reference genome sequence, and variant detection was carried out using the GATK (Genome Analysis Toolkit) standards. This project is an initiative of the GEnome Medical alliance Japan (GEM Japan, GEM-J). Lean more.. |
Total Data Volume | 230 TB (bam, vcf) |
Comments (Policies) |
DATA PROVIDER
Principal Investigator: Masayuki Yamamoto
Affiliation: Tohoku Medical Megabank Organization
Project / Group Name: Tohoku Medical Megabank Project
URL: https://www.megabank.tohoku.ac.jp/english/
Funds / Grants (Research Project Number):
Name | Title | Project Number |
---|---|---|
Japan Agency for Medical Research and Development (AMED) | Tohoku Medical Megabank Project (Tohoku University) Special Account of the Great East Japan Earthquake Disaster Recovery | JP20km0105001 |
Japan Agency for Medical Research and Development (AMED) | Tohoku Medical Megabank Project (Tohoku University) General Accounting | JP20km0105002 |
Japan Agency for Medical Research and Development (AMED) | Tohoku Medical Megabank Project (Iwate Medical University) Special Account of the Great East Japan Earthquake Disaster Recovery | JP20km0105003 |
Japan Agency for Medical Research and Development (AMED) | Tohoku Medical Megabank Project (Iwate Medical University) General Accounting | JP20km0105004 |
PUBLICATIONS
Title | DOI | Dataset ID | |
---|---|---|---|
1 | 3.5KJPNv2: an allele frequency panel of 3552 Japanese individuals including the X chromosome | doi: 10.1038/s41439-019-0059-5 | hum0015.v3.3.5kjpnv2.v1 |
2 |
USRES (Controlled-access Data)
Principal Investigator | Affiliation | Country/Region | Research Title | Data in Use (Dataset ID) | Period of Data Use |
---|---|---|---|---|---|