NBDC Research ID: hum0364.v1

 

SUMMARY

Aims: Next generation sequencing technologies (NGS) enable whole genome sequencing (WGS) analysis. In the past decade, applications of NGS and the development of algorithms for the analysis have successfully identified various types of genetic variations. However, due to the short read length and the high sequencing error rate in repeat regions, the identification of mutations and polymorphisms in microsatellite (MS) regions has been difficult. In this study, we analyzed approximately nine million MS regions using a previously developed MS caller (MIVcall) for three large publicly available human genome sequencing data sets: Simons Genome Diversity Project (SGDP), and Human Genome Diversity Project (HGDP).

Methods: MS regions were detected by MIVcall method

Participants/Materials: Publicly available whole genome sequencing data of healthy individuals were downloaded and used for MS analysis. After evaluating the quality of the data, we analyzed 276 samples from SGDP, 693 from HGDP, and one Jomon Individual (F23).

Dataset IDType of DataCriteriaRelease Date
hum0364.v1.wgs-ms.v1 NGS (WGS) Unrestricted-access 2022/08/29

*Release Note

When the research results including the data which were downloaded from NHA/DRA/JGA, are published or presented somewhere, the data user must refer the papers which are related to the data, or include in the acknowledgment. Learn more

 

MOLECULAR DATA

hum0364.v1.wgs-ms.v1

Participants/Materials

Publicly available WGS data of healthy individuals

   SGDP: 276 samples

   HGDP: 693 samples

   Jomon people: 1 sample

Targets WGS
Target Loci for Capture Methods -
Platform -
Library Source Publicly available WGS data: SGDP, HGDP, Jomon
Cell Lines -
Library Construction (kit name) -
Fragmentation Methods -
Spot Type -
Read Length (without Barcodes, Adaptors, Primers, and Linkers) -
QC/Filtering Methods MS covered by less than ten reads were considered insufficient depth, and we excluded samples if more than 4% had insufficient depth of MS.
Deduplication Bam files generated by SGDP and HGDP project were used in this study.
Mapping Methods Bam files generated by SGDP and HGDP project were used in this study.
Reference Genome Sequence GRCh37
Coverage (Depth) The average numbers of reads of MS were 31.7 in HGDP and SGDP combined, and 27.1 in F23.
Detecting Methods for Variation MIVcall method
Detecting Methods for Structural Variation -
SNV Numbers (after QC) -
SV Numbers (after QC) 733900 (Polymorphic MS in the autosome)
NBDC Dataset ID

hum0364.v1.wgs-ms.v1

(Click the Dataset ID to download the file)

README

Total Data Volume 9.37 GB (vcf)
Comments (Policies) NBDC policy

 

DATA PROVIDER

Principal Investigator: Akihiro Fujimoto

Affiliation: School of Integrated Health Sciences, Faculty of Medicine, The University of Tokyo

Project / Group Name: -

URL: http://www.humgenet.m.u-tokyo.ac.jp/index.en.html

Funds / Grants (Research Project Number):

Name Title Project Number
KAKENHI Grant-in-Aid for Scientific Research on Innovative Areas (Research in a proposed research area) Deciphering Origin and Establishment of Japonesians mainly based on genome sequence data 18H05511
Grant-in-Aid for Scientific Research on Innovative Areas from Japan Society for the Promotion of Science (JSPS) Comprehensive analysis of microsatellite polymorphism in the human population 18H02680
Platform Program for Promotion of Genome Medicine, Japan Agency for Medical Research and Development (AMED) Development of advanced data analysis methods for genome sequencing JP20km0405207

 

PUBLICATIONS

Title DOIDataset ID
1