About NBDC Human Database
An enormous amount of human data is being generated with advances in next-generation sequencing and other analytical technologies. We therefore need rules and mechanisms for organizing and storing such data and for effectively utilizing them to make progress in the life sciences.
To promote sharing and utilization of human data while considering the protection of personal information, the Database Center for Life Science (DBCLS) of the Joint Support-Center for Data Science Research, Research Organization of Information and Systems (ROIS-DS) created a platform for sharing various data generated from human specimens, which are available for publicly access in cooperation with the DNA Data Bank of Japan.
You can apply to use or submit human data through this website.
Violators of the guidelines who have not submitted a report on the deletion of Controlled-access data shall be disclosed here.
The National Bioscience Database Center (NBDC), the Japan Science and Technology Agency (JST)
NBDC Data Sharing Subcommittee Office
NBDC Human Data Review Board Office
5-3, Yonbancho, Chiyoda-ku, Tokyo 102-0081, Japan
Tel. +81-3-5214-8491
Fax. +81-3-5214-8470
In JGA, most of the whole genome sequencing (WGS) data are registered in the FASTQ/BAM format, because of the versatility of the data. Accordingly, the data users have to download the WGS data, followed by data processing by themselves.
To improve the convenience of the data users, germline WGS data registered in JGA were processed in a certain workflow, and alignment results (CRAM), variant call results per sample (gVCF), and variant call results per dataset (aggregated VCF) were calculated. The post-processing data have been registered in the JGA linked to the original data, and the data users who were approved by the NBDC Data Access Committee can also download the post-processing data.
The germline WGS data were processed using the computational resources of the Personal Genome Analysis Section of the National Institute of Genetics Supercomputer System, and the JGA data analysis workflow developed by the Department of NBDC Program, Japan Science and Technology Agency. The analysis workflow was implemented based on the GATK best practice - Germline short variant discovery (SNPs + Indels) using the Common Workflow Language (CWL). The source code of the workflow is available on GitHub at https://github.com/ddbj/jga-analysis.
Perform JGA analysis per-sample workflow (version 1.0.0)
Perform JGA analysis QC (version 1.0.0).
Perform JGA analysis multi-samples workflow (version 1.0.0).
Figure: A flow of germline WGS data processing
For the convenience of data users, alignment data, variant call data, and statistical data, which are processed by a certain workflow on data deposited to the NBDC Human Database as controlled-access data (original data), can be used together with the original data if the user who is permitted to use the original data by the Human Data Review Board and wishes to use the processed data as well. The processed data are placed in a way that they are connected to the original data, and are indicated as "Processed by JGA" in the title of the Analysis and Dataset. When publishing analysis results including processed data, please include the accession number of the original data in the article.
- Whole genome sequencing analysis data (germline)
- Imputation reference panel
* In such data processing conducted by the DBCLS and the Bioinformation and DDBJ Center, the data will not be used for any purposes other than those of activities to promote the use of the NBDC human database.
JGA dataset | Study title | #. Samples | Date of data processing | Remarks |
---|---|---|---|---|
JGAD000252 | Cancer genomics for elucidation of molecular mechanisms of carcinogenecis and progression in lung cancer (hum0068) | 21 | Per-sample:2021-12-27 QC:2022-01-06 Joint-call:2022-01-26 |
This dataset includes the whole genome sequencing analysis (WGS) data of tumor and matched control samples. WGS data of the matched control (germline) were processed. Dataset ID of the processed data: JGAD000670 |
JGAD000235 | Genome sequencing analysis for colorectal cancer (hum0159) | 10 | Per-sample:2021-12-27 QC:2022-01-06 Joint-call:2022-01-31 |
This dataset includes WGS data of tumor and matched control samples. WGS data of the matched control (germline) were processed. Dataset ID of the processed data: JGAD000689 |
JGAD000234 | Genome sequencing analysis for hepatoblastoma (hum0161) | 33 | Per-sample:2021-12-27 QC:2022-01-06 Joint-call:2022-01-31 |
This dataset includes WGS data of tumor and matched control samples. WGS data of the matched control (germline) were processed. Dataset ID of the processed data: JGAD000688 |
JGAD000335 | Collection and transfer of human tumor samples and research using genomic information Transfer of existing samples and research using genomic information Reseach of searching gene mutations in gastrointestinal chronic inflammatory diseases (hum0201) |
14 | Per-sample:2021-12-27 QC:2022-01-06 Joint-call:2022-01-06 |
This dataset includes WGS data of tumor and matched control samples. WGS data of the matched control (germline) were processed. Dataset ID of the processed data: JGAD000687 |
JGAD000220 | Bio Bank Japan project (hum0014) | 1,026 | Per-sample:2022-01-16 QC:2022-02-24 |
Data processing was performed on the WGS data of 1,026 individuals registered to the Biobank Japan Project from FY 2003 to FY 2007. Dataset ID of the processed data: JGAD000690 |
JGAD000220 | Bio Bank Japan project (hum0014) | 1,026 | Joint-call: 2023-01-30 (autosome and chrX PAR regions; BOTHSEX), 2023-02-25 (chrX non-PAR regions; FEMALE samples), 2023-03-02 (chrXY non-PAR regions; MALE samples) |
Data processing was performed on the WGS data of 1,026 individuals registered to the Biobank Japan Project from FY 2003 to FY 2007. Joint-call was performed for genotype calls. Dataset ID of the processed data: JGAD000758 |
*In order to use the processed data, it is necessary to apply the application for data use of the original data (Dataset IDs are shown on the most left side of the above table).