NBDC Research ID: hum0495.v2

 

SUMMARY

Aims: Atrial fibrillation (AF) is common in older adults, and AF-associated ischemic stroke can lead to reduced quality of life or a bedridden state. This study will collect multi-layered data—including genetic, clinical, physiological, and electrocardiographic information—from AF patients and healthy controls, and will develop AI-based algorithms to stratify the risks of AF onset and stroke. Our goal is to establish a foundation applicable to primary screening through health checkups and IoT devices, personalized preemptive medicine, and drug discovery or new drug development.

Methods: [hum0495.v1.gwas.v1] We performed whole exome sequencing and processed the sequencing data according to the best practices described in the Genome Analysis Toolkit (GATK). We also performed gene-based association tests, specifically burden tests, sequence kernel association test (SKAT), and SKAT-O.

                 [JGAS000866] Genotyping was performed using SNP arrays, followed by imputation with the 1000 Genomes reference panel, and GWAS was performed using covariate-adjusted logistic regression.

Participants/Materials: 1,176 PAF patients and 1,172 non-PAF patients

 

Dataset IDType of DataCriteriaRelease Date
hum0495.v1.gwas.v1 GWAS for PAF using whole exome sequencing data Unrestricted-access 2025/02/13
JGAS000866 GWAS for PAF Controlled-access (Type I) 2026/02/27

*Release Note

* Data users need to apply an application for Using NBDC Human Data to reach the Controlled-access Data. Learn more

*When the research results including the data which were downloaded from NHA/DRA, are published or presented somewhere, the data user must refer the papers which are related to the data, or include in the acknowledgment. Learn more

 

MOLECULAR DATA

hum0495.v1.gwas.v1

Participants/Materials

PAF (ICD10: I48.0): 1,176 cases

non-PAF (control) : 1,172 individuals

Targets Exome / Genome wide SNPs
Target Loci for Capture Methods -
Platform Illumina [NovaSeq 6000]
Library Source DNA extracted from peripheral blood cells
Cell Lines -
Library Construction (kit name) SureSelectXT Kit
Fragmentation Methods Ultrasonic fragmentation (Covaris)
Spot Type Paired-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers) 150 bp
Genotype Call Methods (software) GATK HaplotypeCaller
Association Analysis & Meta Analysis (software) Burden, SKAT, SKATO (R:package SKAT)
Filtering Methods

Sample QC:

(1) Sample call rate < 0.97

(2) Samples with sex mismatches were excluded.

(3) One sample for each pair of second degree or closer relatives (kinship coefficient >0.088) was removed.

(4) Samples with outliers in sample size, heterozygosity and missing rates were excluded.

Variant QC:

(1) genotype quality >= 20

(2) depth >=10

(3) allele balance

(4) variant call rate >= 0.97

(5) Hardy‒Weinberg equilibrium P-values >1 × 10−8

(6) PCA

Marker Number (after QC) 518,621
NBDC Dataset ID

hum0495.v1.gwas.v1

(Click the gwas number to download files)

Dictionary file

Total Data Volume 822 KB (tsv)
Comments (Policies) NBDC policy

 

JGAS000866

Participants/Materials

PAF (ICD10: I48.0): 1,038 cases

non-PAF (control) : 744 individuals

Targets Genome wide SNPs
Target Loci for Capture Methods -
Platform Illumina [Infinium Asian Screening Array]
Source DNA extracted from peripheral blood cells
Cell Lines -
Reagents (Kit, Version) Infinium Asian Screening Array-24 v1.0 BeadChip
Genotype Call Methods (software)

genotyping: GenomeStudio

haplotype phasing: SHAPEIT2

imputation: Minimac3

Imputation reference: 1000 Genomes panel

Association Analysis (software) PLINK v1.9
Filtering Methods

Sample QC: We excluded samples with

(1) Sample call rate < 0.97

(2) Samples with sex mismatches

(3) excess heterozygosity > mean ± 3SD

(4) relatedness with PI_HAT > 0.185

(5) outlier samples from East Asian clusters in principal component analysis with 1000 Genomes Project samples.

Variant QC: We excluded variants with

(1) variant call rate < 0.95

(2) Hardy‒Weinberg equilibrium P < 1.0×10-6

(3) minor allele frequency < 0.01

(4) non- autosomal variants

Post-imputation QC: minor allele frequency < 0.01 and imputation score (Rsq) < 0.3

Marker Number (after QC) 8,094,202 SNVs
Japanese Genotype-phenotype Archive Dataset ID JGAD001009
Total Data Volume 971.9 MB (csv)
Comments (Policies) NBDC policy

 

DATA PROVIDER

Principal Investigator: Toshihiro Tanaka

Affiliation: Department of Human Genetics and Disease Diversity, Institute of Science Tokyo

Project / Group Name: -

Funds / Grants (Research Project Number):

Name Title Project Number
Project for Medical Device and Healthcare, Japan Agency for Medical Research and Development (AMED) Establishment of intelligent infrastructure of prevention and detection of atrial fibrillation JP21he2102002

 

PUBLICATIONS

Title DOIDataset ID
1 Rare genetic variants involved in increased risk of paroxysmal atrial fibrillation in a Japanese population doi: 10.1038/s41598-025-97794-7 hum0495.v1.gwas.v1
2

 

USRES (Controlled-access Data)

Principal InvestigatorAffiliationCountry/RegionResearch TitleData in Use (Dataset ID)Period of Data Use