BLAST information

Overall description:

To obtain information on the probe sequences contained on the GWAS array, we used BLAST.
The uploaded files were generated for the genotype data of the Biobank Japan (Study ID: JGAS00000000114).
These results will help to convert genotypes into forward strand.

Methodological description:

Version of blast: 2.2.26
Version of reference: GRCh37.3

We run blast using the following command:

blastall \
-F F \
-i ${TopGenomicSequence_in_manifest_file} \
-p blastn \
-m 0 \
-e 1e-40 \
-d ${ncbi_build37.3} \
-o ${result}

After that, we excluded the information by the following criteria:

  1. Number of mismatch alleles > 3.
  2. Number of insertion or deletion sequence between probe and reference sequence > 3.
  3. Difference in length between probe and alinged sequence > 3.

Uploaded files:

File name No. of variants
HumanOmniExpressExome-8v1_A_FullBlastInformation.txt 951,116
HumanOmniExpressExome-8v1-2_A_FullBlastInformation.txt 964,193
HumanOmniExpress-12v1_J_FullBlastInformation.txt 731,442
HumanExome-12v1_A_FullBlastInformation.txt 247,870
HumanExome-12v1-1_A_FullBlastInformation.txt 242,901

Column:

No. of column column description
1 #ID Variant ID in illumina manifest file
2 MapType Please see the descriptions shown in the table below
3 BlastChr Chromosome estimated by blast
4 BlastPos Chromosomal position estimated by blast
5 IlmnAllele1 Allele1 in illumina manifest file (shown as ‘A’ in FinalReport)
6 IlmnAllele2 Allele2 in illumina manifest file (shown as ‘B’ in FinalReport)
7 ForwardAllele1 Allele1 estimated by blast (forward strand of the reference, shown as ‘A’ in FinalReport)
8 ForwardAllele2 Allele2 estimated by blast (forward strand of the reference, shown as ‘B’ in FinalReport)
9 NewID Variant ID in database (1000 genome project phase3 or dbSNP144)
10 Source Source of NewID (1000 genome project phase3 or dbSNP144)

Categorization of mapping status (MapType):

We defined the result of BLAST as follows:

MapType defenition
unmap Probe sequence was not aligned to the reference sequence
multimap Probe sequence was aligned to the refernce sequence twice or more
singlemap Probe sequence was uniquely aligned to the reference sequence

Link:

Japan Genotype-phenotype Archive (JGA)
National Bioscience Database Center (NBDC)
The Biobank Japan
RIKEN Center for Integrative Medical Sciences
BLAST