NBDC Research ID: hum0214.v4

Click to Latest version.

 

SUMMARY

Aims: To elucidate the regulation of gene expression in each immune cell subset and its contribution to autoimmune diseases.

Methods:

JGAS000220: Various immune cell subsets from 21 systemic sclerosis patients, 26 ANCA associated vasculitis, and 28 healthy controls were collected (Naive_B, SM_B, USM_B, DN_B, Plasmablast, Th1, Th2, Th17, Tfh, Naive_CD4, Mem_CD4, Fr._II_eTreg, Naive_CD8, Mem_CD8, mDC, pDC, CD16p_Mono, CD16n_Mono, NK, Neu) and total RNAs were extracted from each subset. RNA-seq was performed for each sample.

E-GEAD-397 / E-GEAD-398 / E-GEAD-420: Whole blood and 28 immune cell subsets from study population were collected (Naive_CD4, Mem_CD4, Fr._I_nTreg, Fr._II_eTreg, Fr._III_T, Th1, Th2, Th17, Tfh, NK, Naive_CD8, Mem_CD8, EM_CD8, CM_CD8, TEMRA_CD8, Naive_B, USM_B, SM_B, DN_B, Plasmablast, CL_Mono (or CD16n_Mono), CD16p_Mono, Int_Mono, NC_Mono, mDC, pDC, LDG, Neu). Whole genome sequencing was performed with whole blood samples. RNA-seq was performed with each immune cell subset samples. After filtering and normalization of the gene expression data, eQTL analysis was performed in each immune cell type.

JGAS000296: 24 peripheral blood immune cell subsets from 50 systemic sclerosis patients and 48 healthy controls were collected (Naive_CD4, Mem_CD4, Fr._I_nTreg, Fr._II_eTreg, Fr._III_T, Th1, Th2, Th17, Tfh, NK, Naive_CD8, EM_CD8, CM_CD8, TEMRA_CD8, Naive_B, USM_B, SM_B, DN_B, Plasmablast, CL_Mono, Int_Mono, NC_Mono, mDC, pDC). RNA-seq was performed with each immune cell subset samples. After gene expression quantification samples were filtered.

Participants/Materials: Systemic Sclerosis, Systemic Lupus Erythematosus, Myositis, Mixed Connective Tissue Disease, Sjögren’s Syndrome, Rheumatoid Arthritis, Behçet’s Disease, Adult Onset Still’s Disease, ANCA-associated Vasculitis, Takayasu’s Arteritis, healthy individuals

URL: https://www.h.u-tokyo.ac.jp/english/centers-services/clinical-divisions/allergy-and-rheumatology/index.html

 

Data Set IDType of DataCriteriaRelease Date
JGAS000220 NGS (RNA-seq: Systemic sclerosis) Controlled Access (Type I) 2020/10/09
JGAS000220 NGS (RNA-seq: ANCA-associated Vasculitis) Controlled Access (Type I) 2021/03/05
E-GEAD-397 Read count data from RNA-seq Un-restricted Access 2021/04/28
E-GEAD-398 Conditional eQTL summary data (significant associations) Un-restricted Access 2021/04/28
E-GEAD-420 Nominal eQTL data (including non-significant associations) Un-restricted Access 2021/04/28
JGAS000296 NGS (RNA-seq) Controlled Access (Type I) 2022/01/21

*Release Note

*When the research results including the data which were downloaded from NHA/DRA, are published or presented somewhere, the data user must refer the papers which are related to the data, or include in the acknowledgment. Learn more

*Data users need to apply an application for Using NBDC Human Data to reach the Controlled-access Data. Learn more

 

MOLECULAR DATA

JGAS000220 (RNA-seq: Systemic sclerosis)

Participants/Materials

Systemic Sclerosis (ICD10: M340): 21 cases

13 healthy controls

Targets RNA-seq
Target Loci for Capture Methods -
Platform Illumina [HiSeq 2500]
Library Source Total RNAs extracted from 19 immune cell subsets (Naive_B, SM_B, USM_B, DN_B, Plasmablast, Th1, Th2, Th17, Tfh, Naive_CD4, Mem_CD4, Fr._II_eTreg, Naive_CD8, Mem_CD8, mDC, pDC, CD16p_Mono, CD16n_Mono, NK)
Cell Lines -
Library Construction (kit name) SMART-seq v4 Ultra Low Input RNA Kit
Fragmentation Methods SMART-seq v4 Ultra Low Input RNA Kit
Spot Type Paired-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers) 100 bp
Mapping Methods STAR (hg38)
QC Methods The adaptor sequences and 3’ low quality bases (Phred quality score < 20) were trimmed. Short reads (< 50bp) and reads containing many low quality bases (Phred quality score < 20 in > 20% of the bases) were removed. If the uniquely mapped rate was less than 80%, or the number of uniquely mapped reads was 5.00 x 106 reads, the sample was removed before further analysis. The correlation coefficient of the expression data between two samples belonging to the same cell subset and calculated the average of the correlation coefficient (Di). Samples for which Di was less than 0.9 were removed.
Gene Number 26353
Japanese Genotype-phenotype Archive Data set ID JGAD000309
Total Data Volume 125 MB (count data, txt)
Comments (Policies) NBDC policy

 

JGAS000220 (RNA-seq: ANCA-associated Vasculitis)

Participants/Materials

ANCA-associated Vasculitis (ICD10: M318): 26 cases

28 healthy controls (including above 13 individuals)

Targets RNA-seq
Target Loci for Capture Methods -
Platform Illumina [HiSeq 2500]
Library Source Total RNAs extracted from 20 immune cell subsets (Naive_B, SM_B, USM_B, DN_B, Plasmablast, Th1, Th2, Th17, Tfh, Naive_CD4, Mem_CD4, Fr._II_eTreg, Naive_CD8, Mem_CD8, mDC, pDC, CD16p_Mono, CD16n_Mono, NK, Neu)
Cell Lines -
Library Construction (kit name) SMART-seq v4 Ultra Low Input RNA Kit
Fragmentation Methods SMART-seq v4 Ultra Low Input RNA Kit
Spot Type Paired-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers) 100 bp
Mapping Methods STAR (hg38)
QC Methods The adaptor sequences and 3’ low quality bases (Phred quality score < 20) were trimmed. Short reads (< 50bp) and reads containing many low quality bases (Phred quality score < 20 in > 20% of the bases) were removed. If the uniquely mapped rate was less than 80%, or the number of uniquely mapped reads was 5.00 x 106 reads, the sample was removed before further analysis. The correlation coefficient of the expression data between two samples belonging to the same cell subset and calculated the average of the correlation coefficient (Di). Samples for which Di was less than 0.9 were removed.
Gene Number 26353
Japanese Genotype-phenotype Archive Data set ID JGAD000310
Total Data Volume 125 MB (count data, txt)
Comments (Policies) NBDC policy

 

E-GEAD-397 (RNA-seq)

Participants/Materials

Systemic Lupus Erythematosus (ICD10: M329): 62 cases

Myositis (ICD10: M339, M332): 65 cases

Systemic Sclerosis (ICD10: M340): 67 cases

Mixed Connective Tissue Disease (ICD10: M351): 19 cases

Sjögren’s Syndrome (ICD10: M350): 18 cases

Rheumatoid Arthritis (ICD10: M0690): 25 cases

Behçet’s Disease (ICD10: M352): 23 cases

Adult Onset Still’s Disease (ICD10: M0610): 18 cases

ANCA-associated Vasculitis (ICD10: M318: 26 cases

Takayasu’s Arteritis (ICD10: M314): 16 cases

92 healthy controls

Targets RNA-seq
Target Loci for Capture Methods -
Platform Illumina [HiSeq 2500]
Library Source Total RNAs extracted from 28 immune cell subsets (Naive_CD4, Mem_CD4, Fr._I_nTreg, Fr._II_eTreg, Fr._III_T, Th1, Th2, Th17, Tfh, NK, Naive_CD8, Mem_CD8, EM_CD8, CM_CD8, TEMRA_CD8, Naive_B, USM_B, SM_B, DN_B, Plasmablast, CL_Mono (or CD16n_Mono), CD16p_Mono, Int_Mono, NC_Mono, mDC, pDC, LDG, Neu)
Cell Lines -
Library Construction (kit name) SMART-seq v4 Ultra Low Input RNA Kit
Fragmentation Methods SMART-seq v4 Ultra Low Input RNA Kit
Spot Type Paired-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers) 100 bp
Mapping Methods STAR (GRCh38)
QC Methods From sequenced reads, adaptor sequences were trimmed using cutadapt (v1.16). In addition, 3′- ends with low-quality bases (Phred quality score < 20) were trimmed using the fastx-toolkit (v0.0.14). Reads containing more than 20% low-quality bases were removed. Subsequently, reads were aligned against the GRCh38 reference sequence using STAR (v2.5.3) in two-pass mode with Gencode version 27 annotations. We excluded samples with uniquely mapped read rates < 90% (with the exception of < 70% for plasmablasts and <85% for the other B cell subsets) or unique read counts < 6 × 10^6. Expression was quantified using HTSeq (v 0.11.2.). For QC of the expression data, in each cell population, we filtered low count genes (< 10 in > 90% of samples), normalized between samples with a trimmed mean of M values (TMM) implemented in edgeR software, converted to log-transformed count per million (CPM), removed batch effects using ComBat software and computed inter-sample Spearman’s correlations of expression levels between each sample and the remaining samples from the same cell subset. We excluded samples with mean correlation coefficients less than 0.9.
Gene Number 53344
Genomic Expression Archive Data set ID E-GEAD-397
Total Data Volume 1.3 GB (clinical data and count data of autosomal genes, txt)
Comments (Policies) NBDC policy

 

E-GEAD-398 / E-GEAD-420 (eQTL)

Participants/Materials

Systemic Lupus Erythematosus (ICD10: M329): 62 cases

Myositis (ICD10: M339, M332): 65 cases

Systemic Sclerosis (ICD10: M340): 67 cases

Mixed Connective Tissue Disease (ICD10: M351): 19 cases

Sjögren’s Syndrome (ICD10: M350): 18 cases

Rheumatoid Arthritis (ICD10: M0690): 24 cases

Behçet’s Disease (ICD10: M352): 23 cases

Adult Onset Still’s Disease (ICD10: M0610): 18 cases

ANCA-associated Vasculitis (ICD10: M318: 25 cases

Takayasu’s Arteritis (ICD10: M314): 16 cases

79 healthy controls

Targets eQTL
Target Loci for Capture Methods -
Platform Illumina [HiSeq X Ten]
Library Source DNAs extracted from whole blood
Cell Lines -
Library Construction (kit name) TruSeq DNA PCR-Free Library prep kit
Fragmentation Methods TruSeq DNA PCR-Free Library prep kit
Spot Type Paired-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers) 151 bp
Mapping Methods BWA-MEM(GRCh38)
QC Methods WGS data processing was performed based on the standardized best-practice method proposed by GATK (v 4.0.6.0). Samples with genotyping call rates < 99% were removed. We used BEAGLE (v 5.1) to impute missing genotypes. Variants with call rate < 85%, HWE P-value < 1.0 x 10-6 or minor allele frequency < 1% were excluded.
Gene Number

E-GEAD-398: 19441

E-GEAD-420: 22381

Detection method of eQTL Genes expressed at low levels (< 5 count in more than 80% samples or < 0.5 CPM in more than 80% samples) were filtered out in each cell subset. The residual autosomal expression data were normalized between samples with TMM, converted to CPM and then normalized across samples using an inverse normal transform. A Probabilistic Estimation of Expression Residuals (PEER) method was applied to normalized expression data to infer hidden covariates. The top 2 genetic principal components, sample collection phase, clinical diagnosis, sex and latent factors were utilized as covariates for eQTL analysis. Mem CD8s, which were collected in Phase1 and divided into CM CD8 and EM CD8 in Phase2, were analyzed jointly with EM CD8 for eQTL analysis because the majority of the Mem CD8 population consisted of EM CD8. For each cell subset conditional eQTL analysis, we used a QTLtools permutation pass with 10,000 permutations to obtain gene-level nominal P value thresholds corresponding to FDR < 0.05. We subsequently performed forward-backward stepwise regression eQTL analysis with a QTLtools conditional pass. For nominal eQTL analysis, we used a QTLtools nominal pass and tested for the association of the variants located within 1Mbp from the TSS of the genes.
Genomic Expression Archive Data set ID

E-GEAD-398

E-GEAD-420 (2021/6/9: Added columns for REF/ALT allele. Slope indicates the effect size of alternative alleles.)

Total Data Volume

E-GEAD-398: 3.9 GB (conditional eQTL summary data [FDR<0.05], txt)

E-GEAD-420: 38 GB (nominal eQTL data [full], txt)

Comments (Policies) NBDC policy

 

JGAS000296 (RNA-seq)

Participants/Materials

Systemic Sclerosis (ICD10: M340): 50 cases

48 healthy controls

(including same cases in E-GEAD-397)

Targets RNA-seq
Target Loci for Capture Methods -
Platform Illumina [HiSeq 2500]
Library Source Total RNAs extracted from 24 immune cell subsets (Naive_CD4, Mem_CD4, Fr._I_nTreg, Fr._II_eTreg, Fr._III_T, Th1, Th2, Th17, Tfh, NK, Naive_CD8, EM_CD8, CM_CD8, TEMRA_CD8, Naive_B, USM_B, SM_B, DN_B, Plasmablast, CL_Mono, Int_Mono, NC_Mono, mDC, pDC)
Cell Lines -
Library Construction (kit name) SMART-seq v4 Ultra Low Input RNA Kit
Fragmentation Methods SMART-seq v4 Ultra Low Input RNA Kit
Spot Type Paired-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers) 100 bp
Mapping Methods STAR (hg38)
QC Methods The adaptor sequences and 3’ low quality bases (Phred quality score < 20) were trimmed. Short reads (< 50bp) and reads containing many low quality bases (Phred quality score < 20 in > 20% of the bases) were removed. If the uniquely mapped rate was less than 80%, or the number of uniquely mapped reads was 5.00 x 106 reads, the sample was removed before further analysis. The correlation coefficient of the expression data between two samples belonging to the same cell subset and calculated the average of the correlation coefficient (Di). Samples for which Di was less than 0.9 were removed.
Gene Number 26353
Japanese Genotype-phenotype Archive Data set ID JGAD000309
Total Data Volume 172.8 MB (count data, txt)
Comments (Policies) NBDC policy

 

DATA PROVIDER

Principal Investigator: Keishi Fujio

Affiliation: Department of Allergy and Rheumatology, Graduate School of Medicine, The University of Tokyo

Project / Group Name: Immune cell multi-omics analysis of immune-mediated diseases

URL: https://www.h.u-tokyo.ac.jp/english/centers-services/clinical-divisions/allergy-and-rheumatology/index.html

Funds / Grants (Research Project Number):

NameTitleProject Number
Collaborative research fund with Chugai Pharmaceutical Co., Ltd. - -

 

PUBLICATIONS

TitleDOIData Set ID
1 Integrated bulk and single-cell RNA-sequencing identified disease-relevant monocytes and a gene network module underlying systemic sclerosis doi: 10.1016/j.jaut.2020.102547

JGAD000309

E-GEAD-344

2 Identifying the most influential gene expression profile in distinguishing ANCA-associated vasculitis from healthy controls doi: 10.1016/j.jaut.2021.102617 JGAD000310
3 Dynamic landscape of immune cell-specific gene regulation in immune-mediated diseases doi: 10.1016/j.cell.2021.03.056

E-GEAD-397

E-GEAD-398

E-GEAD-420

4 Dysregulation of the gene signature of effector regulatory T cells in the early phase of systemic sclerosis doi: 10.1093/rheumatology/keac031 JGAD000406

 

USRES (Controlled-Access Data)

Principal InvestigatorAffiliationResearch TitleData in Use (Data Set ID)Period of Data Use