NBDC Research ID: hum0214.v4

SUMMARY

Aims: To elucidate the regulation of gene expression in each immune cell subset and its contribution to autoimmune diseases.

Methods:

JGAS000220: Various immune cell subsets from 21 systemic sclerosis patients, 26 ANCA associated vasculitis, and 28 healthy controls were collected (Naive_B, SM_B, USM_B, DN_B, Plasmablast, Th1, Th2, Th17, Tfh, Naive_CD4, Mem_CD4, Fr._II_eTreg, Naive_CD8, Mem_CD8, mDC, pDC, CD16p_Mono, CD16n_Mono, NK, Neu) and total RNAs were extracted from each subset. RNA-seq was performed for each sample.

E-GEAD-397 / E-GEAD-398 / E-GEAD-420: Whole blood and 28 immune cell subsets from study population were collected (Naive_CD4, Mem_CD4, Fr._I_nTreg, Fr._II_eTreg, Fr._III_T, Th1, Th2, Th17, Tfh, NK, Naive_CD8, Mem_CD8, EM_CD8, CM_CD8, TEMRA_CD8, Naive_B, USM_B, SM_B, DN_B, Plasmablast, CL_Mono (or CD16n_Mono), CD16p_Mono, Int_Mono, NC_Mono, mDC, pDC, LDG, Neu). Whole genome sequencing was performed with whole blood samples. RNA-seq was performed with each immune cell subset samples. After filtering and normalization of the gene expression data, eQTL analysis was performed in each immune cell type.

JGAS000296: 24 peripheral blood immune cell subsets from 50 systemic sclerosis patients and 48 healthy controls were collected (Naive_CD4, Mem_CD4, Fr._I_nTreg, Fr._II_eTreg, Fr._III_T, Th1, Th2, Th17, Tfh, NK, Naive_CD8, EM_CD8, CM_CD8, TEMRA_CD8, Naive_B, USM_B, SM_B, DN_B, Plasmablast, CL_Mono, Int_Mono, NC_Mono, mDC, pDC). RNA-seq was performed with each immune cell subset samples. After gene expression quantification samples were filtered.

Participants/Materials: Systemic Sclerosis, Systemic Lupus Erythematosus, Myositis, Mixed Connective Tissue Disease, Sjögren’s Syndrome, Rheumatoid Arthritis, Behçet’s Disease, Adult Onset Still’s Disease, ANCA-associated Vasculitis, Takayasu’s Arteritis, healthy individuals

URL： https://www.h.u-tokyo.ac.jp/english/centers-services/clinical-divisions/allergy-and-rheumatology/index.html

Dataset ID	Type of Data	Criteria	Release Date
JGAS000220	NGS (RNA-seq: Systemic sclerosis)	Controlled Access (Type I)	2020/10/09
JGAS000220	NGS (RNA-seq: ANCA-associated Vasculitis)	Controlled Access (Type I)	2021/03/05
E-GEAD-397	Read count data from RNA-seq	Un-restricted Access	2021/04/28
E-GEAD-398	Conditional eQTL summary data (significant associations)	Un-restricted Access	2021/04/28
E-GEAD-420	Nominal eQTL data (including non-significant associations)	Un-restricted Access	2021/04/28
JGAS000296	NGS (RNA-seq)	Controlled Access (Type I)	2022/01/21

*Release Note

*When the research results including the data which were downloaded from NHA/DRA, are published or presented somewhere, the data user must refer the papers which are related to the data, or include in the acknowledgment. Learn more

*Data users need to apply an application for Using NBDC Human Data to reach the Controlled-access Data. Learn more

MOLECULAR DATA

JGAS000220 (RNA-seq: Systemic sclerosis)


Participants/Materials	Systemic Sclerosis (ICD10: M340): 21 cases 13 healthy controls
Targets	RNA-seq
Target Loci for Capture Methods	-
Platform	Illumina [HiSeq 2500]
Library Source	Total RNAs extracted from 19 immune cell subsets (Naive_B, SM_B, USM_B, DN_B, Plasmablast, Th1, Th2, Th17, Tfh, Naive_CD4, Mem_CD4, Fr._II_eTreg, Naive_CD8, Mem_CD8, mDC, pDC, CD16p_Mono, CD16n_Mono, NK)
Cell Lines	-
Library Construction (kit name)	SMART-seq v4 Ultra Low Input RNA Kit
Fragmentation Methods	SMART-seq v4 Ultra Low Input RNA Kit
Spot Type	Paired-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers)	100 bp
Mapping Methods	STAR (hg38)
QC Methods	The adaptor sequences and 3’ low quality bases (Phred quality score < 20) were trimmed. Short reads (< 50bp) and reads containing many low quality bases (Phred quality score < 20 in > 20% of the bases) were removed. If the uniquely mapped rate was less than 80%, or the number of uniquely mapped reads was 5.00 x 106 reads, the sample was removed before further analysis. The correlation coefficient of the expression data between two samples belonging to the same cell subset and calculated the average of the correlation coefficient (Di). Samples for which Di was less than 0.9 were removed.
Gene Number	26353
Japanese Genotype-phenotype Archive Dataset ID	JGAD000309
Total Data Volume	125 MB (count data, txt)
Comments (Policies)	NBDC policy

JGAS000220 (RNA-seq: ANCA-associated Vasculitis)


Participants/Materials	ANCA-associated Vasculitis (ICD10: M318): 26 cases 28 healthy controls (including above 13 individuals)
Targets	RNA-seq
Target Loci for Capture Methods	-
Platform	Illumina [HiSeq 2500]
Library Source	Total RNAs extracted from 20 immune cell subsets (Naive_B, SM_B, USM_B, DN_B, Plasmablast, Th1, Th2, Th17, Tfh, Naive_CD4, Mem_CD4, Fr._II_eTreg, Naive_CD8, Mem_CD8, mDC, pDC, CD16p_Mono, CD16n_Mono, NK, Neu)
Cell Lines	-
Library Construction (kit name)	SMART-seq v4 Ultra Low Input RNA Kit
Fragmentation Methods	SMART-seq v4 Ultra Low Input RNA Kit
Spot Type	Paired-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers)	100 bp
Mapping Methods	STAR (hg38)
QC Methods	The adaptor sequences and 3’ low quality bases (Phred quality score < 20) were trimmed. Short reads (< 50bp) and reads containing many low quality bases (Phred quality score < 20 in > 20% of the bases) were removed. If the uniquely mapped rate was less than 80%, or the number of uniquely mapped reads was 5.00 x 106 reads, the sample was removed before further analysis. The correlation coefficient of the expression data between two samples belonging to the same cell subset and calculated the average of the correlation coefficient (Di). Samples for which Di was less than 0.9 were removed.
Gene Number	26353
Japanese Genotype-phenotype Archive Dataset ID	JGAD000310
Total Data Volume	125 MB (count data, txt)
Comments (Policies)	NBDC policy

E-GEAD-397 (RNA-seq)


Participants/Materials	Systemic Lupus Erythematosus (ICD10: M329): 62 cases Myositis (ICD10: M339, M332): 65 cases Systemic Sclerosis (ICD10: M340): 67 cases Mixed Connective Tissue Disease (ICD10: M351): 19 cases Sjögren’s Syndrome (ICD10: M350): 18 cases Rheumatoid Arthritis (ICD10: M0690): 25 cases Behçet’s Disease (ICD10: M352): 23 cases Adult Onset Still’s Disease (ICD10: M0610): 18 cases ANCA-associated Vasculitis (ICD10: M318: 26 cases Takayasu’s Arteritis (ICD10: M314): 16 cases 92 healthy controls
Targets	RNA-seq
Target Loci for Capture Methods	-
Platform	Illumina [HiSeq 2500]
Library Source	Total RNAs extracted from 28 immune cell subsets (Naive_CD4, Mem_CD4, Fr._I_nTreg, Fr._II_eTreg, Fr._III_T, Th1, Th2, Th17, Tfh, NK, Naive_CD8, Mem_CD8, EM_CD8, CM_CD8, TEMRA_CD8, Naive_B, USM_B, SM_B, DN_B, Plasmablast, CL_Mono (or CD16n_Mono), CD16p_Mono, Int_Mono, NC_Mono, mDC, pDC, LDG, Neu)
Cell Lines	-
Library Construction (kit name)	SMART-seq v4 Ultra Low Input RNA Kit
Fragmentation Methods	SMART-seq v4 Ultra Low Input RNA Kit
Spot Type	Paired-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers)	100 bp
Mapping Methods	STAR (GRCh38)
QC Methods	From sequenced reads, adaptor sequences were trimmed using cutadapt (v1.16). In addition, 3′- ends with low-quality bases (Phred quality score < 20) were trimmed using the fastx-toolkit (v0.0.14). Reads containing more than 20% low-quality bases were removed. Subsequently, reads were aligned against the GRCh38 reference sequence using STAR (v2.5.3) in two-pass mode with Gencode version 27 annotations. We excluded samples with uniquely mapped read rates < 90% (with the exception of < 70% for plasmablasts and <85% for the other B cell subsets) or unique read counts < 6 × 10^6. Expression was quantified using HTSeq (v 0.11.2.). For QC of the expression data, in each cell population, we filtered low count genes (< 10 in > 90% of samples), normalized between samples with a trimmed mean of M values (TMM) implemented in edgeR software, converted to log-transformed count per million (CPM), removed batch effects using ComBat software and computed inter-sample Spearman’s correlations of expression levels between each sample and the remaining samples from the same cell subset. We excluded samples with mean correlation coefficients less than 0.9.
Gene Number	53344
Genomic Expression Archive Dataset ID	E-GEAD-397
Total Data Volume	1.3 GB (clinical data and count data of autosomal genes, txt)
Comments (Policies)	NBDC policy

E-GEAD-398 / E-GEAD-420 (eQTL)


Participants/Materials	Systemic Lupus Erythematosus (ICD10: M329): 62 cases Myositis (ICD10: M339, M332): 65 cases Systemic Sclerosis (ICD10: M340): 67 cases Mixed Connective Tissue Disease (ICD10: M351): 19 cases Sjögren’s Syndrome (ICD10: M350): 18 cases Rheumatoid Arthritis (ICD10: M0690): 24 cases Behçet’s Disease (ICD10: M352): 23 cases Adult Onset Still’s Disease (ICD10: M0610): 18 cases ANCA-associated Vasculitis (ICD10: M318: 25 cases Takayasu’s Arteritis (ICD10: M314): 16 cases 79 healthy controls
Targets	eQTL
Target Loci for Capture Methods	-
Platform	Illumina [HiSeq X Ten]
Library Source	DNAs extracted from whole blood
Cell Lines	-
Library Construction (kit name)	TruSeq DNA PCR-Free Library prep kit
Fragmentation Methods	TruSeq DNA PCR-Free Library prep kit
Spot Type	Paired-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers)	151 bp
Mapping Methods	BWA-MEM(GRCh38)
QC Methods	WGS data processing was performed based on the standardized best-practice method proposed by GATK (v 4.0.6.0). Samples with genotyping call rates < 99% were removed. We used BEAGLE (v 5.1) to impute missing genotypes. Variants with call rate < 85%, HWE P-value < 1.0 x 10-6 or minor allele frequency < 1% were excluded.
Gene Number	E-GEAD-398: 19441 E-GEAD-420: 22381
Detection method of eQTL	Genes expressed at low levels (< 5 count in more than 80% samples or < 0.5 CPM in more than 80% samples) were filtered out in each cell subset. The residual autosomal expression data were normalized between samples with TMM, converted to CPM and then normalized across samples using an inverse normal transform. A Probabilistic Estimation of Expression Residuals (PEER) method was applied to normalized expression data to infer hidden covariates. The top 2 genetic principal components, sample collection phase, clinical diagnosis, sex and latent factors were utilized as covariates for eQTL analysis. Mem CD8s, which were collected in Phase1 and divided into CM CD8 and EM CD8 in Phase2, were analyzed jointly with EM CD8 for eQTL analysis because the majority of the Mem CD8 population consisted of EM CD8. For each cell subset conditional eQTL analysis, we used a QTLtools permutation pass with 10,000 permutations to obtain gene-level nominal P value thresholds corresponding to FDR < 0.05. We subsequently performed forward-backward stepwise regression eQTL analysis with a QTLtools conditional pass. For nominal eQTL analysis, we used a QTLtools nominal pass and tested for the association of the variants located within 1Mbp from the TSS of the genes.
Genomic Expression Archive Dataset ID	E-GEAD-398 E-GEAD-420 (2021/6/9: Added columns for REF/ALT allele. Slope indicates the effect size of alternative alleles.)
Total Data Volume	E-GEAD-398: 3.9 GB (conditional eQTL summary data [FDR<0.05], txt) E-GEAD-420: 38 GB (nominal eQTL data [full], txt)
Comments (Policies)	NBDC policy

JGAS000296 (RNA-seq)


Participants/Materials	Systemic Sclerosis (ICD10: M340): 50 cases 48 healthy controls (including same cases in E-GEAD-397)
Targets	RNA-seq
Target Loci for Capture Methods	-
Platform	Illumina [HiSeq 2500]
Library Source	Total RNAs extracted from 24 immune cell subsets (Naive_CD4, Mem_CD4, Fr._I_nTreg, Fr._II_eTreg, Fr._III_T, Th1, Th2, Th17, Tfh, NK, Naive_CD8, EM_CD8, CM_CD8, TEMRA_CD8, Naive_B, USM_B, SM_B, DN_B, Plasmablast, CL_Mono, Int_Mono, NC_Mono, mDC, pDC)
Cell Lines	-
Library Construction (kit name)	SMART-seq v4 Ultra Low Input RNA Kit
Fragmentation Methods	SMART-seq v4 Ultra Low Input RNA Kit
Spot Type	Paired-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers)	100 bp
Mapping Methods	STAR (hg38)
QC Methods	The adaptor sequences and 3’ low quality bases (Phred quality score < 20) were trimmed. Short reads (< 50bp) and reads containing many low quality bases (Phred quality score < 20 in > 20% of the bases) were removed. If the uniquely mapped rate was less than 80%, or the number of uniquely mapped reads was 5.00 x 106 reads, the sample was removed before further analysis. The correlation coefficient of the expression data between two samples belonging to the same cell subset and calculated the average of the correlation coefficient (Di). Samples for which Di was less than 0.9 were removed.
Gene Number	26353
Japanese Genotype-phenotype Archive Dataset ID	JGAD000406
Total Data Volume	172.8 MB (count data, txt)
Comments (Policies)	NBDC policy

DATA PROVIDER

Principal Investigator: Keishi Fujio

Affiliation: Department of Allergy and Rheumatology, Graduate School of Medicine, The University of Tokyo

Project / Group Name: Immune cell multi-omics analysis of immune-mediated diseases

URL： https://www.h.u-tokyo.ac.jp/english/centers-services/clinical-divisions/allergy-and-rheumatology/index.html

Funds / Grants (Research Project Number):

Name	Title	Project Number
Collaborative research fund with Chugai Pharmaceutical Co., Ltd.	-	-

PUBLICATIONS

	Title	DOI	Dataset ID
1	Integrated bulk and single-cell RNA-sequencing identified disease-relevant monocytes and a gene network module underlying systemic sclerosis	doi: 10.1016/j.jaut.2020.102547	JGAD000309 E-GEAD-344
2	Identifying the most influential gene expression profile in distinguishing ANCA-associated vasculitis from healthy controls	doi: 10.1016/j.jaut.2021.102617	JGAD000310
3	Dynamic landscape of immune cell-specific gene regulation in immune-mediated diseases	doi: 10.1016/j.cell.2021.03.056	E-GEAD-397 E-GEAD-398 E-GEAD-420
4	Dysregulation of the gene signature of effector regulatory T cells in the early phase of systemic sclerosis	doi: 10.1093/rheumatology/keac031	JGAD000406

USRES (Controlled-Access Data)

Principal Investigator	Affiliation	Research Title	Data in Use (Dataset ID)	Period of Data Use