NBDC Research ID: hum0214.v5
Click to Latest version. span>
SUMMARY
Aims: To elucidate the regulation of gene expression in each immune cell subset and its contribution to autoimmune diseases.
Methods:
JGAS000220: Various immune cell subsets from 21 systemic sclerosis patients, 26 ANCA associated vasculitis, and 28 healthy controls were collected (Naive_B, SM_B, USM_B, DN_B, Plasmablast, Th1, Th2, Th17, Tfh, Naive_CD4, Mem_CD4, Fr._II_eTreg, Naive_CD8, Mem_CD8, mDC, pDC, CD16p_Mono, CD16n_Mono, NK, Neu) and total RNAs were extracted from each subset. RNA-seq was performed for each sample.
E-GEAD-397 / E-GEAD-398 / E-GEAD-420: Whole blood and 28 immune cell subsets from study population were collected (Naive_CD4, Mem_CD4, Fr._I_nTreg, Fr._II_eTreg, Fr._III_T, Th1, Th2, Th17, Tfh, NK, Naive_CD8, Mem_CD8, EM_CD8, CM_CD8, TEMRA_CD8, Naive_B, USM_B, SM_B, DN_B, Plasmablast, CL_Mono (or CD16n_Mono), CD16p_Mono, Int_Mono, NC_Mono, mDC, pDC, LDG, Neu). Whole genome sequencing was performed with whole blood samples. RNA-seq was performed with each immune cell subset samples. After filtering and normalization of the gene expression data, eQTL analysis was performed in each immune cell type.
JGAS000296: 24 peripheral blood immune cell subsets from 50 systemic sclerosis patients and 48 healthy controls were collected (Naive_CD4, Mem_CD4, Fr._I_nTreg, Fr._II_eTreg, Fr._III_T, Th1, Th2, Th17, Tfh, NK, Naive_CD8, EM_CD8, CM_CD8, TEMRA_CD8, Naive_B, USM_B, SM_B, DN_B, Plasmablast, CL_Mono, Int_Mono, NC_Mono, mDC, pDC). RNA-seq was performed with each immune cell subset samples. After gene expression quantification samples were filtered.
JGAS000220 (JGAD000371, JGAD000372, JGAD000373): 19 immune cell subsets from study population were collected (Naive_CD4, Mem_CD4, Fr._II_eTreg, Th1, Th2, Th17, Tfh, NK, Naive_CD8, Mem_CD8, Naive_B, USM_B, SM_B, DN_B, Plasmablast, CD16n_Mono, CD16p_Mono, mDC, pDC). RNA-seq was performed with each immune cell subset sample. ATAC-seq of 15 immune cell subsets was also performed.
Participants/Materials: Systemic Sclerosis, Systemic Lupus Erythematosus, Myositis, Mixed Connective Tissue Disease, Sjögren’s Syndrome, Rheumatoid Arthritis, Behçet’s Disease, Adult Onset Still’s Disease, ANCA-associated Vasculitis, Takayasu’s Arteritis, healthy individuals
Data Set ID | Type of Data | Criteria | Release Date |
---|---|---|---|
JGAS000220 | NGS (RNA-seq: Systemic sclerosis) | Controlled Access (Type I) | 2020/10/09 |
JGAS000220 | NGS (RNA-seq: ANCA-associated Vasculitis) | Controlled Access (Type I) | 2021/03/05 |
E-GEAD-397 | Read count data from RNA-seq | Un-restricted Access | 2021/04/28 |
E-GEAD-398 | Conditional eQTL summary data (significant associations) | Un-restricted Access | 2021/04/28 |
E-GEAD-420 | Nominal eQTL data (including non-significant associations) | Un-restricted Access | 2021/04/28 |
JGAS000296 | NGS (RNA-seq) | Controlled Access (Type I) | 2022/01/21 |
JGAS000220 | NGS (RNA-seq) Systemic Lupus Erythematosus | Controlled Access (Type I) | 2022/03/09 |
JGAS000220 | NGS (ATAC-seq) | Controlled Access (Type I) | 2022/03/09 |
*When the research results including the data which were downloaded from NHA/DRA, are published or presented somewhere, the data user must refer the papers which are related to the data, or include in the acknowledgment. Learn more
*Data users need to apply an application for Using NBDC Human Data to reach the Controlled-access Data. Learn more
MOLECULAR DATA
JGAS000220 (RNA-seq: Systemic sclerosis)
Participants/Materials |
Systemic Sclerosis (ICD10: M340): 21 cases 13 healthy controls |
Targets | RNA-seq |
Target Loci for Capture Methods | - |
Platform | Illumina [HiSeq 2500] |
Library Source | Total RNAs extracted from 19 immune cell subsets (Naive_B, SM_B, USM_B, DN_B, Plasmablast, Th1, Th2, Th17, Tfh, Naive_CD4, Mem_CD4, Fr._II_eTreg, Naive_CD8, Mem_CD8, mDC, pDC, CD16p_Mono, CD16n_Mono, NK) |
Cell Lines | - |
Library Construction (kit name) | SMART-seq v4 Ultra Low Input RNA Kit |
Fragmentation Methods | SMART-seq v4 Ultra Low Input RNA Kit |
Spot Type | Paired-end |
Read Length (without Barcodes, Adaptors, Primers, and Linkers) | 100 bp |
Mapping Methods | STAR (hg38) |
QC Methods | The adaptor sequences and 3’ low quality bases (Phred quality score < 20) were trimmed. Short reads (< 50bp) and reads containing many low quality bases (Phred quality score < 20 in > 20% of the bases) were removed. If the uniquely mapped rate was less than 80%, or the number of uniquely mapped reads was 5.00 x 106 reads, the sample was removed before further analysis. The correlation coefficient of the expression data between two samples belonging to the same cell subset and calculated the average of the correlation coefficient (Di). Samples for which Di was less than 0.9 were removed. |
Gene Number | 26353 |
Japanese Genotype-phenotype Archive Data set ID | JGAD000309 |
Total Data Volume | 125 MB (count data, txt) |
Comments (Policies) | NBDC policy |
JGAS000220 (RNA-seq: ANCA-associated Vasculitis)
Participants/Materials |
ANCA-associated Vasculitis (ICD10: M318): 26 cases 28 healthy controls (including above 13 individuals) |
Targets | RNA-seq |
Target Loci for Capture Methods | - |
Platform | Illumina [HiSeq 2500] |
Library Source | Total RNAs extracted from 20 immune cell subsets (Naive_B, SM_B, USM_B, DN_B, Plasmablast, Th1, Th2, Th17, Tfh, Naive_CD4, Mem_CD4, Fr._II_eTreg, Naive_CD8, Mem_CD8, mDC, pDC, CD16p_Mono, CD16n_Mono, NK, Neu) |
Cell Lines | - |
Library Construction (kit name) | SMART-seq v4 Ultra Low Input RNA Kit |
Fragmentation Methods | SMART-seq v4 Ultra Low Input RNA Kit |
Spot Type | Paired-end |
Read Length (without Barcodes, Adaptors, Primers, and Linkers) | 100 bp |
Mapping Methods | STAR (hg38) |
QC Methods | The adaptor sequences and 3’ low quality bases (Phred quality score < 20) were trimmed. Short reads (< 50bp) and reads containing many low quality bases (Phred quality score < 20 in > 20% of the bases) were removed. If the uniquely mapped rate was less than 80%, or the number of uniquely mapped reads was 5.00 x 106 reads, the sample was removed before further analysis. The correlation coefficient of the expression data between two samples belonging to the same cell subset and calculated the average of the correlation coefficient (Di). Samples for which Di was less than 0.9 were removed. |
Gene Number | 26353 |
Japanese Genotype-phenotype Archive Data set ID | JGAD000310 |
Total Data Volume | 125 MB (count data, txt) |
Comments (Policies) | NBDC policy |
Participants/Materials |
Systemic Lupus Erythematosus (ICD10: M329): 62 cases Myositis (ICD10: M339, M332): 65 cases Systemic Sclerosis (ICD10: M340): 67 cases Mixed Connective Tissue Disease (ICD10: M351): 19 cases Sjögren’s Syndrome (ICD10: M350): 18 cases Rheumatoid Arthritis (ICD10: M0690): 25 cases Behçet’s Disease (ICD10: M352): 23 cases Adult Onset Still’s Disease (ICD10: M0610): 18 cases ANCA-associated Vasculitis (ICD10: M318: 26 cases Takayasu’s Arteritis (ICD10: M314): 16 cases 92 healthy controls |
Targets | RNA-seq |
Target Loci for Capture Methods | - |
Platform | Illumina [HiSeq 2500] |
Library Source | Total RNAs extracted from 28 immune cell subsets (Naive_CD4, Mem_CD4, Fr._I_nTreg, Fr._II_eTreg, Fr._III_T, Th1, Th2, Th17, Tfh, NK, Naive_CD8, Mem_CD8, EM_CD8, CM_CD8, TEMRA_CD8, Naive_B, USM_B, SM_B, DN_B, Plasmablast, CL_Mono (or CD16n_Mono), CD16p_Mono, Int_Mono, NC_Mono, mDC, pDC, LDG, Neu) |
Cell Lines | - |
Library Construction (kit name) | SMART-seq v4 Ultra Low Input RNA Kit |
Fragmentation Methods | SMART-seq v4 Ultra Low Input RNA Kit |
Spot Type | Paired-end |
Read Length (without Barcodes, Adaptors, Primers, and Linkers) | 100 bp |
Mapping Methods | STAR (GRCh38) |
QC Methods | From sequenced reads, adaptor sequences were trimmed using cutadapt (v1.16). In addition, 3′- ends with low-quality bases (Phred quality score < 20) were trimmed using the fastx-toolkit (v0.0.14). Reads containing more than 20% low-quality bases were removed. Subsequently, reads were aligned against the GRCh38 reference sequence using STAR (v2.5.3) in two-pass mode with Gencode version 27 annotations. We excluded samples with uniquely mapped read rates < 90% (with the exception of < 70% for plasmablasts and <85% for the other B cell subsets) or unique read counts < 6 × 10^6. Expression was quantified using HTSeq (v 0.11.2.). For QC of the expression data, in each cell population, we filtered low count genes (< 10 in > 90% of samples), normalized between samples with a trimmed mean of M values (TMM) implemented in edgeR software, converted to log-transformed count per million (CPM), removed batch effects using ComBat software and computed inter-sample Spearman’s correlations of expression levels between each sample and the remaining samples from the same cell subset. We excluded samples with mean correlation coefficients less than 0.9. |
Gene Number | 53344 |
Genomic Expression Archive Data set ID | E-GEAD-397 |
Total Data Volume | 1.3 GB (clinical data and count data of autosomal genes, txt) |
Comments (Policies) | NBDC policy |
E-GEAD-398 / E-GEAD-420 (eQTL)
Participants/Materials |
Systemic Lupus Erythematosus (ICD10: M329): 62 cases Myositis (ICD10: M339, M332): 65 cases Systemic Sclerosis (ICD10: M340): 67 cases Mixed Connective Tissue Disease (ICD10: M351): 19 cases Sjögren’s Syndrome (ICD10: M350): 18 cases Rheumatoid Arthritis (ICD10: M0690): 24 cases Behçet’s Disease (ICD10: M352): 23 cases Adult Onset Still’s Disease (ICD10: M0610): 18 cases ANCA-associated Vasculitis (ICD10: M318: 25 cases Takayasu’s Arteritis (ICD10: M314): 16 cases 79 healthy controls |
Targets | eQTL |
Target Loci for Capture Methods | - |
Platform | Illumina [HiSeq X Ten] |
Library Source | DNAs extracted from whole blood |
Cell Lines | - |
Library Construction (kit name) | TruSeq DNA PCR-Free Library prep kit |
Fragmentation Methods | TruSeq DNA PCR-Free Library prep kit |
Spot Type | Paired-end |
Read Length (without Barcodes, Adaptors, Primers, and Linkers) | 151 bp |
Mapping Methods | BWA-MEM(GRCh38) |
QC Methods | WGS data processing was performed based on the standardized best-practice method proposed by GATK (v 4.0.6.0). Samples with genotyping call rates < 99% were removed. We used BEAGLE (v 5.1) to impute missing genotypes. Variants with call rate < 85%, HWE P-value < 1.0 x 10-6 or minor allele frequency < 1% were excluded. |
Gene Number |
E-GEAD-398: 19441 E-GEAD-420: 22381 |
Detection method of eQTL | Genes expressed at low levels (< 5 count in more than 80% samples or < 0.5 CPM in more than 80% samples) were filtered out in each cell subset. The residual autosomal expression data were normalized between samples with TMM, converted to CPM and then normalized across samples using an inverse normal transform. A Probabilistic Estimation of Expression Residuals (PEER) method was applied to normalized expression data to infer hidden covariates. The top 2 genetic principal components, sample collection phase, clinical diagnosis, sex and latent factors were utilized as covariates for eQTL analysis. Mem CD8s, which were collected in Phase1 and divided into CM CD8 and EM CD8 in Phase2, were analyzed jointly with EM CD8 for eQTL analysis because the majority of the Mem CD8 population consisted of EM CD8. For each cell subset conditional eQTL analysis, we used a QTLtools permutation pass with 10,000 permutations to obtain gene-level nominal P value thresholds corresponding to FDR < 0.05. We subsequently performed forward-backward stepwise regression eQTL analysis with a QTLtools conditional pass. For nominal eQTL analysis, we used a QTLtools nominal pass and tested for the association of the variants located within 1Mbp from the TSS of the genes. |
Genomic Expression Archive Data set ID |
E-GEAD-420 (2021/6/9: Added columns for REF/ALT allele. Slope indicates the effect size of alternative alleles.) |
Total Data Volume |
E-GEAD-398: 3.9 GB (conditional eQTL summary data [FDR<0.05], txt) E-GEAD-420: 38 GB (nominal eQTL data [full], txt) |
Comments (Policies) | NBDC policy |
Participants/Materials |
Systemic Sclerosis (ICD10: M340): 50 cases 48 healthy controls (including same cases in E-GEAD-397) |
Targets | RNA-seq |
Target Loci for Capture Methods | - |
Platform | Illumina [HiSeq 2500] |
Library Source | Total RNAs extracted from 24 immune cell subsets (Naive_CD4, Mem_CD4, Fr._I_nTreg, Fr._II_eTreg, Fr._III_T, Th1, Th2, Th17, Tfh, NK, Naive_CD8, EM_CD8, CM_CD8, TEMRA_CD8, Naive_B, USM_B, SM_B, DN_B, Plasmablast, CL_Mono, Int_Mono, NC_Mono, mDC, pDC) |
Cell Lines | - |
Library Construction (kit name) | SMART-seq v4 Ultra Low Input RNA Kit |
Fragmentation Methods | SMART-seq v4 Ultra Low Input RNA Kit |
Spot Type | Paired-end |
Read Length (without Barcodes, Adaptors, Primers, and Linkers) | 100 bp |
Mapping Methods | STAR (hg38) |
QC Methods | The adaptor sequences and 3’ low quality bases (Phred quality score < 20) were trimmed. Short reads (< 50bp) and reads containing many low quality bases (Phred quality score < 20 in > 20% of the bases) were removed. If the uniquely mapped rate was less than 80%, or the number of uniquely mapped reads was 5.00 x 106 reads, the sample was removed before further analysis. The correlation coefficient of the expression data between two samples belonging to the same cell subset and calculated the average of the correlation coefficient (Di). Samples for which Di was less than 0.9 were removed. |
Gene Number | 26353 |
Japanese Genotype-phenotype Archive Data set ID | JGAD000309 |
Total Data Volume | 172.8 MB (count data, txt) |
Comments (Policies) | NBDC policy |
JGAS000220 (RNA-seq: Systemic Lupus Erythematosus)
Participants/Materials |
Systemic Lupus Erythematosus (ICD10: M329): 107 cases 92 healthy controls |
Targets | RNA-seq |
Target Loci for Capture Methods | - |
Platform | Illumina [HiSeq 2500] |
Library Source | Total RNAs extracted from 19 immune cell subsets (Naive_CD4, Mem_CD4, Fr._II_eTreg, Th1, Th2, Th17, Tfh, NK, Naive_CD8, Mem_CD8, Naive_B, USM_B, SM_B, DN_B, Plasmablast, CD16n_Mono, CD16p_Mono, mDC, pDC) |
Cell Lines | - |
Library Construction (kit name) | Smart-Seq v2 for the test cohort and SMART-seq v4 Ultra Low Input RNA Kit for the validation cohort |
Fragmentation Methods | Smart-Seq v2 for the test cohort and SMART-seq v4 Ultra Low Input RNA Kit for the validation cohort |
Spot Type | Paired-end |
Read Length (without Barcodes, Adaptors, Primers, and Linkers) |
76 bp (test cohort) 100 bp (validation cohort) |
Mapping Methods | STAR (GRCh38) |
QC Methods | FASTQ files were aligned to the human genome (GRCh38; GenBank assembly GCA_000001405.18) using STAR (v2.5). HTSeq-count (v0.6.1) was used to generate gene counts. In the quality-control analysis, low-quality bases (Phred quality score < 20) were trimmed using the fastx-toolkit (v0.0.14). As the level of mitochondrial transcription is an indicator of cell stress, we applied a cutoff percentage of mitochondrial gene transcripts of < 8%. For detecting outlier samples, Spearman’s correlation for each subset was calculated, and samples with an average r2 < 0.8 were omitted as outliers. |
Gene Number |
26354 (test cohort) 26485 (validation cohort) |
Japanese Genotype-phenotype Archive Data set ID | |
Total Data Volume | 250 MB (count data, txt) |
Comments (Policies) | NBDC policy |
Participants/Materials |
Systemic Lupus Erythematosus (ICD10: M329): 8 cases 8 healthy controls |
Targets | ATAC-seq |
Target Loci for Capture Methods | - |
Platform | Illumina [HiSeq 2500] |
Library Source | DNAs extracted from 15 immune cell subsets (Naive_B, SM_B, USM_B, DN_B, Plasmablast, Th1, Th2, Th17, Tfh, Naive_CD4, Mem_CD4, Naive_CD8, CD16p_Mono, CD16n_Mono, NK) |
Cell Lines | - |
Library Construction (kit name) | Fast-ATAC-seq protocol |
Fragmentation Methods | Fast-ATAC-seq protocol |
Spot Type | Paired-end |
Read Length (without Barcodes, Adaptors, Primers, and Linkers) | 102 bp |
Japanese Genotype-phenotype Archive Data set ID | JGAD000373 |
Total Data Volume | 27 GB (tdf, bed) |
Comments (Policies) | NBDC policy |
DATA PROVIDER
Principal Investigator: Keishi Fujio
Affiliation: Department of Allergy and Rheumatology, Graduate School of Medicine, The University of Tokyo
Project / Group Name: Immune cell multi-omics analysis of immune-mediated diseases
Funds / Grants (Research Project Number):
Name | Title | Project Number |
---|---|---|
Collaborative research fund with Chugai Pharmaceutical Co., Ltd. | - | - |
Practical Research Project for Rare / Intractable Diseases, Japan Agency for Medical Research and Development (AMED) | Identification of therapeutic targets and development of intervention strategy for systemic lupus erythematosus based on the comprehensive analysis of genome and transcriptome. | JP17ek0109103 |
PUBLICATIONS
Title | DOI | Data Set ID | |
---|---|---|---|
1 | Integrated bulk and single-cell RNA-sequencing identified disease-relevant monocytes and a gene network module underlying systemic sclerosis | doi: 10.1016/j.jaut.2020.102547 |
JGAD000309 |
2 | Identifying the most influential gene expression profile in distinguishing ANCA-associated vasculitis from healthy controls | doi: 10.1016/j.jaut.2021.102617 | JGAD000310 |
3 | Dynamic landscape of immune cell-specific gene regulation in immune-mediated diseases | doi: 10.1016/j.cell.2021.03.056 |
E-GEAD-397 E-GEAD-398 E-GEAD-420 |
4 | Dysregulation of the gene signature of effector regulatory T cells in the early phase of systemic sclerosis | doi: 10.1093/rheumatology/keac031 | JGAD000406 |
5 | Immune cell multiomics analysis reveals contribution of oxidative phosphorylation to B-cell functions and organ damage of lupus | doi: 10.1136/annrheumdis-2021-221464 |
JGAD000371 JGAD000372 JGAD000373 |
USRES (Controlled-Access Data)
Principal Investigator | Affiliation | Research Title | Data in Use (Data Set ID) | Period of Data Use |
---|---|---|---|---|