Method development and application for spatial proteome and glycoproteome profiling

Peiwu Huang

Principal supervisor: Prof. CAI Zongwei ; Thesis submitted to the Department of Chemistry

Abstract

Tissues are heterogeneous ecosystems comprised of various cell types. For example, in tumor tissues, malignant cancer cells are surround by various non-malignant stromal cells. Proteins, especially N-linked glycoproteins, are key players in tumor microenvironment and respond to many extracellular stimuli for involving and regulating intercellular signaling. Understanding the human proteome and glycoproteome in heterogeneous tissues with spatial resolution are meaningful for exploring intercellular signaling networks and discovering protein biomarkers for various diseases, such as cancer. In this study, we aimed to develop new liquid chromatography-tandem mass spectrometry (LC-MS/MS)-based analytical methods for spatially-resolved proteome and glycoproteome profiling in tissue samples, and apply them for profiling potential biomarkers for pancreatic cancer. We first systematically and synchronously optimized the LC-MS parameters to increase peptide sequencing efficiency in data dependent proteomics. Taking advantage of its hybrid instrument design with various mass analyzer and fragmentation strageties, the Orbitrap Fusion mass spectrometer was used for systematically comparing the popular high-high approach by using orbitrap for both MS1 and MS2 scans and high-low approach by using orbitrap for MS1 scan and ion trap for MS2 scans. High-high approach outperformed high-low approach in terms of better saturation of the scan cycle and higher MS2 identification rate. We then systematically optimized various MS parameters for high-high approach. We investigated the influence of isolation window and injection time on scan speed and MS2 identification rate. We then explored how to properly set dynamic exclusion time according to the chromatography peak width. Furthermore, we found that the orbitrap analyzer, rather than the analytical column, was easily saturated with higher peptide loading amount, thus limited the dynamic range of MS1-based quantification. Finally, by using the optimized LC-MS parameters, more than 9000 proteins and 110,000 unique peptides were identified by using 10 hours of effective LC gradient time. The study therefore illustrated the importance of synchronizing LC-MS precursor targeting and high-resolution fragment detection for high-efficient data dependent proteomics. Understanding the tumor heterogeneity through spatially resolved proteome profiling is meaningful for biomedical research. Laser capture microdissection (LCM) is a powerful technology for exploring local cell populations without losing spatial information. Here, we designed an immunohistochemistry (IHC)-based workflow for cell type-resolved proteome analysis of tissue samples. Firstly, targeted cell type was stained by IHC using antibody targeting cell-type specific marker to improve accuracy and efficiency of LCM. Secondly, to increase protein recovery from chemically crosslinked IHC tissues, we optimized a decrosslinking procedure to seamlessly combine with the integrated spintip-based sample preparation technology SISPROT. This newly developed approach, termed IHC-SISPROT, has comparable performance with traditional H&E staining-based proteomic analysis. High sensitivity and reproducibility of IHC-SISPROT was achieved by combining with data independent proteomic analysis. This IHC-SISPROT workflow was successfully applied for identifying 6660 and 6052 protein groups from cancer cells and cancer- associated fibroblasts (CAFs) by using only 5 mm 2 and 12 μm thickness of hepatocellular carcinoma tissue section. Bioinformatic analysis revealed the enrichment of cell type-specific ligands and receptors and potentially new communications between cancer cells and CAFs by these signaling proteins. Therefore, IHC-SISPROT is sensitive and accurate proteomic approach for spatial profiling of cell type-specific proteome from tissues. N-linked glycoproteins are promising candidates for diagnostic and prognostic biomarkers and therapeutic targets. They often locate at plasma membrane and extracellular space with distinct cell type distribution in tissue microenvironment. Due to access to only low microgram of proteins and low abundance of glycoproteins in tissue sections harvested by LCM, region- and cell type-resolved glycoproteome analysis of tissue sections remains challenging. Here we designed a fully integrated spintip-based glycoproteomic approach (FISGlyco) which achieved all the steps for glycoprotein enrichment, digestion, deglycosylation and desalting in a single spintip device. Sample loss is significantly reduced and the total processing time is reduced to 4 hours, while detection sensitivity and label-free quantification precision is greatly improved. 607 N-glycosylation sites were successfully identified and quantified from only 5 μg of mouse brain proteins. By seamlessly combining with LCM, the first region-resolved N-glycoproteome profiling of four mouse brain regions, including isocortex, hippocampus, thalamus, and hypothalamus, was achieved, with 1,875, 1,794, 1,801, and 1,417 N-glycosites identified, respectively. Our approach could be a generic approach for region and even cell type specific glycoproteome analysis of tissue sections. Pancreatic ductal adenocarcinoma (PDAC) is a devastating disease with five year survival rate of around 8%. No effective biomarkers and targeted therapy are one of the major reasons for this urgent clinical situation. To explore potential protein biomarkers and drug targets located at intercellular space of pancreatic tumor microenvironment, we established chemical proteomic approach for deep glycoproteome profiling of PDAC clinical tissue samples based on the above- mentioned new proteomic methods. Taking advantage of a long chain biotin- hydrazide probe with less space hindrance, the new method outperformed traditional hydrazide chemistry method in terms of sensitivity, time efficiency and glycoproteome coverage. The method was successfully applied to enrich and validate LIF and its receptors as potential biomarkers for PDAC. In addition, to explore the full map of pancreatic tumor microenvironment glycoproteome with diagnostic and therapeutic values, we collected 114 pancreatic tissues, including 30 PDAC tumor tissues, 30 adjacent non-tumor (NT) tissues, 32 chronic pancreatitis tissues and 22 normal pancreatic tissues, and systematically profiled their glycoprotein expression pattern by using the developed glycoproteomic strategy. The deepest glycoproteome of PDAC was achieved, which covered the majority of previously reported glycoprotein biomarkers and drug targets for PDAC. Importantly, we discovered many new glycoproteins with differential expression in PDAC and normal tissue types. Moreover, LCM-based cell-type proteome profiling was achieved for 13 PDAC tissue samples, which covered more than 8000 proteins for both pancreatic stromal cells and pancreatic cancer cells in each sample. We therefore provided a valuable resource for screening novel and cancer specific glycoprotein biomarkers for pancreatic cancer with spatial resolution