Friday, November 22, 2024

Relativity, Amazon, and NASA continue work on Cape infrastructure

Relativity, Amazon, and NASA continue work...

AASWOMEN Newsletter for November 22, 2024

AAS Committee on the Status of...

A proteogenomic analysis of cervical cancer reveals therapeutic and biological insights

BiochemistryA proteogenomic analysis of cervical cancer reveals therapeutic and biological insights


Clinical sample of PUMC-CC cohort

Based on the sample selection parameters of TCGA CC cohort14, this study implemented the following criteria: 1) All tumors and NATs were collected before treatment, and were stored in liquid nitrogen within 30 min after isolation. 2) Each piece of tissue was sectioned and Hematoxylin and eosin (H&E) stained for tumor cellularity analysis before sequencing, and all H&E staining results were scanned and pathologically analyzed. 3) All tumor samples were completed independently by two pathologists to ensure histological accuracy and the NATs contained no tumor cells. 4) The percent tumor nuclei and percent necrosis were also assessed. Tumors with at least 50% tumor cell nuclei and less than 10% necrosis were reserved. All patients signed separate informed consent forms for sampling, research, and publication. This study was conducted in accordance with the guidelines and regulations set forth by the Ethics Committee of National Cancer Center/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College (approval No. 22/374-3576). The treatment-naive CC patients were recruited from Cancer Hospital Chinese Academy of Medical Sciences between June 2019 and July 2021. 139 tumors and 33 NATs (30 NATs with paired tumors) were collected, and each sample was homogenized via cryo-pulverization and aliquoted to subsequent genomic, transcriptomic, proteomic, phosphoproteomic, and acetylproteomic analyses.

Each sample was assigned a new research ID, and the patient’s name or medical record number used during hospitalization was de-identified. The clinical information of these enrolled patients were collected, including SCC, FIGO stage, tumor size, tumor grade, histology type, age diagnosis, HPV type, primary treatment, lymph node status and PFS information. PFS was defined as the time from the start of treatment to disease progression, relapse, or death. The follow-up deadline for these enrolled patients was July 2023. 1 patient did not receive antitumor treatment. The surgical patients generally have a good prognosis, no progression was observed in 62 out of the 73 surgical patients at the time of the follow-up cutoff. In addition, the standard treatment for locally advanced CC is cisplatin-based concurrent chemoradiotherapy, while radiotherapy alone is indicated in older patients or those with insufficient renal function90. Cisplatin remains the radiosensitizing agent for patients with locally advanced CC when used concomitantly with radiotherapy in the National Comprehensive Cancer Network guidelines91. In this study, 65 patients (46.8% of the cohort) received primary treatment with either concurrent chemoradiotherapy (n = 54) or radiotherapy alone (n = 11). These patients, collectively referred to as radical radiotherapy patients92,93, were combined for the PFS analysis.

The 124 patients in the independent validation cohort were stage IIB to stage IVB CC patients who received concurrent chemoradiotherapy or radiotherapy alone at the Cancer Hospital of the Chinese Academy of Medical Sciences from November 2015 to December 2017. The clinicopathological parameters and FPS information for these cases were detailed in Supplementary Data 10.

Cell line

SiHa and HEK293T cells were kept in Dr. Daming Gao’s lab. SiHa cells were cultured in MEM-NEAA with 10% FBS, 100 units of penicillin and 100 mg/mL streptomycin. HEK293T cells were cultured in DMEM with 10% FBS, 100 units of penicillin, and 100 mg/mL streptomycin.

Proteogenomic workflow

The proteogenomic research was carried out through the workflow shown in Supplementary Fig. 1a. To reduce the impact of intra-tumor heterogeneity on multi-omics analysis, tumors, and NATs were pulverized using the CryoPrepTM CP02 (Covaris) and then divided into two parts: the first part was snap-frozen in liquid nitrogen and stored at −80 °C for the following proteomic, phosphoproteomic and acetylproteomics analyses as well as DNA extraction; the second part was added to the RNAlaterTM Stabilization Solution (Invitrogen, AM7020) and kept in −80 °C, and then used for RNA extraction.

DNA/RNA extraction, WES, and RNA-seq

Genomic DNA was extracted from tumors and NATs using Magnetic Universal Genomic DNA Kit (TIANGEN, DP705) according to manufacturer’s protocol. Matched blood DNA was also extracted using the above kit. The quality of isolated genomic DNA was verified by Qubit® DNA Assay Kit in Qubit® 3.0 Flurometer (Invitrogen) and NanoDrop 2000 (Thermo Fisher Scientific) and the integrity was assessed by TapeStation (Agilent Technologies). A total amount of 0.2 μg DNA per sample was used as input material for the DNA library preparations. Sequencing library was generated using NEB Next® UltraTM DNA Library Prep Kit for Illumina (NEB) following manufacturer’s recommendations and index codes were added to each sample. DNA libraries were sequenced on the Illumina NovaSeq 6000 platform and 150 bp paired-end reads were generated. WES was conducted with a mean coverage depth of 156X (range: 112-253X) for tumors,157X (range: 125-216X) for NATs, and 126X (range: 83-205X) for blood samples.

Total RNA was extracted and purified from fresh frozen tissues using the TRIzol® reagent (Invitrogen, 15596026CN). RNA integrity was assessed using the RNA Nano 6000 Assay Kit of the Bioanalyzer 2100 system (Agilent Technologies). Briefly, mRNA was purified from total RNA using poly-T oligo-attached magnetic beads. Fragmentation was carried out using divalent cations under elevated temperature in First Strand Synthesis Reaction Buffer (5X). Paired-end libraries were sequenced on the NovaSeq 6000 Illumina platform, for each tumor and NAT sample, RNA-seq resulted in an average of 64.1 M and 64.4 M high-quality reads, respectively.

MS methods

Protein extraction and tryptic digestion

For protein extraction, ~50 mg of cryo-pulverized tumors or NATs were homogenized in 500 μL lysis buffer (4% SDS, 0.1 M DTT, 0.1 M Tris-HCl, pH 7.6) and sonicated at 20% amplitude for 2 min (5 s on, 5 s off) using Ultrasonic Homogenizer (NingBo Scientz Biotechnology Co., LTD, JY92-IIDN). The proteins were then denatured and reduced at 95 °C for 5 min. Lysates were centrifuged at 12,000 g for 10 min to remove the insoluble debris and protein concentrations of the clarified lysates were determined using tryptophan-based fluorescence quantification method94. Protein lysates (1 mg) were diluted to equal concentration and alkylated with iodoacetamide (IAA) for 45 min at room temperature in the dark. Then, protein lysates were precipitated by adding ice-cold acidified acetone/ethanol buffer with five times the volume of lysis buffer mixed and put them overnight at −20 °C. Precipitated proteins were collected by centrifugation at 18,000 g for 40 min at 4 °C, washed with 1 mL ice-cold acetone and 1 mL 70% ethanol for 40 min at 4 °C. The proteins were then subjected to proteolytic digestion with sequencing grade modified trypsin (Promega) at 1:50 enzyme-to-substrate ratio overnight at 37 °C. The resulting digests were acidified with FA to achieve a final volumetric concentration of 1%, and then subjected to desalting using Waters Sep-Pak® Vac 1cc (50 mg) tC18 cartridges.

TMT 16-plex labeling of peptides

Desalted peptides from each sample were labeled with TMTpro 16-plex reagents (Thermo Fisher Scientific). Tumors and NATs were co-randomized to 12 TMT sets. The NATs were labeled using channels that closely matched the paired tumors within the same TMT set. In addition, paired samples were evenly distributed among the 12 TMT sets to ensure a balance of NATs across all sets. A mixed sample was prepared by pooling an aliquot from 30 CC tumors and NATs and labeled with channel 134N in all of the TMTpro 16-plex sets as internal reference. For each TMT labeling experiment, dried peptides (250 μg) from each sample were dissolved in 250 μL 100 mM TEAB. 5 mg of TMT reagent was dissolved in 205 μL anhydrous acetonitrile, and then 50 μL of each TMT reagent was added to the corresponding peptides. After 1-h incubation at room temperature, 12.5 μL 5% hydroxylamine was added to quench the labeling reaction for 15 min at room temperature. The labeled peptides were pooled, dried down via Speed-Vac, and subsequently desalted on a reversed-phase tC18 SepPak column (Waters).

Peptide fractionation

In the TMT-based proteome analysis, the labeled peptides were pooled together to create a highly complex mixture. To reduce sample complexity and increase the depth of protein identification, high-pH reverse phase liquid chromatography (RPLC) was used for peptide fractionation, as previously described33. For each TMT set, about 4 mg TMTpro 16-plex labeled peptides were fractionated using a 4.6 mm × 250 mm Waters XBridge BEH300 C18 column with 3.5 μm size beads (Waters). Peptides were separated using an Agilent 1260 HPLC instrument via high-pH RPLC with solvent A (10 mM ammonium formate, pH 10) and a non-linear increasing concentration of solvent B (90% ACN, 10 mM ammonium formate, pH 10) at a flow rate of 0.7 mL/min. The 110-min separation gradient was set as follows: 1–5% B in 2 min; 5–25% B in 35 min; 25–40% B in 43 min; 40–55% B in 6 min; 55–95% B in 3 min; 95% B for 4 min; 95–1% B in 1 min; 1% B for 16 min. Peptides were separated and collected every minute for a total of 96 fractions from 3 min to 99 min, with fractions combined into 24 fractions by a stepwise concatenation strategy. 5% of each of the 24 fractions was allocated and dried down in a Speed-Vac for global proteome analysis. The remaining 95% sample was then utilized for phosphopeptides enrichment. In DIA-based proteome analysis, since the samples are analyzed individually, each sample is directly analyzed without prior fractionation.

Phosphopeptide enrichment

High-Select Fe-NTA kit (Thermo Fisher Scientific, A32992) was used for phosphopeptide enrichment according to the manufacturer’s instructions. Briefly, the 24 fractionated peptides were dissolved in 200 μL 80% ACN/0.1% TFA and incubated with 50 μL Fe3+-NTA agarose beads for 20 min at room temperature. Then, the mixture was transferred into the filter tip (Axygen, TF-200-L-R-S) and clarified peptide flow-throughs with unbound peptides were collected by centrifugation. The resins with phosphopeptides were washed with 200 μL 80% ACN/0.1% TFA for 3 times and 200 μL H2O for 3 times. The bound phosphopeptides were eluted twice with 200 μL 50% ACN/5% NH3·H2O and dried down via Speed-Vac. All centrifugation steps above were conducted at 50 g at room temperature. The eluted peptides were desalted using C18 stage tips and dried down.

Acetylpeptide enrichment

PTMScan Acetyl-Lysine Motif [Ac-K] Kit (CST, 13416) was used for acetylpeptide enrichment according to the manufacturer’s instructions. In brief, tryptic peptides from the flow-through of IMAC were concatenated into 4 fractions and dried down via Speed-Vac. The dried peptides were reconstituted in 1.4 mL of the immunoaffinity purification (IAP) buffer (50 mM MOPS/NaOH pH 7.2, 10 mM Na2HPO4 and 50 mM NaCl) and incubated with freshly prepared acetyl-lysine motif antibody agarose beads for 2 h at 4 °C. After removing the supernatant, peptide-bound beads were washed 2 times with 1 mL of ice-cold IAP buffer followed by 3 times with 1 mL HPLC H2O. Then, acetylated peptides were eluted by incubation 2 times with 55 μL of 0.15% TFA at room temperature for 10 min. The eluted peptides were desalted using C18 stage tips and dried down.

LC-MS/MS analysis for TMT-based proteomics

Peptides were resolved with 0.1% formic acid (FA) and separated on a nanoflow Easy nLC 1200 UHPLC system (Thermo Fisher Scientific) with an in-house packed 20 cm × 75 μm interbal diameter C18 column (3 μm ReproSil-Pur C18-AQ beads, Dr. Maisch GmbH). The column was heated to 50 °C using a home-made column heater. The flow rate was set at 300 nL/min with 0.1% FA in H2O (Solvent A) and 0.1% FA in 80% acetonitrile (Solvent B). The 120-min separation gradient was used for proteome analysis and set as followings: 2–5% B in 1 min; 5–27% B in 94 min; 27–35% B in 15 min; 35–52% B in 3 min; 52–100% B in 1 min; and 100% B in 6 min. For phoshoproteome analysis, the same LC and column setup was used except for a 70-min LC gradient (2–4% B in 1 min; 4–25% B in 51 min; 25–32% B in 8 min; 32–50% B in 3 min; 50–100% B in 1 min; and 100% B in 6 min). For acetylproteome analysis, the same LC and column setup was used except for a 180-min LC gradient (4–35% B in 152 min; 35–45% B in 20 min; 45–100% B in 2 min; and 100% B in 6 min).

For proteomic and phosphoproteome analysis, samples were analyzed with a Q-Exactive HF mass spectrometer (Thermo Fisher Scientific) equipped with a nanoflow ionization source. Data-dependent acquisition was performed using Xcalibur software in positive ion mode at a spray voltage of 2300 V. The MS1 spectra were measured with a resolution of 120,000@m/z 200, an AGC target of 3e6, a maximum injection time of 50 ms and a mass range of 300 to 1700 m/z. The data-dependent mode cycle was set to trigger MS2 scan on up to the top 15 most abundant precursors per cycle at an MS2 resolution of 60,000 @ m/z 200, an AGC target of 1e5, a maximum injection time of 120 ms, an isolation window of 0.7 m/z, an high collision dissociation (HCD) collision energy of 30, and a fixed first mass of 110.0 m/z. The dynamic exclusion time was set as 40 s (30 s for phosphoproteome) and precursor ions with charge 1, 7, 8 and >8 were excluded for MS2 analysis.

For acetylproteome analysis, samples were analyzed with a Q-Exactive HF-X mass spectrometer (Thermo Fisher Scientific) equipped with a nanoflow ionization source. Data-dependent acquisition was performed using Xcalibur software in positive ion mode at a spray voltage of 1800 V. The MS1 spectra was measured with a resolution of 120,000 @ m/z 200, an AGC target of 3e6, a maximum injection time of 50 ms and a mass range of 350 to 1700 m/z. The data-dependent mode cycle was set to trigger MS2 scan on up to the top 15 most abundant precursors per cycle at an MS2 resolution of 45,000 @ m/z 200, an AGC target of 1e5, a maximum injection time of 120 ms, an isolation window of 0.7 m/z, an HCD collision energy of 30, and a fixed first mass of 110.0 m/z. The dynamic exclusion time was set as 30 s and precursor ions with charge 1, 7, 8 and >8 were excluded for MS2 analysis.

DIA analysis

Unlabeled, digested peptides from each sample were separated on a nanoElute LC system (Bruker) with an analytical column (20 cm × 75 μm, 1.6 μm C18 resin, IonOpticks). The column was heated to 50 °C using a column heater (Bruker). The flow rate was set at 300 nL/min with 0.1% FA in H2O (Solvent A) and 0.1% FA in acetonitrile (Solvent B). The 90-min separation gradient was used for proteome analysis and set as followings: 2–22% B in 75 min; 22–37% B in 5 min; 37–80% B in 5 min; and 80% B in 5 min. Samples were analyzed using a timsTOF Pro mass spectrometer (Bruker) equipped with a nano-electrospray ion source (Bruker). Source capillary voltage was set to 1500 V in positive ion mode, and dry gas flow to 3 L/min at 180 °C. The DIA MS data were acquired using the diaPASEF method95 consisting of 14 cycles, which including a total of 28 mass-width windows (25 Da width, from 452 to 1152 Da) with 4 mobility windows each, making a total of 56 windows covering the ion mobility range (1/K0) from 0.76 to 1.29 Vs/cm2. The MS and MS/MS spectra were acquired from 100 to 1700 m/z. The TIMS-MS survey scan was acquired between 0.7 and 1.3 Vs/cm2. The TIMS-MS survey scan was acquired between 0.7–1.3 Vs/cm2 and 100–1700 m/z. The acquisition time of each PASEF scan was set to 100 ms with a near 100% duty cycle, which led to a total cycle time of around 1.59 s.

Database searching of MS data

For TMT-based proteomics, all MS raw files were searched against the human Swiss-Prot database containing 20,360 sequences plus 277 Swiss-Prot HPV protein sequences using MaxQuant (version 1.6.17.0)96. TMTpro 16-plex based MS2 reporter ion quantification was chosen with reporter mass tolerance set as 0.003 Da. The purities of TMT labeling channels were corrected according to the kit LOT number (WC320807). The PIF (precursor intensity fraction) filter value was set at 0.5. Enzyme digestion specificity was set to Trypsin and maximum 2 missed cleavages were allowed. Oxidized methionine, protein N-term acetylation, lysine acetylation, asparagine, and glutamine (NQ) deamidation were set as variable modifications. Carbamidomethyl cysteine was set as fixed modification. For phosphorylation data analysis, phospho (STY) was also chosen as a variable modification. The tolerances of first search and main search for peptides were set at 20 ppm and 4.5 ppm, respectively. A cutoff of 1% FDR was applied at the peptide, protein, and site level. A minimum of 7 amino acids was required for peptide identification. For phosphosite and acetylsite localization, the localization probability >0.75 was considered as a confident identification. Contaminants, reverse sequences, and only considered identified sequences identified from the MaxQuant runs were removed except for KRT family proteins since KRT is known to be highly expressed in squamous cell carcinoma.

For DIA proteomics, raw data files were analyzed using DIA-NN software (version 1.8.0)97. The Swiss-Prot database was used for a library-free search with precursor FDR set as 1%. Deep learning-based spectra and retention time prediction were enabled. The fragment m/z was set from 200 to 1800 and the peptide length was set from 7 to 30. Trypsin/P was enabled and maximum number of missed cleavages set to 1. N-terminal methionine excision was enabled and cysteine carbamidomethylation was set as a fixed modification. The median number of points measured across chromatographic peak is 7.

Somatic mutation calling and filtering

WES sequencing reads after exclusion of low-quality reads were mapped to GRCh38 reference genome with BWA (Version: 0.7.17, http://bio-bwa.sourceforge.net/)98, PCR duplicates were removed by Picard (version 2.27.2, http://broadinstitute.github.io/picard/), the base quality score was recalibrated by the BaseRecalibrator tool from GATK (version v4.2.6.1, https://software.broadinstitute.org/gatk/). For patients with blood WES data, somatic variants were detected using Mutect2 on tumor exome data and matched blood data. For those without matched blood data, the genetic variants database of the 1000 Genome Project was used as the panel of normal. Specifically, germline variants were filtered from the database of the 1000 Genome Project, NHLBI-ESP 6500 Exome Sequencing Project, Exome Aggregation Consortium (EXAC), and Genome Aggregation Database (gnomAD). Further filtering was done to obtain high-confidence somatic mutations using the criteria: a minimum of 8X coverage, Variant Allele Fraction (VAF) \(\ge\) 5% and at least 3 variant supporting reads in the sample. Then, FilterMutectCalls. Annovar (version 2017 Jul 17, https://annovar.openbioinformatics.org/en/latest/) was used functionally annotate genetic variants based on the annotation of the human genome (GRCh38), version 32 (Ensembl 98)99, and mutations in the non-coding regions (3′UTR, 5′UTR, Intron, gene intergenic etc.) were removed.

Analysis of significantly mutated genes

The filtered mutations (including SNV and indel) were further used to identify significantly mutated genes by MutSigCV (version 1.41 http://www.broadinstitute.org/cancer/cga/MutSig) with default parameters. Genes with P < 0.05 were identified as significantly mutated genes100.

CNAs analysis

CNAs were called using cnvkit (version 0.9.7, https://cnvkit.readthedocs.io/en/stable/pipeline.html)101 on exome data with default parameters. All blood samples were generated to build the copy-number reference. Confident focal SCNAs across all tumor samples were identified by GISTIC2 with parameters amplifications threshold: 0.3, deletions threshold: −0.3, focal length cutoff: 0.5, value threshold: 0.25, genetic: yes, run broad analysis: yes, focal length cutoff: 0.5, confidence level: 0.99, arm peel: yes, joined segment size: 100, gene collapse method: extreme and max sample segs: 3000. Significantly amplified and deleted chromosome arms were identified with a threshold of FDR < 0.25102.

Comparisons of frequently mutated genes between cervical squamous cell carcinoma and adenocarcinoma

The somatic mutation data called by Mutect from GDC TCGA CC were obtained from https://xenabrowser.net/datapages/, and the cancer type information was retrieved from TCGA CC paper14. Fisher’s exact test was used to evaluate the statistical significance of the difference in mutation rates between patient groups.

RNA-seq data analysis

After removal of adaptor contamination and low-quality reads, sequencing reads were aligned using STAR (version 2.7.2a) to human reference sequence (GRCh38 assembly) and featureCounts was used to produce a matrix of read counts across all genes based on the GENCODE gene annotation (version 32)103. An average of 68 million paired-end reads were sequenced per sample. The ratios of uniquely aligned reads exceeded 88% in all samples. Then, Transcripts Per Million (TPM) normalized read count of each gene was calculated based on the gene length and read count mapped to this gene. Subsequently, the Log2 transformed TPM values were used for downstream analysis.

HPV detection

After being aligned to hg38 reference genome using STAR, the unmapped RNA-seq reads were then aligned to the reference genome of 440 HPV subtypes. HPV detection and genotyping algorithm HPV-EM was used to identify the exact genotypic makeup of each sample34. Here subtypes with mapped read count less than 10 were not considered. The dominant infected HPV subtype was determined based on the criterion that its mapped read count was more than ten times higher than the sum of the other subtypes. For the three tumor samples (TN33, TB103-1, and TB98-1) lacking transcriptome data, HPV infection status was determined based on clinical detection records. HPV-EM was also utilized to measure HPV gene expression, providing read counts for HPV genes. HPV gene expression was subsequently normalized to TPM using the total read count aligned to the human genome.

Inference of cell type score

ESTIMATE was used to infer the stromal and immune scores from the RNA expression data104. In addition, the signature-based method Xcell was used to dissect the relative infiltration levels of different immune cell types105.

Proteomic data analysis

Analysis of TMT quantitative proteomic data

Here we used reverse-zMAP developed based on our previous MAP model106 to process cancer proteomics data generated using iTRAQ/TMT platforms with internal reference samples. Briefly, all the clinical samples were separately compared to the internal reference sample generated in the same MS run using a modified MAP algorithm and a rescaled Log2 ratio of MS intensities, termed z-statistic, was calculated for each protein to describe its relative abundance change between the two samples. In details, for each MS run, we systematically performed pairwise comparison for each tissue sample against the reference sample and the Log2 ratio of its protein intensities is calculated as

$${M}_{{ij}}={{Log}}_{2}\left(\frac{{S}_{{ij}}}{{R}_{{ik}}}\right),$$

(1)

in which \({S}_{{ij}}\) denotes the MS intensity of protein \(i\) in sample \({j}\) and \({R}_{{ik}}\) represents its intensity in the reference sample of this MS run, i.e., run k. Then, a similar sliding window analysis to that performed in MAP was applied to derive a linear model for the Log2 ratios in each window.

$${M}_{j}={\mu }_{j}+\,{\sigma }_{j}*\,\widehat{q}$$

(2)

Next, natural cubic spline fitting was separately applied to characterize the dependence of the fitted \({\mu }_{j}\) and \({{\sigma }_{j}}^{2}\) on the Log2 protein intensity across all windows. Finally, for each protein, \({\mu }_{{ij}}\) and \({\sigma }_{{ij}}\) were calculated based on the fitted natural cubic spline function, respectively, and then, similar to the z-statistic defined in MAP, we defined the z-statistic of protein \(i\) in sample \({j}\) as

$${Z}_{{ij}}=\frac{{M}_{{ij}}-{\mu }_{{ij}}}{{\sigma }_{{ij}}},$$

(3)

in which the dependence of observed Log2 ratios on the protein intensities was effectively mitigated and the contribution of systematic and technical errors was rescaled to follow the standard normal distribution across all samples. Thus, it enables a better characterization of relative protein expression abundance and can be reliably used for downstream analyses.

Analysis of DIA quantitative proteomic data

For DIA proteomic data, protein expression profiles from all samples were normalized together, to minimize the impact of missing values, and we used the proteins that were detected in all samples to calculate the normalization factors. Since there are no internal reference samples in the DIA dataset, we take the geometric mean of all DIA proteomic profiles as a pseudo-reference sample. Then, a similar parallel pairwise comparison analysis to that performed on TMT data was applied to compare each DIA proteomic profile against the pseudo reference sample and calculate a z-statistic for each protein as the basis for downstream analysis.

Analysis of phosphoproteomic and acetylproteomic data

The phosphoproteomic and acetylproteomic data were normalized using the median centering method across adjacent phosphorylation/acetylation sites. Then, similar to the above analysis, we also utilized a sliding window approach on the M-A plot to first calculate the median modification intensity change and average modification intensity within each window. Subsequently, we applied natural cubic spline fitting to fit the dependence between the median modification intensity change and the average modification intensity across all windows. Then, the Log2 ratio of each protein’s modification intensities was adjusted by the estimated median modification intensity derived from the fitted natural cubic spline function. The median ratio of all the modification sites of each protein was calculated as a measure of its modification level.

Imputation of missing values

DreamAI ensemble algorithm107 (implemented using the DreamAI R package, https://github.com/WangLab-MSSM/DreamAI) was applied to impute the missing values in proteomic, phosphoproteomic and acetylproteomic data. Imputation was only performed on the proteins, phosphosites and acetylsites with a missing rate <50%.

Batch effect and data quality analysis of proteomic data

The batch effect in TMT data after transforming the original MS intensities into z-statistics was assessed by performing unsupervised PCA. The leading PCs of the global proteomic data clearly separated the tumors from NATs, and no obvious batch effect was observed among the 12 TMT batches. We also performed unsupervised PCA on the DIA proteomic data, and no obvious batch effect was observed among the 3 DIA batches, too.

Subgrouping analysis of transcriptomic, proteomic, phosphoproteomic and acetylproteomic data

For TMT proteomic data, we selected the foremost 3000 proteins demonstrating the highest variability across tumor samples. Among these, 1674 proteins were identified without any missing values across all tumor samples. Subsequently, consensus clustering was performed on this set of 1674 highly variable proteins using the ConsensusClusterPlus R package108. The detailed parameter settings were number of repetitions = 1000 bootstraps; pItem = 0.8 (resampling 80% of any sample); pFeature = 1; distance = “euclidean”; and k-means clustering with up to 5 clusters.

For phosphoproteomic and acetylproteomic data, top 3000 most variable phosphoproteins/phosphoproteins within tumor samples were selected based on the protein-level modification abundance matrix, and among them, phosphoproteins/phosphoproteins without missing values were used as input for the ConsensusClusterPlus R package to perform sample clustering, with the same parameters applied as above.

For RNA-seq data, we applied HyperChIP to model the mean-variance curve and subsequently utilized scaled variances that consider the mean-variance relationship for gene ranking109. Before constructing the model, we excluded genes with an average expression value of Log2 (TPM + 1) < 2 in the tumor samples. Standardized z-score matrix of Log2 (TPM + 1) for the top 3000 hypervariable genes were used to perform consensus clustering using identical parameters as above. In summary, the tumor samples were classified into three clusters at transcriptome and PTMs levels as well.

Differential protein expression/modification analysis between patient subgroups

ANOVA test was used to detect differentially expressed proteins, phosphosites and acetylsites among the three subgroups, with Bonferroni-adjusted P-values < 0.05 as the cutoff. Then, to detect proteins/PTM sites specifically expressed/modified in each subgroup, hierarchical clustering was applied to divide the differentially expressed proteins and PTM sites into 3 or 4 clusters using seaborn.clustermap (python module) with parameter metric = “correlation”; method = “average” and scipy.cluster.hierarchy.fcluster (Python module) with parameter criterion = “maxclust”. Pathway enrichment analysis of the proteins/PTM sites in each cluster was performed using GSEApy.enrichr (Python module) with the parameter: gene_sets = “KEGG_2021_Human”. Pathways with BH adjusted P < 0.05 were considered as significantly regulated.

Association between proteomic subgroup and clinical outcome

To compare the survival outcomes among the three proteomic subgroups, log-rank test was employed. Then, Kaplan-Meier survival curves were generated using the KaplanMeierFitter function in Python to visualize the survival difference between three subgroups of patients.

Tumor-NAT samples differential expression analysis

Differential expression analysis was performed for 30 paired CC tumors and NATs using the Student’s t test. P-values were adjusted using the BH method. Each feature was required to be non-missing in at least 50% of the paired samples. Proteins or phosphosites/acetylsites (collapsed into gene-level) differentially expressed between tumors and NATs (BH adjusted P < 0.01, Log2 FC > 1 or < −1) were further used for over-representation analysis by WebGestalt110.

Multi-omics data integration

Analysis of mRNA-protein expression correlation

Spearman’s correlation coefficient was used to measure the correlation between each mRNA-protein pair across all 132 tumors (using the RNA-seq and TMT proteomic data). In addition, a P-value was calculated for each correlation coefficient, which was then adjusted for multiple testing using the BH method. The correlation between a mRNA-protein pair was called significant if the adjusted P-value was found to be <0.05. A total number of 6089 mRNA-protein-matched genes were examined, with a median Spearman’s correlation coefficient of r = 0.40. Moreover, the mRNA and protein levels were found to be positively correlated for most (95.9%) mRNA-protein pairs, and 81.7% showed significant positive correlation.

Analysis of the cis/trans effect of CNAs

The effects of SCNAs on mRNA, protein, and phosphoprotein abundance levels in either cis (within the same aberrant locus) or trans (remote locus) mode were visualized by multiOmicsViz (R package). Spearman’s correlation coefficients and the associated P values after adjustment for multiple testing were calculated for all possible CNA-mRNA/protein/phosphoprotein pairs. BH adjusted P < 0.01 was used to identify the significant correlations. A total of 130 tumor samples with both mRNA, WES, and proteomic data were included in this analysis.

Identification of CNG-Cis and CNL-Cis genes

Among the 8035 genes located within amplification foci, 337 genes showed significantly higher abundance in tumor samples compared with NATs at both mRNA and protein levels (Wilcoxon rank-sum test, BH adjusted P < 0.05), and 227 genes also represented significant correlation of copy number with corresponding RNA levels, including 178 that displayed concordant protein expression (Spearman’s correlation >0, BH adjusted P < 0.05). These 178 genes were identified as CNG-Cis genes. The same screening method was used to identify 28 CNL-Cis genes that were downregulated in tumor samples.

Analysis of the effects of arm-level CNAs

Arm-level CNAs were detected using Log2 (tumor/blood) = 0.3/−0.3 (corresponding to arm-gain/loss, respectively) as the cutoff. Then, for each detected arm gain/loss event, the tumor samples were accordingly divided into two subgroups and the proteins whose expression was significantly associated with this event were identified using Student’s t test (with FDR < 0.05 as the cutoff). KEGG pathway enrichment analysis of the arm gain/loss-associated proteins supported by both TMT and DIA data was performed using GSEApy.enricher (python module) with the parameter gene_sets = “KEGG_2021_Human”111. Pathways with FDR < 0.05 were regarded as arm gain/loss-associated pathways.

Identification of radioresponse-related biomarkers

For each candidate protein, we stratified the 62 radical radiotherapy CC patients into two groups using its median expression level as the cutoff. Then, Cox proportional hazards regression for PFS data was performed to identify radioresponse-related biomarkers of CC. Proteins detected in at least more than half of the samples are used here. The filter criteria for biomarker were: P-value of Cox proportional hazards regression <0.05; Log (HR) upper 95% < 0 and Log (HR) lower 95% > 0 for candidate favorable and unfavorable proteins, respectively. Finally, 37 favorable and 55 unfavorable radioresponse-related biomarkers were identified, supported by both TMT and DIA proteomic data. Kaplan-Meier curves (log-rank test) were used to visualize the prognosis difference between radical radiotherapy patients with high and low protein expression.

Risk scoring model based on radioresponse-related biomarkers

We identified 92 radioresponse-related biomarkers, comprising 55 prognostically unfavorable proteins and 37 favorable ones. Unfavorable score and favorable score of each sample were calculated according to the following formulas.

$${{unfavorable\; score}}_{j}=\frac{\mathop{\sum }_{i=1}^{{m}_{j}}\left({Z}_{{ij}}-{median}\left({Z}_{i,j=1,2,\cdots {n}_{i}}\right)\right)}{{m}_{j}}$$

(4)

$${{favorable\; score}}_{j}=\frac{\mathop{\sum }_{i=1}^{{p}_{j}}\left({Z}_{{ij}}-{median}\left({Z}_{i,j=1,2,\cdots {q}_{i}}\right)\right)}{{p}_{j}}$$

(5)

\({Z}_{ij}\) donate z-statistic of protein \(i\) in sample \(j\). \({m}_{j}\) and \({p}_{j}\) represent the number of prognostically unfavorable or favorable proteins detected in sample \(j\), respectively. \({n}_{i}\) and \({q}_{i}\) represent the number of samples in which unfavorable or favorable protein \(i\) was detected, respectively.

Then we calculated the risk score for each sample, the risk scoring model based on above formulas was successfully constructed.

$${r{isk\; score}}_{j}={{unfavorable\; score}}_{j}-{{favorable\; score}}_{j}$$

(6)

62 radical radiotherapy patients were divided into high-risk and low-risk groups based on the median of the risk scores. Next, we validated the prognostic predictive ability of the risk scoring model using the TCGA CC. Firstly, we performed z-score transformation on the FPKM matrix at the gene level. Subsequently, risk scores were calculated for TCGA samples. 291 TCGA CC samples were divided into high-risk and low-risk groups based on the median of the risk scores. Kaplan-Meier analysis (log-rank test) was utilized to illustrate the prognosis disparity between the two groups.

Identification of subgroup-specific kinases

We applied KSEA to estimate the change in a kinase’s activity based on the collective phosphorylation changes of its identified substrates using the method proposed in KSEA app112. Here all Kinase-Substrate annotations from PhosphoSitePlus and from NetworKIN were used. To identify subgroup-specific kinases, we first performed a Student’s t test for each phosphosite between its phosphorylation levels in the samples of a specific subgroup and all other samples. Then, a kinase’s normalized KSEA score is calculated as:

$${KSEA\; Score}=\frac{(\bar{t}-\bar{p})\sqrt{m}}{\delta },$$

(7)

in which \(\bar{t}\) denotes the mean t-statistic of the known substrate phosphosites of this kinase, \(\bar{p}\) represents the mean t-statistic of all phosphosites in the dataset, m denotes the total number of known substrate phosphosites of this kinase, δ denotes the standard deviation of the t-statistics across all phosphosites in the dataset. Subsequently, the corresponding P-value is determined by assessing the one-tailed probability of having a more extreme score than the one measured based on the normal distribution. Finally, subgroup-specific kinases were selected based on: protein expression levels were found to be significantly higher in a specific subgroup than in other tumor samples (Student’s t test, P-value < 0.05); P-value of KSEA score < 0.05.

Pathway enrichment of HPV gene-associated proteins

Spearman’s correlation coefficient was calculated between HPV E6/L1 mRNA level and protein abundance of human genes, which was subsequently used to rank the human genes. Then, the resulting ranked gene list was used as input for the GSEApy.prerank (Python module) to perform pathway enrichment analysis using the following detailed settings: gene_sets = “KEGG_2021_Human”; min_size = 5; max_size = 1000; and permutation_num = 1000. Pathways with FDR < 0.05 were regarded as HPV gene-associated pathways.

Subgrouping analysis based on HPV gene expression

CC tumors were stratified into two subgroups based on the mRNA expression levels of HPV E2 and E5 genes by hierarchical clustering (using the Python function scipy.cluster.hierarchy.linkage with parameters method = “ward” and metric = “euclidean”). The resulting linkage matrix was then used as input for scipy.cluster.hierarchy.fcluster to create flat clusters, with the criterion parameter set to “maxclust” and “t = 2”. Then, we employed either Chi-square test or Fisher’s exact test to test the association between the detected subgroups and the clinical features of patients, depending on the number of categories in each clinical feature. For continuous clinical features, we used the Student’s t test. P-values were adjusted for multiple testing using the BH method.

Functional experiments

siRNA Interference

The siRNAs were synthesized by GenePharma. All siRNA transfections were performed with Lipofectamine 2000 (Invitrogen, 11668019) at 50 nM final concentration according to the manufacturer’s protocol. The siRNA sequence information was shown in Supplementary Data 11.

Plasmids

The HA-tagged coding sequence of human HATs was cloned into pCDNA3.1 vector. The Flag-tagged coding sequence of human FOSL2-WT or the relevant FOSL2-K222R, FOSL2-K222Q mutant were cloned into the lentiviral vector pLEX-MCS-CMV-puro (Addgene) to generate corresponding expression plasmids. The Flag-tagged coding sequence of human PRKCB were cloned into pLVX-IRES-Neo (Clontech) vector.

Cell transfection and lentiviral production

Cells were transfected with plasmid vectors using Lipofectamine 2000 according to the manufacturer’s instructions. To generate the lentivirus, HEK293T cells were co-transfected with psPAX2, pMD2.G, and recombinant lentiviral vectors. Forty-eight hours after transfection, the lentiviral supernatants were harvested and passed through a 0.45 μM filter. The lentiviruses were added to media supplemented with 10 μg/mL polybrene (Beyotime, C0351-1 mL) to transduce SiHa cells following the manufacturer’s instructions.

Constructions of stable cell lines

To establish FOSL2-WT or FOSL2-K222R, FOSL2-K222Q mutant SiHa cell line models, SiHa cells were infected with respective lentivirus and further selected with 0.6 µg/mL puromycin (Beyotime Biotechnology, ST551-10mg). To establish inducible PRKCB-overexpressing SiHa cell strains, SiHa cells were infected with pLVX-IRES-Neo-PRKCB lentivirus and further selected with 0.8 mg/mL G418 (Gibco, 10131035).

Cell proliferation assay

For cell growth assays, experimental and control cells were plated in 96-well plate (2*103 cells per well). For measurement, CCK-8 solution (Beyotime Biotechnology, C0039) at the final concentration of 10% was added to the wells, and absorbance at 450 nm was measured 2 h after incubation to represent the relative cell numbers.

Colony formation assay

To evaluate the clonogenic capacity of experimental and control cells, 500 cells were plated in six-well plate in triplicated. After 10–12 days of continuous culture, the cells were fixed with methanol and stained with 1% crystal violet (Sigma, C0775). The colonies were counted, and each assay was repeated at least three times independently.

Cell cycle analysis

For cell cycle assays, experimental and control cells were seeded in 6-well plate (2*105 cells per well). The cells were collected by trypsinization and fixed in ice-cold 70% ethanol overnight at 4 °C. The cell cycle detection kit (Nanjing KeyGen Biotech, KGA512) and BD LSR II flowcytometer (BD Biosciences) were applied to detect cell cycle distribution. The cell cycle result profiles were analyzed using ModFit LT 3.1 (Verity Software House).

IHC

The fresh tumors and NATs were fixed in 10% neutral buffered formalin, embedded in paraffin, and used to prepare sections that were 4 μm thick. The sections were deparaffinized, rehydrated, and then immersed in a 3% hydrogen peroxide solution for 10 min. For antigen retrieval, the sections were heated in citrate buffer (pH 6.0) at 95 °C for 25 min and then cooled at room temperature for 60 min. After each incubation step, the sections were washed with PBS (pH 7.4). Subsequently, the sections were incubated overnight at 4 °C with the anti-PRKCB antibody (Abcam, ab195039). Immunostaining was carried out following the instructions provided by the manufacturer (PV-9000 Polymer Detection System, Zhongshan Golden Bridge). The sections were counterstained with hematoxylin, dehydrated using graded ethanol, and sealed with neutral resin. Two investigators independently assessed the IHC staining of PRKCB on the sections.

Immunoblot and immunoprecipitation

Cells were lysed in EBC lysis buffer (50 mM Tris HCl, pH 8.0, 120 mM NaCl, 0.5% NP-40) supplemented with protease inhibitors (Selleck Chemicals) and phosphatase inhibitors (Selleck Chemicals), followed by pulse sonication for 10 s. 30 mg total protein were separated by 10% SDS-PAGE gel and blotted with indicated primary antibodies. For immunoprecipitation (IP), the whole cell lysates (WCL) were immunoprecipitated with anti-Flag M2 agarose beads for 2 h in the presence of 2 μM TSA and 10 mM nicotinamide (NAM). The pellet was then washed with NETN buffer (20 mM Tris-HCl, pH 8.0, 100 mM NaCl, 0.5% NP-40, 1 mM EDTA) for four times and analyzed by immunoblotting or MS.

qRT-PCR

Total RNA was extracted following the manufacturer’s instructions using the RNApure Tissue & Cell Kit (Cwbiotech, CW0560S). Subsequently, the isolated RNA served as a template for reverse-transcription reactions employing the HiFiScript cDNA Synthesis Kit (Cwbiotech, CW2569M). qRT-PCR analysis was conducted using the TB Green® Premix Ex Taq™ II (TaKaRa, RR820A) and CFX96 Real-Time System (Bio-Rad). β-actin was served as an internal control. The relative quantification of gene expression was analyzed by the 2Ct method. The primers used for qRT-PCR analyses are as following:

WNT5A Forward: 5′-GCCAGTATCAATTCCGACATCG-3′,

Reverse: 5′-TCACCGCGTATGTGAAGGC-3′

FOSL2 Forward: 5’-CAGAAATTCCGGGTAGATATGCC-3′

Reverse: 5′-GGTATGGGTTGGACATGGAGG-3′

β-actin Forward: 5′-AGAGCTACGAGCTGCCTGAC-3′,

Reverse: 5′-AGCACTGTGTTGGCGTACAG-3′

Xenograft tumorigenesis assay

Five-week-old female BALB/c nude mice (HFK Bioscience) were maintained in pathogen-free conditions. All animals were acclimated for 1 week before experiments. 3*106 experimental and control SiHa cells in 100 μL PBS were subcutaneously inoculated at the flank of randomly grouped nude mice. Tumor size was measured every 3 days with a caliper and tumor volumes were calculated by the formula: volume = (width)2*length*0.52. The maximum tumor burden allowed by the ethics committee did not exceed 1500 mm3. When the tumor burden reached 1500 mm3, the mice were euthanized, and the tumors were dissected for further analysis. After the mice were euthanized, the transplanted tumors were weighed and photographed. All animal procedures were performed according to the guidelines approved by the National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College and adhered to the National Institutes of Health Guide for the care and use of laboratory animals.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Check out our other content

Most Popular Articles