Abstract
Polysorbate 20 (PS20) and polysorbate 80 (PS80) are essential surfactants used to stabilize biopharmaceutical products, yet their highly heterogeneous mixtures and susceptibility to oxidation and enzymatic hydrolysis complicate routine analysis. We developed a hierarchical generative model that reconstructs entire liquid chromatography–mass spectrometry (LC–MS) measurements to automatically interpret complex polysorbate datasets. By embedding domain knowledge of base structures, oxyethylene chain lengths, fatty acid esterification, and isotope patterns, the model resolves individual subspecies and provides molecular-level composition. Applied to PS20 and PS80, the approach distinguishes oxidative from hydrolytic degradation and yields pathway-specific fingerprints. Model outputs agree closely with manual integration while delivering greater depth and automation. This transforms polysorbate analysis from labor-intensive peak-by-peak work into an objective, comprehensive characterization tool suited for quality control, batch selection and degradation monitoring throughout development and manufacturing.
Introduction
The rapid development of Artificial Intelligence (AI) in recent years has revolutionized numerous industries, including pharmaceutical development. AI-assisted analytics offer promising opportunities to enhance efficiency and precision in drug development. Generative models, a subset of AI, are also transforming the development of pharmaceuticals. These models, can design new molecules with desired properties, speeding up the research.(Mardikoraem et al., 2023) Moreover, automation of analysis processes using AI makes it possible to effectively handle large sample number or complex data structures. Generative modeling has emerged as a powerful technique for analyzing complex datasets across various fields, including analytical chemistry. Polysorbates (PS) 20 and 80, by far the most used surfactants for biological stabilization,(Wuchner et al., 2022a, Wuchner et al., 2022b) represent heterogeneous mixtures comprising more than 600 individual substances,(Evers et al., 2020) making them ideal candidates for generative modeling approaches. In LC-MS data analysis of PS, huge amounts of data are generated. Therefore, this approach offers a novel and comprehensive method for extracting detailed information about polysorbate composition and degradation patterns.
Unlike traditional analytical methods that identify and quantify specific peaks or regions of interest, generative modeling recreates the entire LC-MS dataset by optimizing underlying parameters, defined according to the challenge to solve. This comprehensive approach simultaneously considers all data aspects, yielding more robust analysis. By incorporating domain knowledge of polysorbate structures, mass spectrum patterns, chromatographic behavior, and potential degradation products, the model generates synthetic LC-MS data that closely matches observations. This approach enables detailed extraction of polysorbate sample composition, revealing both the presence and relative quantities of various species and their degradation products.
According to the pharmacopoeia, polysorbates comprise a hydrophilic headgroup (either sorbitan or isosorbide) ethoxylated with up to four or two polyoxyethylene (POE) chains, respectively. These POE moieties follow a normal distribution of around 26 units for sorbitan and 13 units for isosorbide.(Evers et al., 2020) They are further esterified with up to four or two hydrophobic fatty acids (FAs) for sorbitan and isosorbide, respectively. The FA composition adds an additional layer of complexity, ranging from caproic (C6:0) to linoleic acid (C18:2) for PS20 and from myristic (C14:0) to linolenic acid (C18:3) for PS80 (Table S1). Polysorbate nomenclature derives from the predominant fatty acid present, PS20 primarily contains polyoxyethylene sorbitan monolaurate (C12:0) (Fig. 1), while PS80 contains mainly polyoxyethylene sorbitan monooleate (C18:1) (Table S1).(Kerwin, 2008) Notably, these principal structures constitute only about 20 % (w/w) of the total material,(Hewitt et al., 2011) with significant variation depending on manufacturers and quality grades. Weber et al. (2023) provide a comprehensive overview of the diverse fatty acids and their potential esterification products in PS20 and PS80.(Weber et al., 2023).

Polysorbate heterogeneity encompasses four distinct categories: (i) base molecule diversity (sorbitan/isosorbide), (ii) polyoxyethylene (POE) chain variation, (iii) esterification degree, and (iv) fatty acid composition.(Kishore, 2018) This complexity increases further when considering the various degradation products that can form during pharmaceutical product storage.(Borisov et al., 2015; Kishore et al., 2011).
The complexity of polysorbate mixtures and their degradation pathways (oxidation vs enzymatic hydrolysis) makes selecting appropriate analytical strategies challenging (Doshi et al., 2021; Felix et al., 2025; Gopalrathnam et al., 2018; Kozuch et al., 2023; Schultz-Fademrecht et al., 2024) . While determining PS content may suffice for well-characterized processes or products, troubleshooting often requires substantially more sophisticated analytical approaches, with degradation pathway identification alone potentially requiring multiple analytical methods.(Carle et al., 2024) Researchers have published numerous analytical techniques over recent years, including fluorescence detection, nuclear magnetic resonance (NMR) and reversed-phase high performance liquid chromatography (RP-HPLC) coupled with charged aerosol detection (CAD), evaporative light scattering detection (ELSD), or mass spectrometry (MS) to elucidate the degradation of PS.(Bhargava et al., 2021; Borisov et al., 2015; Doshi et al., 2020a; Dwivedi et al., 2020; Evers et al., 2020; Kranz et al., 2019; Lippold et al., 2017; Penfield and Rumbelow, 2020; Zhang et al., 2015) MS offers distinct advantages over other detection methods by distinguishing PS species with different m/z values, providing an additional dimension that enables selective determination of co-eluting substances. This capability makes MS particularly valuable for in-depth exploration of polysorbate chemistry and identification of degradation pathways (Ayorinde et al., 2000; Evers et al., 2020; Liu et al., 2022).
Though generative modeling has been applied across various fields, its application to LC-MS analysis of polysorbates introduces a novel approach that could significantly enhance characterization of these critical compounds in biopharmaceutical formulations. This paper presents an algorithm that extracts and characterizes distinctive fingerprints of PS20 and PS80 from different vendors. A subsequent study will demonstrate methods for identifying various degradation pathways and determining oxidation degrees. To our knowledge, this represents the first algorithm capable of automatically extracting individual PS20 and PS80 sub-species and their oxidation products.
Download the full article as PDF here Extraction of the polysorbate 20 and 80 fingerprint via generative modeling
or read more here
Materials
The following materials were used: PS20 high purity (HP) (Croda Inc., Mill Hall, PA, USA); PS20 super refined (SR) (Croda, Columbus Circle, Edison, NJ, USA); PS20 china grade (CG) (Nanjing Well Health Technolgy Co., Building 5, R&D Zone 5, No. 64 Suning Avenue, Xuanwu District, Nanjing, China); PS80 HP (Croda Inc., Mill Hall, PA, USA); PS20 super refined (SR) (Croda, Columbus Circle, Edison, NJ, USA); PS80 china grade (CG) (Nanjing Well Health Technology Co., Building 5, R&D Zone 5, No. 64 Suning Avenue, Xuanwu District, Nanjing, China); acetonitrile ≥99.95 % (Carl Roth GmbH + Co. KG, Karlsruhe, Germany); methanol ≥99.8 % (Fisher Scientific GmbH, Schwerte, Germany); ammonium formate (Merck KGaA, Darmstadt, Germany); monoclonal antibody 1 (mAb1 Boehringer Ingelheim Pharma GmbH & Co KG, Biberach an der Riß, Germany); Milli-Q was obtained from an IQ 7000 Ultrapure Lab Water System from Merck (KGaA) (Darmstad, Germany); L-histidine HCl (Ajinomoto Health & Nutrition North America, Itasca, IL, USA); L-histidine (Ajinomoto Health & Nutrition North America, Itasca, IL, USA).
Peter Roelants, Reza Ranjbar Choubeh, Nico Verbeeck, Rabindranath Andujar, Torsten Schultz-Fademrecht, Patrick Garidel, Viktor Gross, Extraction of the polysorbate 20 and 80 fingerprint via generative modeling, International Journal of Pharmaceutics: X, 2025, 100433, ISSN 2590-1567, https://doi.org/10.1016/j.ijpx.2025.100433.
Read also the interesting article:
A thermodynamic investigation into protein–excipient interactions involving different grades of polysorbate 20 and 80

















































