Artificial Intelligence/Machine Learning Assisted Image Analysis for Characterizing Biotherapeutics
The scientific challenge
Therapeutic protein drugs (for example, monoclonal antibodies) have revolutionized treatment options in various cancers, autoimmune diseases, infectious diseases, and genetic disorders, but, due to their complexity, characterization of these products presents major challenges. One of these challenges arises from subvisible particles consisting of protein aggregates that can be generated under various stress conditions (or due to the presence of container leachables or silicone oil in pre-filled syringes) (1). Although they comprise a small fraction of the total protein, these particles may in some cases increase risks of undesirable immune responses in patients. The root causes of protein aggregation can be elusive; the various stresses encountered during manufacturing, shipping, storage, and administration of protein drugs (Figure 1) promote aggregation by different molecular mechanisms, creating particles with a wide variety of sizes, shapes, and compositions (1, 2).
Among the imaging-based techniques for characterizing these particles that have been explored, flow imaging microscopy (FIM) has shown particular promise. From a single sample, FIM can record large collections of complex images of individual subvisible particles. Although these image sets are rich in structural information, manual extraction of this information is cumbersome. Current vendor-provided methods for FIM image analysis use only human-defined features such as aspect ratio, compactness, or pixel intensity, and thus most of the complex morphological information encoded in an FIM image is underutilized. To overcome the shortcomings of current optical image analyses techniques applied to therapeutic proteins, one possible solution is the application of artificial intelligence/machine learning approach (AI/ML), specifically, convolutional neural networks (CNNs or ConvNets). CNNs are a class of artificial neural networks that have proven useful in many areas of image analysis. The CNNs enable automatic extraction of data-driven features (i.e., measurable characteristics or properties) encoded in images (Figure 2). These complex features (e.g., fingerprints specific to stressed proteins) extracted by CNNs, can potentially be used to monitor the morphological features of particles in biotherapeutics, and enable tracking the consistency of particles in a drug product (2-4).
In a case study, CDER researchers and collaborators at Ursa Analytics, University of Colorado, and the National Institute of Standards and Technology implemented CNNs (Figure 2) to further understanding of certain product quality attributes for a model therapeutic protein formulation. The study demonstrated the advances in microscopy analyses and a tool to monitor changes in product quality attributes ¬ can be achieved through CNN-based approaches, and the study’s findings may be applicable to evaluation of a variety of biopharmaceutical drug products.
Figure 1: Schematic describing the various stresses during manufacturing, shipping, storage, and administration of protein therapeutics. The mechanisms by which particulates (e.g., aggregates) are generated can depend upon the applied stress.
In interest of conflict of interest disclosures, ParticleSentryAI is a commercial software product under development leveraging some of the techniques described in this article. The computational tools used in this study are being developed in a commercial software package called “ParticleSentryAI”.
Convolutional Neural Networks
CNNs are a class of artificial neural networks that have proven invaluable in tasks such as image recognition and classification. A schematic workflow of a representative CNN is illustrated in Figure 2 (basic components are described in greater detail in the legend). These networks can be trained with vast quantities of input data using supervised learning or using a fingerprinting approach, both of which have important applications in analyzing complex visual data such as FIM images.
Figure 2. Basic CNN workflow. Here a CNN is used as an image “classifier”, i.e., the network is intended to process an image of a single particle and predict if that particle comes from one of the two classes: “Stressed” or “Non-Stressed”. (Note that for stressed condition, the model protein solution is kept under shaking for 7 days at ambient temperature, non-stressed protein solution is kept at ambient temperature without shaking stress). To train (i.e., estimate the most discriminatory parameters) this classifier, a large collection of images properly labeled as stressed or unstressed was used. The first step is pre-processing of these FIM images (resizing, normalization, segmentation, etc.) to generate image batches for efficient processing. Then the CNN sequentially passes the batches of images through several “convolutional layers.” Within each convolutional layer, a “filter” (which is itself a small 2D image) is convolvedi with the input image. The parameters of the filters are determined by optimizing a measure that is specific to the task at hand (e.g., a binary cross-entropy lossii in the image classification task shown here). Once all model parameters are estimated (or “learned”), the CNN can process new images in a feedforward fashion. That is, in each convolutional layer, a new set of filters (whose parameters were determined in the “learning phase”) are convolved with the input images from the previous layers producing new “activation images” which serve as input images for the next layer (usually with a smaller size and increased number of channels compared to the images of the previous layer). After passing through all the convolution filter layers, the resulting activation images are typically passed to a fully connected artificial neural network to extract the final “data-driven” features.
Supervised learning techniques allow CNNs to extract feature information from raw images and correlate these features to experimental conditions that generate different particle images with different morphologies. Supervised learning relies on pre-defined labels (e.g., ‘non-stressed’ and ‘stressed’ in the example shown in Figure 3) associated with individual images for training of the network. Once trained, the CNN can be used to predict which of the pre-defined labels best apply to a new image that has not been used in training. This approach is useful in root-cause analysis when the conditions that cause protein aggregation are precisely known a priori (2-4).
Figure 3. Supervised image classification. In the tables above, the ground truth labels are shown in the rows and the model predicted labels are shown as columns. A) Based on single image classification, data were analyzed using supervised image classification for the non-stressed and stressed protein formulations. The numbers depict the fraction of 10,000 test images assigned into the pre-defined categories (non-stressed and stressed condition). B) Based on multiple image classification, pooling a subset of images, (25 pooled images, Npool = 25) increases the fraction of correctly predicted classes to more than 98% for each of the pre-defined categories (non-stressed and stressed condition); this pooling technique is described in (4).
The fingerprint approach (3, 4) was originally motivated by processes (or events) occurring during manufacturing that lead to potential aggregation inducing stresses, where the precise nature of process upsets is not known a priori, so traditional supervised learning approaches are not applicable. The CNN used in the fingerprint approach is trained using labeled samples, but instead of aiming to predict classes, the network is optimized to reduce the dimension of the spatially correlated image pixel intensities, resulting in a new lower dimensional (e.g., 2D) representation of each image. The lower dimensional representation can be used to help in analyzing or can “curate” complex morphology encoded in a heterogeneous collection of FIM images since the full images can readily be mapped to the lower dimensional representation enabled by the CNN as illustrated in Figure 4.
The “fingerprint” is the probability density estimate of the lower dimensional 2D representation of a collection of images from a user-specified reference condition (e.g., drug product images measured under a known shaking protocol). This fingerprint enables one to carry out quantitative product characterization applications through CNN image analysis (3, 4). This analysis is accomplished by combining the fingerprint with formal goodness-of-fit statistical hypothesis testing to determine if a new collection of particle images is consistent with the expected distribution from a fingerprint of a known reference condition (3, 4). One advantage of the fingerprint approach is its ability to detect new, unanticipated particle populations that were not considered in the initial CNN model (3, 4).
Figure 4. Fingerprint analysis of proteins (non-stressed and stressed protein formulation) and the ethylene tetrafluoroethylene (ETFE) protein surrogate is shown here. The image is reproduced from (4) and in the legend “Shaken” is equivalent to “stressed”. NIST ETFE is the standard undiluted stock of ETFE; “Unstressed” and “stressed” correspond to identical ¬globulin formulations that were unstressed and mechanically agitated, respectively. The collages shown on the side panels correspond to the nearest ≈ 50 particles from a selected class (stressed or unstressed protein or ETFE particles) corresponding to the embedding point indicated by the arrows.
In the case study discussed here (Figure 4), darker, seemingly denser, particles are more typical of stressed protein aggregates whereas lighter “fluffier” particles are more commonly encountered in the unstressed protein aggregate images. This representation can be used during manufacturing (for product evaluation) or in exploratory data analysis.
Implementing AI/ML in monitoring changes in product quality attributes during manufacturing of therapeutic proteins
The pre-market research case study used FIM to evaluate the impact of applied stressors (e.g., freeze-thawing, agitation) on model protein formulations (globulins and monoclonal antibodies) (4). As a comparator with which one could validate image measurements, the investigators used abraded ETFE particles that mimic proteinaceous aggregates generated in a protein solution and are stable for up to three years (5).
The ETFE protein surrogate particles tested were found to be statistically repeatable and stable over time when analyzed through the CNN fingerprint approach. These qualities make the ETFE reference standard a promising tool in helping to robustly validate a new “CNN-based” analytical method. For example, the ETFE standard helped quantify the potential sensitivity and precision of the fingerprint algorithm (4). Changes in experimental parameters such as instrument focus, formulation solution refractive index, and illumination intensity can all affect microscopy image quality, so having a reliable reference standard such as ETFE to check instrument calibration is crucial for testing product quality based on CNN image analysis. The fingerprinting algorithm reproducibly detected complex distinguishing “textural features” or “fingerprint features” of protein particles that are otherwise neglected using the standard instrumental measurements. This approach is also used for adjusting tunable hypothesis testing parameters used in real world applications (4).
How does this work help advance drug product quality?
Application of artificial intelligence/machine learning (AI/ML) in the form of CNNs has enabled processing of large collections of images with high efficiency and accuracy, by distinguishing complex “textural features” which are not readily delineated with existing image processing software. The methodology applied in this study is applicable to a range of products in pharmaceuticals and biopharmaceuticals to monitor changes in product attributes (e.g., particles/aggregates) during manufacturing. The analytical procedure (i.e., flow microscopy combined with CNN image analysis) explored in the CDER research study can detect small shifts in protein aggregate populations due to stresses resulting from unknown process upsets providing potential new strategies for monitoring product quality attributes. In addition, use of a reference standard such as ETFE that is stable over time and possesses optical properties similar to those of protein aggregates is useful for validating and evaluating the robustness of the analytical procedure.
- Carpenter JF, Randolph TW, Jiskoot W, Crommelin DJ, Middaugh CR, Winter G, et al. Overlooking subvisible particles in therapeutic protein products: gaps that may compromise product quality. J Pharm Sci. 2009 Apr;98(4):1201-5. Epub 2008 Aug 16. doi: 10.1002/jps.21530. PubMed PMID: 18704929; PubMed Central PMCID: PMC3928042.
- Calderon CP, Daniels AL, Randolph TW. Deep Convolutional Neural Network Analysis of Flow Imaging Microscopy Data to Classify Subvisible Particles in Protein Formulations. J Pharm Sci. 2018 Apr;107(4):999-1008. Epub 2017 Dec 23. doi: 10.1016/j.xphs.2017.12.008. PubMed PMID: 29269269.
- Daniels AL, Calderon CP, Randolph TW. Machine learning and statistical analyses for extracting and characterizing "fingerprints" of antibody aggregation at container interfaces from flow microscopy images. Biotechnol Bioeng. 2020 Nov;117(11):3322-35. Epub 2020 July 28. doi: 10.1002/bit.27501. PubMed PMID: 32667683; PubMed Central PMCID: PMC7855730.
- Calderon CP, Ripple DC, Srinivasan C, Ma Y, Carrier MJ, Randolph TW, et al. Testing Precision Limits of Neural Network-Based Quality Control Metrics in High-Throughput Digital Microscopy Pharmaceutical Research. 2022 Feb;39(2):263-79. Epub 2022 Jan 26. doi: 10.1007/s11095-021-03130-9. PubMed PMID: 35080706.
- Ripple DC, Telikepalli S, Steffens KL, Carrier MJ, Montgomery CB, Ritchie NW, et al. Reference Material 8634: Ethylene Tetrafluoroethylene for Particle Size Distribution and Morphology. Special Publication (NIST-SP). 2019 May. doi: 10.6028/NIST.SP.260-193.
i In a convolution, values in specific areas of the input layer are multiplied by corresponding values in smaller matrices (filters) that have been positioned over a portion of the input and the products are summed. This sum is entered into a mathematical function to produce input for the next layer of the network. (See Lecture 5, Convolutional Neural Networks (https://www.youtube.com/watch?v=bNb2fEVKeEo&t=6s) for a helpful tutorial on how CNNs work).
ii Binary cross entropy is a metric to evaluate how the CNN has performed in its classification task. The CNN output consists of probabilities that each of the elements to be classified belong to a given class label. As the probabilities for the known class labels approach one, the cross entropy approaches zero; this scenario represents the “ideal” CNN classifier. For example, in a two-category classification task if the CNN calculated a probability of one that every stressed image was a stressed image, and a probability of zero that every unstressed image was a stressed image, the cross entropy would be zero and the CNN would be performing perfectly (in practice, the probabilities of a good classifier are not exactly one as shown in Fig. 3).