Abstract
Background/Objectives: Tablet development requires simultaneous optimization of multiple quality attributes under limited experimental budgets, yet formulation–property relationships are highly nonlinear in mixture systems. To support pre-formulation decision-making prior to extensive tablet prototyping, this study proposes an AI framework that organizes formulation and process data together with raw-material property records into a reusable database, and enriches conventional composition/process features with physically motivated mixture descriptors derived from raw-material properties and formulation/process settings.
Methods: Mixture-level scalar descriptors are constructed by composition-weighted aggregation of material properties, and particle size distribution (PSD) is incorporated via a compact set of summary statistics computed from composition-weighted mixture PSDs. Three feature sets are compared: (i) Materials + Processes (MP), (ii) MP with scalar Descriptors (MPD), and (iii) MPD with PSD summaries (MPDD). Five target properties are modeled: hardness, disintegration time, flow function, cohesion, and thickness. We train and evaluate Random Forest, Extra Trees Regressor, Lasso, Partial Least Squares, Support Vector Regression, and a multi-branch neural network that processes the three feature blocks separately and concatenates them for prediction. For interpolation assessment, repeated Train/Dev/Test splitting (5:3:2) across multiple random seeds is used, and the effect of feature augmentation is quantified by paired RMSE improvements with bootstrap confidence intervals and paired Wilcoxon signed-rank tests. To assess robustness under practical formulation updates, rolling-origin time-series splits are employed and Applicability Domain indicators are computed to characterize out-of-distribution coverage.
Results: Across interpolation evaluations, mixture-descriptor augmentation (MPD/MPDD) improves hardness and disintegration time in most settings, whereas gains for flow function are smaller and cohesion/thickness show mixed effects under limited sample sizes.
Conclusions: Under extrapolation-oriented evaluation, the descriptors can improve hardness but may degrade disintegration-time prediction under covariate shift, emphasizing the need for careful descriptor selection and dimensionality control when deploying pre-formulation predictors.
Background
In product development, multiobjective optimization to satisfy multiple target properties (specifications) simultaneously is commonplace; however, exploring candidate conditions requires iterative prototyping and experimentation, which imposes substantial time and cost burdens. In mixture systems in particular, interactions among raw materials often induce strongly nonlinear responses, making the search space high-dimensional and prone to nonconvexity.
In this study, we focus on tablets (compressed tablets) manufactured by compressing powders using a tableting machine. Tablets have been widely used not only in pharmaceuticals but also in dietary supplements because they can accurately contain a single dose and provide a stable, portable dosage form. However, tablet manufacturing requires fundamental powder properties, including flowability to ensure stable die filling and compressibility/compactability to reduce voids and form interparticle bonds during compression. Many active pharmaceutical ingredient powders exhibit poor tableting performance as-is, and it has been reported that fewer than 20% of active ingredients are suitable for direct compression [1]. Consequently, granulation-based approaches have been widely adopted to improve flowability and compressibility prior to tableting, including wet granulation using a binder solution and dry granulation in which primary particles are agglomerated under compressive stress [2]. In addition, advances in instrumentation and measurement technologies for tableting equipment have enabled monitoring and control of compaction pressure and fill depth, contributing to quality assurance and stable production. Furthermore, the development of compaction simulators that can reproduce production-scale compression cycles with small material quantities has facilitated understanding of compaction behavior and evaluation of scale-up [3].
From a formulation-design perspective, tablets comprise diverse excipients in addition to the active ingredient, such as fillers, binders, disintegrants, and lubricants; excipient design is essential both for achieving powder properties suitable for tableting and for ensuring manufacturability and tablet quality [4]. Moreover, improvements in flowability and compressibility and gains in productivity have been pursued through the development of high-functionality grades for direct compression, composite excipients, and the practical implementation of co-processed excipients [5,6]. In quality control, content uniformity is one of the most critical quality attributes and is strictly managed to ensure dose reproducibility and safety [7]. Thus, tablet development constitutes a complex system in which raw materials, composition, and process conditions interact, requiring design and process optimization to satisfy multiple properties simultaneously.
More broadly, tablet pre-formulation and early formulation/process development involve the integration of multiple factors, including active pharmaceutical ingredient (API) properties, excipient functionalities, particle size distribution, particle morphology, moisture-related properties, and processing conditions. These factors influence downstream critical quality attributes such as hardness, disintegration time, and thickness, as well as powder-processability attributes such as flowability and cohesion. Broadly, these variables can be categorized into material-related variables (e.g., composition, particle size distribution, physicochemical properties, and solid-state characteristics) and process-related variables (e.g., granulation conditions and compression settings). In this study, we focus on variables that are typically available in early-stage development datasets and use them to construct predictive models via mixture-level descriptor engineering.
In recent years, data-driven approaches, including machine learning and deep learning, have attracted attention as means to enhance design and decision making by learning patterns and relationships from large-scale data that are difficult for humans to identify. In materials science, research has advanced under the umbrella of materials informatics [8]; in chemistry and drug discovery, the QSAR/QSPR framework has been established within chemoinformatics, and descriptor-based prediction using numerical representations of structure and properties has been widely adopted [9]. In addition, the concept of process informatics, which links material properties and process conditions in an informatics framework for optimization, has been proposed [10]. For mixture systems such as tablets, appropriately aggregating component-level information into mixture-level features is important for both predictive performance and interpretability.
Regarding AI applications in tablet formulations, studies have reported learning the relationships between formulation/process conditions and critical quality attributes (CQAs) to predict properties such as hardness and disintegration time (DT). For example, Akseli et al. proposed a framework that combines non-destructive ultrasonic measurements with machine learning to estimate tablet fracture strength and disintegration behavior from tableting conditions and formulation factors [11]. In large-scale studies using curated formulation databases, deep neural networks and optimized ensemble models have demonstrated high predictive performance for DT and hardness, with some reports achieving 𝑅2 >0.95 [12,13].
However, many existing approaches incorporate post-compression properties, such as tablet hardness, friability, and wetting time, as input variables. Although such integration can improve predictive accuracy, it implicitly assumes that physical tablets have already been manufactured and characterized; consequently, its direct applicability to pre-manufacturing formulation screening and early-stage decision making is limited.
In contrast, efforts have also been reported to predict formulation-level properties using compositional descriptors and raw-material characteristics without relying on post-manufacturing measurements [14,15]. To enable prediction at the design stage, an appropriate representation of formulation mixtures is essential; however, many existing models do not explicitly model mixture-level physical agglomeration or particle size distribution (PSD) effects and instead directly encode excipient composition as a concentration vector.
In light of the above, this study introduces a feature-engineering strategy for mixture systems that considers (i) aggregation of physicochemical properties of raw materials according to mixing ratios and (ii) mixture properties that summarize mixture PSDs into statistically compact descriptors. By explicitly aligning feature construction with physical mixing behavior, we aim to strengthen predictive reliability at the true pre-formulation stage through controlled statistical comparisons and performance-improvement assessment using bootstrap confidence intervals. Furthermore, with an emphasis on prediction in extrapolative regions that is important in the context of process analytical technology (PAT) and quality by design (QbD), we quantify performance differences and coverage via stratified evaluation based on the applicability domain (AD) and assess extrapolation risk.
The objective of this study is to develop and validate a machine-learning framework for tablet development that enables pre-formulation screening using only raw-material information, composition, and process conditions, without relying on post-compression measurements as inputs. Accordingly, we address the following research questions (RQs). We compare three feature sets: MP (Materials + Processes: composition and process conditions only), MPD (MP plus composition-weighted scalar mixture descriptors), and MPDD (MPD plus PSD summary statistics from mixture particle-size distributions).
-
RQ1 (Performance gain by feature augmentation): To what extent does augmenting MP with mixture descriptors and PSD summaries (MP→MPD/MPDD) improve predictive performance across target properties?
-
RQ2 (Robustness under deployment-like shift): Under deployment-like temporal distribution shift (rolling-origin time-series split), are the improvements observed in interpolation evaluation preserved, and for which targets?
-
RQ3 (Risk screening via applicability domain): Can applicability-domain (AD) indicators identify low-coverage regions where prediction errors increase, enabling AD-aware screening of risky predictions?
To answer these questions, the following section describes the dataset construction, feature-set design (MP/MPD/MPDD), model development, and evaluation protocols for both interpolation and extrapolation-oriented settings.
Download the full article as PDF here Application of AI in Tablet Development
or continue reading here
2. Materials and Methods
2.1. Sample Preparation and Measurement
The tablet samples comprised fillers, binders, disintegrants, lubricants, and other excipients. Examples of the materials used are shown in Table 1. Granulated materials were treated as raw materials in the same manner as primary powders. The data on formulations, process conditions, and powder/tablet properties used in this study were not obtained from public sources such as open databases; rather, all samples were prepared by the authors and all values were measured using the corresponding instruments. For sample preparation, each material was weighed using an analytical balance and uniformly blended according to the formulation. The blended powder was then filled and compressed using a single-punch tablet press for testing (N-30E, OKADA SEIKO, Okayama, Japan), and tablets were produced by compaction at the specified compaction force. The compaction force was set to 4400–19,600 N (450–2000 kgf).
Table 1. Examples of raw materials used in the formulations prepared in this study.
| Category | Number of Materials Used | Examples |
|---|---|---|
| Diluents | 8 | Microcrystalline cellulose (MCC) Lactose |
| Disintegrants | 9 | Partially pregelatinized starch Sodium carboxymethyl cellulose |
| Polyols | 4 | Maltitol Erythritol |
| Lubricants | 4 | Magnesium stearate Calcium stearate |
| Binders | 2 | Hydroxypropyl methylcellulose (HPMC) Hydroxypropyl cellulose (HPC) |
| Glidants | 2 | Colloidal silicon dioxide |
In this study, we modeled five target variables relevant to tablet development, including post-compression tablet properties (hardness, disintegration time, and thickness) and pre-compression powder flow/cohesion characteristics. The definition and measurement method for each property are described below. These target variables are important indicators for evaluating tablet performance, quality, and manufacturability (including stability of filling and compression), and they are also key metrics in tablet design. Because the number of observations differs by target variable, the sample size is also provided for each.
-
Hardness [N], 𝑛 =1209: A measure of mechanical strength against external forces, directly related to chipping and capping during transport/handling and to edge defects and cracking during manufacturing. Excessive hardness, however, can impair disintegration and dissolution. Measured using a digital hardness tester (KHT-40N, FUJIWARA, Wakayama, Japan).
-
Disintegration time [minute], 𝑛 =882: The time required for a tablet to disintegrate under specified conditions, which particularly affects initial dissolution and absorption for immediate-release formulations. It is strongly influenced by the type and amount of disintegrant, lubricant level, hardness/compactness, PSD, and hydrophilic excipients. In this study, it was measured in water at 37 ℃ using a disintegration tester (NT-600, TOYAMA Industry, Toyama City, Japan).
-
Flow function [-], 𝑛 =151: A dimensionless index derived from shear-cell testing; higher values indicate better flowability. It is directly related to uniform die filling and to mass variability (a prerequisite for content uniformity). It varies with PSD, particle shape, surface roughness, moisture content, and electrostatic charging, as well as with the addition of lubricants and glidants. At the design stage, target flowability is ensured through the selection of granulation method and binder and through particle-size design, and is reflected in process settings such as hopper angle and feeder speed. Measured using a powder flow tester (Brookfield, Toronto, ON, Canada).
-
Cohesion [kPa], 𝑛 =145: A measure of the strength of adhesion/agglomeration between particles. High cohesion can promote bridging, rat-holing, classification, and segregation, thereby reducing flowability. It is sensitive to particle-size reduction, nonuniform moisture, surface energy, and electrostatic effects. In formulation design, cohesion is controlled via particle-size optimization, granulation, and addition of glidants to ensure stable compression behavior and uniform filling. Measured using a powder flow tester (Brookfield).
-
Thickness [mm], 𝑛 =60: Tablet thickness after compression reflects the fill mass and degree of compaction and affects swallowability, disintegration/dissolution behavior, and manufacturing stability. It correlates with weight and hardness, and changes in geometry can also alter drug diffusion behavior. Thickness depends on die/punch specifications and compression conditions; control within specification is also important for appearance conformity, packaging compatibility, and manufacturing stability. Measured using a micrometer (PK-1012CPX, Mitutoyo, Kawasaki, Japan).
Because sample size differs substantially across target variables, the results for the smaller datasets, particularly Flow function, Cohesion, and Thickness, should be interpreted as exploratory and with greater statistical uncertainty than those for Hardness and Disintegration Time. To clarify dataset structure, we additionally audited each target variable using formulation-level uniqueness defined on a material-composition basis. For the tablet-property targets Hardness, Disintegration Time, and Thickness, we also audited formulation–process-condition uniqueness defined on a combined material-composition and process-condition basis. Hardness (𝑛 =1209) comprised 388 unique formulations and 998 unique formulation–process conditions, and disintegration time (𝑛 =882) comprised 371 unique formulations and 698 unique formulation–process conditions. Thus, some observations shared identical formulation–process conditions in these two targets. Thickness contained repeated formulations (23 unique formulations across 60 observations) but all observations were unique once process conditions were included. For the pre-compression powder properties Flow function and Cohesion, uniqueness was evaluated on a material-composition basis only; under this definition, Flow function (𝑛 =151) comprised 136 unique formulations and Cohesion (𝑛 =145) comprised 134 unique formulations. Some observations originate from identical formulation–process conditions, and therefore the dataset is not fully independent. This is explicitly acknowledged as a limitation of the present study. Detailed counts are provided in the Supporting Information (Table S1).
Hamaguchi, M.; Adachi, T.; Arai, N. Application of AI in Tablet Development: An Integrated Machine Learning Framework for Pre-Formulation Property Prediction. Pharmaceutics 2026, 18, 452. https://doi.org/10.3390/pharmaceutics18040452
Enjoy our new webinar:
Lipid-based formulations and enteric capsules to enhance oral bioavailability of peptides










































All4Nutra







