Skip to main content
  • Dublin
  • Boston
  • Worldwide

A hybrid approach to aqueous solubility prediction using COSMO-RS and machine learning

A hybrid approach to aqueous solubility prediction using COSMO-RS and machine learning

This paper (Mac Fhionnlaoich, N., Zeglinski, J., Simon, M., Wood, B., Davin, S., & Glennon, B. (2024). A hybrid approach to aqueous solubility prediction using COSMO-RS and machine learning. Process Safety and Environmental Protection.  https://doi.org/10.1016/j.cherd.2024.07.050) addresses a significant challenge in pharmaceutical development: the accurate prediction of aqueous solubility for drug candidates. Solubility is a critical property that influences a drug's bioavailability, efficacy, and overall success in therapeutic applications.

Traditional methods of solubility prediction, ranging from empirical to semi-empirical and theoretical models, each have their own limitations:

  • Empirical methods, while straightforward, are often time-consuming and require extensive experimental data.

  • Semi-empirical models, such as those based on Hansen solubility parameters or quantitative structure–property relationships (QSPR), offer a compromise between accuracy and efficiency but still rely on some experimental data.

  • Theoretical models, including molecular dynamics and COSMO-RS, provide deep insights into solute-solvent interactions but are computationally intensive and not always suitable for absolute predictions without calibration.

In response to these challenges, the authors propose a hybrid approach that leverages both thermodynamic modeling and machine learning to predict solubility more effectively, while minimizing the need for an extensive training database and eliminating the requirement for solute-specific experimental data.

The study utilizes COSMO-RS (Conductor-like Screening Model for Real Solvents), a quantum chemistry-based model that accurately describes molecular interactions at the electrostatic and steric levels. This framework is used to derive conformer-specific molecular descriptors—such as dielectric energy corrections, hydrogen bond donor and acceptor moments, and molecular volume—which are crucial for understanding aqueous solvation behavior.

These descriptors are then used as inputs for a fully connected feed-forward neural network with three hidden layers. The neural network is trained using a subset of the AquaSol database, which contains experimentally determined solubility data for various compounds.

The hybrid approach capitalizes on the strengths of both thermodynamic models and machine learning: COSMO-RS provides robust, theory-driven descriptors, while the neural network excels at modeling the complex, non-linear relationships between these descriptors and solubility.

One of the key advantages of this approach is its ability to make accurate solubility predictions without requiring solute-specific experimental data, which is particularly beneficial in the early stages of drug discovery when such data is often unavailable. This is achieved using quantum chemistry-derived descriptors that capture essential molecular interactions, reducing the need for extensive data collection and experimental validation.

The results demonstrate that the hybrid model offers high predictive power, even when trained on a relatively small dataset. The authors highlight the model's potential for improving the efficiency and accuracy of solubility predictions, which could lead to more effective drug candidate screening and ultimately reduce the time and cost associated with pharmaceutical development.

In conclusion, the paper presents a promising methodology that integrates the theoretical rigor of thermodynamic models with the flexibility and power of machine learning. This approach not only enhances the interpretability and generalization capabilities of solubility models but also represents a step forward in predictive modeling in the pharmaceutical industry.

By reducing dependency on experimental data, the model facilitates faster and more accurate assessments of drug candidates, paving the way for more efficient drug discovery and development processes.

View the full paper:

Mac Fhionnlaoich, N., Zeglinski, J., Simon, M., Wood, B., Davin, S., & Glennon, B. (2024). A hybrid approach to aqueous solubility prediction using COSMO-RS and machine learning. Process Safety and Environmental Protectionhttps://doi.org/10.1016/j.cherd.2024.07.050

Schedule a workshop

Meet one-on-one with our experts to discuss your development and manufacturing challenges and our approach to accelerating your medicines through CMC hurdles.