Deciphering molecular insights of HDAC6 inhibition through SHAP-based interpretation of optimized machine learning models
Cơ quan, tổ chức của tác giả
DOI:
https://doi.org/10.59882/1859-364X/311Tóm tắt
Histone deacetylase 6 (HDAC6) is an important target for cancer treatment; however creating effective and selective inhibitors remains a considerable challenge. Machine learning (ML) can speed up drug discovery, though the interpretability of these models is limited. This study aimed to create optimized ML models to predict HDAC6 inhibitory activity, using SHapley Additive exPlanations (SHAP) to enhance interpretability. Bioactivity data (IC50 values) for HDAC6 inhibitors were curated from ChEMBL and BindingDB. All inhibitors were classified as active or inactive based on comparison to SAHA. Five ML algorithms (Decision Tree, Random Forest, SVM, XGBoost, AdaBoost) were trained using five different molecular fingerprints: MACCS Keys, Morgan2, ECFP2, ECFP4, ECFP6. Hyperparameter tuning was conducted to optimize model performance. The best-performing model, ECFP6-RF, achieved high predictive performance on the test set (Accuracy: 90.20%, Precision: 91.53%, AUC-ROC: 96.25%) while maintaining minimal overfitting (train-test gap < 8%). SHAP analysis of the ECFP6-RF model identified key structural features that strongly contributed to HDAC6 inhibition. Notably, fragments associated with the hydroxamic acid zinc-binding group and specific aliphatic/aromatic linkers were highlighted as highly influential, consistent with established structure-activity relationships (SAR). This work demonstrates the successful application of optimized ML models combined with explainable AI (XAI), providing interpretable insights into the molecular determinants of activity.