Current predictor: Aerobic biodegradation

Last update: June 26, 2022

About


Dataset:
The classification model uses SMILES strings (converted to fingerprints) as the input and the class (0 or 1) as the output. Only ready biodegradation data with time of 28 and principles of closed bottle test, closed respirometer, and CO2 evolution are considered.

ML algorithms:
A total of 14 ML algorithms were examined to find the best one, including K nearest neighbors, Linear support vector machine (SVM), Radial basis function SVM (RBF SVM), Gaussian process, Neural net multi-layer perceptron classifier, Decision tree, Random forest, Bagging, Adaptive boosting, Gradient boosting, XGBoost, Extra tree, Gaussian Naive Bayes, Quadratic discriminant analysis.

XGBoost was found to be the best one.

Chemical representation:
MACCS fingerprints

Other notes:
Data balancing was performance as the two classes were not well balanced. Bayesian optimization was conducted for tuning the model hyperparameters. Chemical similarity calculation was performed using the fingerprint similarity based on Tanimoto index to determine the model applicability domain.