Current predictor: Aerobic biodegradation

Last update: June 26, 2022

About


Dataset:
The regression model uses SMILES strings (converted to fingerprints), time (day), guideline (e.g., OECD 301F), principle (e.g., closed respirometer), endpoint (e.g., ready or inherent), and reliability (e.g., 1 or 2) as the inputs. The biodegradation percentages are the output.

ML algorithms:
A total of 12 ML algorithms were examined to find the best one, including Ridge, Lasso, K nearest neighbors, Support vector regression, Decision tree, Random forest, Extra trees, Bagging, Adaptive boosting, Gradient boosting, and XGBoost.

XGBoost was found to be the best one.

Chemical representation:
MACCS fingerprints

Other notes:
Bayesian optimization was performed to tune the model hyperparameters. Chemical similarity calculation was conducted using the fingerprint similarity based on Tanimoto index to determine the model applicability domain.