Accelerate Environmental Remediation Through AI


Existing machine learning models built in our lab, covering biodegradation, adsorption, and oxidation processes for organic/inorganic contaminant removal

Aerobic biodegradation

Aerobic biodegradation in water


Adsorption in water


Oxidation processes

To be updated...


The datasets used for the development of above models

Aerobic biodegradation classification

Containing 6,170 data points with SMILES strings as the inputs and the class (0 or 1) as the output. Only ready biodegradation data with time of 28 and principles of closed bottle test, closed respirometer, and CO2 evolution were considered.

Aerobic biodegradation regression

Containing 12,750 data points and SMILES strings, time (day), guideline (e.g., OECD 301F), principle (e.g., closed respirometer), endpoint (e.g., ready or inherent), and reliability (e.g., 1 or 2) as the inputs. The biodegradation percentages were the output.


Useful frameworks/libraries used for the development of these models


The most widely used programming language for machine learning.

Jupyter Notebook

One of the most widely used web application for machine learning, which allows users to create and share documents that contain live code, equations, visualizations and narrative text.


One of the most useful tools providing dozens of ML models for classification, regression, clustering, and so on. It is a simple and efficient tool for predictive data analysis.


One of the most popular tools for working with Excel or CSV files, or dataframe. It is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool.


One of the most popularly used tools for working with organic chemistry. It allows users to draw chemicals, calculate molecular fingerprints, perform similarity calculations, and more.


One of most widely used libraries for creating static, animated, and interactive visualizations in Python.


An end-to-end open source platform for machine learning, widely used for developing deep neural network models.


An open source machine learning framework that accelerates the path from research prototyping to production deployment.