Accelerate Environmental Remediation Through AI

Models

Existing machine learning models built in our lab, covering biodegradation, adsorption, and oxidation processes for organic/inorganic contaminant removal

Aerobic biodegradation

Aerobic biodegradation in water

Adsorption

Adsorption in water

Oxidation

Oxidation processes

To be updated...

Datasets

The datasets used for the development of above models

Aerobic biodegradation classification

Containing 6,170 data points with SMILES strings as the inputs and the class (0 or 1) as the output. Only ready biodegradation data with time of 28 and principles of closed bottle test, closed respirometer, and CO2 evolution were considered.

Aerobic biodegradation regression

Containing 12,750 data points and SMILES strings, time (day), guideline (e.g., OECD 301F), principle (e.g., closed respirometer), endpoint (e.g., ready or inherent), and reliability (e.g., 1 or 2) as the inputs. The biodegradation percentages were the output.

Tools

Useful frameworks/libraries used for the development of these models

Python3

The most widely used programming language for machine learning.

Jupyter Notebook

One of the most widely used web application for machine learning, which allows users to create and share documents that contain live code, equations, visualizations and narrative text.

Scikit-learn

One of the most useful tools providing dozens of ML models for classification, regression, clustering, and so on. It is a simple and efficient tool for predictive data analysis.

Pandas

One of the most popular tools for working with Excel or CSV files, or dataframe. It is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool.

RDKit

One of the most popularly used tools for working with organic chemistry. It allows users to draw chemicals, calculate molecular fingerprints, perform similarity calculations, and more.

Matplotlib

One of most widely used libraries for creating static, animated, and interactive visualizations in Python.

TensorFlow

An end-to-end open source platform for machine learning, widely used for developing deep neural network models.

PyTorch

An open source machine learning framework that accelerates the path from research prototyping to production deployment.