Sample datasets
ChemPlot provides some sample datasets that can be used to get started right away with exploring the libraries features. These datasets can be loaded with the following function:
from chemplot import load_data
df = load_data("BBBP")
In these case we are loading the BBBP dataset, used in the previous sections of this
manual. load_data() returns a pandas DataFrame built using the sample dataset
provided as a parameter.
Chemplot contains the following sample datasets:
ID |
Name |
Type |
Size |
|---|---|---|---|
C_1478_CLINTOX_2 |
Classification |
1478 |
|
C_1513_BACE_2 |
BACE (Inhibitor) [5] |
Classification |
1513 |
C_2039_BBBP_2 |
BBBP (Blood-brain barrier penetration) [6] |
Classification |
2039 |
C_41127_HIV_3 |
HIV [7] |
Classification |
41127 |
R_642_SAMPL |
SAMPL (Hydration free energy) [8] |
Regression |
642 |
R_1513_BACE |
BACE (Binding affinity) [5] |
Regression |
1513 |
R_4200_LOGP |
LOGP (Lipophilicity) [9] |
Regression |
4200 |
R_1291_LOGS |
LOGS (Aqueous Solubility) [10] |
Regression |
1291 |
R_9982_AQSOLDB |
AQSOLDB (Aqueous Solubility) [11] |
Regression |
9982 |
The datasets ID are constructed in the following way:
Name Formatting: type_size_name_num_of_classes.csv
type: R->Numerical and C->Categorical
size: Number of instances in the dataset
name: Name of dataset
num_of_classes: Number of classes (Categorical only)
You can retrieve the datasets by passing their ID to load_data().
Note
The first 8 datasets in the table are edited versions of the MoleculeNet repository [12].
You can print the available sample datasets to console with ChemPlot using the following function:
from chemplot import info_data
df = info_data()
References: