Sample datasets

ChemPlot provides some sample datasets that can be used to get started right away with exploring the libraries features. These datasets can be loaded with the following function:

from chemplot import load_data

df = load_data("BBBP")

In these case we are loading the BBBP dataset, used in the previous sections of this manual. load_data() returns a pandas DataFrame built using the sample dataset provided as a parameter. Chemplot contains the following sample datasets:

ID	Name	Type	Size
C_1478_CLINTOX_2	Clintox (Toxicity) [1] [2] [3] [4]	Classification	1478
C_1513_BACE_2	BACE (Inhibitor) [5]	Classification	1513
C_2039_BBBP_2	BBBP (Blood-brain barrier penetration) [6]	Classification	2039
C_41127_HIV_3	HIV [7]	Classification	41127
R_642_SAMPL	SAMPL (Hydration free energy) [8]	Regression	642
R_1513_BACE	BACE (Binding affinity) [5]	Regression	1513
R_4200_LOGP	LOGP (Lipophilicity) [9]	Regression	4200
R_1291_LOGS	LOGS (Aqueous Solubility) [10]	Regression	1291
R_9982_AQSOLDB	AQSOLDB (Aqueous Solubility) [11]	Regression	9982

The datasets ID are constructed in the following way:

Name Formatting: type_size_name_num_of_classes.csv

type: R->Numerical and C->Categorical
size: Number of instances in the dataset
name: Name of dataset
num_of_classes: Number of classes (Categorical only)

You can retrieve the datasets by passing their ID to load_data().

Note

The first 8 datasets in the table are edited versions of the MoleculeNet repository [12].

You can print the available sample datasets to console with ChemPlot using the following function:

from chemplot import info_data

df = info_data()

Sample datasets

References: