API documentation
ChemPlot principal class is Plotter. It receives a list of molecules
as a parameter in order to then use different functions for plotting the data
in two dimensions. All the main functions of ChemPlot are part of the Plotter.
There are however two more functions outside of Plotter, which can be
used to access the sample datasets.
chemplot.Plotter
- class chemplot.Plotter(encoding_list, target, target_type, sim_type, get_desc, get_fingerprints)[source]
A class used to plot the ECFP fingerprints of the molecules used to instantiate it.
- Parameters:
__sim_type (string) – similarity type structural or tailored
__target_type (string) – target type R (regression) or C (classificatino)
__target (list) – list containing the target values. Is empty if a target does not exist
__mols (rdkit.Chem.rdchem.Mol) – list of valid molecules that can be plotted
__df_descriptors (Dataframe) – datatframe containing the descriptors representation of each molecule
__df_2_components (Dataframe) – dataframe containing the two-dimenstional representation of each molecule
__plot_title (string) – title of the plot reflecting the dimensionality reduction algorithm used
__data (list) – list of the scaled descriptors to which the dimensionality reduction algorithm is applied
pca_fit (sklearn.decomposition.TSNE) – PCA object created when the corresponding algorithm is applied to the data
tsne_fit (sklearn.manifold.TSNE) – t-SNE object created when the corresponding algorithm is applied to the data
umap_fit (umap.umap_.UMAP) – UMAP object created when the corresponding algorithm is applied to the data
df_plot_xy (Dataframe) – dataframe containing the coordinates that have been plotted
- classmethod from_smiles(smiles_list, target=[], target_type=None, sim_type=None)[source]
Class method to construct a Plotter object from a list of SMILES.
- Parameters:
smile_list (list) – List of the SMILES representation of the molecules to plot.
target (list) – target values
target_type (string) – target type R (regression) or C (classificatino)
sim_type (string) – similarity type structural or tailored
- Returns:
A Plotter object for the molecules given as input.
- Return type:
- classmethod from_inchi(inchi_list, target=[], target_type=None, sim_type=None)[source]
Class method to construct a Plotter object from a list of InChi.
- Parameters:
inchi_list (dict) – List of the InChi representation of the molecules to plot.
target (dict) – target values
target_type (string) – target type R (regression) or C (classificatino)
sim_type (string) – similarity type structural or tailored
- Returns:
A Plotter object for the molecules given as input.
- Return type:
- pca(**kwargs)[source]
Calculates the first 2 PCA components of the molecular descriptors.
- Parameters:
kwargs (key, value mappings) – Other keyword arguments are passed down to sklearn.decomposition.PCA
- Returns:
The dataframe containing the PCA components.
- Return type:
Dataframe
- tsne(perplexity=None, pca=False, random_state=None, **kwargs)[source]
Calculates the first 2 t-SNE components of the molecular descriptors.
- Parameters:
perplexity (int) – perplexity value for the t-SNE model
pca (boolean) – indicates if the features must be preprocessed by PCA
random_state (int) – random seed that can be passed as a parameter for reproducing the same results
kwargs (key, value mappings) – Other keyword arguments are passed down to sklearn.manifold.TSNE
- Returns:
The dataframe containing the t-SNE components.
- Return type:
Dataframe
- umap(n_neighbors=None, min_dist=None, pca=False, random_state=None, **kwargs)[source]
Calculates the first 2 UMAP components of the molecular descriptors.
- Parameters:
num_neighbors (int) – Number of neighbours used in the UMAP madel.
min_dist (float) – Value between 0.0 and 0.99, indicates how close to each other the points can be displayed.
random_state (int) – random seed that can be passed as a parameter for reproducing the same results
kwargs (key, value mappings) – Other keyword arguments are passed down to umap.UMAP
- Returns:
The dataframe containing the UMAP components.
- Return type:
Dataframe
- cluster(n_clusters=5, **kwargs)[source]
Computes the clusters presents in the embedded chemical space.
- Parameters:
n_clusters (int) – Number of clusters that will be computed
kwargs (key, value mappings) – Other keyword arguments are passed down to sklearn.cluster.KMeans
- Returns:
The dataframe containing the 2D embedding.
- Return type:
Dataframe
- visualize_plot(size=20, kind='scatter', remove_outliers=False, is_colored=True, colorbar=False, clusters=False, filename=None, title=None)[source]
Generates a plot for the given molecules embedded in two dimensions.
- Parameters:
size (int) – Size of the plot
kind (string) – Type of plot
remove_outliers (boolean) – Boolean value indicating if the outliers must be identified and removed
is_colored (boolean) – Indicates if the points must be colored according to target
colorbar (boolean) – Indicates if the plot legend must be represented as a colorbar. Only considered when the target_type is “R”.
clusters (boolean or list or int) – If True the clusters are shown instead of possible targets. Pass a list or a int to only show selected clusters (indexed by int).
filename (string) – Indicates the file where to save the plot
title (string) – Title of the plot.
- Returns:
The matplotlib axes containing the plot.
- Return type:
Axes
- interactive_plot(size=700, kind='scatter', remove_outliers=False, is_colored=True, clusters=False, filename=None, show_plot=False, title=None)[source]
Generates an interactive Bokeh plot for the given molecules embedded in two dimensions.
- Parameters:
size (int) – Size of the plot
kind (string) – Type of plot
remove_outliers (boolean) – Boolean value indicating if the outliers must be identified and removed
is_colored (boolean) – Indicates if the points must be colored according to target
clusters – Indicates if to add a tab with the clusters if these have been computed
filename (string) – Indicates the file where to save the Bokeh plot
show_plot (boolean) – Immediately display the current plot.
title (string) – Title of the plot.
- Returns:
The bokeh figure containing the plot.
- Return type:
Figure