API documentation
ChemPlot principal class is Plotter
. It receives a list of molecules
as a parameter in order to then use different functions for plotting the data
in two dimensions. All the main functions of ChemPlot are part of the Plotter
.
There are however two more functions outside of Plotter
, which can be
used to access the sample datasets.
chemplot.Plotter
- class chemplot.Plotter(encoding_list, target, target_type, sim_type, get_desc, get_fingerprints)[source]
A class used to plot the ECFP fingerprints of the molecules used to instantiate it.
- Parameters
__sim_type (string) – similarity type structural or tailored
__target_type (string) – target type R (regression) or C (classificatino)
__target (list) – list containing the target values. Is empty if a target does not exist
__mols (rdkit.Chem.rdchem.Mol) – list of valid molecules that can be plotted
__df_descriptors (Dataframe) – datatframe containing the descriptors representation of each molecule
__df_2_components (Dataframe) – dataframe containing the two-dimenstional representation of each molecule
__plot_title (string) – title of the plot reflecting the dimensionality reduction algorithm used
__data (list) – list of the scaled descriptors to which the dimensionality reduction algorithm is applied
pca_fit (sklearn.decomposition.TSNE) – PCA object created when the corresponding algorithm is applied to the data
tsne_fit (sklearn.manifold.TSNE) – t-SNE object created when the corresponding algorithm is applied to the data
umap_fit (umap.umap_.UMAP) – UMAP object created when the corresponding algorithm is applied to the data
df_plot_xy (Dataframe) – dataframe containing the coordinates that have been plotted
- classmethod from_smiles(smiles_list, target=[], target_type=None, sim_type=None)[source]
Class method to construct a Plotter object from a list of SMILES.
- Parameters
smile_list (list) – List of the SMILES representation of the molecules to plot.
target (list) – target values
target_type (string) – target type R (regression) or C (classificatino)
sim_type (string) – similarity type structural or tailored
- Returns
A Plotter object for the molecules given as input.
- Return type
- classmethod from_inchi(inchi_list, target=[], target_type=None, sim_type=None)[source]
Class method to construct a Plotter object from a list of InChi.
- Parameters
inchi_list (dict) – List of the InChi representation of the molecules to plot.
target (dict) – target values
target_type (string) – target type R (regression) or C (classificatino)
sim_type (string) – similarity type structural or tailored
- Returns
A Plotter object for the molecules given as input.
- Return type
- pca(**kwargs)[source]
Calculates the first 2 PCA components of the molecular descriptors.
- Parameters
kwargs (key, value mappings) – Other keyword arguments are passed down to sklearn.decomposition.PCA
- Returns
The dataframe containing the PCA components.
- Return type
Dataframe
- tsne(perplexity=None, pca=False, random_state=None, **kwargs)[source]
Calculates the first 2 t-SNE components of the molecular descriptors.
- Parameters
perplexity (int) – perplexity value for the t-SNE model
pca (boolean) – indicates if the features must be preprocessed by PCA
random_state (int) – random seed that can be passed as a parameter for reproducing the same results
kwargs (key, value mappings) – Other keyword arguments are passed down to sklearn.manifold.TSNE
- Returns
The dataframe containing the t-SNE components.
- Return type
Dataframe
- umap(n_neighbors=None, min_dist=None, pca=False, random_state=None, **kwargs)[source]
Calculates the first 2 UMAP components of the molecular descriptors.
- Parameters
num_neighbors (int) – Number of neighbours used in the UMAP madel.
min_dist (float) – Value between 0.0 and 0.99, indicates how close to each other the points can be displayed.
random_state (int) – random seed that can be passed as a parameter for reproducing the same results
kwargs (key, value mappings) – Other keyword arguments are passed down to umap.UMAP
- Returns
The dataframe containing the UMAP components.
- Return type
Dataframe
- cluster(n_clusters=5, **kwargs)[source]
Computes the clusters presents in the embedded chemical space.
- Parameters
n_clusters (int) – Number of clusters that will be computed
kwargs (key, value mappings) – Other keyword arguments are passed down to sklearn.cluster.KMeans
- Returns
The dataframe containing the 2D embedding.
- Return type
Dataframe
- visualize_plot(size=20, kind='scatter', remove_outliers=False, is_colored=True, colorbar=False, clusters=False, filename=None, title=None)[source]
Generates a plot for the given molecules embedded in two dimensions.
- Parameters
size (int) – Size of the plot
kind (string) – Type of plot
remove_outliers (boolean) – Boolean value indicating if the outliers must be identified and removed
is_colored (boolean) – Indicates if the points must be colored according to target
colorbar (boolean) – Indicates if the plot legend must be represented as a colorbar. Only considered when the target_type is “R”.
clusters (boolean or list or int) – If True the clusters are shown instead of possible targets. Pass a list or a int to only show selected clusters (indexed by int).
filename (string) – Indicates the file where to save the plot
title (string) – Title of the plot.
- Returns
The matplotlib axes containing the plot.
- Return type
Axes
- interactive_plot(size=700, kind='scatter', remove_outliers=False, is_colored=True, clusters=False, filename=None, show_plot=False, title=None)[source]
Generates an interactive Bokeh plot for the given molecules embedded in two dimensions.
- Parameters
size (int) – Size of the plot
kind (string) – Type of plot
remove_outliers (boolean) – Boolean value indicating if the outliers must be identified and removed
is_colored (boolean) – Indicates if the points must be colored according to target
clusters – Indicates if to add a tab with the clusters if these have been computed
filename (string) – Indicates the file where to save the Bokeh plot
show_plot (boolean) – Immediately display the current plot.
title (string) – Title of the plot.
- Returns
The bokeh figure containing the plot.
- Return type
Figure