API documentation

ChemPlot principal class is Plotter. It receives a list of molecules as a parameter in order to then use different functions for plotting the data in two dimensions. All the main functions of ChemPlot are part of the Plotter. There are however two more functions outside of Plotter, which can be used to access the sample datasets.

chemplot.Plotter

class chemplot.Plotter(encoding_list, target, target_type, sim_type, get_desc, get_fingerprints)[source]

A class used to plot the ECFP fingerprints of the molecules used to instantiate it.

Parameters
  • __sim_type (string) – similarity type structural or tailored

  • __target_type (string) – target type R (regression) or C (classificatino)

  • __target (list) – list containing the target values. Is empty if a target does not exist

  • __mols (rdkit.Chem.rdchem.Mol) – list of valid molecules that can be plotted

  • __df_descriptors (Dataframe) – datatframe containing the descriptors representation of each molecule

  • __df_2_components (Dataframe) – dataframe containing the two-dimenstional representation of each molecule

  • __plot_title (string) – title of the plot reflecting the dimensionality reduction algorithm used

  • __data (list) – list of the scaled descriptors to which the dimensionality reduction algorithm is applied

  • pca_fit (sklearn.decomposition.TSNE) – PCA object created when the corresponding algorithm is applied to the data

  • tsne_fit (sklearn.manifold.TSNE) – t-SNE object created when the corresponding algorithm is applied to the data

  • umap_fit (umap.umap_.UMAP) – UMAP object created when the corresponding algorithm is applied to the data

  • df_plot_xy (Dataframe) – dataframe containing the coordinates that have been plotted

classmethod from_smiles(smiles_list, target=[], target_type=None, sim_type=None)[source]

Class method to construct a Plotter object from a list of SMILES.

Parameters
  • smile_list (list) – List of the SMILES representation of the molecules to plot.

  • target (list) – target values

  • target_type (string) – target type R (regression) or C (classificatino)

  • sim_type (string) – similarity type structural or tailored

Returns

A Plotter object for the molecules given as input.

Return type

Plotter

classmethod from_inchi(inchi_list, target=[], target_type=None, sim_type=None)[source]

Class method to construct a Plotter object from a list of InChi.

Parameters
  • inchi_list (dict) – List of the InChi representation of the molecules to plot.

  • target (dict) – target values

  • target_type (string) – target type R (regression) or C (classificatino)

  • sim_type (string) – similarity type structural or tailored

Returns

A Plotter object for the molecules given as input.

Return type

Plotter

pca(**kwargs)[source]

Calculates the first 2 PCA components of the molecular descriptors.

Parameters

kwargs (key, value mappings) – Other keyword arguments are passed down to sklearn.decomposition.PCA

Returns

The dataframe containing the PCA components.

Return type

Dataframe

tsne(perplexity=None, pca=False, random_state=None, **kwargs)[source]

Calculates the first 2 t-SNE components of the molecular descriptors.

Parameters
  • perplexity (int) – perplexity value for the t-SNE model

  • pca (boolean) – indicates if the features must be preprocessed by PCA

  • random_state (int) – random seed that can be passed as a parameter for reproducing the same results

  • kwargs (key, value mappings) – Other keyword arguments are passed down to sklearn.manifold.TSNE

Returns

The dataframe containing the t-SNE components.

Return type

Dataframe

umap(n_neighbors=None, min_dist=None, pca=False, random_state=None, **kwargs)[source]

Calculates the first 2 UMAP components of the molecular descriptors.

Parameters
  • num_neighbors (int) – Number of neighbours used in the UMAP madel.

  • min_dist (float) – Value between 0.0 and 0.99, indicates how close to each other the points can be displayed.

  • random_state (int) – random seed that can be passed as a parameter for reproducing the same results

  • kwargs (key, value mappings) – Other keyword arguments are passed down to umap.UMAP

Returns

The dataframe containing the UMAP components.

Return type

Dataframe

cluster(n_clusters=5, **kwargs)[source]

Computes the clusters presents in the embedded chemical space.

Parameters
  • n_clusters (int) – Number of clusters that will be computed

  • kwargs (key, value mappings) – Other keyword arguments are passed down to sklearn.cluster.KMeans

Returns

The dataframe containing the 2D embedding.

Return type

Dataframe

visualize_plot(size=20, kind='scatter', remove_outliers=False, is_colored=True, colorbar=False, clusters=False, filename=None, title=None)[source]

Generates a plot for the given molecules embedded in two dimensions.

Parameters
  • size (int) – Size of the plot

  • kind (string) – Type of plot

  • remove_outliers (boolean) – Boolean value indicating if the outliers must be identified and removed

  • is_colored (boolean) – Indicates if the points must be colored according to target

  • colorbar (boolean) – Indicates if the plot legend must be represented as a colorbar. Only considered when the target_type is “R”.

  • clusters (boolean or list or int) – If True the clusters are shown instead of possible targets. Pass a list or a int to only show selected clusters (indexed by int).

  • filename (string) – Indicates the file where to save the plot

  • title (string) – Title of the plot.

Returns

The matplotlib axes containing the plot.

Return type

Axes

interactive_plot(size=700, kind='scatter', remove_outliers=False, is_colored=True, clusters=False, filename=None, show_plot=False, title=None)[source]

Generates an interactive Bokeh plot for the given molecules embedded in two dimensions.

Parameters
  • size (int) – Size of the plot

  • kind (string) – Type of plot

  • remove_outliers (boolean) – Boolean value indicating if the outliers must be identified and removed

  • is_colored (boolean) – Indicates if the points must be colored according to target

  • clusters – Indicates if to add a tab with the clusters if these have been computed

  • filename (string) – Indicates the file where to save the Bokeh plot

  • show_plot (boolean) – Immediately display the current plot.

  • title (string) – Title of the plot.

Returns

The bokeh figure containing the plot.

Return type

Figure

Utils

chemplot.load_data(name)[source]

Returns one of the sample datasets.

Parameters

name (string) – Name of the sample dataset

Returns

The Dataframe of the sample dataset

Return type

Dataframe

chemplot.info_data()[source]

Prints the metadata relative to the available sample datasets.