RanDepict package#

Submodules#

RanDepict.randepict module#

class RanDepict.randepict.RandomDepictor(seed=None, hand_drawn=None, *, config=None)[source]#

Bases: Augmentations, CDKFunctionalities, IndigoFunctionalities, PikachuFunctionalities, RDKitFuntionalities

This class contains everything necessary to generate a variety of random depictions with given SMILES strings. An instance of RandomDepictor can be called with a SMILES str and returns an np.array that represents the RGB image with the given chemical structure.

batch_depict_save(smiles_list, images_per_structure, output_dir, augment, ID_list, shape=(299, 299), processes=4, seed=42)[source]#

Batch generation of chemical structure depictions without usage of fingerprints. The images are saved at a given path.

Return type:

None

Args:

smiles_list (List[str]): List of SMILES str images_per_structure (int): Amount of images to create per SMILES output_dir (str): Output directory augment (bool): indicates whether or not to use augmentations ID_list (List[str]): List of IDs (should be as long as smiles_list) shape (Tuple[int, int], optional): Defaults to (299, 299). processes (int, optional): Number of threads. Defaults to 4. seed (int, optional): Seed for random decisions. Defaults to 42.

batch_depict_save_with_fingerprints(smiles_list, images_per_structure, output_dir, ID_list, indigo_proportion=0.15, rdkit_proportion=0.25, pikachu_proportion=0.25, cdk_proportion=0.35, aug_proportion=0.5, shape=(299, 299), processes=4, seed=42)[source]#

Batch generation of chemical structure depictions with usage of fingerprints. This takes longer than the procedure with batch_depict_save but the diversity of the depictions and augmentations is ensured. The images are saved in the given output_directory

Return type:

None

Args:

smiles_list (List[str]): List of SMILES str images_per_structure (int): Amount of images to create per SMILES output_dir (str): Output directory ID_list (List[str]): IDs (len: smiles_list * images_per_structure) indigo_proportion (float): Indigo proportion. Defaults to 0.15. rdkit_proportion (float): RDKit proportion. Defaults to 0.25. pikachu_proportion (float): PIKAChU proportion. Defaults to 0.25. cdk_proportion (float): CDK proportion. Defaults to 0.35. aug_proportion (float): Augmentation proportion. Defaults to 0.5. shape (Tuple[int, int]): [description]. Defaults to (299, 299). processes (int, optional): Number of threads. Defaults to 4.

batch_depict_with_fingerprints(smiles_list, images_per_structure, indigo_proportion=0.15, rdkit_proportion=0.25, pikachu_proportion=0.25, cdk_proportion=0.35, aug_proportion=0.5, shape=(299, 299), processes=4, seed=42)[source]#

Batch generation of chemical structure depictions with usage of fingerprints. This takes longer than the procedure with batch_depict_save but the diversity of the depictions and augmentations is ensured. The images are saved in the given output_directory

Return type:

None

Args:

smiles_list (List[str]): List of SMILES str images_per_structure (int): Amount of images to create per SMILES output_dir (str): Output directory ID_list (List[str]): IDs (len: smiles_list * images_per_structure) indigo_proportion (float): Indigo proportion. Defaults to 0.15. rdkit_proportion (float): RDKit proportion. Defaults to 0.3. cdk_proportion (float): CDK proportion. Defaults to 0.55. aug_proportion (float): Augmentation proportion. Defaults to 0.5. shape (Tuple[int, int]): [description]. Defaults to (299, 299). processes (int, optional): Number of threads. Defaults to 4.

central_square_image(im)[source]#

This function takes image (np.array) and will add white padding so that the image has a square shape with the width/height of the longest side of the original image.

Return type:

array

Args:

im (np.array): Input image

Returns:

np.array: Output image

depict_from_fingerprint(smiles, fingerprints, schemes, shape=(299, 299), seed=42)[source]#

This function takes a SMILES representation of a molecule, a list of one or two fingerprints and a list of the corresponding fingerprint schemes and generates a chemical structure depiction that fits the fingerprint. ___ If only one fingerprint/scheme is given, we assume that they contain information for a depiction without augmentations. If two are given, we assume that the first one contains information about the depiction and the second one contains information about the augmentations. ___ All this function does is set the class attributes in a manner that random_choice() knows to not to actually pick parameters randomly.

Return type:

array

Args:

fingerprints (List[np.array]): List of one or two fingerprints schemes (List[Dict]): List of one or two fingerprint schemes shape (Tuple[int,int]): Desired output image shape

Returns:

np.array: Chemical structure depiction

depict_save(smiles, images_per_structure, output_dir, augment, ID, shape=(299, 299), seed=42)[source]#

This function takes a SMILES str, the amount of images to create per SMILES str and the path of an output directory. It then creates images_per_structure depictions of the chemical structure that is represented by the SMILES str and saves it as PNG images in output_dir. If augment == True, it adds augmentations to the structure depiction. If an ID is given, it is used as the base filename. Otherwise, the SMILES str is used.

Args:

smiles (str): SMILES representation of molecule images_per_structure (int): Number of images to create per SMILES output_dir (str): output directory path augment (bool): Add augmentations (if True) ID (str): ID (used for name of saved image) shape (Tuple[int, int], optional): im shape. Defaults to (299, 299) seed (int, optional): Seed. Defaults to 42.

depict_save_from_fingerprint(smiles, fingerprints, schemes, output_dir, filename, shape=(299, 299), seed=42)[source]#

This function takes a SMILES representation of a molecule, a list of one or two fingerprints and a list of the corresponding fingerprint schemes, generates a chemical structure depiction that fits the fingerprint and saves the resulting image at a given path. ___ If only one fingerprint/scheme is given, we assume that they contain information for a depiction without augmentations. If two are given, we assume that the first one contains information about the depiction and the second one contains information about the augmentations. ___ All this function does is set the class attributes in a manner that random_choice() knows to not to actually pick parameters randomly.

Return type:

None

Args:

smiles (str): SMILES representation of molecule fingerprints (List[np.array]): List of one or two fingerprints schemes (List[Dict]): List of one or two fingerprint schemes output_dir (str): output directory filename (str): filename shape (Tuple[int,int]): output image shape Defaults to (299,299). seed (int): Seed for remaining random decisions

Returns:

np.array: Chemical structure depiction

classmethod from_config(config_file)[source]#
Return type:

RandomDepictor

get_depiction_functions(smiles)[source]#

PIKAChU, RDKit and Indigo can run into problems if certain R group variables are present in the input molecule, and PIKAChU cannot handle isotopes. Hence, the depiction functions that use their functionalities need to be removed based on the input smiles str (if necessary).

Return type:

List[Callable]

Args:

smiles (str): SMILES representation of a molecule

Returns:

List[Callable]: List of depiction functions

has_r_group(smiles)[source]#

Determines whether or not a given SMILES str contains an R group

Return type:

bool

Args:

smiles (str): SMILES representation of molecule

Returns:

bool

random_choice(iterable, log_attribute=False)[source]#

This function takes an iterable, calls random.choice() on it, increases random.seed by 1 and returns the result. This way, results produced by RanDepict are replicable.

Additionally, this function handles the generation of depictions and augmentations from given fingerprints by handling all random decisions according to the fingerprint template.

Args:

iterable (List): iterable to pick from log_attribute (str, optional): ID for fingerprint. Defaults to False.

Returns:

Any: “Randomly” picked element

random_depiction(smiles, shape=(299, 299))[source]#

This function takes a SMILES and depicts it using Rdkit, Indigo, CDK or PIKACHU. The depiction method and the specific parameters for the depiction are chosen completely randomly. The purpose of this function is to enable depicting a diverse variety of chemical structure depictions.

Return type:

array

Args:

smiles (str): SMILES representation of molecule shape (Tuple[int, int], optional): im shape. Defaults to (299, 299)

Returns:

np.array: Chemical structure depiction

random_depiction_with_coordinates(smiles, augment=False, shape=(512, 512))[source]#

This function takes a SMILES and depicts it using Rdkit, Indigo or CDK. We cannot use PIKAChU here, as it does not depict given coordinates, but it always generates them during the prediction process. The depiction method and the specific parameters for the depiction are chosen completely randomly. The purpose of this function is to enable depicting a diverse variety of chemical structure depictions.

The depiction (np.array) and the cxSMILES (str) that encodes the coordinates of the depicted molecule are returned.

Return type:

Tuple[array, str]

Args:

smiles (str): SMILES representation of a molecule augment (bool, optional): Whether add augmentations to the image. Defaults to False. shape (Tuple[int, int], optional): Image shape. Defaults to (512, 512).

Returns:

Tuple[np.array, str]: structure depiction, cxSMILES

Module contents#

RanDepict Python Package. This repository contains RanDepict, an easy-to-use utility to generate a big variety of chemical structure depictions (random depiction styles and image augmentations).

Example:#

>>> from RanDepict import RandomDepictor
>>> smiles = "CN1C=NC2=C1C(=O)N(C(=O)N2C)C"
>>> with RandomDepictor() as depictor:
>>>    image = depictor(smiles)

Have a look in the RanDepictNotebook.ipynb for more examples.

For comments, bug reports or feature ideas, please raise an issue on the Github repository.