gChem: Chemistry Tools for Google Spreadsheets

Researchers: Richard L Apodaca, Jean-Claude Bradley, Andrew SID Lang
Note: All scripts are released under the MIT license

Introduction

In 2011, Richard Apodaca had the idea to enhance Google Spreadsheets with common functions useful for cheminformatics and made the first custom script that gave Google Spreadsheets access to the Chemical Identification Resolver, a webservice that allows one to convert one chemical structure identifier into another (e.g. common name into SMILES). Jean-Claude Bradley and Andrew Lang then extended the functionality to include many of their previously developed ONS web-services. Here we present step-by-step tutorials on how to develop your own custom cheminformatics scripts.

Master Spreadsheet

For those of you who would just like to view the finished product, please feel free to make a copy of the master spreadsheet. [edit link]

Adding a Custom Function to Google Spreadsheets

1. Make a copy of the gchem quickstart spreadsheet.

2. Open up the spreadsheet and let's use the inbuilt custom function to pull a 2D-image of a structure using a ChemSpider ID. Select cell B2 (click on it) and then go to the custom menu gONS and select the custom function getCSImage. [You will need to give the script access and permissions to run. You only have to do this once.]

20150722figure1.png

If everything works correctly, you should then see the following image appear:

20150722figure2.png

Feel free to explore the function by changing the number or adding a new number (2424 for example) to the first column of a new row. Change the size of the cell to change the size of the image. If you don't enter a number, you will see the following image.
NotFound.png
More on this later.

3. The code that makes this happen can be found under the menu item Tools > Script editor
/**
 * Menu For All Functions - Requires identifier to be in the first column and the same row as selected cell
 */
 
function onOpen() {
  var ss = SpreadsheetApp.getActiveSpreadsheet();
  var onsmenu = [ {name: "getCSImage", functionName: "InsertgetCSImage"}];
  ss.addMenu("gONS", onsmenu);
}
 
/**
 * Function that adds the custom formula to the cell contents
 */
 
function InsertgetCSImage() {
 var sheet = SpreadsheetApp.getActiveSheet();
 var ac = sheet.getActiveCell();
 var r = ac.getRow();
 var c = ac.getColumn();
 sheet.getRange(r, c).setFormula('=image(getCSImageURL(' + sheet.getRange(r, 1).getA1Notation() + '))');
}
 
/**
 * Function that gets the URL to the ChemSpider image
 */
 
function getCSImageURL(id) {
  var csid;
  csid = id;
  if (isNaN(csid)) {
    return 'http://oruopennotebookscience.wikispaces.com/file/view/NotFound.png/555614381/NotFound.png';
  }
  else
  {
    return 'http://www.chemspider.com/image/' + csid;
  }
};
The first function onOpen is a special function that executes automatically every time you open your spreadsheet. Within the onOpen function is the code that creates the new menu gONS. The InsertgetCSImage function inserts an image into your currently highlighted cell (ac.getRow() and ac.getColumn()) using the already built in function image. Within the InsertgetCSImage function we call the getCSImageURL function to make sure that the user passed a ChemSpider ID (CSID). If the cell to the left of the highlighted cell contains a number, it is assumed to be a CSID and it returns the URL of the appropriate image, otherwise it will return the URL of the NotFound.png image.

Two: gChem - Adding a New Menu gChem - The Chemical Identification Resolver

1. Make a copy of the gChem CIR sheet. Again you will have to give it permissions the first time you use the scripts.

2. Open up the spreadsheet and let's use the inbuilt custom functions under the gChem menu to populate the spreadsheet. Your spreadsheet should turn from this:
20151014figure3.png
to this:
20151014figure4.png
3. Now edit the SMILES and watch all the entries automatically update.
4.The new code added to created another menu:
var menuEntries = [ {name: "getImage", functionName: "InsertgetImage"},
                      {name: "getChemSpiderID", functionName: "InsertgetChemSpiderID"},
                      {name: "getCAS", functionName: "InsertgetCAS"},
                      {name: "getSMILES", functionName: "InsertgetSMILES"},
                      {name: "getSDF", functionName: "InsertgetSDF"},
                      {name: "getInChIKey", functionName: "InsertgetInChIKey"},
                      {name: "getInChI", functionName: "InsertgetInChI"},
                      {name: "getSynonyms", functionName: "InsertgetSynonyms"}];
  ss.addMenu("gChem", menuEntries);
The menu commands execute the following functions that hit the Chemical Identification Resolver
function InsertgetImage() {
 var sheet = SpreadsheetApp.getActiveSheet();
 var ac = sheet.getActiveCell();
 var r = ac.getRow();
 var c = ac.getColumn();
 sheet.getRange(r, c).setFormula('=image(getImageURL(' + sheet.getRange(r, 1).getA1Notation() + '))');
}
 
function InsertgetCAS() {
 var sheet = SpreadsheetApp.getActiveSheet();
 var ac = sheet.getActiveCell();
 var r = ac.getRow();
 var c = ac.getColumn();
 sheet.getRange(r, c).setFormula('=getCAS(' + sheet.getRange(r, 1).getA1Notation() + ')');
}
 
function InsertgetSMILES() {
 var sheet = SpreadsheetApp.getActiveSheet();
 var ac = sheet.getActiveCell();
 var r = ac.getRow();
 var c = ac.getColumn();
 sheet.getRange(r, c).setFormula('=getSMILES(' + sheet.getRange(r, 1).getA1Notation() + ')');
}
 
function InsertgetSDF() {
 var sheet = SpreadsheetApp.getActiveSheet();
 var ac = sheet.getActiveCell();
 var r = ac.getRow();
 var c = ac.getColumn();
 sheet.getRange(r, c).setFormula('=getSDF(' + sheet.getRange(r, 1).getA1Notation() + ')');
}
 
function InsertgetInChIKey() {
 var sheet = SpreadsheetApp.getActiveSheet();
 var ac = sheet.getActiveCell();
 var r = ac.getRow();
 var c = ac.getColumn();
 sheet.getRange(r, c).setFormula('=getInChIKey(' + sheet.getRange(r, 1).getA1Notation() + ')');
}
 
function InsertgetInChI() {
 var sheet = SpreadsheetApp.getActiveSheet();
 var ac = sheet.getActiveCell();
 var r = ac.getRow();
 var c = ac.getColumn();
 sheet.getRange(r, c).setFormula('=getInChI(' + sheet.getRange(r, 1).getA1Notation() + ')');
}
 
function InsertgetSynonyms() {
 var sheet = SpreadsheetApp.getActiveSheet();
 var ac = sheet.getActiveCell();
 var r = ac.getRow();
 var c = ac.getColumn();
 sheet.getRange(r, c).setFormula('=getSynonyms(' + sheet.getRange(r, 1).getA1Notation() + ')');
}
 
/**
 * Beginning of gChem Reference Functions
 */
 
function getImageURL(id) {
  return 'http://cactus.nci.nih.gov/chemical/structure/' + id + '/image?width=150&height=150&format=png';
};
 
function getCAS(id) {
 var content = lookup(id, 'cas');
 
 return content ? content : 'NOT FOUND';
};
 
function getSMILES(id) {
  var content = lookup(id, 'smiles');
 
  return content ? content : 'NOT FOUND';
};
 
function getSDF(id) {
  var content = lookup(id, 'sdf');
 
  return content ? content : 'NOT FOUND';
}
 
function getCSID(id) { // returns first ChemSpider ID
  var content = lookup(id, 'chemspider_id');
 
  if (!content) {
    return 'NOT FOUND';
  }
 
  var csids = content.split('\n');
  return csids[0];
};
 
function getInChIKey(id) {
  var content = lookup(id, 'stdinchikey');
 
  return content ? content.split('=')[1] : 'NOT FOUND';
};
 
function getInChI(id) {
  var content = lookup(id, 'stdinchi');
 
  return content ? content : 'NOT FOUND';
};
 
function getSynonyms(id) {
  var content = lookup(id, 'names');
 
  if (!content) {
    return 'NOT FOUND';
  }
 
  var names = content.split('\n');
 
  if (names.length <= 5) {
    return content;
  }
 
  names.splice(5);
 
  return names.join('\n') + '\n...';
};
 
/**
 * Main lookup function that queries the CIR service
 */
 
function lookup(id, representation) {
  var url = 'http://cactus.nci.nih.gov/chemical/structure/' + encodeURIComponent(id) + '/' + representation;
  var result;
 
  try {
    var response = UrlFetchApp.fetch(url);
    result = response.getContentText();
  } catch (error) {
    // do nothing
  }
 
  return result;
};
The main function that actually does the work is the lookup function which is at the bottom of this section of code.

Three: Adding a New Menu gCDK - CDK Descriptors

The Chemistry Development Kit is an Open Source program and webservice that allows one to calculate many physical/chemical properties including fingerpriints directly from structure. It is a simple matter to add these functions to the spreadsheet. I've added some to a new menu "gCDK," see if you can add one more. Click to download the latest spreadsheet: gChem - CDK.

Four: Adding Other Webservices - gONS

Your turn. Try adding new functionality to the spreadsheet under the gONS menu (or if you prefer create your own menu) by hitting your favourite API, e.g. the PubChem API, the OPSIN API, etc.

OLCC

These gChem tutorials have been used as part of the Cheminformatics OLCC : A blended learning approach to teaching undergraduate chemistry students critical informatics skills; an NSF funded project led by Robert Belford at UALR