CDK Web Services

A posting by Dr. Murray Rust on the qsar-devel mailing list about the use of web services piqued my interest about this topic. The links he provided provided instructions on setting up and developing such services using Gridsphere. A quick scan of the instructions left me a little puzzled so I decided to set up the environment for this from scratch.

I wanted to see how easy it would be to set up & develop a web service that would provide some CDK functionality. It turns out that most of time went on setting up my environment rather than actually developing the web service! As an example I've set up a Tomcat app server providing Axis based web services (instructions) .

A few points regarding the use of Axis to provide CDK web services: Since all the services I provide are really part of a single Axis web application I placed all the CDK distribution jars as well as dependency jars in the WEB-INF/lib directory of the Axis web application. Note that the CDK dependency jars are located in $CDK_HOME/jars.

You can go here to see the services that are have made available. There are currently a number of cheminformatics related services along with command line Java clients as well as examples of PHP based access.

To compile the Java client programs you will require the Axis libraries. To use the PHP clients you will need to have the SOAP package installed. Note that v0.9.3 (latest beta version) causes the SDG web service to fail so I'm currently using v0.9.0.

Toxicological Hazard Prediction | Topological Polar Surface Area | 2D Structure Diagrams | 3D Coordinates | Molecular Weight & Formula | Tanimoto Similarity | Molecular Descriptors | Fingerprints

CAVEATS

[Update 26/04/2006] Added a ToxTree service
[Update 07/03/2006] Added a TPSA service
[Update 06/03/2006] Updated setup description. Also added the latest version of the descriptor service
[Update 03/03/2006] Added structure diagram, molecular weight and formula webservices
[Update 01/03/2006] Updated most of the service and client code to be in sync with the latest CDK. Also updated the fact that the descriptor WS is currently not working

ToxTree - Toxicological Hazard Prediction
This web service is based on the ToxTree program developed by Nina Jeliazkova. The aim of the program is to predict the toxicological hazard class of a molecule based on its structure using the Cramer decision tree method (Cramer G. M. et al., J. Cosmet. Toxicol., 1978, 16, 255-276). Currently the web service does not take into account the component-of-food rule and found-in-body rule (it uses NO for both questions).

SMILES

You can also get the tox class for a SMILES string by doing

http://156.56.90.245:8080/tox/services/toxTreeWS?method=getCramerClass&smiles=c1ccccc1
[Service code]
Topological Polar Surface Area
The CDK includes a molecular descriptor to calculate the Topological Polar Surface Area [1]. Since it's generally useful I made it available as a seperate web service. The code will add hydrogens to satisfy valencies and will also perform aromaticity detection.

SMILES

You can also get a value of the TPSA for a SMILES string by doing

http://156.56.90.245:8080/cdkws/services/Descriptors?method=getTPSA&s=O=C=O
[Service code & Client code]
Strucure Diagrams
In many cases it is useful to obtain a 2D representation of a molecular structure. This web service takes a SMILES string and returns a JPEG image of the 2D structure using the StructureDiagramGenerator of the CDK. The call to the service can specify the width, height and scaling factor for the resultant image, though by setting all of them to zero default values are used.

A command line client to access this service can be used as

java -cp $CLASSPATH:./ CDKsdgClient CC=OC http://156.56.90.245:8080/cdkws/services/StructureDiagram
Note that you can't specify the image width and height, but it's a trivial change to the source code. The program will generate a file called img.jpeg in the current working directory. You can also access the service from the form provided below.
SMILES
Width
Height
Scale (Between 0 and 1)
[Service code & Client code]
Molecular Weight & Formula
Since the CDK can generate molecular formula in plain text format as well as HTML'ized text format I've put up 3 services that calculate the molecular weight, molecular formula (plain) and the molecular formula (HTML). You can use the form below to supply a SMILES string and get back the MW and molecular formulae. Note that the service adds H's to satisfy valencies, so you don't need to specify H's.

SMILES
Also, since the input to this service is simply a single string you can also use it directly from your browser such as (using carbon dioxide as the query molecule)
http://156.56.90.245:8080/cdkws/services/Utility?method=getMolecularFormula&s=O=C=O
http://156.56.90.245:8080/cdkws/services/Utility?method=getHTMLMolecularFormula&s=O=C=O
http://156.56.90.245:8080/cdkws/services/Utility?method=getMolecularWeight&s=O=C=O
[Service code & Client code]
Fingerprint Similarity
The CDKsim service is a simple extension of the fingerprint service described below and takes two SMILES and evaluates the Tanimoto coefficient [1, 2] between them. The service can be accessed using a command line client and usage is simply
java CDKsimClient CCC=OCCOCCC CCCC
The client program has the fingerprint length and search depth hard coded at 1024 and 6 respectively. The Tanimoto coefficient between the two molecules will be printed to stdout.

To access the service without the client, the form below may be used to specify two SMILES. The Bit Length setting indicates the length of the fingerprint that will be evaluated for each molecule and the Depth setting indicates the search depth that is used in the algorithm.

Target: Bit Length
Query: Depth
An alternative version of this service can be found here which allows you to evaluate a similarity matrix for a set of SMILES (upto 100).

[Service code & Client code]

3D Coordinate Generation
The CDK ws recently enhanced with the addition of a 3D coordinate generator. Thus given a SMILES string it can generate a reasonable set of 3D coordinates. The algorithm can use either the MM2 or MMF94 forcefields for the geometry optimization. However the implementation does have some problems such as the inability to handle condensed ring systems etc. However for most molecules it does a reasonable job.

The service provided here accepts a valid SMILES string and generates the 3D coordinates of the molecule returning a string version of the molecule in SDF format. To use the service from the command line use the client code and do

java CDKstruct3DClient CCC
By default it will try to connect to http://localhost:8080/axis/services/CDKstruct3D but you can specify an alternate host (run it with no arguments to get usage information). Note that if you want to run this service on your own host make sure to set the heap space for the Java VM to more than 64MB.

Due to a slow machine that I need to do work on, this service is currently not available

[Service code & Client code]

Descriptor Calculation
The CDK can calculate a variety of molecular and atomic descriptors for QSAR modeling a useful application is to export the functionality as a web service. The CDKdesc service provides this. The code for the service can be found here and is based on the descriptor calculator application from the CDK distribution. The service takes two String arguments (WSDL) - the first argument is the molecular structure in SDF format contained in a string. The second argument can be one of the below indicating which descriptors are to be calculated
  • topological
  • geometrical
  • electronic
  • molecular
  • constitutional
  • all (evaluate all available descriptors)
Currently this service only evaluates the above types of molecular descriptors and does not consider atomic descriptors. The service returns an SD file with the descriptors included as properties.

You can get a client program that will read in a SDF formatted structure file along with the type specification as described above and will print the calculated descriptors in SDF format on standard output. The usage is

java CDKdescClient molecule.sdf topological,electronic http://156.56.90.245:8080/axis/services/CDKdesc
The client code can be tested with this example SD file. No error checking is done, so you need to maintain the order shown here and avoid spelling errors.

A few caveats regarding the service: If you specify all ensure that the SD file contains 3D coordinates as some descriptors (i.e., non topological) will expect 3D coordinates to be present and otherwise will probably throw an Exception.

If you have difficulties in getting the client to work (or are just lazy) you can see the result of the descriptor calculation (basically a dump of the returned SD file) for a molecule by uploading a SD file below. Currently you can't choose combinations of descriptor types and the file size is limited to 512Kb.

Load File:
Descriptor Class:
[Service code & Client code]
Fingerprints
The service named CDKws provides 2 methods both of which return the fingerprint of a molecule specified as a SMILES string. One of the functions returns a String representation and the other returns a Vector with the elements representing bit positions that are on. The web service itself calls the CDK method, Fingerprinter.getFingerprint() and the source is here.

One way to check that this is all working is to enter the following link in your browser:

http://156.56.90.245:8080/cdkws/services/Utility?method=getFingerprintString&s=CC=CCOC
You'll get an XML page in which you should be able to see the String representation of the fingerprint. However using a browser for this purpose is not too handy. So I also wrote a client program that will contact the web service and supply a SMILES string and print out the return values of the two methods. Usage is
java CDKwsClient CC=CCOC

15/12/2004: The service code has been updated to allow the user to specify bit string length and depth of search. So to get fingerprints of 256 bit length and generated using a search depth of 4 you can do
http://156.56.90.245:8080/cdkws/services/Utility?method=getFingerprintString&s=CC=CCOC&length=256&depth=4
If the length and depth parameters are not sent in the request then the service uses the CDK default values (1024 for bit length and 6 for search depth). You can also call the method with just the length parameter set. The getFingerprintVector method has also been updated similarly. The resultant WSDL seems to be unecessarily complex. Maybe it'd be better to just force the caller to specify the length and depth for each call?

[Service code & Client code]