CDK Web Services
I wanted to see how easy it would be to set up & develop a web service that would provide some CDK functionality. It turns out that most of time went on setting up my environment rather than actually developing the web service! As an example I've set up a Tomcat app server providing Axis based web services (instructions) .
A few points regarding the use of Axis to provide CDK web services: Since all the services I provide are really part of a single Axis web application I placed all the CDK distribution jars as well as dependency jars in the WEB-INF/lib directory of the Axis web application. Note that the CDK dependency jars are located in $CDK_HOME/jars.
You can go here to see the services that are have made available. There are currently a number of cheminformatics related services along with command line Java clients as well as examples of PHP based access.
To compile the Java client programs you will require the Axis libraries. To use the PHP clients you will need to have the SOAP package installed. Note that v0.9.3 (latest beta version) causes the SDG web service to fail so I'm currently using v0.9.0.
Toxicological Hazard Prediction | Topological Polar Surface Area | 2D Structure Diagrams | 3D Coordinates | Molecular Weight & Formula | Tanimoto Similarity | Molecular Descriptors | Fingerprints
CAVEATS
- If you plan on deploying multiple web applications which need to
use CDK functionality, then it might be better to place the CDK jars (and dependency
jars) in $CATALINA_HOME/shared/lib. This way you won't need
multiple copies of these files. However note that if you use this approach the
descriptor webservice provided below will not work as written
- Axis does not like it when there are multiple XML parsers in the CLASSPATH. It comes with Xerces so make sure that gnujaxp.jar from the CDK dependencies is not placed in WEB-INF/lib directory
- If you plan to run the structure diagram service locally you
will need an X server to be running, since it uses AWT
functionality. One solution is to use Xvfb. So I added the
following lines to $CATALINA_HOME/bin/catalina.sh
/usr/bin/X11/Xvfb :2 -screen 1 800x600x16 & export DISPLAY=:2.1
- To run the 3D coordinate generator, you'll need to set the heap
space for the Java VM to at least 128MB (the default is 64MB). Thus
I added the following line to $CATALINA_HOME/bin/catalina.sh
export JAVA_OPTS=-Xmx128M
- Not really a caveat but makes life easier. To get the PHP clients running install the SOAP library
(v0.9.0 works) by doing
pear install --alldeps channel://pear.php.net/SOAP-0.9.0
[Update 26/04/2006] Added a ToxTree service
[Update 07/03/2006] Added a TPSA service
[Update 06/03/2006] Updated setup description. Also added the latest
version of the descriptor service
[Update 03/03/2006] Added structure diagram, molecular
weight and formula webservices
[Update 01/03/2006] Updated most of the service and client
code to be in sync with the latest CDK. Also updated the fact that
the descriptor WS is currently not working
You can also get the tox class for a SMILES string by doing
http://156.56.90.245:8080/tox/services/toxTreeWS?method=getCramerClass&smiles=c1ccccc1[Service code]
You can also get a value of the TPSA for a SMILES string by doing
http://156.56.90.245:8080/cdkws/services/Descriptors?method=getTPSA&s=O=C=O[Service code & Client code]
A command line client to access this service can be used as
java -cp $CLASSPATH:./ CDKsdgClient CC=OC http://156.56.90.245:8080/cdkws/services/StructureDiagramNote that you can't specify the image width and height, but it's a trivial change to the source code. The program will generate a file called img.jpeg in the current working directory. You can also access the service from the form provided below. [Service code & Client code]
Also, since the input to this service is simply a single string you can also use it directly from your browser such as (using carbon dioxide as the query molecule)
http://156.56.90.245:8080/cdkws/services/Utility?method=getMolecularFormula&s=O=C=O http://156.56.90.245:8080/cdkws/services/Utility?method=getHTMLMolecularFormula&s=O=C=O http://156.56.90.245:8080/cdkws/services/Utility?method=getMolecularWeight&s=O=C=O[Service code & Client code]
java CDKsimClient CCC=OCCOCCC CCCCThe client program has the fingerprint length and search depth hard coded at 1024 and 6 respectively. The Tanimoto coefficient between the two molecules will be printed to stdout.
To access the service without the client, the form below may be used to specify two SMILES. The Bit Length setting indicates the length of the fingerprint that will be evaluated for each molecule and the Depth setting indicates the search depth that is used in the algorithm.
An alternative version of this service can be found here which allows you to evaluate a similarity matrix for a set of SMILES (upto 100).The service provided here accepts a valid SMILES string and generates the 3D coordinates of the molecule returning a string version of the molecule in SDF format. To use the service from the command line use the client code and do
java CDKstruct3DClient CCCBy default it will try to connect to http://localhost:8080/axis/services/CDKstruct3D but you can specify an alternate host (run it with no arguments to get usage information). Note that if you want to run this service on your own host make sure to set the heap space for the Java VM to more than 64MB.
Due to a slow machine that I need to do work on, this service is currently not available
- topological
- geometrical
- electronic
- molecular
- constitutional
- all (evaluate all available descriptors)
You can get a client program that will read in a SDF formatted structure file along with the type specification as described above and will print the calculated descriptors in SDF format on standard output. The usage is
java CDKdescClient molecule.sdf topological,electronic http://156.56.90.245:8080/axis/services/CDKdescThe client code can be tested with this example SD file. No error checking is done, so you need to maintain the order shown here and avoid spelling errors.
A few caveats regarding the service: If you specify all ensure that the SD file contains 3D coordinates as some descriptors (i.e., non topological) will expect 3D coordinates to be present and otherwise will probably throw an Exception.
If you have difficulties in getting the client to work (or are just lazy) you can see the result of the descriptor calculation (basically a dump of the returned SD file) for a molecule by uploading a SD file below. Currently you can't choose combinations of descriptor types and the file size is limited to 512Kb.
[Service code & Client code]One way to check that this is all working is to enter the following link in your browser:
http://156.56.90.245:8080/cdkws/services/Utility?method=getFingerprintString&s=CC=CCOCYou'll get an XML page in which you should be able to see the String representation of the fingerprint. However using a browser for this purpose is not too handy. So I also wrote a client program that will contact the web service and supply a SMILES string and print out the return values of the two methods. Usage is
java CDKwsClient CC=CCOC
15/12/2004: The service code has been updated to allow the user to specify bit string length and depth of search. So to get fingerprints of 256 bit length and generated using a search depth of 4 you can do
http://156.56.90.245:8080/cdkws/services/Utility?method=getFingerprintString&s=CC=CCOC&length=256&depth=4If the length and depth parameters are not sent in the request then the service uses the CDK default values (1024 for bit length and 6 for search depth). You can also call the method with just the length parameter set. The getFingerprintVector method has also been updated similarly. The resultant WSDL seems to be unecessarily complex. Maybe it'd be better to just force the caller to specify the length and depth for each call?