CDK Descriptor Calculator GUI (v 1.4.8)
(screenshot, screenshot)
Features
- Automatically detects descriptor classes defined in the CDK QSAR descriptor dictionary
- Groups of descriptors and individual descriptors can be selected for evaluation
- Input can be SDF or SMI formats
- Output can be a variety of delimited text formats, annotated SDF or ARFF
- Can evaluate fingerprints (hashed, MACCS, EState)
- See here for a summary overview of the available descriptors and their names.
Download
- CDKDescUI.jar (1.4.8) - with all dependencies included
- Sources
Updates in the new version
- version 1.4.8
- Updated to CDK 2.0
- Allow for compounds with >2 components
- version 1.4.6
- Updated to CDK 1.5.10
- Fixed bug in file format detection
- Condensed output in batch mode
- version 1.4.5
- Updated to latest CDK master
- version 1.3.9
- Updated to latest CDK master (as of Nov 28, 2014)
- version 1.3.8
- Updated to latest CDK master
- version 1.3.7
- Updated to latest CDK master
- Updated logging to use JDK logging framework
- version 1.3.4
- Fixed a bug in that missed a dependency on the fragmenter module during build time
- version 1.3.3
- Descriptor selection files will be generated with a ".xml" suffix if not specified
- The default descriptor selection on startup is all available descriptors
- Synced with the latest CDK master
- version 1.3.2
- Added a command line option to specify whether explicit H's should be added before descriptor or fingerprint calculation in the batch mode
- version 1.3.1
- Fixed the main driver so that we only initialize the GUI if we are not in batch mode. Thanks to Theo Walker for pointing this out.
- version 1.3.0
- Updated to latest stable CDK release (1.3.x) and includes a new volume descriptor and hybridization fingerprinter
- version 1.1.1
- Updated so that if calculation is cancelled, all data calculated to that point is still output
- Synced with latest CDK master
- version 1.0.5
- The command line batch mode now supports descriptor selections, rather than all or certain types of descriptors. To make a selection file, start in GUI mode, select individual descriptors and then save the selected descriptors from the menu option. The selections file should then be specified using the -s option. Note that specifying a selectin file overrides the -t option. Also updated to latest CDK master
- version 1.0.3
- The command line batch mode now supports all the fingerprint options of the GUI
- version 1.0.2
- Updated to the latest CDK and include a missing dependency that caused previous versions to crash
- version 1.0.1
- The GUI now supports drag 'n drop so that dragging a file onto the UI (say on the descriptor list or the tabs) will automatically fill in the input file text field with the file name
- The program now has a command line batch mode. This is slightly limited in that it only
allows one to calculate groups of descriptors (all or topological or geometric etc) rather than
specifying individual descriptor classes. In addition, output is currently fixed to tab delimited.
Run the program as:
java -jar CDKDescUI.jar -h
to see options - Synced to the latest CDK master
- version 0.99
- All "Browse" buttons let th OS X user specify the file name
- version 0.98
- Synced with the latest CDK master
- Updated "Save Selection" file chooser dialog to allow the user to specify the file name by hand
- version 0.94
- Synced wth the latest CDK 1.2.x
- version 0.93
- Synced to latest CDK
- Added an option to add explicit H's before calculation. By default this is set to TRUE
- version 0.92
- Synced to latest CDK
- Fixes some atom typing issues with N-oxides
- version 0.91
- BCUT performance is back to fast, but charge weighted BCUT's will be inaccurate for pi systems
- MACCS and EState fingerprints are significantly faster
- version 0.90
- Updated to latest CDK
- Correctly handles 3D molecules that are flat
- ARFF output format is supported
- Can now evaluate various fingerprints
- version 0.85
- Updated to latest CDK. As a result, the GUI is much more robust wrt descriptor calculation errors. If a molecule gives an error with a certain descriptor, its values are NA. At all times, the resultant descriptor matrix is rectangular and not ragged
- version 0.84
- It is now possible to save the currently selected set of descriptors to an XML file. This is handy if you plan to repeatedly evauate a subset of all the descriptors, and avoids having to select specific decriptors each time. It is also possible to load a previouly saved selection
- version 0.83
- Descriptor exception dialog now includes the name of the problematic descriptor
- The descriptor columns in the output file are not listed in alphabetic order
- Error messages cleaned up and exception dialog has a slightly improved UI
- The output file noew uses NA's consistently to indicate a descriptor value that could not be calculated
- version 0.80
- Fixed a bug arising from some missing classes
- version 0.79
- Updated to latest CDK trunk
- Fixes a problem with CPSA values being NaN with charged molecules
- version 0.78
- Fixed a bug which caused molecules read from SD files to have no aromaticity. Also fixed a bug which caused molecules with 2 disconnected components to be skipped. Now, we process the larger of the two components.
- version 0.77
- Synced to the latest CDK
- version 0.76
- Updated to the latest CDK (BCUT calculations are significantly faster)
- version 0.75
- Updated to the latest CDK
- version 0.74
- Updated to latest CDK
- Atom typing errors are caught and reported.
- version 0.73
- SMILES format detection is a little more robust
- AlogP is now an included descriptor
- version 0.72
- Synced with the latest CDK
- Added aromaticity checking, so that we know it's always done
- version 0.71
- Synced with the latest CDK
- version 0.70
- Thanks to Tobias Kind for extensive bug reports
- Correctly process SMILES files with or without names for the molecules
- Autoselecting descriptors based on file format is a little more robust
- Updated status messages on completion and cancellation
- Updated to the latest CDK
- version 0.60
- Automatically deselects descriptor classes that cannot be calculated for a SMILES file. Basically only the topological descriptors remain selected
- version 0.52
- Handles SMILES files properly
- Updated to the latest CDK
- Fixed the counter to report the correct number of SMILES processed
- version 0.51
- Updated to the latest CDK
- Perform aromaticity perception before evaluating descriptors, so we don't have to worry whether the descriptor will do it
- version 0.50
- Updated to the latest CDK
- version 0.49
- Speed improvement since we no longer count the number of molecules before starting calculations. Much better for very large SD files
- Progress bar is now indeterminate and the count of molecules processed is shown in a status label
- version 0.48
- Provides more intuitive descriptor names in the tree view
- Updated to the latest CDK
- version 0.47
- Updated to the latest CDK
- version 0.46
- If a molecule has 2 disconnected components, the larger one is used for calculations. This is based on the assumption that we are working with a salt. However it is not very rigorous as it does not look at the identity of the components. In addition, it still skips molecules with more than 2 components
- version 0.45
- Added a check for disconnected molecules. If such a molecule is found, we skip all descriptor calculations for it
- Any exceptions caught during descriptor processing are reported at the end in a nice dialog, contents can be saved
- version 0.44
- Updated to the latest CDK SVN
- version 0.43
- Updated to the latest CDK SVN
- version 0.42
- Updated to the latest CDK SVN
- version 0.41
- Updated to the latest CDK
- In case a descriptor does not have a class entry in the OWL dictionary, the descriptor is not included in the UI
- Updated build script
- version 0.40
- Better and more informative error handling
- Synchronized with the latest CDK CVS
- version 0.30
- Input can now be in SDF format or SMI format. Automatically detected
- Annotated SDF output is now enabled
- version 0.27
- Updated to use the IDescriptor interface
- version 0.26
- Minor bugfix in specification parsing. Also the dependency free jar is updated to use the latest descriptors from the CDK QSAR package
- version 0.25
- Updated to use the new BO based descriptor specification format
- version 0.24
- Updated to show tooltips for each descriptor entry. The text of the tooltip is the descriptor definition provided in the CDK descriptor algorithm dictionary
- version 0.23
- Updated to make use of the new OWL based descriptor dictionary. To use the small jar or compile from the sources, you should use a recently synced copy of the CDK sources.
- version 0.22
- Added the ability to interrupt descriptor calculation
- version 0.21
- Added an informative About dialog. Also include the license (GPL v2) text.
- version 0.20
- Added CSV and tab delimited output formats
- Platform independent line separator for the CSV, tab and space delimited output formats
- The descriptor data is automatically saved
Building the sources
The project was written using JDK 1.5. You'll also probably need to change the paths in cdkdescui.properties. After that the available ant targets are- clean - self-explanatory
- jar - create a jar file containing the application. Requires the CDK libraries and depdendencies to be in the CLASSPATH
- bigjar - creates a standalone jar file with all dependencies included
Stuff to do
- Get the preferences to work
- Add icons to the descriptors indicating whether they are 3D or 2D
- Add timeout settings
- Implement the plugin mechanism
- Get CML output working
- Add descriptor reduction
Report descriptor calculation errors- Descriptor descriptions (help file)
In the case of SMI formatted molecules add ability to evaluate only topological descriptors if 2D coords are available or all descriptors if 3D coords can be generated.