CDK Descriptor Calculator GUI (v 1.4.8)
(
screenshot,
screenshot)
Features
- Automatically detects descriptor classes defined in the CDK QSAR descriptor dictionary
- Groups of descriptors and individual descriptors can be selected for evaluation
- Input can be SDF or SMI formats
- Output can be a variety of delimited text formats, annotated SDF or ARFF
- Can evaluate fingerprints (hashed, MACCS, EState)
- See here for a summary overview of the available descriptors and their names.
Download
- CDKDescUI.jar (1.4.8) - with all dependencies included
- Sources
Updates in the new version
- version 1.4.8
- Updated to CDK 2.0
- Allow for compounds with >2 components
- version 1.4.6
- Updated to CDK 1.5.10
- Fixed bug in file format detection
- Condensed output in batch mode
- version 1.4.5
- Updated to latest CDK master
- version 1.3.9
- Updated to latest CDK master (as of Nov 28, 2014)
- version 1.3.8
- Updated to latest CDK master
- version 1.3.7
- Updated to latest CDK master
- Updated logging to use JDK logging framework
- version 1.3.4
- Fixed a bug in that missed a dependency on the fragmenter module during build time
- version 1.3.3
- Descriptor selection files will be generated with a ".xml" suffix if not specified
- The default descriptor selection on startup is all available descriptors
- Synced with the latest CDK master
- version 1.3.2
- Added a command line option to specify whether explicit H's should be added
before descriptor or fingerprint calculation in the batch mode
- version 1.3.1
- Fixed the main driver so that we only initialize the GUI if we are not in
batch mode. Thanks to Theo Walker for pointing this out.
- version 1.3.0
- Updated to latest stable CDK release (1.3.x) and includes a new volume descriptor and hybridization fingerprinter
- version 1.1.1
- Updated so that if calculation is cancelled, all data calculated to that point is still output
- Synced with latest CDK master
- version 1.0.5
- The command line batch mode now supports descriptor selections, rather than all or
certain types of descriptors. To make a selection file, start in GUI mode, select
individual descriptors and then save the selected descriptors from the menu option. The
selections file should then be specified using the -s option. Note that specifying
a selectin file overrides the -t option. Also updated to latest CDK master
- version 1.0.3
- The command line batch mode now supports all the fingerprint options of the GUI
- version 1.0.2
- Updated to the latest CDK and include a missing dependency that caused previous versions to crash
- version 1.0.1
- version 0.99
- All "Browse" buttons let th OS X user specify the file name
- version 0.98
- Synced with the latest CDK master
- Updated "Save Selection" file chooser dialog to allow the user to specify the file name by hand
- version 0.94
- Synced wth the latest CDK 1.2.x
- version 0.93
- Synced to latest CDK
- Added an option to add explicit H's before calculation. By default this is set to TRUE
- version 0.92
- Synced to latest CDK
- Fixes some atom typing issues with N-oxides
- version 0.91
- BCUT performance is back to fast, but charge weighted BCUT's will be inaccurate for pi systems
- MACCS and EState fingerprints are significantly faster
- version 0.90
- Updated to latest CDK
- Correctly handles 3D molecules that are flat
- ARFF output format is supported
- Can now evaluate various fingerprints
- version 0.85
- Updated to latest CDK. As a result, the GUI is much more robust wrt
descriptor calculation errors. If a molecule gives an error with a certain
descriptor, its values are NA. At all times, the resultant descriptor matrix
is rectangular and not ragged
- version 0.84
- It is now possible to save the currently selected set of descriptors to an XML file.
This is handy if you plan to repeatedly evauate a subset of all the descriptors, and
avoids having to select specific decriptors each time. It is also possible to
load a previouly saved selection
- version 0.83
- Descriptor exception dialog now includes the name of the problematic descriptor
- The descriptor columns in the output file are not listed in alphabetic order
- Error messages cleaned up and exception dialog has a slightly improved UI
- The output file noew uses NA's consistently to indicate a descriptor value that could not be calculated
- version 0.80
- Fixed a bug arising from some missing classes
- version 0.79
- Updated to latest CDK trunk
- Fixes a problem with CPSA values being NaN with charged molecules
- version 0.78
- Fixed a bug which caused molecules read from SD files to have no
aromaticity. Also fixed a bug which caused molecules with 2
disconnected components to be skipped. Now, we process the larger of
the two components.
- version 0.77
- version 0.76
- Updated to the latest CDK (BCUT calculations are significantly faster)
- version 0.75
- Updated to the latest CDK
- version 0.74
- Updated to latest CDK
- Atom typing errors are caught and reported.
- version 0.73
- SMILES format detection is a little more robust
- AlogP is now an included descriptor
- version 0.72
- Synced with the latest CDK
- Added aromaticity checking, so that we know it's always done
- version 0.71
- Synced with the latest CDK
- version 0.70
- Thanks to Tobias Kind for extensive bug reports
- Correctly process SMILES files with or without names for the molecules
- Autoselecting descriptors based on file format is a little more robust
- Updated status messages on completion and cancellation
- Updated to the latest CDK
- version 0.60
- Automatically deselects descriptor classes that cannot be calculated for
a SMILES file. Basically only the topological descriptors remain selected
- version 0.52
- Handles SMILES files properly
- Updated to the latest CDK
- Fixed the counter to report the correct number of SMILES processed
- version 0.51
- Updated to the latest CDK
- Perform aromaticity perception before evaluating descriptors, so we
don't have to worry whether the descriptor will do it
- version 0.50
- Updated to the latest CDK
- version 0.49
- Speed improvement since we no longer count the number
of molecules before starting calculations. Much better for
very large SD files
- Progress bar is now indeterminate and the count of molecules
processed is shown in a status label
- version 0.48
- Provides more intuitive descriptor names in the tree view
- Updated to the latest CDK
- version 0.47
- Updated to the latest CDK
- version 0.46
- If a molecule has 2 disconnected components, the larger one is
used for calculations. This is based on the assumption
that we are working with a salt. However it is not
very rigorous as it does not look at the identity of
the components. In addition, it still skips molecules
with more than 2 components
- version 0.45
- Added a check for disconnected molecules. If such a molecule is found, we skip
all descriptor calculations for it
- Any exceptions caught during descriptor processing are reported at the end
in a nice dialog, contents can be saved
- version 0.44
- Updated to the latest CDK SVN
- version 0.43
- Updated to the latest CDK SVN
- version 0.42
- Updated to the latest CDK SVN
- version 0.41
- Updated to the latest CDK
- In case a descriptor does not have a class entry in the OWL dictionary, the descriptor
is not included in the UI
- Updated build script
- version 0.40
- Better and more informative error handling
- Synchronized with the latest CDK CVS
- version 0.30
- Input can now be in SDF format or SMI format. Automatically detected
- Annotated SDF output is now enabled
- version 0.27
- Updated to use the IDescriptor interface
- version 0.26
- Minor bugfix in specification parsing. Also the dependency free jar is updated to use the
latest descriptors from the CDK QSAR package
- version 0.25
- Updated to use the new BO based descriptor specification format
- version 0.24
- Updated to show tooltips for each descriptor entry. The text of the
tooltip is the descriptor definition provided in the CDK descriptor algorithm
dictionary
- version 0.23
- Updated to make use of the new OWL based descriptor dictionary. To use the small jar
or compile from the sources, you should use a recently synced copy of the CDK sources.
- version 0.22
- Added the ability to interrupt descriptor calculation
- version 0.21
- Added an informative About dialog. Also include
the license (GPL v2) text.
- version 0.20
- Added CSV and tab delimited output formats
- Platform independent line separator for the CSV, tab and space delimited output formats
- The descriptor data is automatically saved
Building the sources
The project was written using JDK 1.5. You'll also
probably need to change the paths in
cdkdescui.properties. After that the available
ant targets are
- clean - self-explanatory
- jar - create a jar file containing the application. Requires the CDK libraries and depdendencies to
be in the CLASSPATH
- bigjar - creates a standalone jar file with all dependencies included
Stuff to do
- Get the preferences to work
- Add icons to the descriptors indicating whether they are 3D or 2D
- Add timeout settings
- Implement the plugin mechanism
- Get CML output working
- Add descriptor reduction
Report descriptor calculation errors
- Descriptor descriptions (help file)
In the case of SMI formatted molecules add ability to evaluate
only topological descriptors if 2D coords are available or all
descriptors if 3D coords can be generated.