Sunday, October 23, 2005

BioJava: ExternalProcess class

The BioWeka project committed a set of Java classes to the BioJava project to execute external programs within Java programs.

The ExternalProcess class is intended for applications that call an external program many times, e.g. in a loop, with varying command line arguments and that need high performance throughput, i.e. the program's input and output should not be written to disk, but directly written/read to/from its standard input, output and error streams. Therefore, it provides input/output handling in multiple threads and the threads are managed by a thread pool.

ProTag web site

ProTag has now its own web site: http://protag.sourceforge.net. There is also a project site http://sourceforge.net/projects/protag which hosts the ProTag downloads and user forums and offers access to the source code (will be available in a few days).

Wednesday, October 19, 2005

Bioinformatics Wiki

Some fellow students started a Wiki about Bioinformatics: Bioinfo Wiki.

RNA Secondary Structure Sequence Specification

Philipp Seibel from the University of Würzburg contributed specifications for RNA secondary structure sequences and alignments to the open BioTypes 1.1 schema.

Monday, October 17, 2005

BioWeka 0.4

The new release 0.4 of BioWeka combines the packages BioWeka, XML-Stylesheets, FoldRec and BioJava 1.4 Extensions in one single distribution: bioweka-0.4.zip. This ZIP files contains


  • the bioweka-0.4.jar library, its source code and documentation
  • the updated GenericPropertiesCreator.props file
  • the required JAR libraries for BioJava, Apache Commons (only those which are required by BioJava) and JAligner
  • the FoldRec and BioJava 1.4 Extensions libraries (these BioWeka packages are not separately available any longer)
  • the most recent AAindex database
  • the substitution matrices from the NCBI for the usage with the aligners GlobalAligner and LocalAligner (alias JAligner)
  • a patch for the converter.pl perl script of the InterProScan standalone application
  • the latest release (0.4) of XML-stylesheets including the XSLT stylesheet to load MAGE-ML files directly into Weka (this BioWeka package is not separately available any longer)
  • the align_learn.pl perl script for converting sequence alignments in the FASTA format to the ARFF format

Note: Weka is not included and most be downloaded separately.

In addition, there is some new functionality:

  • The AlignmentScorer filter performs an all-against-all alignment using a specified Aligner and stores the scores within the data set.
  • Such a data set can be used in conjunction with the new ScoreClassifier class to classify sequences based on the precomputed alignment (scores). This is a replacement for the SimpleAlignmentClassifier which reduces the time effort to do a cross validation of an alignment classification.
  • Save filter to export ARFF files into another format (e.g. FASTA) from within the Explorer GUI.
  • Some minor bug fixes ...

Tuesday, October 11, 2005

AAindexStreamReader for the BioJava project

Originally developed for the BioWeka project, the AAindexStreamReader class (among other classes) became part of the BioJava project, at this time only available through the CVS. The purpose of this class is to load Amino Acid Index Database files (AAindex1 files) into a set of SymbolPropertyTable objects. The AAindex database defines over 500 different property tables for the twenty amino acids. Using these property tables one can e.g. calculate the average hydrophobicity of an amino acid sequence, i.e. protein.

Saturday, October 08, 2005

ProTag is "a really cool tool"

Jeff Perkel posted on The Scientist a short article about ProTag and he says that ProTag is "a really cool tool". Thanks for the compliment!

Tuesday, October 04, 2005

Web Servicing the Biological Office - Application Note

An application note about "Web Servicing the Biological Office" (ProTag, ProThesaurus and LiMB) was published in the proceedings of the ECCB '05, Madrid conference.

The PowerPoint presentation can be found here.