Jboss ModeShape: A federating JCR repository

Some interesting stuff is happing in the JCR community. With Apache Jackrabbit 2.0.0 out (with JCR 2.0) and an interesting project called Jboss ModeShape almost reaching it's final 1.0 release. ModeShape recently came to my attention and it seems an interesting project. In this post I will give a short introduction of ModeShape and it's features.

What's ModeShape?

ModeShape is a Java Content Repository implementation which will support both JSR-170 and JSR-283. It's not trying to be just another isolated content repository, but a repository with a strong focus on content federation. In other words: ModeShape's main goal is to provide a single JCR interface for accessing and searching content coming from different back-end systems. These systems can even be of different sorts. You might think of a ModeShape repository containing information from a relation database, a file system and perhaps even another Java content repository like for instance Hippo CMS 7's content repository. You can configure these sources of information with the help of ModeShapes connector framework.

Connectors

One of ModeShape's key concepts is the concept of connectors. A connector will allow you to connect to a certain type of back-end system and transparently expose the information inside the ModeShape repository. In the current 1.0.0 beta release there are already a couple of out of the box connectors available:


  • In-Memory Connector
  • File System Connector
  • JPA Connector
  • Federation Connector
  • Subversion Connector
  • JBoss Cache Connector
  • Infinispan Connector
  • JDBC Metadata Connector 

That's already quite a few, but for the upcoming release they also have plans for expanding the set of connectors with for instance a JCR connector, which I find quite interesting myself, because that would allow you to expose other JCR implementations like Hippo CMS 7 (Apache JackRabbit) in combination with other systems through one JCR interface.

There are many other content solutions out there, so if you can't find a connector that suits your need, you can of course write one yourself and perhaps donate it to the ModeShape project.

Sequencers

One of ModeShapes other interesting features is the concept of sequencers. With sequencers you can gather additional information from a certain item inside the repository and store that extracted information in the repository. ModeShape has quite a few sequencers out of the box:


  • Compact Node Type (CND) Sequencer
  • XML Document Sequencer
  • ZIP File Sequencer
  • Microsoft Office Document Sequencer
  • Java Source File Sequencer
  • Java Class File Sequencer
  • Image Sequencer
  • MP3 Sequencer
  • DDL File Sequencer
  • Text Sequencers

The example below is of the ImageSequencer, which can gather information from certain types of images stored inside the repository. The ImageMetaDataSequencer is used here to extract metadata like size, dimensions and so on from the image if they have one of the specified extensions and the extracted information is stored somewhere else inside the repository.

JcrConfiguration config = ...
config.sequencer("Image Sequencer")
.usingClass("org.modeshape.sequencer.image.ImageMetadataSequencer")
.loadedFromClasspath()
.setDescription("Sequences image files to extract the characteristics of the image")
.sequencingFrom("//(*.(jpg|jpeg|gif|bmp|psd)[*])/jcr:content[@jcr:data]")
.andOutputtingTo("/images/$1");

Conclusion

With other mature JCR implementations out there I think ModeShapes strongest point is it's focus on content federation. Providing a single JCR interface for content stored in different systems is a great initiative, because the JCR API is quite easy to learn and to use. I see a bright future for ModeShape, since companies are sharing more and more in-house information on the web these days. I myself will try to keep a close eye on ModeShape and see how it evolves.

Creating an IntelliJ launcher on Ubuntu 9.10

Over the last couple of months I've slowly switched from Eclipse to IntelliJ 9 as my main IDE for Java development. After having used Eclipse for more then 5 years I got pointed to IntelliJ by friends from JTEAM, that I'm working with at one of my projects. They challenged me to start using IntelliJ, because I would eventually be impressed and would never want to switch back.

As I'm working on my Dell/Linux laptop, I used to start IntelliJ from the command line as instructed in the readme. Starting it from the command line started bugging me after a while, so I wanted to create a launcher for it. Creating a launcher seemed quite simple at first, but getting it to work was something else.

After a while I figured it out that by using the following line in my application launcher:


/bin/sh -c "export JDK_HOME=/path/to/java&&/path/to/intellij/bin/idea.sh"


I was able to get my IntelliJ launcher to work. As you might notice, you will still have to change the path to your JDK_HOME and IntelliJ installation directory, because they might be different on your own system.
I hope this post can help all of you out there trying to do the same thing.

Content mangement and the semantic web

I came across the term 'semantic web' a couple of years ago, when one of the original creators of Apache Cocoon went of to work on the SIMILE Project at MIT. I didn't pay much attention to the concept of 'semantic web' back then, because I just started learning Apache Cocoon and still had a lot to learn.
But over the last couple of months I've been doing some research on the currently available standards for providing semantic data on the web with a strong focus on RDFa.

Content management

Working at Hippo, a CMS vendor based in the Netherlands & USA, makes me think in content and publishing strategies. Publishing information to the web is one of our core businesses, but I've learned over the last couple of month we can enrich our publishing platform even more by providing semantic data. I started my journey by looking around if other CMS vendors are paying attention to semantic web standards. I noticed that only a few of the enormous amount of  content management vendors actually put effort in providing semantic web functionalities for their end-users. I think that's a shame, because enrich your pages a lot.
This post should give you an insight on how you could create a website with embedded meta data (with Hippo), but let's first start with some basics.


What's the idea behind the semantic web?

The current web is very well suited for being read by people like you and me. Computers however can only analyze the words on a page, but can not see the semantics of a piece of information on that specific page, that we as people do see.
If you would allow the information on you page to be machine-readable, the computer would be able to analyze your page and extract much more information from it then just being a piece of text. That's where semantic web standards can help out.
Standards for providing semantic data on the web are not new and some of them have already been available for quite some time. Probably the two most well known are: RDF and Microformats. However recently RDFa has been getting a lot of attention by Google, Yahoo and now also the UK government.


What is RDFa?


RDFa is short for “Resource Description Framework in attributes”. This sounds a bit descriptive, but it means that RDFa provides a set of XHTML attributes, which in their turn provide a way of translating visual data on a page into machine-readable hints. So let's take a look at an example of how a simple web page is currently structured.



<html>
  <body>
    <h1>Content management and the semantic web</h1>
    <h2>Jeroen Reijn</h2>
    <p>some information</p>
  </body>
</html> 

As you can see in the above XHTML fragment, we have a page with a title, a subtitle and a small snippet of text inside the body of the page. By rendering this HTML fragment in the browser the visitor of this page will recognize this piece of text as being the title and author of the current article on the page. A machine however would need a bit more information to be sure the content can be identified as a title and author. That's where RDFa can help out. By using vocabularies, you can give meaning to specific pieces of content on a page.
Let's see what the above XHTML fragment would look like if we would use RDFa.


<html>
  <body xmlns:dc="http://purl.org/dc/elements/1.1/"> 
    <h1 property="dc:title">Content management and the semantic web</h1>
    <h2 property="dc:creator">Jeroen Reijn</h2>
    <p>some information</p>
  </body>
</html>

As shown in the example, the Dublin Core vocabulary is added to the page first. This is important to be able to use the properties inside the vocabulary later on. Once the vocabulary is in place, we can give meaning to fragments on the page. In the HTML fragment above the h1 is marked as the Dublin Core title attribute and the h2 as the Dublin Core creator attribute. With these properties in place a machine, like a search engine crawler, can now also store this as additional meta data of the page.
One of the main advantages of RDFa is that your content can processed in a more efficient way, which in turn can make your page rank higher then it might have been before.
Big search engines like Google and Yahoo already scan your website for RDFa embedded information, so why not use it?

How to use RDFa in your (hippo) website?

Hippo CMS is a content (centered) management system and it differs from other CMS's in such a way that the information inside the Hippo CMS content repository is not stored or identified as pages, but rather as content. In most cases even reusable content. To be more precise: information stored inside the content repository is stored as JCR nodes and/or properties.
Since the data is just content and not bound to any front-end technology, you can either publish it as XML, (X)HTML with some help from the Hippo Site Toolkit (HST) or any other format you might like.
Now let's take the above HTML fragment as an example and let's see what this would look like on a content level. One of the most important things to mention here is that a JCR repository has the concept of nodetype definitions in which you can configure what your data model looks like. You could compare it with for instance a XML Schema or DTD for a piece of XML, but then for the nodes and properties available in a JCR repository.


Let's first start with our content definition or in content management terms the document type. We will need three fields:

  • Title
  • Author
  • Body (rich-text field)
If you would create a document type with the Hippo CMS template editor, the resulting nodetype definition will end up looking like this:


<'myproject'='http://www.myproject.org/nt/myproject/1.0'>
<'hippostd'='http://www.onehippo.org/jcr/hippostd/nt/2.0'>
<'hippo'='http://www.onehippo.org/jcr/hippo/nt/2.0'>

[myproject:text] > hippostd:publishable, hippostd:publishableSummary, hippo:document
- myproject:title (string)
- myproject:author (string)
+ myproject:body (hippostd:html)

As you can see all three fields are available and can be used later on by any client that can read from the Java content repository. To be able to render this type of information as XHTML, we will be using the Hippo Site Toolkit. The Hippo Site Toolkit uses the concept of mapping  JCR nodes to simple Java beans, to be able to have an easier development cycle without having to learn the entire JCR API.

A Java bean representation of the JCR 'myproject:text' nodetype will look like this:

import org.hippoecm.hst.content.beans.Node; 

import org.hippoecm.hst.content.beans.standard.HippoDocument;
import org.hippoecm.hst.content.beans.standard.HippoHtml;


@Node(jcrType="myproject:text")
public class TextBean extends HippoDocument{

    public String getTitle() {
        return getProperty("myproject:title");
    }
    
    public String getAuthor() {
        return getProperty("myproject:author");
    }

    public HippoHtml getBody(){
        return getHippoHtml("myproject:body");
    }

}

As you can see the Java bean is quite straight forward and easy to read.
Now if we want to render the information on a webpage, we can use for instance JSP's with expression language to get the information from the Java bean. The JSP needed for outputting the RDFa enabled webpage can be as simple as this:

<%@ page language="java" %>
<%@ taglib uri="http://www.hippoecm.org/jsp/hst/core" prefix='hst'%>
<html>
  <body xmlns:dc="http://purl.org/dc/elements/1.1/"> 
    <h1 property="dc:title">${document.title}</h1>
    <h2 property="dc:creator">${document.author}</h2>
    <hst:html hippohtml="${document.body}"/>
  </body>
</html>
As you can see it's that easy to use RDFa inside your website if you have a template independent CMS like Hippo.

It gets even better

Using RDFa for simple text can already be a great improvement for you website, but support for other RDFa vocabularies is added on a regular basis. Google recently announced support for RDFa enabled pages with videos (or media) on them. You can provide extra information for your media files to the Google crawler, like the url to the thumbnail that belongs to your video, which can be presented when your video is found as one of the results in a search performed at Google. The possibilities are enormous, so I can see a lot of good things coming from using RDFa in the near future.

I think the role that content management systems can have for RDFa should not be underestimated, since most website these days are backed by some sort of content management system.

For more information on RDFa see:

Apache Cocoon and Javascript minification

A couple of days ago somebody on the Apache Cocoon user list send a message to the mailing-list about on the fly minification of for instance Javascript files. This topic has been quite popular over the past years, since web application have become richer and Javascript files have become larger.

The ideal situation would be to compres your static files (CSS or Javascript) at build time, so this will not cost you any processing power, when your application is already running. I myself quite often use the Maven 2 YUI compressor plugin while building my projects, but in case you can't use this plugin you could think about a different solution. Since I've been using Cocoon for over more then 5 years, I thought I gave it another try and write a nice Cocoon reader that does this minification for you.

There are multiple minification and obfuscation frameworks out there. One has a greater compression ratio then the other, but for me the most well know ones are probably:
  1. Dojo Shrinksafe - Rhino based compressor from the Dojo Toolkit
  2. YUI Compressor - Rhino based compressor by Yahoo
  3. JSMin - a whitespace compressor by Douglas Crockford
Since Apache Cocoon comes with a version of Rhino and both #1 and #2 have their own version of Rhino included, this could end up having nasty conflicts because of two different versions of the library on the same classpath. Therefore I chose to write a reader based on JSMin, which does a lot of whitespace compression for you.

The implementation of this reader was quite simple and if you're interested, you can get the source here. Do keep in mind that you will have to have the JSMin.java file also on the classpath, otherwise it wil not work.

Japanese and Java resource bundles

At Hippo we have a project, which is build with Java Server Faces, for which I occasionally do some maintenance. A while ago I had an issue in our JIRA bug tracker that reported an error for the Japanese version of the website. The error came from a component that reads information from a resource bundle properties file, which is stored on the local filesystem. In this case from the Japanese version of the resource bundle (ApplicationResource_jp.properties), which is used by the web application to display some Japanese labels.

The error wasn't very clear since it only gave the following exception:

java.util.MissingResourceException:
Can't find resource for bundle java.util.PropertyResourceBundle, key 'somekey'


Looking in my project, I could clearly see that the resource bundle was there and after a quick peek at the resource bundle file itself, I could see that the requested key was also present.

After trying some different options I came to the conclusion that my web application was unable to read the actual .properties file from the classpath. By searching some more, I found out that the Java compiler and other Java tools can only process files which contain Latin-1 and/or Unicode-encoded (\udddd notation) characters. Since I was seeing Japanese characters when opening the properties file, it was clearly the case that this file did not meet those requirements.

Solving this issue was quite simple in the end, since the Sun JDK comes with a utility to help you out with files that contain characters, which are not Latin1. The utility is called: 'native2ascii' and can be run from the command-line quite easily by typing:

$ native2ascii [inputfile] [outputfile]

Once I did that the application was working like a charm again!

JCR: Sorting on child node properties

A JCR repository, like Apache Jackrabbit (basis for Hippo CMS 7's content repository), mainly consists of nodes and properties.
As described in the JCR specification, a Java Content Repository should support 2 different query syntaxes: XPath and SQL. Once you get the hang of the syntax, performing a search on a JCR repository is quite easy, but today I came into a situation where I was not able perform the query I wanted. In this post I'll try to describe what my problem was and how the same result can still be achieved.

The content model


Let's first start with my content model. The actual node definition for my project looks something like the below:


[myproject:metadata]
- myproject:creator (string)
- myproject:language (string)
- myproject:publicationDate (date)
- myproject:availableUntil (date)
- myproject:lastModified (date)
- myproject:keywords (string)
- myproject:contributor (string)

[myproject:news] > hippostd:publishable, hippostd:publishableSummary, hippo:document
- myproject:title (string)
+ myproject:introduction (hippostd:html)
+ myproject:body (hippostd:html)
+ myproject:metadata (myproject:metadata)


I came into a situation where I wanted to search for nodes of type 'myproject:news', but sorted on the 'myproject:publicationDate' property of the 'myproject:metadata' subnode. Writing an XPath for such a query is quite easy if you're familiar with the XPath syntax.

Let's start out with a very simple search and just search for nodes of the type 'myproject:news' , which in XPath looks like:


//element( *, myproject:news)


Now if we would want to order these node types based on for instance the myproject:title property the same XPath query looks like:


//element( *, myproject:news) order by @myproject:title descending


Now if we would want to sort on the 'myproject:publicationDate' property of the myproject:metadata subnode, I would expect the same XPath to be:


//element( *, myproject:news) order by myproject:metadata/@myproject:publicationDate descending


Unfortunately this query did not seem to actually sort the result on the publicatenDate property as I would have expected. I was searching for typos first, but it appeared that the syntax of my query was ok, but it appeared that support for child axis in order by clauses was not yet supported by Jackrabbit itself.

Then I found this JIRA issue[1] in the Jackrabbit bugtracker describing this problem and there appears to be a patch available. I'm still wondering how much of a performance impact this might have for large repositories, where you might want to sort on a property of a child node 'n'-levels deep underneath the actual node.

If you want to sort on properties of a specific nodetype, you will have to add the sortable properties to the actual nodetype, which you are searching for and can't put them on a subnode.
It seems that the patch, which should fix this problem, has already been comitted to the Jackrabbit trunk and should be available from Jackrabbit 1.6.0 as marked in the JackRabbit JIRA.

Mozilla lightning in Ubuntu Jaunty

I just did a fresh install of Ubuntu Jaunty Jackalope(9.04). After reinstalling Thunderbird (my favorite mail client), I was unable to see my Google calendars with Thunderbirds Lightning extension.

The Lightning extension seemed to work, since I saw a calendar, but I was unable to actually add one of my existing calendars.

Google gave me a quick answer, which was located at the ubuntu forums.

To fix this a couple of simple steps needed to be taken:

First remove the Lightning extension from Thunderbird: Tools->Addons, select Lightning, and uninstall. Close Thunderbird.

Now install libstdc++5:

$ sudo apt-get install libstdc++5

Now open Thunderbird again, and go back to Tools->Addons. Click the 'Install' button, and browse to the extension file, lightning-0.9-tb-linux.xpi - and open it. At the Software Installation prompt, click 'Install Now' after the short countdown, restart Thunderbird.

Once Thunderbird has been restarting you should be able to add your calendars again.
top