Showing posts with label hippo. Show all posts
Showing posts with label hippo. Show all posts

09 February 2012

Get in control with Spring Insight!

Ever wondered what your application was doing? Why that specific page was so slow? I've asked myself this question numerous times and always had to change some log level or attach a profiler to get actual feedback on what was going on inside my application.

The other day while commuting from home to work, I discovered the Spring Insight project. From what I've seen so far Spring Insight is a set of inspections (plugins) which are visually displayed in a web application. To get an idea of what Spring Insight can do for you, be sure to check out the introduction screencast.

tc Server Developer Edition Screencast


By default Spring Insight comes with a default set of plugins/inspections for different kinds of frameworks/libraries like:
  • Spring Web, Spring core
  • JDBC
  • Servlets
  • Hibernate
  • Grails
There are more plugins available and it's even quite easy to create some of your own and that's what the rest of this post is about.

Writing your own Spring Insight plugin

Working with Hippo CMS driven web applications every day I had the idea of creating a Spring Insight plugin for the Hippo Site Toolkit (HST in short). The HST consists of a set of components that interact with the Hippo content repository. During a single request multiple components can be called and for each component there are multiple processing phases. So my initial idea for the Spring Insight plugin was to show:
  1. The amount of time taken for each processing phase of an HST component
  2. The time it takes to perform an HstQuery to the repository
Because the default Spring Insight plugins are open source I was able to write my first plugin in about 30 minutes or so. A large part of those 30 minutes were taken up with learning AspectsJ, because I'd never used that before.

Getting started

For this post we will now focus on creating an inspection on performing HST queries. From the Insight web application view I would like to see the information  of an HstQuery and time it took to perform the actual query.
With AspectJ you can pick a join point and inspect for instance the execution of that join point. In our case I would like to inspect the HstQuery.execute() method. By putting the join point on the HstQuery interface, we've made sure that any object extending the HstQuery will be able to represent it's data within the Insight web application.

Let's first take a look at the what such an inspection looks like.


package com.jeroenreijn.insight.hst;

import com.springsource.insight.collection.AbstractOperationCollectionAspect;
import com.springsource.insight.intercept.operation.Operation;
import com.springsource.insight.intercept.operation.OperationType;

import org.aspectj.lang.JoinPoint;
import org.hippoecm.hst.content.beans.query.HstQuery;
import org.hippoecm.hst.content.beans.query.HstQueryResult;
import org.hippoecm.hst.content.beans.query.exceptions.QueryException;

/**
 * Aspect for collecting HstQuery executions.
 */
public aspect HstQueryOperationAspect extends AbstractOperationCollectionAspect {

    private static final OperationType TYPE = OperationType.valueOf("query_execute");

    public pointcut collectionPoint(): execution(HstQueryResult HstQuery.execute());

    public Operation createOperation(JoinPoint jp) {
        HstQuery query = (HstQuery) jp.getTarget();
        Operation op = new Operation()
                .type(TYPE)
                .label("HstQuery");
        op.sourceCodeLocation(getSourceCodeLocation(jp));
        try {
            op.put("query", query.getQueryAsString(false));
            op.put("limit", query.getLimit());
            op.put("offset", query.getOffset());
        } catch (QueryException e) {
            // ignore for now
        }
        return op;
    }

}

The more important part of the above collection aspect is the collectionPoint poincut, where we define what kind of operation we would like to collect information from. In this case we define an inspection on the HstQuery.execute() method.
Next to the collection point you will also see the createOperation()  method. which allows you to collect certain information from the current state of the collection point. In the above code snippet we collect the actually HstQuery object and get some information from it like the actual JCR XPath query, the limit set on the query and the offset.  That's all for the information collection part of our plugin.

Now that we've created the aspect for the HstQuery, let's create a view for this inspection. You can create a freemarker template for each inspection if you want. For the HstQuery I've created the following template.

<#ftl strip_whitespace=true>
<#import "/insight-1.0.ftl" as insight />

<@insight.group label="HST Query">
    <@insight.entry name="Query" value=operation.query />
    <@insight.entry name="Limit" value=operation.limit />
    <@insight.entry name="Offset" value=operation.offset/>
</@insight.group>

In the above template we define the values that we've put as attributes on our Operation object. All we have to do now is wire the operation and the view together inside the plugin configuration.

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xmlns:insight="http://www.springframework.org/schema/insight-idk"
 xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
  http://www.springframework.org/schema/insight-idk http://www.springframework.org/schema/insight-idk/insight-idk-1.0.xsd">

<insight:plugin name="hst" version="${project.version}" publisher="Jeroen Reijn" />

  <insight:operation-view operation="query_execute" template="com/jeroenreijn/insight/hst/query.ftl" />

  <insight:operation-group group="Hippo" operation="query_execute" />

</beans>


So now that we've finished our plugin, we package it and drop it inside the collection-plugins folder of our Spring Insight instance. Next we fire up the VMware vFabric TM tc Server and do some requests on the web application that we would like to get some information from. Once that's done switch the URL in the browser to '/insight' and there is the information collected by Spring Insight. The image below show exactly the information that we tried to show.




In this example request you can see from the top of the call stack, the chain of filters that the request went through and all of the HST components. For each component you can now see the class, the window name (as you can also see in the CMS console) and the render path ( the JSP or Freemarker template) used for rendering the information of the component. You can also expand an HST component when it contains an HstQuery.

The advantage of having such a plugin might help us identify some slow pages that might have slow JCR queries or components that do extensive (unnecessary) processing.

Summary

Spring Insight is a very interesting project. Doing a quick scan for troublesome code is relatively fast, but can for now only be done with the VMware vFabric TM tc Server, so you cannot run it in your personal preferred application container like Tomcat, Jetty or JBoss. I've personally added Spring Insight to my default set of tools for figuring out performance issues when I need to do a review of a project.

All of the above code and how to install this HST Spring Insight plugin can be found on the plugin project page on Github.

12 December 2011

Make your date range queries in Jackrabbit go faster!

As you might know Hippo CMS uses Apache Jackrabbit as the core for it's content repository. One of Jackrabbits features is 'search' and the execution of most queries are delegated to Apache Lucene. If you want to keep the queries as fast as possible you sometimes have to analyze how your repository is behaving with the content in your project.

The problem

One of our customers noticed that they had some unexpected memory peaks and slow response times while their system was running during the day. They were using Jackrabbit 2.2.5 at the time. To get some more insight on what was going on we first started out by looking at the server logs. They were using the Jackrabbit search functionality quite heavily, so we replayed the server request logs on the acceptance environment. During the replay we set the log level of the Query to debug, so we were able to see how long each query took. We soon discovered that the longest queries were 'range queries' and also noticed that during the execution of such a range query the memory usage reached some peaks.

User view of a date range
In this specific case the problamatic queries were date range queries. A date range query is a query that for instance searches for a document between day x and day y.

As an example: Give me all documents that were created between 2007-01-01 and 2011-01-01. This is a very typical search filter, which you will see in all kind of applications.

In plain Jackrabbit a date range query (xpath notation) will look something like this:
//element(*,custom:document)[@custom:date >=xs:dateTime('2007-01-01T00:00:00.000Z') and @custom:date <=xs:dateTime('2010-01-01T00:00:00.000Z')] order by @custom:date descending

Now if you would be using the Hippo HST you would have probably used the Filter.addGreaterOrEqualThan and passed along a Date object as an argument, which automatically is being converted into the above syntax.

The analysis

To get some insight in what was causing this behaviour I created a unit test that performed a variety of range queries on a set of 100.000 simple documents/nodes. For this test I created a very simple nodetype definition that would hold a date/date-time property in different kind of formats. The used nodetype looks like this:

[custom:document]
- custom:date (date) 
- custom:dateasstringwithhoursandminutesandseconds (string)
- custom:dateasstringwithhoursandminutes (string)
- custom:dateasstring (string)

So looking at the above nodetype definition we have a node that contains 4 properties, where we have a normal JCR date (and time) property and next to that there are 3 properties that have a more fine-grained format of the date-time.

Trying to create a real world scenario the unit test generates documents starting from 01-01-2001. With every new document the test adds 1 hour and 3 seconds to the date field. After creating 100.000
nodes it ends up somewhere around 2012-07-03 03:19:57. The test will then sleeps for about 60 seconds (gives lucene time to finish up it's indexing) before it starts doing the range queries.

In finding a solution I created 4 different versions of the range query where I start narrowing down on the date format to get closer to the actual date (without the time). In the test case 4 different kind of queries are peformed and repeated 5 times before moving on to the next type of query. The range queries performed are:
  • Normal range query (with JCR date-time format like mentioned above)
  • Range query with date as string with format yyyyMMddHHmmss
  • Range query with date as string with format yyyyMMddHHmm
  • Range query with date as string with format yyyyMMdd

Queries 1 to 3 took an average of 3500 ms (3.5 seconds) with a large memory footprint usage of about 380MB per query. That's huge and slow for just a simple query! You can imagine this might end up leading to OutOfMemory errors.
Memory usage overtime while performing the queries (graph comes from VisualVM)

However the fourth query is actually quite fast and takes less memory! It's a really significant difference. The fourth query takes about 180 ms on average and uses about 40-50MB. It's still a lot of memory (in my opinion), but since they are a lot faster the total amount of used memory might not be that large, because the amount of memory is freed much earlier in the process.

Looking at the graph on the right you will see there is no difference between query 1 to 3, but option 4 (which is in fact actually what a date range should do) showed a really large improvement on overall performance and memory usage. So in the end it turns out that the 'time' in the default JCR date format was actually giving us the issues. Because the time was added to the date value, the number of unique values for the date property in the lucene index had become larger then needed causing the slowdown.

A solution

Now as a solution we solved this by adding a derived data function that extracts the simplified date property. For range queries we now do the range on the 'yyyyMMdd' formatted date and order the results by the original date property, so that the time is taken into account and the sort order is correct. Using the simple date format will also help when trying to find documents/nodes that belong to a certain date. This has just turned into a simple 'equals' instead of a range from 0:00 till 23:59:59.

If you are currently using Apache Jackrabbit and are using these kind of queries you might want to rethink you current content model. A small change might give you a huge performance boost!


16 February 2011

HIPPOs RESTful JAX-RS Component Support & Spring Android

The new Hippo CMS 7.5 release brings some quite interesting features. The most interesting new feature for me was support for RESTful components within the Hippo Site Toolkit (HST-2 v2.20.01). Being able to expose data in a RESTful manner opens up a whole new set of possibilities for external application developers.

As you might have read in my previous post, I'm building a sample application to get acquainted with the Android platform. My previous post was mainly focussed on layouts and ListViews, but this time I will be focussing on information retrieval from an external REST service. That's why I've used the default REST service that comes with the online Hippo GoGreen demo as my source of information.  The GoGreen REST service exposes a list of 'top products' with additional information about the products that can be used nicely for this demo project, but first let's start at the beginning.

Getting started with RESTful HST-2 components

From what I've seen in the documentation and in the GoGreen source code, there are two different methods of exposing data with the RESTful components.
  1. The data can be exposed based on the primary JCR NodeType of a resource inside the Hippo repository. The HST-2 sitemap will determine the URLs of the items based on the relative path of the items inside the repository. This approach can be done with the JaxrsRestContentPipeline.
  2. A sitemap item (or mount) can be configured as a JaxrsRestPlainPipeline. By doing so, the HST will try to match the request within a Jax-RS based resource provider component that handles all the (relative) URL matching from there on.   
    In this example I will use the JaxrsRestPlainPipeline approach, which is also used by the Hippo GoGreen demo to create the 'top products' resource. The response output of a REST pipeline can be in all kinds of different formats. For this example we will use JSON, but you can also use XML instead.

    Configuration


    The first step in the proces of setting up our own REST service is to create an HST mount. The configuration for our mount has to look something similar to :

    <sv:node sv:name="restapi">
      <sv:property sv:name="jcr:primaryType" sv:type="Name">
        <sv:value>hst:mount</sv:value>
      </sv:property>
      <sv:property sv:name="hst:alias" sv:type="String">
        <sv:value>restapi</sv:value>
      </sv:property>
      <sv:property sv:name="hst:authenticated" sv:type="Boolean">
        <sv:value>false</sv:value>
      </sv:property>
      <sv:property sv:name="hst:isSite" sv:type="Boolean">
        <sv:value>false</sv:value>
      </sv:property>
      <sv:property sv:name="hst:mountpoint" sv:type="String">
        <sv:value>/hst:hst/hst:sites/rest-live</sv:value>
      </sv:property>
      <sv:property sv:name="hst:mountsite" sv:type="String">
        <sv:value>site</sv:value>
      </sv:property>
      <sv:property sv:name="hst:namedpipeline" sv:type="String">
        <sv:value>JaxrsRestContentPipeline</sv:value>
      </sv:property>
      <sv:property sv:name="hst:roles" sv:type="String">
        <sv:value>everybody</sv:value>
      </sv:property>
      <sv:property sv:name="hst:showport" sv:type="Boolean">
        <sv:value>true</sv:value>
      </sv:property>
      <sv:property sv:name="hst:subjectbasedsession" sv:type="Boolean">
        <sv:value>true</sv:value>
      </sv:property>
      <sv:property sv:name="hst:types" sv:type="String">
        <sv:value>rest</sv:value>
      </sv:property>
    </sv:node>
    

    As you can see there is lot to configure for a mount, but I don not want to go into much detail. The next step is to setup an HST sitemap for this mount. In the configuration above, our mount uses a default namedpipeline of type  JaxrsRestContentPipeline , since we want to use a  JaxrsRestPlainPipeline, we can override the type of pipeline by specifying the hst:namedpipeline property on an HST sitemap item for this mount, for example for the sitemap item called 'topproducts'.

    <sv:node sv:name="topproducts">
      <sv:property sv:name="jcr:primaryType" sv:type="Name">
        <sv:value>hst:sitemapitem</sv:value>
      </sv:property>
      <sv:property sv:name="hst:namedpipeline" sv:type="String">
        <sv:value>JaxrsRestPlainPipeline</sv:value>
      </sv:property>
    </sv:node>
    

    Spring Configuration

    Now after we stored the HST-2 configuration in the repository, the next step is to register our new component as a plain resource provider in our website Spring configuration. We can do this by creating a file called custom-jaxrs-resources.xml in the src/main/resources/META-INF/hst-assembly/overrides/ folder of our Hippo site project with the following content.

    <?xml version="1.0" encoding="UTF-8"?>
    <beans xmlns="http://www.springframework.org/schema/beans" 
           xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
           xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd">
      
      <import resource="classpath:/org/hippoecm/hst/site/optional/jaxrs/SpringComponentManager-rest-jackson.xml" />
      <import resource="classpath:/org/hippoecm/hst/site/optional/jaxrs/SpringComponentManager-rest-plain-pipeline.xml" />
      <import resource="classpath:/org/hippoecm/hst/site/optional/jaxrs/SpringComponentManager-rest-content-pipeline.xml" />
      
      <!-- Custom JAX-RS REST Plain Resource Providers to be overriden. -->
      <bean id="customRestPlainResourceProviders" class="org.springframework.beans.factory.config.ListFactoryBean">
        <property name="sourceList">
          <list>
            <bean class="org.apache.cxf.jaxrs.lifecycle.SingletonResourceProvider">
              <constructor-arg>
                <bean class="com.onehippo.gogreen.jaxrs.services.TopProductsResource" />
              </constructor-arg>
            </bean>
          </list>
        </property>
      </bean>
          
    </beans>
    

    With this configuration in place the HST-2 has knowledge of our custom resource and the TopProductsResource can start creating the response.

    Now let's take a look at our TopProductsResource.

    @Path("/topproducts/") 
    public class TopProductsResource extends AbstractResource {
      @GET
      @Path("/topproducts/")
      public List<ProductLinkRepresentation> getProductResources(@Context HttpServletRequest servletRequest, @Context HttpServletResponse servletResponse, @Context UriInfo uriInfo,
                @QueryParam("sortby") @DefaultValue("hippogogreen:rating") String sortBy, 
                @QueryParam("sortdir") @DefaultValue("descending") String sortDirection,
                @QueryParam("max") @DefaultValue("10") String maxParam) {
            
          List<ProductLinkRepresentation> productRepList = new ArrayList<ProductLinkRepresentation>();
          HstRequestContext requestContext = getRequestContext(servletRequest);
            
          try {
              Node mountContentNode = getNodeFromMount(requestContext);
              HstQueryResult result = getHstQueryResult(sortBy, sortDirection, maxParam, requestContext, mountContentNode);
              HippoBeanIterator iterator = result.getHippoBeans();
    
              while (iterator.hasNext()) {
                  Product productBean = (Product) iterator.nextHippoBean();
                    
                  if (productBean != null) {
                    ProductLinkRepresentation productRep = new ProductLinkRepresentation(requestContext).represent(productBean);
                    productRepList.add(productRep);
                  }
              }
          } catch (Exception e) {
            log.warn("Failed to retrieve top products. {}", e);        
            throw new WebApplicationException(e);
          }
            
          return productRepList;
      }
    }
    

    The TopProductsResource has a @Path("/topproducts/") annotation set on the class level. This is what's making the request to '/topproducts' being handled by this specific resource. As you can see the only other thing the resource does is perform the query from the getProductResources() method. Take a look at the full source code for more details on the TopProductsResource class.

    Response output


    Now that we've setup the configuration and put the component in place, let's take a look at our actual response. You can see what the response of the TopProductsResource is if you go to the following URL:

    http://www.demo.onehippo.com/restapi/topproducts?_type=json

    Note: the URL might not be available at the time you try it, because the GoGreen demo is restarted every 30 minutes with a fresh set of content. If the URL does not work try again in 5 minutes.

    Since we specified the response type as JSON, the actual response should look something like what is shown below. For readability I've removed some properties, but I guess you get the idea.

    [
      {
        productLink: "http://www.demo.onehippo.com/restapi/products/food/2010/07/organic-cotton-reusable-lunch-bag./"
        price: 34
        rating: 5
        smallThumbnail: "http://www.demo.onehippo.com/binaries/smallthumbnail/content/gallery/products/2010/06/organic-lunch-bag.jpg"
        localizedName: "Organic Cotton Reusable Lunch Bag"
        primaryNodeTypeName: "hippogogreen:product"
      },
      {
        productLink: "http://www.demo.onehippo.com/restapi/products/food/2010/07/birch-wood-compostable-cutlery./"
        price: 5
        rating: 4.25
        smallThumbnail: "http://www.demo.onehippo.com/binaries/smallthumbnail/content/gallery/products/2010/07/wooden-cutlery.png"
        localizedName: "Birch Wood Compostable Cutlery"
        primaryNodeTypeName: "hippogogreen:product"
      }
    ]
    

    As you can see the response is quite simple and contains an array of product items with their properties.
    If you want to know more about RESTful Component support there is a nice page on the HST-2 wiki. Now let's move on with the Android part of this post.

    Spring Android

    Android version 2.2 has native support for handling JSON. I tried that, but I recently discovered Spring Android. Spring Android is quite new and gives you an easy to use REST client. The reason I chose to use Spring Android is that it takes less code to handle requests then by doing it the native Android way with the default HttpClient. Now when we combining Spring Android with Jackson it makes working with JSON really easy. All you have to do is create a mapping class, so that Jackson knows how to map the response array.

    To be able to work with the JSON response we will need the following three libraries in our Android project.
    • spring-android-rest-template-1.0.0.M2.jar
    • jackson-core-asl-1.7.1.jar
    • jackson-mapper-asl-1.7.1.jar

    Using Spring Android


    For my Android application I've created a service class called ProductService.

    public class ProductService {
      private static final String RESTAPI_BASE_URI = "http://www.demo.onehippo.com/restapi";
      private static final String RESTAPI_RESPONSE_TYPE = "_type=json";
    
      public static ArrayList<Product> getAllProductsFromHippo() {
        ArrayList<Product> products = new ArrayList<Product>();
        RestTemplate restTemplate = new RestTemplate();
        
        List<HttpMessageConverter<?>> messageConverters = restTemplate.getMessageConverters();
        //add the Jackson mapper for easy mapping of JSON to POJO's
        messageConverters.add(new MappingJacksonHttpMessageConverter());
    
        String url = RESTAPI_BASE_URI + "/topproducts./?" + RESTAPI_RESPONSE_TYPE;
    
        Product[] productsFromHippo = restTemplate.getForObject(url, Product[].class);
        products.addAll(Arrays.asList(productsFromHippo));
        return products;
      }
    }
    

    As you can see the getAllProductsFromHippo method uses the Spring Android RestTemplate in combination with the MappingJacksonHttpMessageConverter to map the JSON response to an array of Product classes. Let's have a closer look at a Product class.

    package org.onehippo.gogreen.android.data;
    
    import org.codehaus.jackson.annotate.JsonIgnoreProperties;
    import org.codehaus.jackson.annotate.JsonProperty;
    
    @JsonIgnoreProperties(ignoreUnknown = true)
    public class Product {
    
      @JsonProperty
      private String localizedName;
    
      public String getLocalizedName() {
          return localizedName;
      }
    
      public void setLocalizedName(final String localizedName) {
          this.localizedName = localizedName;
      }
    }
    

    The Product class is quite simple. It only contains the localized name (for now). To make sure the mapping succeeds, I've also added the annotation JsonIgnoreProperties , so that it will ignore unknown properties during the mapping phase.
    Now if we provide the list of Product items to the Android ArrayAdapter, which is used by our ListView we will see all the items in the list returned by the HST-2 REST service.

    Resources used

    The following resources were used to create this post:

    10 January 2011

    Tip of the day: sharing information between HST components

    If you're working as a web developer with Hippo CMS, I guess you have written quite a few HST components. I presume that by now you will have a basic understanding of what HST components can and can't do.
    I've had the situation myself where I wanted to share some information between components on a single page. I first thought I could simply achieve this by adding an attribute to the request, but that didn't work. To show you what you can do, let's first start of with a bit of background information about what's actually going on inside the HST, when an incoming request is being processed.

    HST request processing


    Let's first have a look at a page definition. In a traditional HST page definition you have a tree of components. The figure below describes a normal page layout defintion.


    As you can see the page definition in the above figure has a root component (the page definition itself) with three child components: component 1, component 2 and component 3. Now the follow flow chart shows what the HST will do when it's handling a request.

    This flow chart of couse does not show all the steps taken by the HST,
    but it should give you a good impression of what's going on.

    At first the HST will lookup the correct page definition based on the HST sitemap. Once the correct page definition has been found the HST will start processing.

    The entire component tree for the current page definition is fetched and once the tree is there, the HST will use the AggregationValve to run down the component tree and invoke the doBeforeRender() methods of all of the available components.

    Once the entire component three has been processed in the 'before render phase', it will start processing the doRender() methods of all the components, so that the output of every single component will be generated and aggregated to end up in the end result.

    Now the important part to know here is that each invoked component will have it's own individual HStRequest object. Now if you want to share any kind of information you cannot simply use the HstRequest, because all the information is gone while processing the next component.
    However there is an object attached to these individual HstRequest objects and that is the HstRequestContext. The HstRequestContext hold some quite useful information, which you help support your components, but you can also add your own information by setting some attribute.


    Enough theory for now. If you would like to have a deeper knowledge of the HST request processing, the proces is described in much more detail on the HST2 wiki.


    Now for some code


    As an example let's take the usage of banners on a page. Let's say we have a boolean flag configured somewhere, which will define if banners should be shown on our page.

    Let's presume that a banner can be shown by multiple components on a page. If we take figure 1, we could say that a banner could appear above component 2 and underneath component 3.

    Now we could let both component 2 and component 3 figure out if the banners should be shown, but we could also share the information if component 2 or 3 is executed first, so that the other component does not have to read the configuration over again. The resulting code is quite simple. Let's have a look.

    @Override
    public void doBeforeRender(HstRequest request, HstResponse response) {
      boolean isBannerEnabled;
      HstRequestContext requestContext = request.getRequestContext();
      //let's see if the flag has been set on the request context
      if(requestContext.getAttribute(IS_BANNER_ENABLED_ATTRIBUTE)!=null){
        isBannerEnabled = (Boolean)requestContext.getAttribute(IS_BANNER_ENABLED_ATTRIBUTE);
      } else {
        //nothing on the request context, so lets figure it out
        isBannerEnabled = isBannerEnabled(request);
        //put the result on the request context so all other components can benefit
        requestContext.setAttribute(IS_BANNER_ENABLED_ATTRIBUTE, isBannerEnabled);
      }
      //put on the request for the current component
      request.setAttribute(IS_BANNER_ENABLED_ATTRIBUTE, isBannerEnabled);
    }
    
    /**
     * Simply return true for this example.
     */
    boolean isBannerEnabled(HstRequest request) {
      return true;
    }
    

    As you will see the actual code is really simple. All you will have to do is store information on the HstRequestContext. That's it. Well these were my 2 cents for today. Go and have fun and try to leverage the power of the HST2.

    21 October 2010

    Unit testing your HST2 components with EasyMock

    Quality is an important aspect of every software development project. Writing unit tests is just one part of keeping an eye on quality. In this post I will try to explain how you can unit test your Hippo Site Toolkit (HST2) components, so you can be sure that the component still behaves as expected even after multiple maintenance cycles.

    A mocking framework

    Unit testing is the testing of software units (for instance HST2 components) in isolation. However, most units do not work alone, but they collaborate with other units, like the HST2 does for instance with a running JCR repository and a live HttpServletRequest (wrapped inside an HstRequest). To test a unit in isolation, we have to simulate these collaborations in our tests.
    One way of working around such collaborations is by using Mock objects. A Mock Object is a test-oriented replacement for such a collaborator. It is configured to simulate the object that it replaces in a simple way. For this post I use EasyMock, but there are other mocking frameworks out there.

    Setting up your environment

    If you are regular reader, you might have noticed that I've been using Maven2 in most of my posts, so this time will not be different. To be able to test your HST components, you will need to add the following dependencies to your project/module pom.xml.

    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>4.5</version>
      <scope>test</scope>
    </dependency>
    
    <dependency>
      <groupId>org.easymock</groupId>
      <artifactId>easymock</artifactId>
      <version>2.5.2</version>
      <scope>test</scope>
    </dependency>
    
    <dependency>
      <groupId>org.easymock</groupId>
      <artifactId>easymockclassextension</artifactId>
      <version>2.5.2</version>
      <scope>test</scope>
      <exclusions>
        <exclusion>
          <groupId>cglib</groupId>
          <artifactId>cglib-nodep</artifactId>
        </exclusion>
      </exclusions>
    </dependency>
    
    

    Now that we have setup all the needed dependencies let's create an HST component to get started.

    Basic HST2 Component


    So let's start of with a simple/basic HST2 component. Here we have a simple component that tries to get a HippoBean wrapping a JCR node for the current request and puts the bean as an attribute on the request.

    import org.hippoecm.hst.component.support.bean.BaseHstComponent;
    import org.hippoecm.hst.content.beans.standard.HippoBean;
    import org.hippoecm.hst.core.component.HstComponentException;
    import org.hippoecm.hst.core.component.HstRequest;
    import org.hippoecm.hst.core.component.HstResponse;
    
    public class AbstractBaseHstComponent extends BaseHstComponent{
    
        @Override
        public void doBeforeRender(HstRequest request, HstResponse response) 
            throws HstComponentException {
    
            HippoBean bean = getContentBean(request);
            if(bean!=null) {
                request.setAttribute("document",bean);
            }
        }
    }
    

    Looks quite simple right? Now let's move on to the test.

    The actual test

    Now that we've seen what our component looks like, let's take a look at how we can test this class. The component doesn't do a lot, but there are a couple of things that we want to test:

    • that the getContentBean method is called
    • when the bean is not null the bean is set as an attribute on the request
    • there is an attribute on the request with the name document
    • the document from the request attribute is the same as the one put on the request

    So now let's translate that into some code.

    Before we can actually test our doBeforeRender method we need to do some setup before we can continue.

    MockHstRequest fakeRequest;
    MockHstResponse fakeResponse;
    AbstractBaseHstComponent component;
    
    @Before
    public void setUp() throws Exception {
        fakeRequest = new MockHstRequest();
        fakeResponse = new MockHstResponse();
        component = createMockBuilder(AbstractBaseHstComponent.class).
                    addMockedMethod("getContentBean", HstRequest.class).
                    createMock();
    }
    

    Now looking at this setUp() method, you will notice that at first we create mocked versions of a request and response. These objects are necessary because they are parameters for our method under test. In a normal environment these objects will be created by the servlet container, but since we're unit testing we have to create these ourselves.

    Next to that we create a mocked version of our AbstractBaseHstComponent. We do this because we need to mock the getContentBean method, which in a normal live environment performs interaction to a live JCR repository. The logic for getting the bean based on repository configuration is not useful for our test, so we mock the method.

    Now let's have a look at the total test case and the actual test method.

    import org.hippoecm.hst.content.beans.standard.HippoBean;
    import org.hippoecm.hst.content.beans.standard.HippoDocument;
    import org.hippoecm.hst.core.component.HstRequest;
    import org.junit.Before;
    import org.junit.Test;
    import org.onehippo.hst.mock.MockHstRequest;
    import org.onehippo.hst.mock.MockHstResponse;
    import static org.easymock.classextension.EasyMock.*;
    import static org.junit.Assert.*;
    
    /**
     * Test for {@link com.jeroenreijn.site.components.AbstractBaseHstComponent}
     */
    public class AbstractBaseHstComponentTest {
    
        MockHstRequest fakeRequest;
        MockHstResponse fakeResponse;
        AbstractBaseHstComponent component;
    
        @Before
        public void setUp() throws Exception {
            fakeRequest = new MockHstRequest();
            fakeResponse = new MockHstResponse();
            component = createMockBuilder(AbstractBaseHstComponent.class).
                    addMockedMethod("getContentBean", HstRequest.class).
                    createMock();
        }
    
        @Test
        public void testDocumentOnRequestAfterDoBeforeRender() throws Exception {
    
            HippoBean bean = new HippoDocument();
    
            //record the expected behavior
            expect(component.getContentBean(fakeRequest)).andReturn(bean);
    
            //stop recording and switch the mocked Object to replay state.
            replay(component);
    
            component.doBeforeRender(fakeRequest,fakeResponse);
    
            //verify the specified behavior has been used
            verify(component);
    
            assertSame(fakeRequest.getAttribute("document"),bean);
        }
    }
    
    

    As you might notice, the testDocumentOnRequestAfterDoBeforeRender() method tests the doBeforeRender method and checks all of the above requirements.

    The next step


    Even though you can create a mock of most objects quite easily, it's much better to have some native support/provided mock objects for most HST2 classes. Therefore I've added a patch to JIRA, which adds more mocked classes that can be used for testing, so you do not have to mock explicit methods. Next to that it will create test maven artifact, which you can use when testing your HST component without having to mock explicit methods or objects yourself.
    Let me know if you run into any issues or have some ideas on improvement. It can make all our lives better.

    ps. I've just noticed that Shane Smith of iProfs created a similar post with using Mockito.

    10 June 2010

    An introduction to Hippo CMS 7 updater modules

    Once your Hippo CMS project is in production, there is always the case that you or your customer wants to add extra features to the website or portal. This might mean that the data model has to change.
    The data model for a piece of content in Hippo CMS is stored based on a JCR nodetype defintion.
    As you might know the editor templates, which are related to the data model, can be edited live in the CMS. When you're done editing the editor templates, you can use the 'Update all content' button to persist the changes in your existing content model.
    This might be a nice way of doing things during development, but performing such an operation on a live clustered environment can be quite tricky and you might want to do it in a more controlled and tested way.
    As of Hippo CMS 7.2 it's possible to perform these changes by writing updater modules in plain Java. In this post I will try to explain the concept of updater modules and will show you how to write these updater modules and use them for updating your data model.

    Writing an Updater module


    When you start writing an updater module you can start out with the following simple class file:

    import org.hippoecm.repository.ext.UpdaterModule;
    import org.hippoecm.repository.ext.UpdaterContext;
    
    public class MyProjectUpdater implements UpdaterModule {
    
        public void register(final UpdaterContext context) {
           .....
        }
    
    }
    

    As you can see in the above code snippet the MyProjectUpdater extends the UpdaterModule interface, which requires you to implement the register() method. On your classpath you will need the hippo-ecm-api library, which comes with the Hippo CMS 7 war package or you can get it from the maven 2 repository.

    Updaters and versioning


    Performing such an update on your data model is most of the time specific for the current release of your project. The engine behind the updater modules can be instructed to only trigger certain updater modules if certain requirements (like the version of your project) are met. You can instruct the updater engine to trigger a specific updater module by registering a start tag on the UpdaterContext. In the following example we will:

    • register a unique name for our updater module
    • register a start tag for which this updater module should be triggered
    • register an end tag to which this version should change once the update was successful

    import org.hippoecm.repository.ext.UpdaterModule;
    import org.hippoecm.repository.ext.UpdaterContext;
    
    public class MyProjectUpdater implements UpdaterModule {
    
        public void register(final UpdaterContext context) {
            context.registerName("myproject-updater-v1-to-v1_1");
            context.registerStartTag("myproject-v1");
            context.registerEndTag("myproject-v1_1");
       }
      
    }
    

    In the above updater module we will update our project from version 1 to version 1.1.
    Our updater module does not do any radical changes so far. It will only change the start version in the repository for our project. You can find the current registered version(s) inside the Hippo repository with the Hippo CMS Console view on the path:

    /hippo:configuration/hippo:initialize/@hippo:version.
    

    If you don't have a project specific version yet, I would recommend creating one, because it will help you with using these updater modules.

    Now let's continue with some more interesting stuff.

    Visitors


    You might want to change the data model with some simple operations like: adding a field, removing a field or introducing some new nodetypes. The hippo repository provides several visitors for doing changes inside the repository while performing an update. By default Hippo CMS 7.3 comes with 4 types of visitors. The following diagram shows you the class hierarchy for the ItemVisitor interface.


    As you can see the following visitors are available:
    • NodeTypeVisitor - visits nodes of a specific primary type
    • PathVisitor - visits nodes based on their path in the repository
    • QueryVisitor - visits nodes found based on a JCR query
    • NamespaceVisitor - visits specified namespaces

    Of course you can also write your own visitor if you want, but I guess the provided visitors are the most commonly used.

    How to use a visitor in your module

    Now that we've seen the available visitors, let's see how we can use them. I think the most common use for updaters is when you need to update your data model without any extra processing involved. Let's say our current datamodel (cnd) version 1.0 looks like this:

    <hippo='http://www.onehippo.org/jcr/hippo/nt/2.0'>
    <hippostd='http://www.onehippo.org/jcr/hippostd/nt/2.0'>
    <hippostdpubwf='http://www.onehippo.org/jcr/hippostdpubwf/nt/1.0'>
    <myproject='http://www.myproject.org/jcr/nt/1.0'>
    
    [myproject:basedocument] > hippo:document, hippostdpubwf:document, hippostd:publishableSummary
    
    [myproject:news] > myproject:basedocument
    - myproject:title (string)
    + myproject:text (hippostd:html)
    

    We want to move to version 1.1, where we added a new subtitle field. The new nodetype defintion now looks like this:

    <hippo='http://www.onehippo.org/jcr/hippo/nt/2.0'>
    <hippostd='http://www.onehippo.org/jcr/hippostd/nt/2.0'>
    <hippostdpubwf='http://www.onehippo.org/jcr/hippostdpubwf/nt/1.0'>
    <myproject='http://www.myproject.org/jcr/nt/1.1'>
    
    [myproject:basedocument] > hippo:document, hippostdpubwf:document, hippostd:publishableSummary
    
    [myproject:news] > myproject:basedocument
    - myproject:title (string)
    - myproject:subtitle (string)
    + myproject:text (hippostd:html)
    


    Now if we want to update our namespace with an updater module our actual code will look like this:

    import java.io.InputStreamReader;
    
    import org.hippoecm.repository.ext.UpdaterItemVisitor;
    import org.hippoecm.repository.ext.UpdaterModule;
    import org.hippoecm.repository.ext.UpdaterContext;
    
    public class MyProjectUpdater implements UpdaterModule {
    
        public void register(final UpdaterContext context) {
            context.registerName("myproject-updater-v1-to-v1_1");
            context.registerStartTag("myproject-v1");
            context.registerEndTag("myproject-v1_1");
            
            context.registerVisitor(new UpdaterItemVisitor.NamespaceVisitor(context, "myproject", "-",
            new InputStreamReader(getClass().getClassLoader().getResourceAsStream("myproject.cnd"))));
    
       }
      
    }
    

    The updater module above registers a namespace visitor on the UpdaterContext and the visitor reloads the content nodetype definition (cnd in short) from the classpath and updates the namespace to the new version. This is all you have to do to just bump a namespace from version 1.0 to 1.1.

    Now if you actually want to change something during the update we can make use of one of the other visitors like the NodeTypeVisitor. Let's say we want to change a certain property of all documents of type 'myproject:news', then this is what the updater might look like:

    import java.io.InputStreamReader;
    
    import javax.jcr.Node;
    import javax.jcr.RepositoryException;
    import javax.jcr.Value;
    
    import org.hippoecm.repository.ext.UpdaterItemVisitor;
    import org.hippoecm.repository.ext.UpdaterModule;
    import org.hippoecm.repository.ext.UpdaterContext;
    
    public class MyProjectUpdater implements UpdaterModule {
    
        public void register(final UpdaterContext context) {
            context.registerName("myproject-updater-v1-to-v1_1");
            context.registerStartTag("myproject-v1");
            context.registerEndTag("myproject-v1_1");
            
            context.registerVisitor(new UpdaterItemVisitor.NodeTypeVisitor("myproject:news") {
                @Override
                protected void leaving(Node node, int level) throws RepositoryException {
                   if (node.hasProperty("myproject:property")) {
                       node.setProperty("myproject:property", "new value");
                   }
                } 
            });
       }
      
    }
    

    The important part of the updater in this case is that we override the leaving() method, which will be called before the visitor leaves this node and moves on to the next. It will then change the value of a certain property and move on.

    If you want to see more examples of how to use certain types of visitors please let me know, but I hope that the two examples above can help you get started with writing updater modules. Now let's see how to get the repository to run your updater module.

    Adding the updater module to your deployment


    Now that we've seen how to write an updater module, the next step is to get the repository to run your updater module. The Hippo CMS 7 repository knows about the existence of these updater modules, but you will need to instruct the repository on where they can be found. Making an updater module available to repository is done in the similar fashion as adding a daemon module to the repository. The location of the updater module needs to be added to the MANIFEST.MF, which will end up in your jar. Maven 2 can help you with achieving this by means of the maven-jar-plugin. See the following plugin configuration from my pom.xml file.

    <plugin>
      <groupid>org.apache.maven.plugins</groupid>
      <artifactid>maven-jar-plugin</artifactid>
      <configuration>
        <archive>
          <manifest>
            <addDefaultImplementationEntries>true</addDefaultImplementationEntries>
          </manifest>
          <manifestEntries>
            <Hippo-Modules>com.myproject.repository.update.MyProjectUpdater</Hippo-Modules>
          </manifestEntries>
        </archive>
      </configuration>
    </plugin>
    

    Now when you add the jar with our updater module to the CMS web application archive and start the CMS, the repository will scan all manifest files for implementations of the UpdaterModule interface. The updater modules will be registered and triggered when needed.

    The updater modules are quite powerful and it's great that you can test them on your test environment, so you can make sure that when you perform an update in production it will succeed.

    References

    24 May 2010

    Giving Hippo CMS 7 some WebDAV support

    A while ago a user on the Hippo CMS 7 forum donated a patch, which provided a servlet for simple WebDAV support. He donated his code as a proof of concept and hoped that somebody with some deeper knowledge of Hippo and it's nodetypes could pick this up and continue to work on it. Since the patch was left alone for quite some time I picked it up and turned it into a Hippo Forge project and called it: 'Hippo CMS 7 WebDAV Support'.

    Now and then I find some time to work on this project and the current status now is that I have a working quickstart, which you can check out. I personally think it's far from finished, because there is still a lot of work remaining, but it is already usable. In this post I will explain how the WebDAV plugin works, the current status of the project and some future plans. Let's start with some of the basics.

    What is WebDAV?


    WebDAV is short for Web-based Distributed Authoring and Versioning. In short WebDAV is an extension on top of the default HTTP protocol and it allows computer users to edit and store files on a remote server. All the major operating systems provide support for WebDAV and will allow you to easily store and edit files on a remote server as if they were on your own computer.

    Enable WebDAV support for your Hippo CMS 7 project


    Using WebDAV in combination with Hippo CMS 7 is quite easy actually. All you need to do for now is do four things to enable the WebDAV support for your project.

    1. Add the WebDAV support maven dependency to your project

    <dependency>
      <groupId>org.onehippo.forge.addon.webdav</groupId>
      <artifactId>webdav-addon</artifactId>
      <version>${webdav.addon.version}</version>
    </dependency>
    

    2. Add the WebDAV support servlet definition to your web.xml

    <servlet>
     <servlet-name>WebDAVServlet</servlet-name>
     <servlet-class>org.onehippo.forge.addon.webdav.HippoWebdavServlet</servlet-class>
     <init-param>
      <param-name>repository-address</param-name>
      <param-value>vm://</param-value>
     </init-param>
     <init-param>
      <param-name>resource-path-prefix</param-name>
      <param-value>/webdav</param-value>
      <description>defines the prefix for spooling resources out of the repository.</description>
     </init-param>
     <init-param>
      <param-name>resource-config</param-name>
      <param-value>/WEB-INF/config.xml</param-value>
      <description>
       Defines various dav-resource configuration parameters.
      </description>
     </init-param>        
     <load-on-startup>5</load-on-startup>
    </servlet>
    

    3. Add the WebDAV servlet mapping to your web.xml
    <servlet-mapping>
     <servlet-name>WebDAVServlet</servlet-name>
     <url-pattern>/webdav/*</url-pattern>        
    </servlet-mapping>
    

    4. Add the WebDAV support configuration file to your projects WEB-INF directory.

    This configuration file can be found on the WebDAV support documentation site. It's quite easy to read and you should put it into your /webapp/WEB-INF/ folder if possible.

    In action


    The following video will show you how easy it is to upload multiple files into the CMS.

    Hippo CMS 7 WebDAV support from Jeroen Reijn on Vimeo.


    This video is also available on YouTube.

    Current status


    The current status is that the WebDAV addon has default support for the Hippo assets folder. This was actually quite easy to develop. This can also be used to copy all assets from a CMS 6 instance directly into a running CMS 7 instance. All other folders are not WebDAV enabled yet, but I have some plans for the other folders in the future.

    For the short-term roadmap: 'pretty url support' is the first thing I want to work on. I could have put it in hardcoded, but since I want to make it configurable like in the CMS this will my main focus for the next two weeks. If you have ideas or want to help out, please let me know!

    05 April 2010

    Metadata extraction with Apache Tika

    At Hippo I work with/for customers that have quite a lot of content. The projects I work on have content in the range of 5.000 to 500.000 document gathered in one content repository. This can be just textual content, but most of the time this is a variety of different content types. You might think of images, PDFs and Microsoft office document formats. By default Apache JackRabbit, the layer underneath Hippo Repository, indexes this kind of content by using extractors, so that the information can be found within the Hippo CMS 7 search or from any application connected to the Hippo Repository which is performing a search on the content repository. Being able to search on content found within a file is interesting, but there is so much more that you can do with this kind of information.

    Content metadata


    Having all this content inside the repository is nice, but a certain piece of content uploaded to the repository can contain much more information than the file itself and sometimes this metadata is ignored in content management systems. As an example you might want to be able to see the number of pages of a PDF document inside your CMS or view the EXIF information of an image stored inside your content repository. There are a number of parser libraries out there that can extract information from a specific file format, but you can get quite lost. Within the ASF there is also a very nice toolkit called Apache Tika, which provides parsers for a lot of different file formats.

    What is Apache Tika?


    Apache Tika is a subproject of the Apache Lucene project and is a toolkit for extracting content and metadata from different kind of file formats. The content extraction logic is not located inside Tika itself, but Tika defines a standard API and makes use of existing libraries like POI and PDFBox for it's content extraction. While writing this post the current release of Tika is version 0.6 and the following file formats are already supported: 
    • HyperText Markup Language
    • XML and derived formats
    • Microsoft Office document formats
    • OpenDocument Format
    • Portable Document Format
    • Electronic Publication Format
    • Rich Text Format
    • Compression and packaging formats
    • Text formats
    • Audio formats
    • Image formats
    • Video formats
    • Java class files and archives
    • The mbox format
    As you can see this is already quite a lot. The team behind Tika is working hard on improving the current parser possibilities and adding more formats for the upcoming releases. Tika is actually already being used by a number of other Apache projects like JackRabbit and Solr. Now let's see how we can use Tika ourselves.

    Getting started


    I always work with Maven as my build system, so let's start of with a piece of pom.xml. First add the Tika parser dependency to our pom.xml.

    <dependency>
      <groupId>org.apache.tika</groupId>
      <artifactId>tika-parsers</artifactId>
      <version>0.6</version>
    </dependency>
    

    By depending on tika-parsers Maven will automatically gather the required parser libraries, which are needed to parse certain file formats. Since my Java code example will be based on a unit test, we will also need to add JUnit as a dependency to our pom.xml.

    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>4.7</version>
    </dependency>
    

    In this post I want to see what kind of EXIF information can be retrieved from an image by using Tika. One of my hobbies is photography and therefor I have tons of images, which contain a lot of metadata about for instance the ISO speed or dimensions of an image. Now let's write some actual code to see how Tika works.

    The actual code


    As I mentioned before, my example code is written as a JUnit test. If you are not familiar with writing tests or JUnit itself please have a look at the JUnit website.
    To be able to run this test, I've added one of my images on the test classpath, so my test class will be able to find the image resource. The following piece of code shows my entire test class.

    public class ImageMetaDataTest {
    
        private static final String fileName = "IMG_2659.JPG";
    
        private Tika tika;
        private InputStream stream;
    
        @Before
        public void setUp() {
            tika = new Tika();
            stream = this.getClass().getResourceAsStream(fileName);
        }
    
        @Test
        public void testImageMetadataCameraModel() throws IOException, SAXException, 
                TikaException {
    
            Metadata metadata = new Metadata();
            ContentHandler handler = new DefaultHandler();
            Parser parser = new JpegParser();
            ParseContext context = new ParseContext();
    
            String mimeType = tika.detect(stream);
            metadata.set(Metadata.CONTENT_TYPE, mimeType);
    
            parser.parse(stream,handler,metadata,context);
            assertTrue("The expected Model is not correct", 
                    metadata.get("Model").equals("Canon EOS 350D DIGITAL"));
        }
    
        @After
        public void close() throws IOException {
            if(stream!=null) {
                stream.close();
            }
        }
    
    }
    
    

    As you can see the code is quite small. The most important part of the above code example is using the JpegParser to parse the .JPG file and the creation of the Metadata object with the appropriate information.
    I think this simple test case shows you how easy to use the Tika API is. Of course in the above test case I only test for the current Camera Model, but the Metadata object holds much more information then just that.

    Viewing all the fields found in the metadata of the image can be achieved quite easily by using for instance the following method.

    private void listAvailableMetaDataFields(final Metadata metadata) {
      for(int i = 0; i <metadata.names().length; i++) {
        String name = metadata.names()[i];
        System.out.println(name + " : " + metadata.get(name));
      }
    }
    

    The output of this method can be like:


    Easy Shooting Mode : Manual
    Image Type : Canon EOS 350D DIGITAL
    Model : Canon EOS 350D DIGITAL
    Metering Mode : Evaluative
    Quality : Fine
    Shutter/Auto Exposure-lock Buttons : AF/AE lock
    ISO Speed Ratings : 400


    It's as easy as that.

    Looking ahead


    I'm currently looking for possibilities of integrating Apache Tika into Hippo CMS 7, to enhance the system with much more metadata then there currently is available in the system. I think this can become quite a powerful addition in combination with the facetted navigation feature introduced in Hippo CMS 7.3. I've already started working on some code, which I hope to provide as a patch in the near future.

    04 February 2010

    Jboss ModeShape: A federating JCR repository

    Some interesting stuff is happing in the JCR community. With Apache Jackrabbit 2.0.0 out (with JCR 2.0) and an interesting project called Jboss ModeShape almost reaching it's final 1.0 release. ModeShape recently came to my attention and it seems an interesting project. In this post I will give a short introduction of ModeShape and it's features.

    What's ModeShape?

    ModeShape is a Java Content Repository implementation which will support both JSR-170 and JSR-283. It's not trying to be just another isolated content repository, but a repository with a strong focus on content federation. In other words: ModeShape's main goal is to provide a single JCR interface for accessing and searching content coming from different back-end systems. These systems can even be of different sorts. You might think of a ModeShape repository containing information from a relation database, a file system and perhaps even another Java content repository like for instance Hippo CMS 7's content repository. You can configure these sources of information with the help of ModeShapes connector framework.

    Connectors

    One of ModeShape's key concepts is the concept of connectors. A connector will allow you to connect to a certain type of back-end system and transparently expose the information inside the ModeShape repository. In the current 1.0.0 beta release there are already a couple of out of the box connectors available:


    • In-Memory Connector
    • File System Connector
    • JPA Connector
    • Federation Connector
    • Subversion Connector
    • JBoss Cache Connector
    • Infinispan Connector
    • JDBC Metadata Connector 

    That's already quite a few, but for the upcoming release they also have plans for expanding the set of connectors with for instance a JCR connector, which I find quite interesting myself, because that would allow you to expose other JCR implementations like Hippo CMS 7 (Apache JackRabbit) in combination with other systems through one JCR interface.

    There are many other content solutions out there, so if you can't find a connector that suits your need, you can of course write one yourself and perhaps donate it to the ModeShape project.

    Sequencers

    One of ModeShapes other interesting features is the concept of sequencers. With sequencers you can gather additional information from a certain item inside the repository and store that extracted information in the repository. ModeShape has quite a few sequencers out of the box:


    • Compact Node Type (CND) Sequencer
    • XML Document Sequencer
    • ZIP File Sequencer
    • Microsoft Office Document Sequencer
    • Java Source File Sequencer
    • Java Class File Sequencer
    • Image Sequencer
    • MP3 Sequencer
    • DDL File Sequencer
    • Text Sequencers

    The example below is of the ImageSequencer, which can gather information from certain types of images stored inside the repository. The ImageMetaDataSequencer is used here to extract metadata like size, dimensions and so on from the image if they have one of the specified extensions and the extracted information is stored somewhere else inside the repository.

    JcrConfiguration config = ...
    config.sequencer("Image Sequencer")
    .usingClass("org.modeshape.sequencer.image.ImageMetadataSequencer")
    .loadedFromClasspath()
    .setDescription("Sequences image files to extract the characteristics of the image")
    .sequencingFrom("//(*.(jpg|jpeg|gif|bmp|psd)[*])/jcr:content[@jcr:data]")
    .andOutputtingTo("/images/$1");
    

    Conclusion

    With other mature JCR implementations out there I think ModeShapes strongest point is it's focus on content federation. Providing a single JCR interface for content stored in different systems is a great initiative, because the JCR API is quite easy to learn and to use. I see a bright future for ModeShape, since companies are sharing more and more in-house information on the web these days. I myself will try to keep a close eye on ModeShape and see how it evolves.

    17 December 2009

    Content mangement and the semantic web

    I came across the term 'semantic web' a couple of years ago, when one of the original creators of Apache Cocoon went of to work on the SIMILE Project at MIT. I didn't pay much attention to the concept of 'semantic web' back then, because I just started learning Apache Cocoon and still had a lot to learn.
    But over the last couple of months I've been doing some research on the currently available standards for providing semantic data on the web with a strong focus on RDFa.

    Content management

    Working at Hippo, a CMS vendor based in the Netherlands & USA, makes me think in content and publishing strategies. Publishing information to the web is one of our core businesses, but I've learned over the last couple of month we can enrich our publishing platform even more by providing semantic data. I started my journey by looking around if other CMS vendors are paying attention to semantic web standards. I noticed that only a few of the enormous amount of  content management vendors actually put effort in providing semantic web functionalities for their end-users. I think that's a shame, because enrich your pages a lot.
    This post should give you an insight on how you could create a website with embedded meta data (with Hippo), but let's first start with some basics.


    What's the idea behind the semantic web?

    The current web is very well suited for being read by people like you and me. Computers however can only analyze the words on a page, but can not see the semantics of a piece of information on that specific page, that we as people do see.
    If you would allow the information on you page to be machine-readable, the computer would be able to analyze your page and extract much more information from it then just being a piece of text. That's where semantic web standards can help out.
    Standards for providing semantic data on the web are not new and some of them have already been available for quite some time. Probably the two most well known are: RDF and Microformats. However recently RDFa has been getting a lot of attention by Google, Yahoo and now also the UK government.


    What is RDFa?


    RDFa is short for “Resource Description Framework in attributes”. This sounds a bit descriptive, but it means that RDFa provides a set of XHTML attributes, which in their turn provide a way of translating visual data on a page into machine-readable hints. So let's take a look at an example of how a simple web page is currently structured.



    <html>
      <body>
        <h1>Content management and the semantic web</h1>
        <h2>Jeroen Reijn</h2>
        <p>some information</p>
      </body>
    </html> 
    

    As you can see in the above XHTML fragment, we have a page with a title, a subtitle and a small snippet of text inside the body of the page. By rendering this HTML fragment in the browser the visitor of this page will recognize this piece of text as being the title and author of the current article on the page. A machine however would need a bit more information to be sure the content can be identified as a title and author. That's where RDFa can help out. By using vocabularies, you can give meaning to specific pieces of content on a page.
    Let's see what the above XHTML fragment would look like if we would use RDFa.


    <html>
      <body xmlns:dc="http://purl.org/dc/elements/1.1/"> 
        <h1 property="dc:title">Content management and the semantic web</h1>
        <h2 property="dc:creator">Jeroen Reijn</h2>
        <p>some information</p>
      </body>
    </html>
    

    As shown in the example, the Dublin Core vocabulary is added to the page first. This is important to be able to use the properties inside the vocabulary later on. Once the vocabulary is in place, we can give meaning to fragments on the page. In the HTML fragment above the h1 is marked as the Dublin Core title attribute and the h2 as the Dublin Core creator attribute. With these properties in place a machine, like a search engine crawler, can now also store this as additional meta data of the page.
    One of the main advantages of RDFa is that your content can processed in a more efficient way, which in turn can make your page rank higher then it might have been before.
    Big search engines like Google and Yahoo already scan your website for RDFa embedded information, so why not use it?

    How to use RDFa in your (hippo) website?

    Hippo CMS is a content (centered) management system and it differs from other CMS's in such a way that the information inside the Hippo CMS content repository is not stored or identified as pages, but rather as content. In most cases even reusable content. To be more precise: information stored inside the content repository is stored as JCR nodes and/or properties.
    Since the data is just content and not bound to any front-end technology, you can either publish it as XML, (X)HTML with some help from the Hippo Site Toolkit (HST) or any other format you might like.
    Now let's take the above HTML fragment as an example and let's see what this would look like on a content level. One of the most important things to mention here is that a JCR repository has the concept of nodetype definitions in which you can configure what your data model looks like. You could compare it with for instance a XML Schema or DTD for a piece of XML, but then for the nodes and properties available in a JCR repository.


    Let's first start with our content definition or in content management terms the document type. We will need three fields:

    • Title
    • Author
    • Body (rich-text field)
    If you would create a document type with the Hippo CMS template editor, the resulting nodetype definition will end up looking like this:


    <'myproject'='http://www.myproject.org/nt/myproject/1.0'>
    <'hippostd'='http://www.onehippo.org/jcr/hippostd/nt/2.0'>
    <'hippo'='http://www.onehippo.org/jcr/hippo/nt/2.0'>

    [myproject:text] > hippostd:publishable, hippostd:publishableSummary, hippo:document
    - myproject:title (string)
    - myproject:author (string)
    + myproject:body (hippostd:html)

    As you can see all three fields are available and can be used later on by any client that can read from the Java content repository. To be able to render this type of information as XHTML, we will be using the Hippo Site Toolkit. The Hippo Site Toolkit uses the concept of mapping  JCR nodes to simple Java beans, to be able to have an easier development cycle without having to learn the entire JCR API.

    A Java bean representation of the JCR 'myproject:text' nodetype will look like this:

    import org.hippoecm.hst.content.beans.Node; 
    
    import org.hippoecm.hst.content.beans.standard.HippoDocument;
    import org.hippoecm.hst.content.beans.standard.HippoHtml;
    
    
    @Node(jcrType="myproject:text")
    public class TextBean extends HippoDocument{
    
        public String getTitle() {
            return getProperty("myproject:title");
        }
        
        public String getAuthor() {
            return getProperty("myproject:author");
        }
    
        public HippoHtml getBody(){
            return getHippoHtml("myproject:body");
        }
    
    }

    As you can see the Java bean is quite straight forward and easy to read.
    Now if we want to render the information on a webpage, we can use for instance JSP's with expression language to get the information from the Java bean. The JSP needed for outputting the RDFa enabled webpage can be as simple as this:

    <%@ page language="java" %>
    <%@ taglib uri="http://www.hippoecm.org/jsp/hst/core" prefix='hst'%>
    <html>
      <body xmlns:dc="http://purl.org/dc/elements/1.1/"> 
        <h1 property="dc:title">${document.title}</h1>
        <h2 property="dc:creator">${document.author}</h2>
        <hst:html hippohtml="${document.body}"/>
      </body>
    </html>
    
    As you can see it's that easy to use RDFa inside your website if you have a template independent CMS like Hippo.

    It gets even better

    Using RDFa for simple text can already be a great improvement for you website, but support for other RDFa vocabularies is added on a regular basis. Google recently announced support for RDFa enabled pages with videos (or media) on them. You can provide extra information for your media files to the Google crawler, like the url to the thumbnail that belongs to your video, which can be presented when your video is found as one of the results in a search performed at Google. The possibilities are enormous, so I can see a lot of good things coming from using RDFa in the near future.

    I think the role that content management systems can have for RDFa should not be underestimated, since most website these days are backed by some sort of content management system.

    For more information on RDFa see:

    15 September 2009

    Apache Cocoon and Javascript minification

    A couple of days ago somebody on the Apache Cocoon user list send a message to the mailing-list about on the fly minification of for instance Javascript files. This topic has been quite popular over the past years, since web application have become richer and Javascript files have become larger.

    The ideal situation would be to compres your static files (CSS or Javascript) at build time, so this will not cost you any processing power, when your application is already running. I myself quite often use the Maven 2 YUI compressor plugin while building my projects, but in case you can't use this plugin you could think about a different solution. Since I've been using Cocoon for over more then 5 years, I thought I gave it another try and write a nice Cocoon reader that does this minification for you.

    There are multiple minification and obfuscation frameworks out there. One has a greater compression ratio then the other, but for me the most well know ones are probably:
    1. Dojo Shrinksafe - Rhino based compressor from the Dojo Toolkit
    2. YUI Compressor - Rhino based compressor by Yahoo
    3. JSMin - a whitespace compressor by Douglas Crockford
    Since Apache Cocoon comes with a version of Rhino and both #1 and #2 have their own version of Rhino included, this could end up having nasty conflicts because of two different versions of the library on the same classpath. Therefore I chose to write a reader based on JSMin, which does a lot of whitespace compression for you.

    The implementation of this reader was quite simple and if you're interested, you can get the source here. Do keep in mind that you will have to have the JSMin.java file also on the classpath, otherwise it wil not work.

    07 August 2009

    Japanese and Java resource bundles

    At Hippo we have a project, which is build with Java Server Faces, for which I occasionally do some maintenance. A while ago I had an issue in our JIRA bug tracker that reported an error for the Japanese version of the website. The error came from a component that reads information from a resource bundle properties file, which is stored on the local filesystem. In this case from the Japanese version of the resource bundle (ApplicationResource_jp.properties), which is used by the web application to display some Japanese labels.

    The error wasn't very clear since it only gave the following exception:

    java.util.MissingResourceException:
    Can't find resource for bundle java.util.PropertyResourceBundle, key 'somekey'


    Looking in my project, I could clearly see that the resource bundle was there and after a quick peek at the resource bundle file itself, I could see that the requested key was also present.

    After trying some different options I came to the conclusion that my web application was unable to read the actual .properties file from the classpath. By searching some more, I found out that the Java compiler and other Java tools can only process files which contain Latin-1 and/or Unicode-encoded (\udddd notation) characters. Since I was seeing Japanese characters when opening the properties file, it was clearly the case that this file did not meet those requirements.

    Solving this issue was quite simple in the end, since the Sun JDK comes with a utility to help you out with files that contain characters, which are not Latin1. The utility is called: 'native2ascii' and can be run from the command-line quite easily by typing:

    $ native2ascii [inputfile] [outputfile]

    Once I did that the application was working like a charm again!

    10 June 2009

    JCR: Sorting on child node properties

    A JCR repository, like Apache Jackrabbit (basis for Hippo CMS 7's content repository), mainly consists of nodes and properties.
    As described in the JCR specification, a Java Content Repository should support 2 different query syntaxes: XPath and SQL. Once you get the hang of the syntax, performing a search on a JCR repository is quite easy, but today I came into a situation where I was not able perform the query I wanted. In this post I'll try to describe what my problem was and how the same result can still be achieved.

    The content model


    Let's first start with my content model. The actual node definition for my project looks something like the below:


    [myproject:metadata]
    - myproject:creator (string)
    - myproject:language (string)
    - myproject:publicationDate (date)
    - myproject:availableUntil (date)
    - myproject:lastModified (date)
    - myproject:keywords (string)
    - myproject:contributor (string)

    [myproject:news] > hippostd:publishable, hippostd:publishableSummary, hippo:document
    - myproject:title (string)
    + myproject:introduction (hippostd:html)
    + myproject:body (hippostd:html)
    + myproject:metadata (myproject:metadata)


    I came into a situation where I wanted to search for nodes of type 'myproject:news', but sorted on the 'myproject:publicationDate' property of the 'myproject:metadata' subnode. Writing an XPath for such a query is quite easy if you're familiar with the XPath syntax.

    Let's start out with a very simple search and just search for nodes of the type 'myproject:news' , which in XPath looks like:


    //element( *, myproject:news)


    Now if we would want to order these node types based on for instance the myproject:title property the same XPath query looks like:


    //element( *, myproject:news) order by @myproject:title descending


    Now if we would want to sort on the 'myproject:publicationDate' property of the myproject:metadata subnode, I would expect the same XPath to be:


    //element( *, myproject:news) order by myproject:metadata/@myproject:publicationDate descending


    Unfortunately this query did not seem to actually sort the result on the publicatenDate property as I would have expected. I was searching for typos first, but it appeared that the syntax of my query was ok, but it appeared that support for child axis in order by clauses was not yet supported by Jackrabbit itself.

    Then I found this JIRA issue[1] in the Jackrabbit bugtracker describing this problem and there appears to be a patch available. I'm still wondering how much of a performance impact this might have for large repositories, where you might want to sort on a property of a child node 'n'-levels deep underneath the actual node.

    If you want to sort on properties of a specific nodetype, you will have to add the sortable properties to the actual nodetype, which you are searching for and can't put them on a subnode.
    It seems that the patch, which should fix this problem, has already been comitted to the Jackrabbit trunk and should be available from Jackrabbit 1.6.0 as marked in the JackRabbit JIRA.

    30 March 2009

    Apache Camel: open source integration framework

    I'm currently working on a project where we are looking at creating an integration layer for external applications to connect to our back-end applications. In our case, one of the back-end applications is Hippo CMS 7's repository.

    I've been reading up on ESB's like Apache ServiceMix and Synapse, but even though both projects look very interesting, they actually are a bit too much for what I want to do. There was one project though that seems to be exactly what I want: Apache Camel.

    About Apache Camel

    Apache Camel is an open source Java framework that focuses on making integration easier. One of the great things is that Camel comes with a lot of default components and connectors.
    Even though I was quite new to the integration concept, I was able to get my first Camel project up and running within 30 minutes or so, which I think is quite fast. You only need is a bit of Java/Spring knowledge to get going.

    The basic concepts

    While using an integration framework like Camel, you will have to keep four key terms in mind:

    • Endpoint: where the message comes in or leaves the integration layer
    • Route: how a message goes from endpoint A to endpoint B
    • Filter: the chained components that are involved in the process of handling a message that comes from endpoint A and goes to endpoint B. It could be that the content of the message needs to be transformed from SOAP to for instance ATOM.
    • Pipe: the way the message travels from endpoint A through filters to endpoint B

    One of the things I'm looking at Camel for is using it to convert RSS feed entries into JCR nodes. If I would create an endpoint diagram, which would describe my route, it would look something like the image below.


    With Camel, the endpoints and routes can be configured in a few lines of Java code or with Spring XML configuration. I started out with the Spring XML configuration and it was actually quite easy to get going. Here is an example where I poll my own RSS feed and store the items into a mock 'feeds' object.
    <?xml version="1.0" encoding="UTF-8"?>
    <beans xmlns="http://www.springframework.org/schema/beans"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:context="http://www.springframework.org/schema/context"
    xsi:schemaLocation="
    http://www.springframework.org/schema/beans
    http://www.springframework.org/schema/beans/spring-beans-2.5.xsd
    http://www.springframework.org/schema/context
    http://www.springframework.org/schema/context/spring-context-2.5.xsd
    http://camel.apache.org/schema/spring
    http://camel.apache.org/schema/spring/camel-spring.xsd">
    
      <camelContext xmlns="http://camel.apache.org/schema/spring">
        <route>
          <from uri="rss://http://blog.jeroenreijn.com/feeds/posts/default?alt=rss" />
          <to uri="mock:feeds"/>
        </route>
      </camelContext>
    
    </beans>
    

    As you can see that's just a couple of lines of code. It's really that simple to do things in Camel. Of course this configuration does not end up in a JCR repository, but as an example I think it's quite easy to grasp. For those of you, that want to play around with Camel as well, I'll try to explain all the step I took to get a working web application example from here on. As I'm using Maven2 for building my projects, you should be able to reproduce my setup quite easily.

    Setting up your maven project

    First off we'll start with adding the camel dependencies to our maven project descriptor( pom.xml).
    <dependencies>
      <dependency>
        <groupId>org.apache.camel</groupId>
        <artifactId>camel-core</artifactId>
        <version>${camel-version}</version>
      </dependency>
      <dependency>
        <groupId>org.apache.camel</groupId>
        <artifactId>camel-spring</artifactId>
        <version>${camel-version}</version>
      </dependency>
      <dependency>
        <groupId>org.springframework</groupId>
        <artifactId>spring-core</artifactId>
        <version>${spring-version}</version>
      </dependency>
      <dependency>
        <groupId>org.springframework</groupId>
        <artifactId>spring-web</artifactId>
        <version>${spring-version}</version>
      </dependency>
      <dependency>
        <groupId>org.apache.camel</groupId>
        <artifactId>camel-rss</artifactId>
        <version>${camel-version}</version>
      </dependency>
    </dependencies>
    
    As you can see I explicitly added the camel-rss component, so that my camel application knows how to handle rss feeds. Camel does not have it's own RSS parser, but is using Rome in the background for handling the RSS feeds. The Camel project is setup in such a way that you can include any component you want, by adding the needed component dependency to your pom.xml. If you're thinking about using Camel, make sure you checkout the components page, which shows you all of the currently available components.

    Camel uses Spring, so we need to add the Spring ContextLoaderListener to the local web.xml in src/main/webapp/WEB-INF/.
    <?xml version="1.0" encoding="UTF-8"?>
    <web-app xmlns="http://java.sun.com/xml/ns/j2ee"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://java.sun.com/xml/ns/j2ee
    http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd"
    version="2.4">
    
      <listener>
        <listener-class>org.springframework.web.context.ContextLoaderListener</listener-class>
      </listener>
    </web-app>
    
    The last step in our process is defining our endpoints. In my case I chose to use the Spring XML configuration for defining my endpoints.

    Add a file called applicationContext.xml to your src/main/webapp/WEB-INF/ folder.
    Once the file is created you should be able to define your routes like this:

    <?xml version="1.0" encoding="UTF-8"?>
    <beans xmlns="http://www.springframework.org/schema/beans"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:context="http://www.springframework.org/schema/context"
    xsi:schemaLocation="
    http://www.springframework.org/schema/beans
    http://www.springframework.org/schema/beans/spring-beans-2.5.xsd
    http://www.springframework.org/schema/context
    http://www.springframework.org/schema/context/spring-context-2.5.xsd
    http://camel.apache.org/schema/spring
    http://camel.apache.org/schema/spring/camel-spring.xsd">
    
      <camelContext xmlns="http://camel.apache.org/schema/spring">
       <route>
         <from uri="rss://http://blog.jeroenreijn.com/feeds/posts/default?alt=rss" />
         <to uri="mock:feeds"/>
      </route>
    </camelContext>
    
    </beans>
    
    In this example I'm using my own RSS feed, but you can of course use any feed url you like.
    For testing purposes you can add a log4j.properties file in src/main/resources/, so you can see the output of the Camel RSS component in your console. Here is the configuration I used writing this blogpost.


    # The logging properties used for eclipse testing, We want to see debug output on the console.
    log4j.rootLogger=INFO, out

    log4j.logger.org.apache.camel=DEBUG

    # uncomment the following line to turn on ActiveMQ debugging
    # log4j.logger.org.springframework=INFO

    # CONSOLE appender not used by default
    log4j.appender.out=org.apache.log4j.ConsoleAppender
    log4j.appender.out.layout=org.apache.log4j.PatternLayout
    log4j.appender.out.layout.ConversionPattern=[%30.30t] %-30.30c{1} %-5p %m%n




    Well that's it. Now the only thing you will need to do is fire up an application container, like Jetty and see what's going on in the console.

    $ mvn jetty:run

    If Jetty is running and everything is setup correctly you should be able to see some debug information come by that looks like:


    SyndFeedImpl.author=noreply@blogger.com (Jeroen Reijn)
    SyndFeedImpl.authors=[]
    SyndFeedImpl.title=Jeroen Reijn
    SyndFeedImpl.description=
    SyndFeedImpl.feedType=rss_2.0
    SyndFeedImpl.encoding=null
    SyndFeedImpl.entries[0].contributors=[]


    As you will see the RSS feed is parsed and converted into a SyndFeed object.
    From there on you can make use of this object and perform any operation on it.

    I must admit that while playing around with Camel and RSS feeds,
    I noticed that the RSS (and Atom) component did not handle extra request parameters correctly, so I added a patch in the Camel JIRA, hoping it wil be included in the next release of Camel.
    If you have issues with the RSS component and request parameters, you might want to try to build the Camel SVN trunk and apply my patch (CAMEL-1496).
    This is only necessary if you want to parse a feed that has for instance a unique id as request parameter added to the feed URL.

    We'll that's it! This post will get a follow-up, where I will show you have to use Camel to actually store the RSS feed entries into a JCR repository.

    Here are a couple of good articles too read before starting with Camel:

    This blogpost was inspired by an article over at Gridshore, where Jettro wrote a post on using Spring Integrations as integration framework. Since I'm pretty much Apache minded, I have been looking around for other open source integration frameworks within the ASF, which brought me to Apache Camel.