12 December 2011

Make your date range queries in Jackrabbit go faster!

As you might know Hippo CMS uses Apache Jackrabbit as the core for it's content repository. One of Jackrabbits features is 'search' and the execution of most queries are delegated to Apache Lucene. If you want to keep the queries as fast as possible you sometimes have to analyze how your repository is behaving with the content in your project.

The problem

One of our customers noticed that they had some unexpected memory peaks and slow response times while their system was running during the day. They were using Jackrabbit 2.2.5 at the time. To get some more insight on what was going on we first started out by looking at the server logs. They were using the Jackrabbit search functionality quite heavily, so we replayed the server request logs on the acceptance environment. During the replay we set the log level of the Query to debug, so we were able to see how long each query took. We soon discovered that the longest queries were 'range queries' and also noticed that during the execution of such a range query the memory usage reached some peaks.

User view of a date range
In this specific case the problamatic queries were date range queries. A date range query is a query that for instance searches for a document between day x and day y.

As an example: Give me all documents that were created between 2007-01-01 and 2011-01-01. This is a very typical search filter, which you will see in all kind of applications.

In plain Jackrabbit a date range query (xpath notation) will look something like this:
//element(*,custom:document)[@custom:date >=xs:dateTime('2007-01-01T00:00:00.000Z') and @custom:date <=xs:dateTime('2010-01-01T00:00:00.000Z')] order by @custom:date descending

Now if you would be using the Hippo HST you would have probably used the Filter.addGreaterOrEqualThan and passed along a Date object as an argument, which automatically is being converted into the above syntax.

The analysis

To get some insight in what was causing this behaviour I created a unit test that performed a variety of range queries on a set of 100.000 simple documents/nodes. For this test I created a very simple nodetype definition that would hold a date/date-time property in different kind of formats. The used nodetype looks like this:

[custom:document]
- custom:date (date) 
- custom:dateasstringwithhoursandminutesandseconds (string)
- custom:dateasstringwithhoursandminutes (string)
- custom:dateasstring (string)

So looking at the above nodetype definition we have a node that contains 4 properties, where we have a normal JCR date (and time) property and next to that there are 3 properties that have a more fine-grained format of the date-time.

Trying to create a real world scenario the unit test generates documents starting from 01-01-2001. With every new document the test adds 1 hour and 3 seconds to the date field. After creating 100.000
nodes it ends up somewhere around 2012-07-03 03:19:57. The test will then sleeps for about 60 seconds (gives lucene time to finish up it's indexing) before it starts doing the range queries.

In finding a solution I created 4 different versions of the range query where I start narrowing down on the date format to get closer to the actual date (without the time). In the test case 4 different kind of queries are peformed and repeated 5 times before moving on to the next type of query. The range queries performed are:
  • Normal range query (with JCR date-time format like mentioned above)
  • Range query with date as string with format yyyyMMddHHmmss
  • Range query with date as string with format yyyyMMddHHmm
  • Range query with date as string with format yyyyMMdd

Queries 1 to 3 took an average of 3500 ms (3.5 seconds) with a large memory footprint usage of about 380MB per query. That's huge and slow for just a simple query! You can imagine this might end up leading to OutOfMemory errors.
Memory usage overtime while performing the queries (graph comes from VisualVM)

However the fourth query is actually quite fast and takes less memory! It's a really significant difference. The fourth query takes about 180 ms on average and uses about 40-50MB. It's still a lot of memory (in my opinion), but since they are a lot faster the total amount of used memory might not be that large, because the amount of memory is freed much earlier in the process.

Looking at the graph on the right you will see there is no difference between query 1 to 3, but option 4 (which is in fact actually what a date range should do) showed a really large improvement on overall performance and memory usage. So in the end it turns out that the 'time' in the default JCR date format was actually giving us the issues. Because the time was added to the date value, the number of unique values for the date property in the lucene index had become larger then needed causing the slowdown.

A solution

Now as a solution we solved this by adding a derived data function that extracts the simplified date property. For range queries we now do the range on the 'yyyyMMdd' formatted date and order the results by the original date property, so that the time is taken into account and the sort order is correct. Using the simple date format will also help when trying to find documents/nodes that belong to a certain date. This has just turned into a simple 'equals' instead of a range from 0:00 till 23:59:59.

If you are currently using Apache Jackrabbit and are using these kind of queries you might want to rethink you current content model. A small change might give you a huge performance boost!


04 August 2011

Replacing the MacBookPro mid 2009 HDD part 2

As you might have read in my previous upgrade post I initially wanted to upgrade my MacBookPro5,4 mid 2009 model with an SSD. Since my initial search for a compatible SSD I've read tons of articles on the web describing this same issue, but without a solid answer. It's been quite a challenge to figure out what to do and I almost gave up on the idea of being able to use an SSD until I found the this excellent blogpost by David Leach.

I've read similar stories to that of David, where 5 attempts with different SSD's kept on failing and all of them gave them different kind of results from not recognizing the SSD to getting those colorful beachballs.

Davids post and the people in his comments gave me the confidence to go for a Kingston V+100 128 GB SSD as it seemed to be the one and only drive that worked for him and other readers of his blog.

So last week I ordered one and it arrived the next day! Since I had a day off I had plenty of time to install it. Only when you need to go from 250GB to 128GB you will notice how much 'noise' is on your laptop's hard drive. The 128GB version was already quite expensive and I thought that the 256 GB was not really worth it. A 256GB version is around € 420,  which is in my opinion a bit too much money for an upgrade. It's a fact that traditional HDD's are a lot cheaper than SSDs at the moment, but the performance upgrade is really worth it.

Just after I cloned the original HDD to the SSD with SuperDuper! I tried a cold start of the MBP with the original HDD installed. The bootup time was clocked at 1 minute and 19 seconds. That's quite long for a fresh start. Of course I don't restart all the time, but you get the initial idea. So after I replaced the HDD with the SSD I set the timer again and it really blew me away! The cold start with the SSD took only 21 seconds! That's really amazing.

Now for some actual real world test I fired up IntelliJ with four projects open. IntelliJ was open within 30 seconds (with indexing) for four projects. AWESOME! How's that for a productivity boost.

Some benchmarks


Now in my previous post I posted some benchmarks of the original stock Apple HDD vs the WD Scorpio Blue where there was only a very small improvement in overal read and write speeds.

For those of you who did not read my other post here is the performance of the Scorpio Blue 5400 rpm drive.

That looks nice right? But here is the Kingston V+100 SSD benchmark.


As you can see the read and write speeds have greatly improved. The average read speed is now almost four times the previous speed, which you really notice when booting up or starting an application. I've been using the SSD now for over more than a week during my daily work. I can tell you: I will never go back! The laptop is really fast and responsive now as I would expect a laptop should be.

I hope others struggling with their MacBookPro mid 2009 model will be able to find this post and get the most out of their MacBookPro. It's really worth the upgrade!

31 July 2011

Getting started with MongoDB and Spring Data

Last month I finally found some time to play around with a NoSQL database. Getting hands on experience with a NoSQL database has been on my list for quite some time, but due to busy times at work I was unable to find the energy to get things going.

A little background information


Most of you have probably have heard the term NoSQL before. The term is used in situations where you do not have a traditional relation database for storing information. There are many different sorts of NoSQL databases. To make a small summary these are probably the most well-known:


The above types cover most of the differences, but for each type there are a lot of different implementations. For a better overview you might want to take a look at the NOSQL database website.

For my own experiment I chose to use MongoDB, since I had read a lot about it and it seemed quite easy to get started with.

MongoDB is as they describe it on their website:
A scalable, high-performance, open source, document-oriented database.
The document-oriented aspect was one of the reasons why I chose MongoDB to start with. It allows you to store rich content with data structures inside your datastore.

Getting started with MongoDB


To begin with, I looked at the Quick start page for Mac OS X and I recommend you to do that too (unless you use a different OS). It will get you going and within a couple of minutes you'll have MongoDB up and running on your local machine.

MongoDB stores it's data by default in a certain location. Of course you can configure that, so I started MongoDB with the --dbpath parameter. This parameter will allow you to specificy your own storage location. It will look something like this:

$ ./mongodb-xxxxxxx/bin/mongod --dbpath=/Users/jreijn/Development/temp/mongodb/

If you do that you eventually will get a message saying:


Mon Jul 18 22:19:58 [initandlisten] waiting for connections on port 27017
Mon Jul 18 22:19:58 [websvr] web admin interface listening on port 28017


At this point MongoDB is running and we can proceed to the next step: using Spring Data to interact with MongoDB.

Getting started with Spring Data

The primary goals of the Spring Data project is to make it easier for developers to work with (No)SQL databases. The Spring Data project already has support for a number of the above mentioned NoSQL type of databases.
Since we're now using MongoDB, there is a specific sub project that handles MongoDB interaction. To be able to use this in our project we first need to add a Maven dependency to our pom.xml.

<dependency>
  <groupId>org.springframework.data</groupId>
  <artifactId>spring-data-mongodb</artifactId>
  <version>${spring.data.mongo.version}</version>
</dependency>

Looks easy right? Just one single Maven dependency. Of course in the end the spring-data-mongodb artifact depends on other artifacts which it will bring into your project. Now onto some Java code!

For my first experiment I used a simple Person domain object that I'm going to query and persist inside the database. The Person class is quite simple and looks as follows.

package com.jeroenreijn.mongodb.example.domain;

import org.springframework.data.annotation.Id;
import org.springframework.data.document.mongodb.mapping.Document;

/**
 * A simple POJO representing a Person
 *
 */
@Document
public class Person {

    @Id
    private String personId;

    private String name;
    private String homeTown;
    private int age;

    public Person(String name, int age) {
        this.name = name;
        this.age = age;
    }

    public String getPersonId() {
        return personId;
    }

    public void setPersonId(final String personId) {
        this.personId = personId;
    }

    public String getName() {
        return name;
    }
    public void setName(final String name) {
        this.name = name;
    }

    public int getAge() {
        return age;
    }

    public void setAge(final int age) {
        this.age = age;
    }

    public String getHomeTown() {
        return homeTown;
    }

    public void setHomeTown(final String homeTown) {
        this.homeTown = homeTown;
    }

    @Override
    public String toString() {
        return "Person [id=" + personId + ", name=" + name + ", age=" + age + ", home town=" + homeTown + "]";
    }

}

Now if you look at the class more closely you will see some Spring Data specific annotations like @Id and @Document . The @Document annotation identifies a domain object that is going to be persisted to MongoDB. Now that we have a persistable domain object we can move on to the real interaction.

For easy connectivity with MongoDB we can make use of Spring Data's MongoTemplate class. Here is a simple PersonRepository object that handles all 'Person' related interaction with MongoDB by means of the MongoTemplate.

package com.jeroenreijn.mongodb.example;

import java.util.Iterator;
import java.util.List;

import com.jeroenreijn.mongodb.example.domain.Person;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.data.document.mongodb.MongoTemplate;
import org.springframework.stereotype.Repository;

/**
 * Repository for {@link Person}s
 *
 */
@Repository
public class PersonRepository {

    static final Logger logger = LoggerFactory.getLogger(PersonRepository.class);

    @Autowired
    MongoTemplate mongoTemplate;

    public void logAllPersons() {
        List<Person> results = mongoTemplate.findAll(Person.class);
        logger.info("Total amount of persons: {}", results.size());
        logger.info("Results: {}", results);
    }

    public void insertPersonWithNameJohnAndRandomAge() {
        //get random age between 1 and 100
        double age = Math.ceil(Math.random() * 100);

        Person p = new Person("John", (int) age);

        mongoTemplate.insert(p);
    }

    /**
     * Create a {@link Person} collection if the collection does not already exists
     */
    public void createPersonCollection() {
        if (!mongoTemplate.collectionExists(Person.class)) {
            mongoTemplate.createCollection(Person.class);
        }
    }

    /**
     * Drops the {@link Person} collection if the collection does already exists
     */
    public void dropPersonCollection() {
        if (mongoTemplate.collectionExists(Person.class)) {
            mongoTemplate.dropCollection(Person.class);
        }
    }
}


If you look at the above code you will see the MongoTemplate in action. There is quite a long list of method calls which you can use for inserting, querying and so on. The MongoTemplate in this case is @Autowired from the Spring configuration, so let's have a look at the configuration.

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:context="http://www.springframework.org/schema/context"
       xsi:schemaLocation="http://www.springframework.org/schema/beans
        http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
        http://www.springframework.org/schema/context
        http://www.springframework.org/schema/context/spring-context-3.0.xsd">

  <!-- Activate annotation configured components -->
  <context:annotation-config/>

  <!-- Scan components for annotations within the configured package -->
  <context:component-scan base-package="com.jeroenreijn.mongodb.example">
    <context:exclude-filter type="annotation" expression="org.springframework.context.annotation.Configuration"/>
  </context:component-scan>

  <!-- Define the MongoTemplate which handles connectivity with MongoDB -->
  <bean id="mongoTemplate" class="org.springframework.data.document.mongodb.MongoTemplate">
    <constructor-arg name="mongo" ref="mongo"/>
    <constructor-arg name="databaseName" value="demo"/>
  </bean>

  <!-- Factory bean that creates the Mongo instance -->
  <bean id="mongo" class="org.springframework.data.document.mongodb.MongoFactoryBean">
    <property name="host" value="localhost"/>
  </bean>

  <!-- Use this post processor to translate any MongoExceptions thrown in @Repository annotated classes -->
  <bean class="org.springframework.dao.annotation.PersistenceExceptionTranslationPostProcessor"/>

</beans>

The MongoTemplate is configured with a reference to a MongoDBFactoryBean (which handles the actual database connectivity) and is setup with a database name used for this example.

Now that we have all components in place, let's get something in and out of MongoDB.

package com.jeroenreijn.mongodb.example;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.context.ConfigurableApplicationContext;
import org.springframework.context.support.ClassPathXmlApplicationContext;

/**
 * Small MongoDB application that uses spring data to interact with MongoDB.
 * 
 */
public class MongoDBApp {

  static final Logger logger = LoggerFactory.getLogger(MongoDBApp.class);

  public static void main( String[] args ) {
    logger.info("Bootstrapping MongoDemo application");

    ConfigurableApplicationContext context = new ClassPathXmlApplicationContext("META-INF/spring/applicationContext.xml");

    PersonRepository personRepository = context.getBean(PersonRepository.class);

    // cleanup person collection before insertion
    personRepository.dropPersonCollection();

    //create person collection
    personRepository.createPersonCollection();

    for(int i=0; i<20; i++) {
      personRepository.insertPersonWithNameJohnAndRandomAge();
    }

    personRepository.logAllPersons();
    logger.info("Finished MongoDemo application");
  }
}

All this application does for now is setup a connection with MongoDB, insert 20 persons (documents),  fetch them all and write the information to the log. As a first experiment this was quite fun to do.

Conclusion

As you can see with Spring Data it's quite easy to get some basic functionality within only a couple of minutes. All the sources mentioned above and a working project can be found on my github profile. It was a fun first experiment and I already started working on a bit more advanced project, which combines Spring Data, MongoDB, HTML5 and CSS3. It will be on github shortly together with another blog post here so be sure to come back.

26 June 2011

MacBook Pro: Replacing the internal HDD

A while ago I thought it would be nice to give my MacBook Pro (MBP) a performance upgrade by putting in a Solid State Drive (SSD). I had pretty high expectations, since I was under the impression that Apple delivers computers with a really high quality factor. Unfortunately it turned out that the mid 2009 model range (which I have) appeared to be a range that has a lot of problems with third party drives.  To be more precise I'm currently running a MacBookPro5,4 Intel Core 2 Duo 2.53 GHz.

Some background


Initially I had bought a 120 GB OCZ Vertex 2 SSD for about € 170,- because any SSD drive should give my MBP an enormous boost. After figuring out how to start fresh on a new disk, I took the bottom of my MBP and replaced the stock HDD with this SSD. Eager to start using the SSD, I powered the laptop and was confronted with a gray screen and a non booting laptop. Hmmm.. now what..

Booting while using the 'Option' key, which should allow me to choose from which medium to start, did not help either. I had inserted the original DVD that came with the MBP, so I could start from there, but even though I was able to select the DVD drive, my SSD was not recognized at all.
I put the stock drive back in and tried attaching the SSD through an external USB case, but that did not help at all, so I got pretty frustrated.
Attaching the USB case to an old desktop machine running the latest version of Ubuntu made the drive actually show up, so it wasn't the drive that was broken. 
It had to be something with my MBP or the hardware inside the MBP, but how hard could it be to create compatible hardware the year 2009-2011.. It wasn't like that my MPB was 10 years old or something.

Finding the exact cause of the problem did not even prove to be that simple. It took my quite a while before I figured out that apparently the mid 2009 range was famous for having hardware issues with third party drives. Even the Apple discussion forums were not very helpful if you did not know what to look for. In the end I found out that the cause of the problem was the negotiation between the Nvidia controller and the SSD drive. The Nvidia chipset inside the mid 2009 range model was the cause of the problem. Now that I knew what to look for I found hundreds op people trying to upgrade their internal drive with another without success. All hope went away, because after that I read about a lot of problems with a variety of drives, so I brought back the drive and postponed my wish for an SSD.

Two months later


This weekend I noticed that the internal HDD of my PS3 was getting full. I downloaded two free games from the 'Welcome back' program, but was unable to install them on my PS3 due to insufficient storage capacity.
The PS3 version I bought a couple of years ago only had a 40GB HDD, but fortunately Sony explicitly mentions that you can upgrade the PS3 HDD without any problems if you take a 5400 rpm version with a maximum height of 9.5 mm. So this saturday I went to the local computer store and came home with a WD Scorpio Blue 320 GB with an 8 MB cache, which had cost me only € 42,-  . I had read some good reviews on this particular drive and wanted to give it a go. The WD Scorpio Blue is a low power consuming 2.5 inch laptop drive, but appears to be one of the fastest 5400 rpm drives available.

With this new 2.5 inch drive in my hand I got tempted to see if my MPB would be able to recognize this drive, where it failed to detect the SSD two months ago. Just for the fun of it I placed it in the external USB case and Voilà! it appeared inside Apples disk utility.
This triggered me to try it inside the MBP as well, just to see if it would recognize the drive. To be able to create a bootable drive I used the build in 'Restore' functionality of Disk Utility and cloned my old drive to this new drive.

Some figures


Before switching to the new drive I ran an Xbench test for the stock drive to see what the 'old' performance was like. So here are the result of the stock FUJITSU drive.


As you can see this is quite nice for a 5400 rpm drive, so after I replaced the stock drive with the WD Scorpio Blue I did another disk test and this is what the WD Scorpio Blue did.




Even though it's not as big as an improvement as an 7200 rpm or SSD drive I'm quite happy with the results of the new drive! Both read and write performance have increased, so for now I think I'll just stick with the 320 GB WD drive in my MBP and will put the 'old' stock drive into my PS3 where 250GB should be plenty of disk space. I'll test drive it for a week, just to be sure that I don't run into any issues during the week.

27 March 2011

Simple XML processing with Apache Cocoon 3

It's been a while since I've last used Apache Cocoon. I can still remember the day that I was using Cocoon for doing all my web development projects. My first introduction with Cocoon was when I started at Hippo about 8 years ago. In comparison to other frameworks, I sometimes miss the simplicity of the Cocoon pipeline concept when I have to work with XML. Especially processing larger XML files is pain in most IDE's.

The Cocoon team has been working on Cocoon 3 for a while now, so I wanted to give it a try. Having worked with both Cocoon 2.1 and 2.2, version 3 is created with two new goals in mind:
  • slim down the framework
  • make it easier to use/combine with other frameworks

From what I've seen so far, this results in that you can now do a lot more with just plain Java. Where you would need to use a sitemap XML file before, you can now do a lot with just a few lines of Java. Even though Cocoon 3 is still in alpha stage, it already looks quite promising.

Getting started

To be able to process XML with Cocoon 3, all we need are the following two maven dependencies.

<dependencies>
  <dependency>
    <groupId>org.apache.cocoon.pipeline</groupId>
    <artifactId>cocoon-pipeline</artifactId>
    <version>3.0.0-alpha-2</version>
  </dependency>

  <dependency>
    <groupId>org.apache.cocoon.sax</groupId>
    <artifactId>cocoon-sax</artifactId>
    <version>3.0.0-alpha-2</version>
  </dependency>
</dependencies>

These dependencies drag in just two more dependencies (commons-logging and cocoon-xml, so the end result will be quite small, which is really nice compared to for instance Cocoon 2.1, which came with quite some baggage.

The code

Now let's have a look at some Java code. To play around with Cocoon 3, I'm going to use the RSS feed of this blog. Let's see how Cocoon's new Java based coding works. To be able to process the XML result of the RSS feed, I've created the RSSFeedInfoGenerator. The RSSFeedInfoGenerator is a simple class that will parse a provided RSS feed url.

/**
 * RSS Feed info generator
 */
public class RSSFeedInfoGenerator {

  private static final String DEFAULT_RSS_URL = "http://blog.jeroenreijn.com/feeds/posts/default?alt=rss";

  public static void main(String[] args) {
    RSSParser parser = new RSSParser();
    if(args!=null && args.length > 0) {
      parser.setFeedURL(args[0]);
    } else {
      parser.setFeedURL(DEFAULT_RSS_URL);
    }
    parser.parse();
  }
}

So now there is a start from which we can actually build the RSS parser and use Cocoon for processing the XML. Now let's take a look at the actual RSSParser.

/**
 * Rss parser that uses Cocoon 3 pipelines for generating
 * and transforming the RSS feed to a simple XML response.
 */
public class RSSParser {

  private static final Log LOG = LogFactory.getLog(RSSParser.class);
  private String feedURL;

  /**
   * Parse the provided feed URL and generate the Feed INFO.
   */
  public void parse() {
    try {
       Pipeline<SAXPipelineComponent> pipeline = new NonCachingPipeline<SAXPipelineComponent>();
       XSLTTransformer xsltTransformer = new XSLTTransformer(this.getClass().getResource("simplify-rss.xsl"));

       pipeline.addComponent(new XMLGenerator(new URL(getFeedURL())));
       pipeline.addComponent(new CleaningTransformer());
       pipeline.addComponent(xsltTransformer);
       pipeline.addComponent(new XMLSerializer().setIndent(true));
       pipeline.setup(System.out);
       pipeline.execute();

    } catch (MalformedURLException e) {
       LOG.error("An exception occurred while parsing the RSS URL: " + e.getMessage());
    } catch (FileNotFoundException e) {
       LOG.error("An exception occurred while parsing the RSS URL: " + e.getMessage());
    } catch (Exception e) {
       LOG.error("An exception occurred trying to parse the RSS feed: " + e.getMessage());
    }
  }

  public String getFeedURL() {
    return feedURL;
  }

  public void setFeedURL(final String feedURL) {
    this.feedURL = feedURL;
  }
}


As you can see I first created a Pipeline, which in this case is SAX based. In a Cocoon pipeline you can add multiple components, so we add a Generator, two Transformers and a Serializer. The normal XML version of the RSS feed is quite large, so to make the XML result for this example quite small, we use an XSL template to remove all but the title and lastBuildDate from the RSS feed. Let's have a look at the XSL template.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes"/>

  <xsl:template match="rss">
    <info>
      <xsl:copy-of select="channel/title"/>
      <xsl:copy-of select="channel/lastBuildDate"/>
    </info>
  </xsl:template>

  <xsl:template match="@*|node()|text()|comment()|processing-instruction()" priority="-1">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()|text()|comment()|processing-instruction()" />
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

Looks quite simple right? Now when we run the above code, the RSSParser will output an XML snippet to the terminal/console which looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<info>
  <title>Jeroen Reijn</title>
  <lastBuildDate>Mon, 21 Mar 2011 22:54:06 +0000</lastBuildDate>
</info>

Final thoughts

I was able to put this example together in about 20 minutes. That's quite fast if you compare this to the old styled processing of Cocoon.
I think the Cocoon team is really far with reaching their goals. Because you are now able to write the processing logic with just some Java, this makes it easy to integrate with any existing Java based framework.
I'm curious what else will get into the first official Cocoon 3 release, because it's already quite powerful. From now on I will be using Cocoon 3, when I need to process large (and small) XML files. With the new Java based model it's easy to create a small but powerful processor.

For those interested in the source code, you can find the code on GitHub.