11 March 2014

Looking back at 2013

This post has been in draft mode for a couple of months now. I don't know why it actually took this long to publish. Last year was a great year and a lot of things happened, so here goes.
 

Professionally


2013 was the year that I celebrated 10 years at Hippo. It has been a great journey so far and I can still remember the first couple of days when I started in our tiny office in the north of Amsterdam. Back then we were with 7 or 8, but now there are about 80 of us. I could be off a bit, because I really lost count.

Over the last year I've been helping out several new and existing customers. Kick-starting a new project at the University of Leiden, returning to the Dutch Ministry of General Affairs for setting up a large scale multi-tenant platform based on Hippo CMS and kick-started the development team of a big broadcasting company.

Also in close collaboration with the US office I've done quite some interesting calls and had some interesting architectural discussions with new customers. We see a lot of traction for Hippo CMS in the US, so I think 2014 will bring many new and interesting challenges. 

Speaking

One of my personal goals for the second half of 2013 was to give Hippo more visibility by speaking at conferences, meetups, etc. It actually turned out a lot better than I expected, because it resulted in 5 public presentations between June and December at conferences like GOTO conference and NoSQL Matters.

Here are two of the talks a gave:
Here you can see the recording of my talk about how we use Couchbase and Elasticsearch for real-time visitor analysis.


Personal

On a personal front 2013 was a year with ups and downs. I think the biggest downside last year was that health-wise it wasn’t such a good year. Nothing really big, but I think I was sick as much as in the last 5 years.

One definite positive thing was seeing my 1 year old son grow up and watching him explore the world. As a 'new' parent seeing your son going from crawling, to walking to running is amazing! For him life is still one big adventure. Too bad that we can't remember much as adults about our first couple of years. It must be insanely impressive.

Also in 2013 I picked up running again. Running was my passion when I was 18. I ran at least once every week with a minimum distance of 8 km ( about 5 miles ). Getting started with running again took some effort. A lot of exercise of course, but running apps did keep me motivated.
Anyway when I started last year I knew it was going to be a tough challenge, because my condition was almost zero to none. I had a goal to work for because in September we were going to run the Dam to Dam loop with a big group of Hippos. The total distance for the Dam tot Dam loop is 16.1 km (10 miles) so getting into shape was very important. Because of my health issues my preparation was not such a success as I had hoped at the beginning of the year, but I managed to complete the 16.1 km after all and was really happy with that! It gave me a goal for 2014.



The Hippo running crew before we started the race (when we still had a lot of energy!)


Looking forward

So what's next? Well I don't know yet, but business wise I think it will be a great year with our upcoming 7.9 version of Hippo CMS and personally I will keep training and make another attempt to break my record of the Dam tot Dam loop!

19 February 2014

Using Markdown with the Maven site plugin

I find that generating Maven project documentation is always a bit cumbersome with the default XDOC or APT ("Almost Plain Text") syntaxes. This probably has to do with getting accustomed to using Markdown while doing my thing on GitHub, which is sort of the de facto standard there.

While writing some documentation for a new Hippo CMS plugin the other day I noticed that the maven site plugin already supports the Markdown syntax and it's actually quite easy to setup, but the markdown-doxia-module documentation is a bit limited. With this post I hope shed some more light and help you get going with using Markdown for writing documentation.

First up we need to define the maven-site-plugin in our project pom.xml file. If you start with version 3.3 the markdown-doxia-module will already be included. However for this post I will use the latest version ( at this moment 1.5 ), so I have to define it explicitly in my POM file.

<plugins>
  <plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-site-plugin</artifactId>
    <version>3.3</version>
    <dependencies>
      <dependency>
        <groupId>org.apache.maven.doxia</groupId>
        <artifactId>doxia-module-markdown</artifactId>
        <version>1.5</version>
      </dependency>
    </dependencies>
  </plugin>
</plugins>

Next, we will need to create the directory  src/site/markdown which will hold our Markdown files. Make sure the files have the .md extension.

Now let's start with a simple file called index.md that needs to go into the markdown folder. To prove that it will render markdown syntax we can use the following snippet as content.

A First Level Header
====================

A Second Level Header
---------------------

Now is the time for all good men to come to
the aid of their country. This is just a
regular paragraph.

The quick brown fox jumped over the lazy
dog's back.

### Header 3

> This is a blockquote.
> 
> This is the second paragraph in the blockquote.
>
> ## This is an H2 in a blockquote 
 
Now start the maven-site-plugin from the command-line:

$ mvn site:run

and point your browser to http://localhost:8080/ and see the beautiful result!

A concrete implementation can be found on the Hippo forge.



05 September 2013

The mystery of the Bootstrap application during a Maven build on Mac OS

In my day to day job I'm a Java coder working on a MacBook Pro running OS X (Mountain Lion) and recently one thing started to really annoy me. While performing an Apache Maven build cycle occasionally an application pops up in my OS X dock and while browsing the web or composing an e-mail the focus is lost and moves to the just started application. In my case these applcations are most of the time called Bootstrap or ForkedBooter.

I asked around a little if any of my fellow coders experienced this as well and it seems so, but nobody took the time to figure out what was going on. The answers are out there on the web, but you really need to know what to search for before finding a proper answer.

ForkedBooter

If you see the ForkedBooter application pop up in your dock this is most likely due to the maven-surefire-plugin which is being executed during the test phase of the Maven build lifecycle.

It's actually quite easy to get rid of this application popping up in the OS X dock by telling Maven to run Java in headless mode. To do so I've added the following line to my .bash_profile file stored in my users home directory. In my case this is located in /Users/jreijn.

export MAVEN_OPTS="-Xms256m -Xmx512m -XX:PermSize=64m -XX:MaxPermSize=256m -Djava.awt.headless=true"

By adding the headless directive it will tell Maven and the plugins (which embrace the MAVEN_OPTS) to run Java in headless mode.

This should resolve the ForkedBooter popping up in the OS X dock.

Bootstrap

The Bootstrap application showing up in the dock is actually quite specific and originates from starting up Apache Tomcat somewhere during the Maven build. In my specific case this was because at Hippo we use the cargo-maven2-plugin to fire up Apache Tomcat to run the CMS and site web application inside a Tomcat instance.

There are several ways of solving this. One of the possible options I found was to change Tomcats  conf/catalina.properties file and add the following line at the end of the file.

java.awt.headless=true

When using a standalone Tomcat instance this way of solving is fine, but you could also add this to the catalina.sh or the startup.sh scripts.

Now in the scenario of using the Maven cargo plugin the container might be reinstalled on every build and this will overwrite your changes. There are two (or more) approaches again to solve this problem.

The first approach would be to add the catalina.properties file to your local project and copy it over when cargo installs the container.

<build>
  <plugins>
    <plugin>
      <groupId>org.codehaus.cargo</groupId>
      <artifactId>cargo-maven2-plugin</artifactId>
      <configuration>
        <configuration>
          <configfiles>
            <configfile>
              <file>${project.basedir}/conf/catalina.properties</file>
              <todir>conf/</todir>
              <tofile>catalina.properties</tofile>
            </configfile>
          </configfiles>
        </configuration>
      </configuration>
    </plugin>
  </plugins>
</build>

The problem with this approach is that you will have a local copy inside your project which you have to recheck when upgrading the cargo plugin or the container instance or it might not work when switching to a different container then Tomcat.

The other more simple approach which will work across multiple containers is by adding a system property to the cargo plugin.

<build>
  <plugins>
    <plugin>
      <groupId>org.codehaus.cargo</groupId>
      <artifactId>cargo-maven2-plugin</artifactId>
      <configuration>
        <container>
          <systemProperties>
            <java.awt.headless>true</java.awt.headless>
          </systemProperties>
        </container>
      </configuration>
    </plugin>
  </plugins>
</build>

This way the system property is added to the Java run-time when starting up Tomcat from cargo and the Bootstrap application does not pop up anymore inside the OS X dock.

I hope this post will help those of you in search for the same answers and could not find it.

References


Stack overflow: Any idea why org.apache.catalina.startup.Bootstrap pops up in dock on Mac?
Jenkins JIRA: on OSX a java icon jump on dock for all starting maven build and takes focus

22 July 2013

Real-time visitor analysis with Couchbase, Elasticsearch and Kibana

At Hippo we recently started using Couchbase as the storage solution for our targeting/relevance module. Couchbase is a really high performant NoSQL database, which since version 2.0 can be used as a (JSON) document database. Couchbase is really fast when it comes to simple CRUD operations, but does lack some search capabilities like Geo-spatial search (still 'experimental' mode) and free text search, which you might find in other document oriented NoSQL databases like MongoDB.

However the lack of these search capabilities can be overcome quite easily by combining Couchbase with Elasticsearch by using the Couchbase-Elasticsearch transport plugin. The plugin uses the Couchbase built-in cross data center replication mechanism (XDCR), which can be used for replicating data between Couchbase clusters. It sort of makes Elasticsearch act just like another Couchbase cluster.

In this post we will go through all the necessary steps to setup Couchbase, Elasticsearch and Kibana for doing 'real-time' visitor analysis.

If you are familiar with LogStash you might wonder why we use Couchbase as an additional storage for our request data. Well it's because with Hippo CMS we store more than just the request log information. We also store information about a visitor over multiple requests with regards to (content) characteristics and persona based matching. We need a cluster-wide high performance database for that and that's why we use Couchbase as a first layer of storage.

Setting up Couchbase

As I've said before at Hippo we use Couchbase as our storage solution. For installation instructions please see the official Couchbase download page. Couchbase uses data buckets for storage. There are two kind of buckets available; 'couchbase' buckets and 'memcached' buckets. For this specific use-case you will need to create a bucket of type 'couchbase' called 'targeting'. Buckets of type 'couchbase' allow you to store JSON documents and perform for instance map-reduce functions on the available data in a bucket.

In this bucket we will be storing request data. An example of a request document could look similar to this:

{
  "visitorId": "7a1c7e75-8539-40",
  "pageUrl": "http://www.mydomain.com/news",
  "pathInfo": "/news",
  "remoteAddr": "127.0.0.1",
  "referer": "http://www.mydomain.com/",
  "timestamp": 1371419505909,
  "collectorData": {
    "geo": {
      "country": "",
      "city": "",
      "latitude": 0,
      "longitude": 0
    },
    "returningvisitor": false,
    "channel": "English Website"
  },
  "personaIdScores": [],
  "globalPersonaIdScores": []
}

The above snippet is taken from the requestlog of our documentation website. As you can see our relevance/targeting module is collecting data about visitors (like geo data, type of channel a user is visiting, etc) and this data is stored in Couchbase as a JSON document.
Now that we have this data inside our database we would like to slice this data and see what our visitors are doing over time.

Elasticsearch

Elasticsearch is probably one of the most rapidly adopted technologies. It has adoption by companies like Github, StackOverflow and Foursquare. For those of you not yet familiar with Elasticsearch; it's a distributed (JSON based) document storage solution with advanced query capabilities and often used for distributed search and analytics.

If you already have Elasticsearch installed on your machine you can skip this step, but if you don't then let's first start with installing Elasticsearch. For ease of use I've written down all the manual command-line steps, so this is easy to reproduce.
$ curl -O https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.90.2.tar.gz
This will download Elasticsearch 0.90.2 to your local directory. Now let's unzip it and move move on to installing the Couchbase plugin.
$ tar xzvf elasticsearch-0.90.2.tar.gz 

Adding the Couchbase transport plugin

Since we're using the latest version of Elasticsearch we need to manually build the Couchbase transport plugin from Github. This part is a little more tricky, but still quite easy to do. Let's first do a checkout of the transport plugin from Github.
$ git clone https://github.com/couchbaselabs/elasticsearch-transport-couchbase.git
Since the transport plugin depends on a SNAPSHOT of the couchbase-capi-server module we will need to build that locally as well.
$ git clone https://github.com/couchbaselabs/couchbase-capi-server.git
Now let's build the capi server first.
$ cd couchbase-capi-server/ && mvn clean install
This installs the dependency into your local maven repository for later usage.
Now we need to switch back to the transport plugin directory and we need to create the plugin package:
$ cd ../elasticsearch-transport-couchbase/
$ mvn clean package
Now that both dependencies have been build we can continue to installing the plugin into Elasticsearch. Let's first switch back to our Elasticsearch install directory.
$ cd ../elasticsearch-0.90.2/
Due to an issue with the elasticsearch 0.90.2 plugin manager we need to install the plugin without the Elasticsearch plugin manager. Let's first manually create the transport-couchbase plugin directory.
$ mkdir plugins/transport-couchbase
$ cd plugins/transport-couchbase
Next we need to unzip the plugin archive we just created.
$ unzip [/path/to/your/githubcheckout/]elasticsearch-transport-couchbase/target/releases/elasticsearch-transport-couchbase-1.0.1-SNAPSHOT.zip
Now we also need to set a proper username and password for the connection from Couchbase (needed for replication).
$ cd ../../ 
$ echo "couchbase.password: password" >> config/elasticsearch.yml
$ echo "couchbase.username: Administrator" >> config/elasticsearch.yml 
Now that everything is in place we can fire up elasticsearch. 
$ ./bin/elasticsearch -f
During startup you should see something  similar to:
[2013-07-22 10:57:43,940][INFO ][transport.couchbase] [Toro] bound_address {inet[0.0.0.0/0.0.0.0:9091]}, publish_address {inet[/10.10.100.156:9091]} 
With transport.couchbase in the start up log it means the Couchbase transport plugin started correctly and is running on port 9091 (you will need this later on).
So far for getting the connection up and running. For more information about configuring the transport plugin please read the official documentation.

Storing data in Elasticsearch

Let's first create a new index to hold our data.
$ curl -XPUT 'http://localhost:9200/targeting'
If you follow the official Couchbase transport plugin documentation you will also need to import the Couchbase mapping file, so that elasticsearch knows how to index (or not) certain fields. The transport plugin comes with a mapping for Couchbase documents which marks all documents coming from Couchbase to be indexed and not stored within Elasticsearch. This default is there because Couchbase itself is really fast in getting the document and it does not need elasticsearch to store the document. That's fine for most use cases, but in our case we would like use our data later on to view it in Kibana, so we will need to create our own simple mapping for documents coming from Couchbase.

Now that we have our index created we will need to add the mapping for our document of type couchbaseDocument.
$ curl -XPUT 'http://localhost:9200/targeting/couchbaseDocument/_mapping' -d '{
    "couchbaseDocument": {
        "properties": {
            "doc": {
                "properties": {
                    "timestamp": {
                        "type": "date"
                    },
                    "remoteAddr": {
                        "type": "ip"
                    },
                    "collectorData": {
                        "properties": {
                            "channel": {
                                "type": "string",
                                "index": "not_analyzed"
                            },
                            "audience": {
                                "properties": {
                                    "terms": {
                                        "type": "array"
                                    }
                                }
                            },
                            "categories": {
                                "properties": {
                                    "terms": {
                                        "type": "array"
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    } 
}'
You should see:
{"ok":true,"acknowledged":true}
 
The above mapping maps certain fields of our example request log document and tells Elasticsearch how to index those specific fields. With the mapping in place we can go into the Couchbase Admin Console and create a reference to our Elasticsearch cluster. Keep in mind that port 9091 is the port we've seen before when starting Elasticsearch with the couchbase-transport plugin.



Next we need to setup replication from Couchbase to Elasticsearch. The target bucket in this case is the Elasticsearch index called 'targeting' which we created a few minutes ago.


Now when you press replicate Couchbase will start replicating your existing data into the targeting index within Elasticsearch. Now let's move on to our final step: setting up Kibana 3.


Analytics with Kibana 3


Kibana 3 is an open source (ASL 2.0) analytics and search interface, which you can use for any kind of timestamped data sets stored in Elasticsearch. It's really easy to use and gives you a visually attractive representation of the data in graphs, charts and (world)maps. With Kibana you can easily create your own dashboard that represents a nice overview of your dataset.

Kibana 3 is a very easy to install HTML + Javascript application. It only requires a webserver and a connection to Elasticsearch. For more information on how to install Kibana see the introduction page. If you run Elasticsearch on a different port then 9200 you will need to change the config.js file and point it to your Elasticsearch instance. I'll skip the the installation and move on to showing the data in our own dashboard.


Now when the data is in Elasticsearch you can start adding panels to your dashboard. Adding a pie chart is as easy as couple of clicks. Here is an example of how to add a pie chart panel based on the available channels(sites) within Hippo CMS.



Now when you add a set of panels you might and up with entire dashboard with live streaming data. 
One of the nice features of Kibana is that it can load a dashboard configuration from GitHub by using a gist URL. A dashboard I created for Hippo targeting data can be found at: https://gist.github.com/jreijn/5830593
It's is a nice example of a dashboard and it's based on request log information of our online documentation website.
Well that's it. In this post we've seen how to setup Couchbase, Elasticsearch and Kibana to perform real-time visitor analysis for your website / web application. Last month at the Hippo GetTogether 2013 I gave a live demonstration of what I've written here. When the video becomes available I'll do a short update or new post, so you can hear more about our relevance module and the integration with Couchbase. 

References

19 February 2013

Hippo Fridays @ Hippo

At Hippo we have a concept we call 'Hippo Fridays'. Hippo Fridays are monthly Fridays on which all Hippo developers can share knowledge, try out new things, work on improvements or hack on their own pet project. We've been having Hippo Fridays for more then a year and even if it's only one day a month, they are always great fun!

The other day while leaving the office I overheard one of my colleagues ask what the actual outcome is of these Hippo Fridays. Does something end up in the product? Well let me share what has come out of the more recent Hippo Fridays and will end up in the upcoming Hippo CMS 7.8 release.


HTML5 History API in the CMS and Console

With the upcoming Hippo CMS 7.8 release both CMS and Console will make use of the HTML5 history API. This might sounds a bit vague and technical, but it means that the CMS and Console will store the URLs to the documents that you visited in your browsers history. By doing that it will allow you to reach them by using your browsers history or by using a direct URL in the browsers address bar. See the address bar in the picture below.

Multiple Console improvements

The more experienced Hippo users will probably notice some new options in the Console menu bar.
The Console UI was improved with some new features to benefit the user experience:
  • deletion of multiple nodes
  • keyboard shortcuts
  • open a node by path or UUID
  • use the arrow keys to navigate the tree  
The next image shows you all the keyboard-shortcuts that are available in the Console.


Scripting support

With the upcoming 7.8 release we will also have scripting support straight from the CMS UI. This feature will be for 'admin' users only. Scripting support is focused on supporting JCR runner / visitors from the CMS UI and helps you do bulk updates of document or just plain JCR nodes. The scripting support in CMS 7.8 was inspired by the introduction of the Hippo CMS Groovy add-on, which started out as a prototype on a Hippo Friday.

 

SNEAK PREVIEW: Settings management

This next feature is one of my own pet projects. Those of you who are experienced with Hippo CMS know that Hippo CMS is very flexible and you can configure all most everything. However most of the configuration options are done through the CMS Console.

With the settings management add-on there will be a new user friendly interface and you might even discover some options you never knew existed. Since this is still under heavy development it will not end up in the CMS 7.8 release, but I will keep you posted when a first release is made, so you can try it out.



As you can see: What happens on Hippo Fridays does not stay on Hippo Fridays!