JCR Clustering

1 Introduction

Clustering allows to run an application on several parallel servers (also known as cluster nodes). In this topology, the load is distributed across different servers. A cluster brings a number of benefits:

  • Availability: Even if one server (node) crashes, the other nodes continue to serve and ensure that your system does not break down.
  • Scalability: The load is balanced across the different nodes and when you need more horse power, you simply add new nodes to the cluster
eXo JCR comes with support of built-in clustering. With JCR clustering it is possible to configure several instances of a Java Content Repository and use them as a single instance of JCR.

Load Balancing to a JCR cluster

On the picture above, a load balancer distributes requests to a bunch of cluster nodes. Changes made by a request are broadcasted to the other cluster participants. The key to JCR clustering is replication of changes among the different nodes. This article gives some explanations of the replication in eXo JCR and shows how to configure it.

1.1 Implementation

Replication in eXo JCR uses JGroups as the network communication layer between nodes. JGroups provides peer-to-peer network messaging between several instances of JCR. The WorkspaceDataReplicator component is dedicated to capturing, broadcasting a local log of changes. It is also used to receive the changes from other cluster nodes.

1.2 Replication modes

(TODO : detailed explanation of persistent vs proxy mode with diagrams)

There are two different replication modes, a persistent mode and a proxy mode:

  • persistent means that changes will be applied to the repository's persistent storage,
  • proxy means that only cache and indexes will be replicated (useful if you use single storage for all repository instances or an external replication mechanism)
This example shows a configuration in proxy mode that uses several tomcat application server instances and one database. See also How-to+Proxy+JCR+Cluster.

Proxy mode

2 Replication mechanisms

Beyond persistent or proxy replication eXo JCR uses other mechanisms to ensure the integrity of the data.

2.1 Recovery

Since version 1.9, eXo JCR features a recovery mechanism in order to automatically synchronize a node that has been disconnected from the cluster for some time. During the downtime, the cluster keeps serving and data changes may occur. When the node returns to the cluster, it needs to synchronize its last recorded state with the other participants. The process of synchronizing is called recovery of lost changes.

At startup the returning node asks other participants for the missed changes since last time it was up. It receives a log of the missed changes and attempts to restore the state. Once the restoration is over, the node is active and ready to serve.

jcrrecovery.png

The diagram shows a typical recovery sequence:

  1. The cluster is up and running.
  2. The red node gets out of the cluster (disconnected).
  3. The node attempts to return and requests the changelog. The node is orange because it is not serving.
  4. The node receives the changelog from other cluster nodes.
  5. The node is recovering while applying the changelog. The node is not ready to serve yet.
  6. The node has finished recovery. It is green because it is ready to serve. Back to step 1.
Warning: The change log request is based on system time. So it is critical that the servers are time-synchronized (with NTP for example).

2.2 Priorities

In a cluster, not all members may have the same importance. JCR replication service allows you to assign different a priority to each node. Based on priorities, JCR nodes are able to change their behavior when they detect a connection loss with other members. A priority is simply an integer value between 0 and 100. In a cluster, there is always one single node that has higher priority than the other nodes. That node is called the main node.

There are 2 modes for the priority mechanism which are called static and dynamic. The modes define different reactions in case the connection to the main node is lost. When a connection loss is detected, each node updates its status and switches JCR access to read-only or to read-write. In Read-write or RW mode a node does both updating the repository and giving read access. In read-only or RO mode the node does not update the repository data.

  • Connection to the main node is lost:
    • "static" : The main node remains read-write. The other nodes become read-only.
    • "dynamic" : The main node become read-only. The other nodes remain read-write.
  • Any node (that is not the main node) becomes suddenly isolated:
    • "static" : The main node remains read-write. The other nodes become read-only.
    • "dynamic" : The main node remains read-write. The other nodes become read-only.
  • Main node joins the cluster:
    • static and dynamic : All nodes become read-write.
Note that in a 2-nodes configuration, static and dynamic modes have the same behavior.

In the following table there is a cluster with three nodes, Node1 is the main node whose priority is 100.

Static modeNode1 (100)Node2 (50)Node3 (30)
Connection to the main node is lostRWRW > RORW > RO
The main node is backRWRO > RWRO > RW

This priority mechanism has an impact on how you startup and shutdown the cluster. Basically, the rule to follow is :

  • Startup cluster nodes in decreasing order of a priority - the main node at the beginning.
  • Shutdown cluster nodes in ascending order of a priority - the main node at the end.

3 Configuration

Warning: Always synchronize the system time of the cluster nodes using the same NTP-server.

This example of channel-config using point-to-point connections (TCP) for base transport and multicast PING (MPING) for initialize members. If you need only point-to-point connections, then you can use TCPPING + TCP You can get more details on the Replication configuration page

The configuration depends on the JCR version:

4 Initialization Procedure

The cluster requires different startup procedures to work properly.

4.1 Persistent mode

For a step-by-step tutorial, read the Configure a persistent JCR cluster.

4.1.1 First persistent cluster node startup

  • configure the ScratchWorkspaceInitializer in workspace configuration;
  • set the following parameters in replication config:
<property name="enabled" value="false"/>
<property name="mode" value="persistent"/>
  • run the application server;
  • stop the application server after the full launch;
  • set the following parameters in replication config:
<property name="enabled" value="true"/>
  • run the application server. The first cluster node operates normally;
  • start backups for all workspaces. The backup data is needed to initialize the other participants of the cluster.

4.1.2 Other cluster nodes startup

  • configure the BackupWorkspaceInitializer in workspace configuration. For each workspace, the relevant folder backup to take the first participant cluster.
<initializer class="org.exoplatform.services.jcr.impl.core.BackupWorkspaceInitializer"> 
  <properties> 
    <property name="restore-path" value="../temp/backup/repository_production-20080609_001009"/> 
  </properties> 
</initializer>
  • set the following parameters in replication config:
<property name="enabled" value="false"/> 
<property name="mode" value="persistent"/>
  • run the application server;
  • stop application server after the full launch;
  • set the following parameters in replication config:
<property name="enabled" value="true"/>
  • run the application server.

4.1.3 Adding a new cluster node

Notify the new node to the other nodes

  • add the new node in replication config for all cluster participants:
<property name="other-participants" value="cluster_node2;cluster_node3;NEW_NODE"/>
  • restart all cluster participants.
  • it is a precondition that the first member of the cluster is under backup, if not, you should first create a backup

Add a new node to the cluster

  • configure the BackupWorkspaceInitializer in the workspace configuration. For each workspace, the relevant folder backup to take the first participant cluster
<initializer class="org.exoplatform.services.jcr.impl.core.BackupWorkspaceInitializer"> 
            <properties> 
              <property name="restore-path" value="../temp/backup/repository_production-20080609_001009"/> 
            </properties> 
          </initializer>
  • set the following parameters in replication config:
<property name="enabled" value="false"/>
    <property name="mode" value="persistent"/>
  • run the application server;
  • stop application server after the full launch;
  • set the following parameters in replication config:
<property name="enabled" value="true"/>
  • run the application server.

4.2 Proxy mode

For a step-by-step tutorial, read the How to configure a proxy JCR cluster

4.2.1 First proxy cluster node startup

  • configure the ScratchWorkspaceInitializer in workspace configuration;
  • set the following parameters in replication config:
<property name="enabled" value="false"/>
    <property name="mode" value="proxy"/>
  • run the application server;
  • stop application server after the full launch;
  • set the following parameters in replication config:
<property name="enabled" value="true"/>
  • run the application server.

4.2.2 Others cluster members startup

  • same steps as for first member startup

4.2.3 Adding a new node to the cluster

Configure another node to know the new one

  • the add new node in replication config for all cluster participants:
<property name="other-participants" value="cluster_node2;cluster_node3;NEW_NODE"/>
  • restart the all cluster participants. The all cluster participants is operate normally.

4.2.4 Add new member to the cluster

  • configure the ScratchWorkspaceInitializer in workspace configuration;
  • set the following parameters in replication config:
<property name="enabled" value="false"/>
    <property name="mode" value="proxy"/>
  • run the application server;
  • stop application server after the full launch;
  • set the following parameters in replication config:
<property name="enabled" value="true"/>
  • run the application server.

5 Cluster configuration HOW-TOs

Recently Modified

Creator: Gennady Azarenkov on 05/23/2007
Copyright (c) 2000-2009. Allright reserved - eXo platform SAS
1.6.13286