Content Replication

Content Replication allows administrators to specify which subsets of content will be replicated automatically to repositories running elsewhere in a distributed environment. This allows you, for instance, to restrict the replication of confidential documents to domains within the firewall or to replicate localized language versions to localized servers.

New in version:7.3

Enabling Content Replication

The Content Replication is enabled by setting the java system property replication.config.  The value must point to the replication config file, which can be specified relative to the context path with /rel/path/replication.xml or as an absolute path with file://abs/path/replication.xml.

Configuration

The configuration file looks as follows:

 

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Replication SYSTEM "replication-1.0.dtd">
<Replication>
  <Journal class="org.hippoecm.repository.replication.ReplicationJournal">
    <param name="directory" value="${rep.home}/replication"/>
    <param name="basename" value="journal"/>
    <param name="maximumSize" value="1048576"/>
    <param name="localChangesOnly" value="false"/>
  </Journal>

  <ReplicatorNodes>
    <ReplicatorNode>
      <param name="id" value="replicator-1"/>
      <param name="syncDelay" value="5000"/>
      <param name="stopDelay" value="5000"/>
      <param name="retryDelay" value="5000"/>
      <param name="maxRetries" value="5"/>
      
      <Replicator class="org.hippoecm.repository.replication.replicators.JCRReplicator">
        <param name="repositoryAddress" value="rmi://localhost:1199/hipporepository"/>
        <param name="username" value="admin"/>
        <param name="password" value="supersecret"/>
      </Replicator>
      <Filter class="org.hippoecm.repository.replication.filters.PathFilter">
        <param name="replicate" value="/content"/>
        <param name="exclude" value="/content/intranet,/content/assets/secret/>
      </Filter>
      <Filter class="org.hippoecm.repository.replication.filters.PropertyFilter">
        <param name="exclude" value="notme=,notthis,onlynot=this"/>
      </Filter>
      <Filter class="org.hippoecm.repository.replication.filters.PublishedOnlyFilter"/>
    </ReplicatorNode>
  </ReplicatorNodes>
</Replication>

 

In the config file must be one Journal section and one or more ReplicatorNode sections.  Each ReplicatorNode section must containe one Replicator section and can contain one ore more Filter sections.

 

The Journal parameters:

 

  • directory, the location of the journal files. 
  • basename, the name of the journal files
  • maximumSize, the maximum size of the journal for rotation
  • localChangesOnly, replicate only changes made to this node in the cluster or all changes made in the cluster

 

The ReplicatorNode parameters:

 

  • id, a unique id for this replicator node
  • syncDelay, how often the journal should be checked for changes in milliseconds
  • stopDelay, grace period for shutdown in milliseconds
  • retryDelay, delay before the next retry when a replication action has failed in milliseconds
  • numberOfRetries, maximum number of tries before giving up on replicating the specific changes in the revision

 

 

 The JCRReplicator parameters:

  • repositoryAddress, the rmi address of the remote repository
  • username, the name of the user that is used for replication
  • password, the password

 

The PathFilter parameters: 

  • replicate, a comma separated list of paths to replicate
  • exclude, a comma separated list of paths that are excluded from replication

The PropertyFilter parameters: 

  • exclude, a comma separated list of properties or properties with a specific values that will exclude the node from replication. If no value is specified it will exclude all nodes that have the propety set.

How Content Replication works

 

When Content Replication is enabled the repository creates a file journal to which all changes are written. The journal only contains which nodes and properties have been added, deleted or modified, but does not contain the values itself. 

 

A replicator nodes will periodically check the journal asynchronously for updates.  When new updates are found these are handed over the the actual replicator which will do the actual replication. The JCRReplicator will connect over rmi to the remote repository and replicate the changes to the remote repository. The replicator will check the filters to see if the changes need to be replicated or not. Currently there are filters available that filter on path, properties and publication state which will only replicate published or live documents.

Initial Startup

The best way for the initial startup is to startup the slaves first as the master will try to push the changes to the slaves. On the slaves the paths which are going to be replicated must be removed. The master will recreate the content with the correct internal node id's. The master repository will replicate all the "missing" content  when it is triggered to do its first replication.

Hippo Europe: +31 (0)20 5224466
Hippo North America: +1 (707) 773-4646