Repository Maintenance - Enterprise Java Content management system - Hippo CMS

Repository Maintenance

Removing old revisions from the repository journal

Jackrabbit 2 uses a journal in the database for cluster messaging. This journal can grow quite large over time and it can be useful to clean the old journal entries periodically. How fast the table grows depends on the amount of writes that are done to the repository.

The table REPOSITORY_LOCAL_REVISIONS holds the information up to which journal entry the node in the cluster has caught up. Entries older than the lowest revision in that table can be safely deleted from the REPOSITORY_JOURNAL tables. 

Show revisions in the database:

mysql> SELECT * FROM REPOSITORY_LOCAL_REVISIONS;
+---------------+-------------+
| JOURNAL_ID    | REVISION_ID |
+---------------+-------------+
| cms-node1     |     2302839 |
| cms-node2     |     2302839 |
| site-node1    |     2302839 |
| site-node2    |     2302839 |
+---------------+-------------+
4 rows in set (0.00 sec)

mysql> SELECT * FROM REPOSITORY_GLOBAL_REVISION;
+-------------+
| REVISION_ID |
+-------------+
|     2302839 |
+-------------+
1 row in set (0.00 sec)

The entries in the JOURNAL_ID correspond to the jackrabbit cluster id of the repository nodes.

There are two ways to periodically clean the old entries: using sql or let jackrabbit handle the cleanup.

Remove old revisions using SQL

If you want to safely remove old entries from the journal table you can run the following query:

MySQL REPOSITORY_JOURNAL table cleanup:

mysql> DELETE FROM REPOSITORY_JOURNAL WHERE REVISION_ID < ANY (SELECT min(REVISION_ID)
       FROM REPOSITORY_LOCAL_REVISIONS);
Query OK, 0 rows affected (0.00 sec)

mysql> OPTIMIZE TABLE REPOSITORY_JOURNAL;
+-----------------------------+----------+----------+----------------------------+
| Table                       | Op       | Msg_type | Msg_text                   |
+-----------------------------+----------+----------+----------------------------+
| hippocms.REPOSITORY_JOURNAL | optimize | note     | Table does not support     |
|                             |          |          | optimize, doing recreate + |
|                             |          |          | analyze instead            |
| hippocms.REPOSITORY_JOURNAL | optimize | status   | OK                         |
+-----------------------------+----------+----------+----------------------------+
2 rows in set (0.03 sec)

Caveats

Remove a node from the cluster

When a cluster node is removed from the cluster, the entry in the REPOSITORY_LOCAL_REVISIONS is not removed for that node. This has to be done manually if you want to safely remove the old revisions.

Adding a node to the cluster

When a new node is added to the cluster it will try to read the entire REPOSITORY_JOURNAL table. If the table is very large this can lead to out of memory errors on startup. To safely add a new node to a cluster it is advisable to clean up the old revisions first.

Repository Maintenance

Removing old revisions from the repository journal

Jackrabbit 2 uses a journal in the database for cluster messaging. This journal can grow quite large over time and it can be useful to clean the old journal entries periodically. How fast the table grows depends on the amount of writes that are done to the repository.

The table REPOSITORY_LOCAL_REVISIONS holds the information up to which journal entry the node in the cluster has caught up. Entries older than the lowest revision in that table can be safely deleted from the REPOSITORY_JOURNAL tables. 

Show revisions in the database:

mysql> SELECT * FROM REPOSITORY_LOCAL_REVISIONS;
+---------------+-------------+
| JOURNAL_ID    | REVISION_ID |
+---------------+-------------+
| cms-node1     |     2302839 |
| cms-node2     |     2302839 |
| site-node1    |     2302839 |
| site-node2    |     2302839 |
+---------------+-------------+
4 rows in set (0.00 sec)

mysql> SELECT * FROM REPOSITORY_GLOBAL_REVISION;
+-------------+
| REVISION_ID |
+-------------+
|     2302839 |
+-------------+
1 row in set (0.00 sec)

The entries in the JOURNAL_ID correspond to the jackrabbit cluster id of the repository nodes.

There are two ways to periodically clean the old entries: using sql or let jackrabbit handle the cleanup.

Remove old revisions using SQL

If you want to safely remove old entries from the journal table you can run the following query:

MySQL REPOSITORY_JOURNAL table cleanup:

mysql> DELETE FROM REPOSITORY_JOURNAL WHERE REVISION_ID < ANY (SELECT min(REVISION_ID)
       FROM REPOSITORY_LOCAL_REVISIONS);
Query OK, 0 rows affected (0.00 sec)

mysql> OPTIMIZE TABLE REPOSITORY_JOURNAL;
+-----------------------------+----------+----------+----------------------------+
| Table                       | Op       | Msg_type | Msg_text                   |
+-----------------------------+----------+----------+----------------------------+
| hippocms.REPOSITORY_JOURNAL | optimize | note     | Table does not support     |
|                             |          |          | optimize, doing recreate + |
|                             |          |          | analyze instead            |
| hippocms.REPOSITORY_JOURNAL | optimize | status   | OK                         |
+-----------------------------+----------+----------+----------------------------+
2 rows in set (0.03 sec)

Caveats

Remove a node from the cluster

When a cluster node is removed from the cluster, the entry in the REPOSITORY_LOCAL_REVISIONS is not removed for that node. This has to be done manually if you want to safely remove the old revisions.

Adding a node to the cluster

When a new node is added to the cluster it will try to read the entire REPOSITORY_JOURNAL table. If the table is very large this can lead to out of memory errors on startup. To safely add a new node to a cluster it is advisable to clean up the old revisions first.