For example, if you store some (binary) content in a document, you can add a property to the document that indicates the size of the content stored.
In principle, this is utterly useless and even somewhat dangerous. This
because one can simply use a function that computes the size of the document
on the fly. So when retrieving the document, there is no fundamental reason
that retrieving the property with the size is better than just calling the
function which computes the size of the content. The same yields for querying,
a query might as well be written as SELECT * FROM documents
WHERE length(content) > 0.
The danger lies that when modifying the content, you can forget to modify the property size to reflect the different size of the content. Now the content and the size property are inconsistent, which is generally considered a bigger problem than just forgetting the data all together!
There are however technical reasons why such functionality is still needed.
For one, the compute function might be very expensive, and since this function
could be executed a lot of times, especially in a query over all documents.
Another, even more pressing need for this functionality is that the query languages
available do not allow you to express all kind of realistic queries. For
example, a XPATH query does not allow you to query for a document which
contains two properties, lets call them a and b,
and where these two properties are equal. Naively this could be written down as
//*[@a=@b] but this yields no results even though logically there
would be. Certain other queries are possible but have huge performance impacts.
These are deliberate deficiencies in the query languages XPATH and JCR-SQL,
not bugs.
As a solution to expressing efficient queries and accessing information about
the content without having to know or execute the procedure how the data is
obtained, the Hippo CMS repository has the facility of "derived data".
As the term coins, it computes properties of a document, derived from other
properties of the document. Derived properties may be put on and computed from
the JCR node that represents a document, or on any descendent node in a
document.
When entering a document which should contain such derived property, you should not set the value of the derived property yourself. Instead, the repository upon a save() call will automatically compute the value of the property. Because the repository guarantees to recompute the property upon a save, the problem of inconsistent data has nearly vanished.
In order for the repository to do this, the repository must be informed when and how to compute the properties.
The logic of this is that a JCR nodetype defines the valid properties a piece of document is allowed to contain, i.e.. the structure of the data. Whether or not a property is to be derived is at the very least dependent on whether a piece of data can contain the property in question. It is therefore proper to use the content model also to determine when to compute a property.
Because of the behavior that derived properties are computed upon every modification which could lead to a different value of the derived property, and the compute method must be written as a pure function this methodology is functionally equivalent as if the properties where never computed at when storing data, but computed on the fly when retrieving or querying the data.
We will outline how to define, configure and use derived data based on a simple example to compute Pythagorean theorem.
We define a document type that is a core shape definition:
[sample:shape] > hippo:document
- sample:a (double)
- sample:b (double)
And subsequently a definition that can be added as mixin type to the shape definition to indicate the shape is a triangle:
[sample:triangle] > hippo:derived mixin
- sample:c (double)
To indicate certain properties of this type sample:triangle are to be computed using the procedure of derived data we must extend from the hippo:derived mixin node type.
Now we need to configure in the repository how to compute the derived property
field of sample:triangle. These procedures are defined in the JCR repository
under /hippo:configuration/hippo:derivatives. To compute the c
property we can enter the following JCR definition
[repository root]
`-- hippo:configuration
`-- hippo:derivatives [hippo:derivativesfolder]
`-- pythagorean [hippo:deriveddefinition]
+ hippo:nodetype = sample:triangle
+ hippo:classname = sample.PythagoreanTheorem
+ hippo:serialver = 1
|-- hippo:accessed [hippo:propertyreferences]
| |-- a [hippo:relativepropertyreference]
| | + hippo:relPath = sample:a
| `-- b [hippo:relativepropertyreference]
| + hippo:relPath = sample:b
`-- hippo:derived [hippo:propertyreferences]
`-- c [hippo:relativepropertyreference]
+ hippo:relPath = sample:c
First, the hippo:nodetype property defines the nodetype which contains the properties that should be derived. For any change to nodes of this type, this derived data definition indicates the function to be executed.
The hippo:classname contains the name of the class that should extend the base
class org.hippoecm.repository.ext.DerivedDataFunction and implement the method
compute should should be a pure function as implementation. The
class PythagoreanTheorem must have a no argument public constructor. The
number stated in the hippo:serialver property should match the
serialVersionUID field in the implementing class sample.PythagorieanTheorem.
The definitions in hippo:accessed and hippo:derived node structure indicate the input and output parameters to the derived data function.
Here we indicate that relative to the node of type sample:triangle there are two input properties: sample:a and sample:b. The hippo:relPath properties indicate the relative path to the subject node for which the computation takes place. The value of these two properties are entered as keys "a" and "b" (the name of the hippo:relativepropertyreference nodes) in a Map the compute method implemented by PythagoreanTheorem takes as input:
public Map<String,Value[]> compute(Map<String,Value[]> parameters);
As result the compute method should return a map where under the key "c" the value for the derived property sample:c can be found. The definition also states the --possible multiple-- computed results by the function as nodes under hippo:derived. The hippo:relPath again indicates the relative path to the property.
The hippo:relPath may indicate any property below the document for which properties are computed. It may not contain references to other documents.
The configuration indicates which class should be used to compute the data.
This class must extend the
org.hippoecm.repository.ext.DerivedDataFunction base class and
implement the compute method. It cannot be stressed enough that it is an
requirement that this is a pure function.
package sample;
import org.hippoecm.repository.ext.DerivedDataFunction;
public static class PythagoreanTheorem extends DerivedDataFunction {
static final long serialVersionUID = 1;
public Map<String,Value[]> compute(Map<String,Value[]> parameters) {
double a = parameters.get("a")[0].getDouble();
double b = parameters.get("b")[0].getDouble();
double c = Math.sqrt(a * a + b * b);
parameters.put("c", new Value[] { getValueFactory().createValue(c) });
return parameters;
}
}
This class can be packages in a normal plug-in. Upon any change the properties will be computed. Current limitations give however one exception, imported data is not recomputed and must be already correct.
Hippo Europe: +31 (0)20 5224466
Hippo North America: +1 (707) 773-4646
© 1999-2010 Hippo B.V., All Rights Reserved