HST Rewriting rich text field runtime 

Typically, as a customer / developer you might want to inject runtime changes to the rich text field from some document. You even might want to inject context aware runtime modifications.

For example:

  • When an internal link cannot be resolved, remove the entire <a> element

  • When a link is created, add a tooltip

  • When the channel is a mobile channel, take images of lower resolution

  • Create a lightbox for images (show some small variant that is clickable to show a large one)

  • Create a context aware lightbox for images : Depending on the context, show a different sized image when clicking

  • Etc

To use your own content rewriter, either:

  • configure it on a per template basis, or
  • override the global default (available from HST 3.0.2 and 3.1.0 onwards)

Configure on a Per Template Basis

Normally, when displaying rich text content, you use something like

JSP

<hst:html hippohtml="${requestScope.document.html}"/>

Freemarker

<@hst.html hippohtml=document.html />

This assumes content rewriting is done with the built-in HST SimpleContentRewriter. But, you can use your own custom content rewriter as well. Your script becomes something like:

JSP

<hst:html hippohtml="${requestScope.document.html}" 
          contentRewriter="${requestScope.myContentRewriter}"/>

Freemarker

<@hst.html hippohtml=document.html contentRewriter=myContentRewriter />

Also, you need to have set myContentRewriter on the request as well. Thus for example, you BaseComponent could have something like:

public abstract class BaseComponent extends BaseHstComponent {
   public static final MyContentRewriter myContentRewriter =
       new MyContentRewriter();
   @Override
   public void doBeforeRender(HstRequest request, HstResponse response) {
        // always have the custom content rewriter available
        request.setAttribute("myContentRewriter", myContentRewriter);
   } 

Configure Global Default

Available from HST 3.0.2 and 3.1.0 onwards

By default, the HST uses the SimpleContentRewriter. You can override this default by configuring default.hst.contentrewriter.class in the file hst-config.properties, which is typically located in the site/src/main/webapp/WEB-INF directory in a project.

Writing a Custom Content Rewriter

Next, your custom content rewriter needs to be written. It needs to implement org.hippoecm.hst.content.rewriter.ContentRewriter. The easiest way is to extend from org.hippoecm.hst.content.rewriter.impl.AbstractContentRewriter, or even from SimpleContentRewriter which gives you many rewriting utilities already.

Assume you want to write a content rewriter that adds a style=color:red to internal links that are broken. The easiest way to achieve this is to extend SimpleContentRewriter. The SimpleContentRewriter does string based rewriting of the rich text field. For our use case, it is easier to use the org.htmlcleaner.HtmlCleaner to do the job. Hence, our content rewriter will need to override the rewrite method from SimpleContentRewriter. The BrokenLinksMarkerContentRewriter below should pretty much do what we want. Note there is one important thing: The SimpleContentRewriter does content rewriting for html links and images : If you override rewrite to only change the way links are rewritten, then, at the end, you need to call super.rewrite(..) unless you make sure that you also rewrite images. The example below does also do the rewriting of image therefor, and does not need the super.rewrite(..)

If you also need access to the HstRequest / HstResponse in your ContentRewriter, then you can use the following code

HstRequest hstRequest = HstRequestUtils.getHstRequest(
                                requestContext.getServletRequest());
HstResponse hstResponse = HstRequestUtils.getHstResponse(
                                requestContext.getServletRequest(),
                                requestContext.getServletResponse()
                                );

The example below uses org.htmlcleaner.TagNode#setAttribute which is part of htmlcleaner 2.2. Upgrade htmlcleaner to this version, otherwise use #addAttribute

BrokenLinksMarkerContentRewriter:

import javax.jcr.Node;
import org.apache.commons.lang.StringUtils;
import org.hippoecm.hst.configuration.hosting.Mount;
import org.hippoecm.hst.content.rewriter.impl.SimpleContentRewriter;
import org.hippoecm.hst.core.linking.HstLink;
import org.hippoecm.hst.core.request.HstRequestContext;
import org.htmlcleaner.CleanerProperties;
import org.htmlcleaner.HtmlCleaner;
import org.htmlcleaner.TagNode;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class BrokenLinksMarkerContentRewriter
                    extends SimpleContentRewriter {

    private final static Logger log =
                    LoggerFactory.getLogger(SimpleContentRewriter.class);

    private static boolean htmlCleanerInitialized;
    private static HtmlCleaner cleaner;

    private static synchronized void initCleaner() {
        if (!htmlCleanerInitialized) {
            cleaner = new HtmlCleaner();
            CleanerProperties properties = cleaner.getProperties();
            properties.setOmitHtmlEnvelope(true);
            properties.setTranslateSpecialEntities(false);
            properties.setOmitXmlDeclaration(true);
            properties.setRecognizeUnicodeChars(false);
            properties.setOmitComments(true);
            htmlCleanerInitialized = true;
        }
    }

    protected static HtmlCleaner getHtmlCleaner() {
        if (!htmlCleanerInitialized) {
            initCleaner();
        }

        return cleaner;
    }
    @Override
    public String rewrite(final String html, final Node node,
                          final HstRequestContext requestContext,
                          final Mount targetMount) {

        if (html == null) {
            if (html == null || HTML_TAG_PATTERN.matcher(html).find() ||
                BODY_TAG_PATTERN.matcher(html).find()) {

                return null;
            }
        }
        try {
            TagNode rootNode =  getHtmlCleaner().clean(html);
            TagNode [] links = rootNode.getElementsByName("a", true);
            // rewrite of links
            // THIS IS WHERE THE EXAMPLE IS ABOUT: WHEN A LINK CANNOT BE
            // RESOLVED, WE REMOVE THE href AND SET A STYLE
            for (TagNode link : links) {
                String documentPath =  link.getAttributeByName("href");
                if(isExternal(documentPath)) {
                   continue;
                } else {
                    String queryString =
                            StringUtils.substringAfter(documentPath, "?");
                    boolean hasQueryString =
                            !StringUtils.isEmpty(queryString);
                    if (hasQueryString) {
                        documentPath =
                            StringUtils.substringBefore(documentPath, "?");
                    }
                    HstLink href = getDocumentLink(documentPath,node,
                                                   requestContext,
                                                   targetMount);
                    // if the link is null, marked as notFound or has an
                    // empty path, we mark the link element with a
                    // style=color:red
                    if (href == null || href.isNotFound() ||
                        href.getPath() == null) {

                        // mark the element and remove the href
                        link.removeAttribute("href");
                        link.setAttribute("style", "color:red");
                    } else {
                        String rewritterHref = href.toUrlForm(
                                 requestContext, isFullyQualifiedLinks());
                        if (hasQueryString) {
                            rewritterHref += "?"+ queryString;
                        }
                        // override the href attr
                        link.setAttribute("href", rewritterHref);
                    }
                }
            }

            // BELOW IS FOR REWRITING IMAGE SRC ATTR WHICH RESULTS IN
            // VERY SAME BEHAVIOR AS SimpleContentRewriter
            // We could skip the code below altogether, and rewrite the
            // result below from getHtmlCleaner().getInnerHtml(bodyNode);
            // with super.rewrite() from SimpleContentRewriter
            TagNode [] images = rootNode.getElementsByName("img", true);
            int i = 0;
            for (TagNode image : images) {
                i++;
                String srcPath =  image.getAttributeByName("src");
                if(isExternal(srcPath)) {
                    continue;
                } else {
                    HstLink binaryLink = getBinaryLink(srcPath, node,
                                                       requestContext,
                                                       targetMount);
                    if (binaryLink != null &&
                        binaryLink.getPath() != null) {

                        String rewrittenSrc = binaryLink.toUrlForm(
                                  requestContext, isFullyQualifiedLinks());
                        image.setAttribute("src", rewrittenSrc);
                    } else {
                        log.warn("Skip href because url is null");
                    }
                }
            }

            // everything is rewritten. Now write the "body" element
            // as result
            TagNode [] targetNodes =
                         rootNode.getElementsByName("body", true);
            if (targetNodes.length > 0 ) {
                TagNode bodyNode = targetNodes[0];
                return getHtmlCleaner().getInnerHtml(bodyNode);
            }  else {
                log.warn("Cannot rewrite content for '{}' because there
                                  is no 'body' element" + node.getPath());
            }

        } catch (Exception e) {
            throw new RuntimeException(e);
        }
        return null;
    }

}