Keep your HTML output secure and clean from XSS JavaScript injection

2020-09-15

Writing secure web services can be hard, several attack vectors exist, this article explains how XSS or JavaScript injection can be prevented.

HTML tags are a wonderful tool to structure and format your content but it also allows for XSS JavaScript injection attacks in several ways, below are a few simpler variations.

<script>alert('Injected via script tag');</script>
<p onmouseover="alert('Injected via mouseover callback');">Lorem ipsum dolor sit amet</p>
<p><a href="javascript:alert('Injected via a href')">Lorem ipsum dolor sit amet</a></p>

In many websites, you already have or might soon find the need for rich text formatting for your page content. That means that you need to serve your content as HTML and cannot simply strip out all of the HTML tags. Sometimes you will store the content your users create in a database or files as HTML directly and sometimes in other formats like Markdown, Wikitext, or AsciiDoc. These processors are good for formatting but can usually never be trusted to create secure HTML output.

The examples above are pretty harmless as long as they only use JavaScript to produce alert popups, but this has caused serious disruptions in large services during the years, one of the most famous ones being the Twitters "onMouseOver" XSS vulnerability (explained in detail here). I remember this one, seeing an odd tweet with black square characters one afternoon and how it retweeted when I hovered, by that time it was only a few minutes before my entire feed was only this retweeted message as people around the world put their cursor over it, and soon Twitter was not possible to reach any longer for quite some time until they managed to deploy a fix.

Depending on how well you have secured your session cookies, they might be possible to read from the JavaScript code running in your browser as well, and could in those cases be stolen from the users. Depending on how well you have set up your CORS policy it might be possible to inject code that then loads additional JavaScript into your web page from URLs of your attacker's choice.

You would think that it then would be trivial to prevent these security holes with JavaScript by simply replacing away every <script>/onmouseover/onload/javascript: variation, but there are a huge number of variations in how XSS code can be written and injected.

This is a few of the ways the same single statement could be written:

<p><a href="jav&#x0D;ascript:alert('XSS');">XSS on click</a></p>
<p><a href="javascri&#x0D;pt:alert('XSS');">XSS on click</a></p>
<p><a href="jav&#x0D;ascript:alert&#40;'XSS'&#41;;">XSS on click</a></p>

It takes a lot of effort to prove that your code can safely guard against all of the possible ways so unless this is your expert area you should probably use well-known libraries that have active development by security experts.

The solution, how prevent any XSS JavaScript injection?

Find a good well tested and trusted library that can perform the sanitizing for you
Create a function with a single HTML string argument that runs your library of choice with the same rules every time (no cheating with a second configuration argument that might loosen the rules).
Make sure to use that function every time you output HTML in your templates.

I have liked and used the npm package dompurify for quite some time now. You might find other JavaScript packages that better fit your needs, but here is an example of how I would typically use it in a project. The example leaves out some details about the build setup as this is not the scope of the article, but the example should be easy to adapt to your favorite setup.

utils/purifyHTML.js

import {sanitize} from 'dompurify';

export function purifyHTML(html) {
  return sanitize(html);
}

components/CommentList.js

import {LitElement, html} from 'lit-element';
import {unsafeHTML} from 'lit-html/directives/unsafe-html';

import {purifyHTML} from '../utils/purifyHTML';

class CommentList extends LitElement {
  static get properties() {
    return {
      postId: {
        type: String,
        reflect: true
      },
      _comments: Array
    };
  }

  update(propertiesChanged) {
    if (propertiesChanged.has('postId') && this.postId) {
      fetch(`/api/post/${this.postId}/comments`)
        .then((response) => response.json())
        .then((data) => this._comments = (data && data.length > 0) ? data : undefined)
        .catch(() => this._comments = undefined);
    }
  }

  render() {
    // When rendering templates with lit-html the default tagged template literal html``
    // will always render strings escaped. To render rich HTML content you can use
    // unsafeHTML and here is where we will need our purifyHTML function.

    // In the example below, we render the heading as text and the comment body as HTML.

    return html`
      <section>
        <h2>Comments</h2>
        ${this._comments ? this._comments.map((comment) => html`
          <section>
            <h3>${purifyHTML(comment.author)}:</h3>
            ${unsafeHTML(purifyHTML(comment.body))}
          </section>
        `) : html`<p><em>There are no comments for this post yet.</em></p>`}
      </section>
    `;
  }
}

customElements.define('myapp-comment-list', CommentList);

index.html

<!DOCTYPE html>
<html lang="en" dir="ltr">
  <head>
    <meta charset="utf-8">
    <title>My app</title>
  </head>
  <body>
    <!-- use the myapp-comment-list Web Component to safely render the user comments -->
    <myapp-comment-list postid="123"></myapp-comment-list>

    <!-- import you project build including the files above -->
    <script type="module" src="main.js"></script>
  </body>
</html>

utils/purifyHTML.js (For Node.js)

If you want to use dompurify in Node.js you will have to provide it with a custom JavaScript DOM implementation as there is no built-in DOM in Node.js. But you can keep the same signature, receive a single string, and output a string.

import DOMPurify from 'dompurify'

// This section is only needed for Node.js, as it lacks a DOM
import {JSDOM} from 'jsdom'
const {window} = new JSDOM('<!DOCTYPE html>')
const domPurify = DOMPurify(window);

export function purifyHTML(html) {
  return domPurify.sanitize(html);
}