How To Fix XSS Vulnerabilities In Code

Learn how to bulletproof your code against dangerous inputs with proper escaping.

Hello, world! I'm Jesse from Chef Secure, and you're finally gonna learn how to properly fix XSS vulnerabilities in your code.

Here goes.

Don't trust user input.

That's all...

It's not that simple.

For instance, consider a select box on a signup form, all the values are defined by you on a server you trust, and you never ask the user to input anything, but the submitted value can still be modified to deliver an XSS payload.

Or how about referrer urls? Or API calls that happen in the background of your application with AJAX?

Or how about, instead of thinking what is or isn't user input, just don't trust requests entirely.

This can save you beyond XSS too, like if you receive a request to delete an account, for instance. This can potentially come from anyone, so you're gonna need some defenses.

So this recipe is all about setting up defenses to protect yourself from XSS attacks.

The key to defending against XSS is the same as attacking it: know where the values are inserted.

For XSS, there are seven contexts you need to know.

First, the most common two contexts, HTML content and HTML attributes, are usually bundled together under the same escaping.

You already know, as an attacker, you're often looking to use special characters to break out of something and inject your own code. And to stop this, you need to make sure these special characters aren't used in a way that can modify the webpage.

You do this with escaping.

In short, escaping replaces each special character with a safe alternative.

Here are the special characters for HTML and how to escape them. Starting off,

& becomes &

Then to prevent tag injection you need to escape angle brackets.

< becomes &lt;
> becomes &gt;

And to protect attributes you need to escape

" becomes &quot;
' becomes &#x27;

By the way, 27 is just the hex value of the single quote in ASCII. You might see the decimal value used instead, like &#39; but it works exactly the same.

And that's all you need –


We're not done yet.

Remember, attributes can be unquoted as well, so we need to take care of space characters too. Using hex values again,

space becomes &#x20;
newline becomes &#x0a;
carriage return becomes &#x0d;
tab becomes &#x09;
form feed becomes &#x0c;

Even if you, personally, never ever use unquoted attributes and never will in your life,

Which is good. I like you.

the fact is, someone else might. And any of those space characters can be used to break out and deliver an XSS payload.

By the way, many frameworks with built-in escaping don't account for unquoted or single quoted attributes by default, so you may need to add that in if you want extra protection.

And if you're using a framework that doesn't have automatic escaping, you should really, REALLY put in the effort to

  1. finding a way to enable it, or
  2. finding a better framework
because it'll be well worth it for the security of your app in the long run.

And always remember: automatic escaping doesn't guarantee you're 100% safe from XSS attacks because they may not always be context-specific like in URLs.

Also, there are built in methods to turn off automatic escaping when developers need it which can, unintentionally, introduce new XSS vulnerabilities.

The third context you need to know lives inside some HTML attributes and that's URLs.

If you're working with a full URL, then you need to ensure that valid schemes like HTTP and HTTPS are used.

You could use a regular expression to check this, but you need to make certain that the string actually starts with the valid scheme.

Remember, whitelisting, where you compare against what you do want, is always better than blacklisting, where you compare against what you don't, because this ensures that nothing unexpected passes through.

Sometimes you'll be using untrusted data in only part of the URL like in URL parameters. And this requires you to use URL encoding, or percent encoding, to make sure the data stays where it belongs and attackers can't change the URL unexpectedly like adding or overriding parameters.

As a reminder, URL encoding allows special characters in URLs like percents, ampersands and semicolons to be used without confusing browsers or servers.

It's made using the percent symbol followed by the character's hex value To make it simple, all characters other than A-Z, a-z, 0-9, -, _, . or ~ should be percent encoded.

After you've encoded your URL properly, remember, if you're still inside an HTML attribute, like a src or href, HTML escaping is still required.

The fourth context is JavaScript. Or, more specifically, JavaScript strings. Because you don't really want to put untrusted data anywhere else.

For instance, if you need a number assigned to a variable, and you do this

let number = {{ data }}

without casting the data as an integer on your server like this

let number = {{ data.to_i }}
(or similar)

You're wide open to XSS attacks, because an attacker can just put their JavaScript code directly in there.

let number = alert()

So, instead, when supplying data to a JavaScript variable, just put it in a string first.

let number = '{{ data }}'

And if you need a number and don't cast on your server, you can do it with JavaScript's parseInt function

let number = parseInt('{{ data }}')

or convert it to a number with the number constructor.

let number = Number('{{ data }}')

However, the string is still vulnerable to XSS, because we don't have escaping yet.

let number = Number(''+alert()+'')

Quotes need to be escaped so attackers can't break out of strings.

" becomes \"
' becomes \'

And don't forget template literal escaping with

` becoming \`
$ becoming \$

Then forward slashes need to be escaped to prevent closing the script tag and opening a new one, because, remember, the HTML parser runs before JavaScript.

/ becomes \/

And we don't want attackers to escape our own escapes

\ becomes \\

Finally, newlines can break out of strings as well and cause errors in our code, so we also escape

carriage return with \r
line feed/newline with \n
line separator with \u2028
paragraph separator with \u2029

What about inline event handlers? They're in both JavaScript and HTML contexts, so what escaping do you use?

Since HTML gets parsed by your browser before JavaScript, wrap the untrusted data with JavaScript escaping on the inside and HTML escaping on the outside.

Think of it like this:

your browser gets served a webpage containing the event handler from there, the HTML parser, eats it up, digests it, poops it out –

It's the circle of life, folks!

then JavaScript will come along eventually and eat whatever's left over.

So we need to prepare the event handler so it can be consumed by HTML first and then JavaScript in the end.

And remember, this is important, don't put untrusted data in a place that gets evaluated directly as an expression, such as outside a string, or within strings that get evaluated by potentially unsafe functions, like

eval('{{ escapeJS(data) }}')
new Function('{{ escapeJS(data) }}')
setTimeout('{{ escapeJS(data) }}',0)
setInterval('{{ escapeJS(data) }}',0)
setImmediate('{{ escapeJS(data) }}',0)

Regular expressions make up the fifth escaping context.

If you recall, regular expressions are used to find patterns within strings. One use case would be to make a search feature within a webpage.

Escaping is simple again, just add a backslash before special characters.

Now, for XSS, all you really need to escape is the forward slash so attackers can't break out of the regular expression. But, for completeness, here's a list of all the characters that need to be escaped to prevent other kinds of attacks that can change the meaning of your regular expression.

. becomes \.
* becomes \*
+ becomes \+
? becomes \?
^ becomes \^
$ becomes \$
{ becomes \{
} becomes \}
( becomes \(
) becomes \)
| becomes \|
[ becomes \[
] becomes \]
\ becomes \\
/ becomes \/

The sixth context to go over is JSON, or JavaScript Object Notation.

As the name implies, JSON is the format for JavaScript objects, so you can, say, take an object from your server and work with it in your browser.

This can go inside event handlers, data attributes, JavaScript variable values within script tags, or directly within their own script tags with the type attribute set to application/json.

By the way, data attributes and application/json scripts are the best place to put your JSON data. You'll see why in a later recipe.

Also remember, data attributes are part of the HTML context, so you'll need HTML escaping on top of your JSON escaping.

Now, thankfully, due to the JSON format, escaping JSON for XSS is very simple. Really, the biggest threat is breaking out of inline script tags using the 'ol trick of closing a script tag opening a new one and then adding the attack payload.

let user = { name: 'Jesse</script><script>alert();//' }

Therefore, you could either escape the forward slash or, like Rails and PHP do, escape the angle brackets with unicode where

/ becomes \/
< becomes \u003c
> becomes \u003e

Finally, like within JavaScript strings, line separators and paragraph separators can be used to break out and cause errors in your code, so you'll also escape

Actual newlines and other control characters are already covered with proper JSON encoding so there's no need for further escaping.

I want to make it very clear. The first step is to create valid JSON using a proper library or module, then do escaping.

In other words, don't just use this escaping directly on untrusted input and think it's safe.

For instance, if you have a variable in JavaScript equal to an untrusted string that's escaped only for JSON

let data = {{ escapeJSON(untrusted) }}

Then, trivially, an attacker can just add code directly.

let data = alert('uh oh')

Tag injections, or line and paragraph separators, aren't even needed.

So, AGAIN, it's ESSENTIAL that you're working with valid JSON first.

A-a-and the seventh, and last context, is CSS.

In short, inline styles are also vulnerable to breaking out with a closing tag.

The backslash character is used for escaping in CSS. So escape the escape first with

\ becoming \\

Then don't allow attackers to close the style tag

/ becomes \/

This is technically all you need to prevent XSS, because CSS won't run JavaScript. So an attacker's only tactic is to break out of the style tag.

Now, historically, CSS has been used in the past to run JavaScript and launch XSS attacks.

But good fortune has blessed the world with the death of I.–

NO! I won't speak ill of the dead!

But modern browsers around today keep a clear separation between scripts and styles. So, thankfully, XSS from CSS is no longer a problem with up-to-date browsers.

There are more escaping rules for CSS like these

form feed becomes \00000c
newline becomes \00000a
carriage return becomes \00000d

but CSS-based attacks, while interesting, are outside the scope of this course.

Whatever framework you use, you should have built-in escaping functions which you should use instead of writing your own.

However, you can, and probably should add to them, such as escaping space characters and single quotes for HTML attributes.

It's important to know exactly how your built-in functions work, so you can also know their limitations.

For instance, PHP's htmlspecialchars function to escape data for HTML contexts, doesn't escape single quotes by default. And you have to rely on passing a flag every time.

And you can improve situations like these with your own wrapper functions or modules that does the work for you.

Because someone will forget. So just let your code do the work for you, then you can forget proactively.