As you may know, one of the best ways to protect your application from cross-site scripting is to use proper output encoding when displaying user editable data on the screen. Here, I attempt to show that although the above statement sounds simple enough, it can actually be a very daunting task.
When it comes to output encoding in a web application, there are two main contexts that the developer needs to be aware of. These are the HTML context and the JavaScript context. These contexts are special in that the encoding scheme used for one does not apply to the other. As such, care should be taken to ensure that the proper scheme is applied correctly on every page.
So, for the sake of argument, let us say that we are doing an assessment of a Java-based web application and discover that the developer has written code similar to the following:
<%@ page language="java" contentType="text/html; charset=UTF-8"
pageEncoding="UTF-8"%>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type"
content="text/html; charset=UTF-8">
<title>Test Encode</title>
<script>
var param = <%= request.getParameter("js") %>;
alert(param);
</script>
</head>
<body>
There is JavaScript on this page.
</body>
</html>
Obviously, this example code is vulnerable to cross-scripting. So, in our report we point this out, and afterwards the developer comes back with the following change to the code:
...
<script>
var param = <%= StringEscapeUtils.escapeJavaScript(
request.getParameter("js")) %>;
alert(param);
</script>
...
Upon inspection, it appears as though the developer has done the right thing and found a widely distributed library that will do the output encoding for him. This is usually always better than trying to write a library/function yourself unless you REALLY understand the subject matter. In this case, the develop chose the Apache Commons Lang - StringEscapeUtils class. This appears to be a good choice as it provides methods for both of the contexts mentioned above, namely escapeHtml and escapeJavaScript. There is only one problem, the escapeJavaScript method has a very poor implementation. Why? Well, the code comments say it all in this case:
/**
* ...
* <p>The only difference between Java strings and JavaScript
* strings is that in JavaScript, a single quote must be
* escaped.</p>
* ...
*/
What?! Maybe if you are parsing, but in no way is that statement true when it comes to encoding. According to the JavaScript reference, encoding in a JavaScript context should escape all characters with an ASCII value less than 256 in the format of \xHH where HH is the hex representation of the ASCII value of the character. For example, the null character, ASCII value 0, becomes \x00, the new line character, ASCII value 10, becomes \x10, etc. So where does StringEscapeUtils go wrong? Lets look at the code.
The definition of the escapeJavaScript function is defined as follows:
public static String JavaDoc escapeJava(String str) {
return escapeJavaStyleString(str, false);
}After more looking, we come to the function in question:
private static void escapeJavaStyleString(Writer out,
String str, boolean escapeSingleQuote)
throws IOException {
if (out == null) {
throw new IllegalArgumentException("The Writer must not be
null");
}
if (str == null) {
return;
}
int sz;
sz = str.length();
for (int i = 0; i < sz; i++) {
char ch = str.charAt(i);
// handle unicode
if (ch > 0xfff) { out.write("\\u" + hex(ch)); }
else if (ch > 0xff) { out.write("\\u0" + hex(ch)); }
else if (ch > 0x7f) { out.write("\\u00" + hex(ch));}
else if (ch < 32) {
switch (ch) {
case '\b': out.write('\\'); out.write('b');
break;
case '\n': out.write('\\'); out.write('n');
break;
case '\t': out.write('\\'); out.write('t');
break;
case '\f': out.write('\\'); out.write('f');
break;
case '\r': out.write('\\'); out.write('r');
break;
default :
if (ch > 0xf) { out.write("\\u00" + hex(ch)); }
else { out.write("\\u000" + hex(ch)); }
break;
}
}
else {
switch (ch) {
case '\'': if (escapeSingleQuote) {
out.write('\\');
}
out.write('\'');
break;
case '"': out.write('\\'); out.write('"');
break;
case '\\': out.write('\\'); out.write('\\');
break;
default : out.write(ch);
break;
}
}
}
}
If you look closely at this code, you will notice that it fails to follow the specification for JavaScript encoding. This code is only worried about the ' (single quote), " (double quote), and / (slash) as well as a few of the control characters. The problem is they fail to consider characters that have special meaning in JavaScript such as ; (semi-colon), () (parenthesis), etc.
So, what does this mean? A call to the application with the js parameter set to 1; alert(String.fromCharCode(88,83,83)) results in the following code being generated on the client:
...
<script>
var param = 1; alert(String.fromCharCode(88,83,83));
alert(param);
</script>
...
Now, there are two alerts happening instead of one. Change this attack from alert(something) to eval(something really bad) and you still face the same problem.
So what is the solution? Don't use StringEscapeUtils.escapeJavaScript() and hope to be safe from XSS in the JavaScript context. Use something more robust (and implemented correctly) like the OWASP Reform project or the OWASP ESAPI.