#685 ✓invalid
dmitry.romanov (at gmail)

String.unescapeHTML does not handle HTML Entities in IE 7

Reported by dmitry.romanov (at gmail) | May 25th, 2009 @ 01:32 PM

Download the attached file and place prototype.js (I'm not attaching it) in the same folder.
Open testUnescapeHtmlEntities.html in IE 7 and Firefox.

The actual test code is:

alert('''.unescapeHTML());

In IE 7 it shows ' in the alert box (i.e. the HTML Entity is not unescaped), but in Firefox we see ' (apostrophe) -- correct result.

I tried both latest prototype versions: 1.6.0.3 and 1.6.1_rc2 with the same result in IE.

It seems the issue is caused by the following redefinitions of unescapeHTML:

lines 536-543 in 1.6.0.3:

 if (Prototype.Browser.WebKit || Prototype.Browser.IE) Object.extend(String.prototype, {
   escapeHTML: function() {

 return this.replace(/&/g,'&').replace(/</g,'<').replace(/>/g,'>');



}, unescapeHTML: function() {
 return this.stripTags().replace(/&/g,'&').replace(/</g,'<').replace(/>/g,'>');



} });

lines 669-678 in 1.6.1_rc2:

 if ('<\n>'.escapeHTML() !== '&lt;\n&gt;') {
   String.prototype.escapeHTML = function() {

 return this.replace(/&amp;/g,'&amp;amp;').replace(/&lt;/g,'&amp;lt;').replace(/&gt;/g,'&amp;gt;');



} }
if ('&lt;\n&gt;'.unescapeHTML() !== '<\n>') { String.prototype.unescapeHTML = function() {
 return this.stripTags().replace(/&amp;lt;/g,'&lt;').replace(/&amp;gt;/g,'&gt;').replace(/&amp;amp;/g,'&amp;');



}

If I just comment the above lines - IE starts to handle HTML entities properly.

Comments and changes to this ticket

  • Juriy Zaytsev

    Juriy Zaytsev May 26th, 2009 @ 04:21 AM

    • State changed from “new” to “invalid”

    Entities other than <, & and > are not guaranteed to be handled by unescapeHTML, so I'm closing this ticket as invalid.

    What you're seeing happens due to Prototype relying on non-standard (albeit, widely supported) innerHTML-based parsing in one group of browsers (those which - when assigned to an innerHTML of an arbitrary element - parse "<" and ">" into "<" and ">" respectively) and manual parsing in others (i.e. those which do not unescape such entities).

    The rationale behind using innerHTML is performance which is supposedly higher when escaping large chunks of text.

  • dmitry.romanov (at gmail)

    dmitry.romanov (at gmail) May 26th, 2009 @ 11:07 AM

    Ok, got it.
    Apparently this fact is worth mentioning in API docs.

    By the way, what is the reason of the following way of checking whether a browser supports innerHTML for HTML unescaping (this is a line from 1.6.1_rc2):

    '&lt;\n&gt;'.unescapeHTML() !== '<\n>'
    

    IE does not resolve this expression to true because &lt;\n&gt; goes to < > (i.e. a space in between instead of line break). However, in general IE works fine with innerHTML approach including handling of various HTML entities.
    Or am I missing something important?

  • dmitry.romanov (at gmail)

    dmitry.romanov (at gmail) May 26th, 2009 @ 11:08 AM

    Sorry, in my latest comment, the last paragraph should be read as:

    IE resolves this expression to true .....

  • Juriy Zaytsev

    Juriy Zaytsev May 26th, 2009 @ 06:30 PM

    There's a test for "\n" -> "\n" translation in unescapeHTML unit tests. There's no mention and no explanation of it anywhere in the docs or tests, so I have little idea what its purpose is.

Please Sign in or create a free account to add a new ticket.

With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.

New-ticket Create new ticket

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile ยป

The Prototype JavaScript library.

Shared Ticket Bins

Attachments

Pages