Talk:Numeric character reference

Latest comment: 2 years ago by 61.5.147.169 in topic [email protected]

Numeric character conversor

edit

Perl

edit

For usual needs, there are a "1 line code" conversor for Perl:

while (<STDIN>) {
       s/(.)/(ord($1)>127)? ('&#'.ord($1).';'): $1/ge;
       print $_;
}

(use %perl code.pl < fileIn.txt > fileOut.txt)

It converts unicode or ISO Latim to XML-compatible ASCII.

JavaScript

edit
function unicode_to_ncr(text){
  var ncr_text = ""
  var text_length = text.length
  for(var index = 0; index < text_length; index++) {
     var character = text.charAt(index)
     var ncr_character = character.charCodeAt(0)
     if(ncr_character < 128) {
        ncr_text += character
     }
     else {
        ncr_text += "&#"+ncr_character+";"
     }
  }
  return ncr_text
}

It, also, converts unicode or ISO Latin to XML-compatible ASCII.

Terminology?

edit

The nomenclature used in this article is not the same as the basic SGML one. SGML has two proper names, "character reference", which is the numeric character reference described here, and "entity reference", which is a macro resolving to any sequence of characters.

The list of entity references used in HTML all resolve to exactly one character. But that doesn't make them special cases, as the phrase character entity reference implies; they just all happen to be one-character strings. Pim 2 (talk) 11:26, 11 December 2011 (UTC)Reply

[email protected]

edit

hahaha 61.5.147.169 (talk) 18:32, 22 July 2022 (UTC)Reply