mb_decode_numericentity

(PHP 4 >= 4.0.6, PHP 5, PHP 7)

mb_decode_numericentity — Decode HTML numeric string reference to character

Description

mb_decode_numericentity ( string $str , array $convmap [, string $encoding = mb_internal_encoding() [, bool $is_hex = FALSE ]] ) : string

Convert numeric string reference of string str in a specified block to character.

Parameters

str: The string being decoded.
convmap: convmap is an array that specifies the code area to convert.
encoding: The encoding parameter is the character encoding. If it is omitted, the internal character encoding value will be used.
is_hex: This parameter is not used.

Return Values

The converted string.

Changelog

Version	Description
5.4.0	Added `is_hex` parameter.

Examples

Example #1 convmap example


<?php
$convmap = array (
   int start_code1, int end_code1, int offset1, int mask1,
   int start_code2, int end_code2, int offset2, int mask2,
   ........
   int start_codeN, int end_codeN, int offsetN, int maskN );
// Specify Unicode value for start_codeN and end_codeN
// Add offsetN to value and take bit-wise 'AND' with maskN, 
// then convert value to numeric string reference.
?>

Example #2 convmap example escapes JavaScript string


<?php
function escape_javascript_string($str) {
  $map = [
          1,1,1,1,1,1,1,1,1,1,
          1,1,1,1,1,1,1,1,1,1,
          1,1,1,1,1,1,1,1,1,1,
          1,1,1,1,1,1,1,1,1,1,
          1,1,1,1,1,1,1,1,0,0, // 49
          0,0,0,0,0,0,0,0,1,1,
          1,1,1,1,1,0,0,0,0,0,
          0,0,0,0,0,0,0,0,0,0,
          0,0,0,0,0,0,0,0,0,0,
          0,1,1,1,1,1,1,0,0,0, // 99
          0,0,0,0,0,0,0,0,0,0,
          0,0,0,0,0,0,0,0,0,0,
          0,0,0,1,1,1,1,1,1,1,
          1,1,1,1,1,1,1,1,1,1,
          1,1,1,1,1,1,1,1,1,1, // 149
          1,1,1,1,1,1,1,1,1,1,
          1,1,1,1,1,1,1,1,1,1,
          1,1,1,1,1,1,1,1,1,1,
          1,1,1,1,1,1,1,1,1,1,
          1,1,1,1,1,1,1,1,1,1, // 199
          1,1,1,1,1,1,1,1,1,1,
          1,1,1,1,1,1,1,1,1,1,
          1,1,1,1,1,1,1,1,1,1,
          1,1,1,1,1,1,1,1,1,1,
          1,1,1,1,1,1,1,1,1,1, // 249
          1,1,1,1,1,1,1, // 255
          ];
  // Char encoding is UTF-8
  $mblen = mb_strlen($str, 'UTF-8');
  $utf32 = bin2hex(mb_convert_encoding($str, 'UTF-32', 'UTF-8'));
  for ($i=0, $encoded=''; $i < $mblen; $i++) {
      $u = substr($utf32, $i*8, 8);
      $v = base_convert($u, 16, 10);
      if ($v < 256 && $map[$v]) {
        $encoded .= '\\x'.substr($u, 6,2);
      } else if ($v == 2028) {
        $encoded .= '\\u2028';
      } else if ($v == 2029) {
        $encoded .= '\\u2029';
      } else {
        $encoded .= mb_convert_encoding(hex2bin($u), 'UTF-8', 'UTF-32');
      }
   }
   return $encoded;
}
 
// Test data
$convmap = [ 0x0, 0xffff, 0, 0xffff ];
$msg = '';
for ($i=0; $i < 1000; $i++) {
  // chr() cannot generate correct UTF-8 data larger value than 128, use mb_decode_numericentity().
  $msg .= mb_decode_numericentity('&#'.$i.';', $convmap, 'UTF-8');
}
 
// var_dump($msg);
var_dump(escape_javascript_string($msg));

User Contributed Notes 6 notes

down

donovan at conduit it ¶

13 years ago


note that at this time it seems that mb_decode_numericentity() only works with decimal entities and not hexadecimal entities.  This fact would have saved me a good hour of time in debugging.

For those who need to convert hex entities try first converting them all to decimal entities with a combination of the preg_replace() and hexdec() functions.

down

dev at glossword info ¶

15 years ago


Just two great functions for daily use:

/* Converts any HTML-entities into characters */
function my_numeric2character($t)
{
    $convmap = array(0x0, 0x2FFFF, 0, 0xFFFF);
    return mb_decode_numericentity($t, $convmap, 'UTF-8');
}
/* Converts any characters into HTML-entities */
function my_character2numeric($t)
{
    $convmap = array(0x0, 0x2FFFF, 0, 0xFFFF);
    return mb_encode_numericentity($t, $convmap, 'UTF-8');
}
print my_numeric2character('&#8217; &#7936; &#226;');
print my_character2numeric(' ’ â ');

down

-1

php at cNhOiSpPpAlMe dot org ¶

15 years ago


Here are functions to convert hankaku to zenkaku characters (and vice-versa) in Japanese text.

<?php

// Supported characters:
//    (space)
//     !#$%&()*+,./0123456789:;<=>?@
//    ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`
//    abcdefghijklmnopqrstuvwxyz{|}
// (Katakana isn't supported.)

function f_han2zen ($string,$encoding = null) {
  if (is_null($encoding)) $encoding = mb_internal_encoding();
  $convmap = array(
     0x20,0x20,0x3000-0x20,0xffff,   // Space
     0x21,0x7e,0xff01-0x21,0xffff);
  $temp = mb_encode_numericentity($string,$convmap,$encoding);
  $convmap = array(0,0xffff,0,0xffff);
  return mb_decode_numericentity($temp,$convmap,$encoding);
}
function f_zen2han ($string,$encoding = null) {
  if (is_null($encoding)) $encoding = mb_internal_encoding();
  $convmap = array(
     0x3000,0x3000,-(0x3000-0x20),0xffff,   // Space
     0xff01,0xff5e,-(0xff01-0x21),0xffff);
  $temp = mb_encode_numericentity($string,$convmap,$encoding);
  $convmap = array(0,0xffff,0,0xffff);
  return mb_decode_numericentity($temp,$convmap,$encoding);
}

// Sample usage:
f_han2zen("test","shift_jis");
f_han2zen("test","utf-8");

?>

down

-1

Andrew Simpson ¶

14 years ago


Many web browsers will tend upload high order characters as UTF-8 encoded entities. 

Here is some simple code to convert UTF-8 HTML entities within a block of text into proper characters:

<?php
   //decode decimal HTML entities added by web browser
  $body = preg_replace('/&#\d{2,5};/ue', "utf8_entity_decode('$0')", $body );
  //decode hex HTML entities added by web browser
  $body = preg_replace('/&#x([a-fA-F0-7]{2,8});/ue', "utf8_entity_decode('&#'.hexdec('$1').';')", $body );

//callback function for the regex
function utf8_entity_decode($entity){
$convmap = array(0x0, 0x10000, 0, 0xfffff);
 return mb_decode_numericentity($entity, $convmap, 'UTF-8');
}
?>

down

-3

Navi ¶

10 years ago


Manual entity => utf8 conversion:

<?php

        // parse entities

        $raw = preg_replace_callback

        (

            "/&#(\\d+);/u",

            "_pcreEntityToUtf",

            $raw

        );


    function _pcreEntityToUtf($matches)

    {

        $char = intval(is_array($matches) ? $matches[1] : $matches);


        if ($char < 0x80)

        {

            // to prevent insertion of control characters

            if ($char >= 0x20) return htmlspecialchars(chr($char));

            else return "&#$char;";

        }

        else if ($char < 0x8000)

        {

            return chr(0xc0 | (0x1f & ($char >> 6))) . chr(0x80 | (0x3f & $char));

        }

        else

        {

            return chr(0xe0 | (0x0f & ($char >> 12))) . chr(0x80 | (0x3f & ($char >> 6))). chr(0x80 | (0x3f & $char));

        }

    }

?>

down

-4

dirk at camindo de ¶

14 years ago


By use of function utf8_decode you'll get a problem with all extended chars above ISO-8859-1 charset. You can solve this problem by using the 

function mb_encode_numericentity before:

  // convert $text from UTF-8 to ISO-8859-1
  $convmap = array(0xFF, 0x2FFFF, 0, 0xFFFF);
  $text = mb_encode_numericentity($text, $convmap, "UTF-8");
  $text = utf8_decode($text);

The second line encodes all extended chars below 0xFF, the third line converts the rest: 0x80 - 0xFF

add a note

mb_decode_numericentity

Description

Parameters

Return Values

Changelog

Examples

See Also

User Contributed Notes 6 notes