#stackoverlog#php

PHP DOMDocument loadHTML not encoding UTF-8 correctly

When I handle HTML DOM by DOMDocument, I encountered to the issue that the HTML string was not saved correctly in UTF-8

$profile = "<div><p>Ký tự tiếng Việt</p></div>";
$dom = new DOMDocument();
$dom->loadHTML($profile); 

$divs = $dom->getElementsByTagName('div');

foreach ($divs as $div) {
    echo $dom->saveHTML($div);
}

Option 1:

To resolve the problem I used the following code snippet with <?xml encoding="utf-8" ?>:

$profile = '<div><p>Ký tự tiếng Việt</p></div>';
$dom = new DOMDocument();
$dom->loadHTML('<?xml encoding="utf-8" ?>' . $profile);
echo $dom->saveHTML();

Option 2:

Another option is to use PHP library simple_html_dom with portable-utf8 to fix the UTF-8 issue.

$profile = "<div><p>Ký tự tiếng Việt</p></div>";
$htmlDom = new HtmlDomParser($profile);

// use other methods to manupilate the dom follow the library document 
An Tran

An Tran

A passionate web developer, self-learner and music lover.

Read More