PHP DOMDocument loadHTML not encoding UTF-8 correctly

When I handle HTML DOM by DOMDocument, I encountered to the issue that the HTML string was not saved correctly in UTF-8

$profile = "<div><p>Ký tự tiếng Việt</p></div>";
$dom = new DOMDocument();

$divs = $dom->getElementsByTagName('div');

foreach ($divs as $div) {
    echo $dom->saveHTML($div);

Option 1:

To resolve the problem I used the following code snippet with <?xml encoding="utf-8" ?>:

$profile = '<div><p>Ký tự tiếng Việt</p></div>';
$dom = new DOMDocument();
$dom->loadHTML('<?xml encoding="utf-8" ?>' . $profile);
echo $dom->saveHTML();

Option 2:

Another option is to use PHP library simple_html_dom with portable-utf8 to fix the UTF-8 issue.

$profile = "<div><p>Ký tự tiếng Việt</p></div>";
$htmlDom = new HtmlDomParser($profile);

// use other methods to manupilate the dom follow the library document 
Truong An

Truong An

Self-learner, passionate software engineer from Vietnam

Read More