PHP DOMDocument loadHTML not encoding UTF-8 correctly
When I handle HTML DOM by DOMDocument, I encountered to the issue that the HTML string was not saved correctly in UTF-8
$profile = "<div><p>Ký tự tiếng Việt</p></div>";
$dom = new DOMDocument();
$dom->loadHTML($profile);
$divs = $dom->getElementsByTagName('div');
foreach ($divs as $div) {
echo $dom->saveHTML($div);
}
Option 1:
To resolve the problem I used the following code snippet with <?xml encoding="utf-8" ?>
:
$profile = '<div><p>Ký tự tiếng Việt</p></div>';
$dom = new DOMDocument();
$dom->loadHTML('<?xml encoding="utf-8" ?>' . $profile);
echo $dom->saveHTML();
Option 2:
Another option is to use PHP library simple_html_dom
with portable-utf8
to fix the UTF-8 issue.
$profile = "<div><p>Ký tự tiếng Việt</p></div>";
$htmlDom = new HtmlDomParser($profile);
// use other methods to manupilate the dom follow the library document