IID_IHTMLDocument2 get_innerText no line breaks
Im downloading webpages via CInternetSession
Im then passing them into a IHTMLDocument2 instance
see this article
http://www.codeproject.com/KB/IP/parse_html.aspx
I then get the body element and get the text from here
MSHTML::IHTMLElementPtr body_element;
hr = pDoc->get_body(&body_element);
BSTR bstr;
hr = body_element->get_outerText(&bstr);
For many pages downloaded the text returned includes line breaks where you would expect to see them if displayed in a browser.
But for some pages all the text is returned in a single line. Anybody got an idea why this would happen?
Some additional points:
If I take the source from the browser control GetSource() for the same pages and pass to same routine above then is broken down into lines correctly?