-
October 22nd, 2009, 07:46 AM
#1
IID_IHTMLDocument2 get_innerText no line breaks
Im downloading webpages via CInternetSession
Im then passing them into a IHTMLDocument2 instance
see this article
http://www.codeproject.com/KB/IP/parse_html.aspx
I then get the body element and get the text from here
MSHTML::IHTMLElementPtr body_element;
hr = pDoc->get_body(&body_element);
BSTR bstr;
hr = body_element->get_outerText(&bstr);
For many pages downloaded the text returned includes line breaks where you would expect to see them if displayed in a browser.
But for some pages all the text is returned in a single line. Anybody got an idea why this would happen?
Some additional points:
If I take the source from the browser control GetSource() for the same pages and pass to same routine above then is broken down into lines correctly?
Tags for this Thread
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|