I wrote a C# program that uses WebRequest and WebResponse to perform a simple web crawler. I discovered something about web sites. Web browsers such as IE and FireFox offer the capacity to view the HTML source code. But it seems that html code that is sent to the browser is one thing and what the browser interprets and displays is something else. For example, if you run a google search in IE and run the same google search in FireFox, the content that you can see when you view the source in IE will NOT have the hyperlinks and content from the search results, but you can see the html hyperlinks and content from the search results when you view the source in FireFox. So my question is this. How do you specialise the WebRequest and WebResponse to show the content after it is processed by the browser instead of before?
One possible solution might be to use HttpWebRequest instead of WebRequest and use the UserAgent property to somehow trick C# into thinking I am using the Firefox browser. But this does not seem to me to plausable.
I also tried the WebBroswer control without setting it to visible, but I don't know the syntax to use.
The examples I have found seem to suggest that before I use the WebBrowser class it is required to assign it to a URL. Following that< can set the visitiblity to hidden. But wouldn't it be too late then?
The "Navagate" method does not seem to offer the capacity to load a URL hidden.
My first thought is that the Google server is detecting the browser type and serving markup tailored to the browser detected.
No, the issue is that the browser processes the html data for a bit after it gets a loaded message from the server. I wish I knew how to snag the html data during an idle time. Any ideas?
I have dumped the WebRequest and WebResponse and replaced it with a single WebBrowser class.
I have issues with the C# WPF WebBrowser class
I have a C# WPF application which uses a WebBrowser class. I have overwritten the LoadComplete method as well as several other methods to try to get the html content after the pages is loaded:
String URL = textBox1.Text;
URL = URL.Replace(' ', '+');
webbrowser1 = new WebBrowser();
webbrowser1.LoadCompleted += new LoadCompletedEventHandler(webbrowser1_LoadCompleted);
webbrowser1.Loaded += new RoutedEventHandler(webbrowser1_Loaded);
webbrowser1.Navigated += webbrowser1_Navigated;
int i = 0;
webbrowser1.Navigate(new Uri("https://www.google.com/#q=" + URL + ....
I have put breakpoints in these methods and I have written code to grap the Inner HTML from the HTMLDocument of the WebBrowser class and I have output the html text to different files for each method.
Then I have run the program. While the program runs, I watch the actual program to see if the document loads. The web page that is loading is the results page of a google search query.
Visually, the output is a blank, white page when eaqch method for LoadComplete, Loaded and Navagated is hit while I run through the code. It is only after the methods have been hit (some a couple of times) and the program is idle does the display show the results page.
The HTML code that is output to files does not represent the google search results page. Instead, it represents the google home page and does not have ny of the results. Any ideas? How can I programatically get the results page?