I use .net 3.5 with windows vista. I've been trying to download a certain web page in order to extract some data. I opened my browser (firefox), loaded the web page and selected "view page source" and finally saved it. I got the file somename.aspx. Then I wrote code that recognised certain strings in the code like "rgb (247, 10, 15)" and got the data that followed those strings. Then I wrote some code to download the web page from c# but the problem is that what I get an unformatted text that contains characters like \n or \r and those rgbs converted to "color:#d42d24" or something like that.
So the problem is that when I use firefox to get the web page the first line reads:Code:string file; Console.WriteLine ("Getting data from web page"); Uri webFile=new Uri ("http://www.druglist.gr/drugs.aspx?title=A"); HttpWebRequest request=(HttpWebRequest) WebRequest.Create (webFile); request.Method="GET"; WebResponse response=request.GetResponse (); StreamReader stream=new StreamReader (response.GetResponseStream(), Encoding.GetEncoding ("Utf-8")); file=stream.ReadToEnd (); stream.Close ();
but when I use C# and store the page in the string "file" I get this:Code:<!DOCTYPE HTML PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml">
If I use: Console.WriteLine(file) then I get the "correct" format that I want.Code:\r\n\r\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\r\n\r\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\r\n
But how can I convert the web page from one format to the other?




Reply With Quote