-
August 19th, 2009, 04:21 PM
#1
getting html source code with c++
I am writing a program that looks through source code of websites and pulls information out of it, but I don't know how to download the source codes so my file can parse them. I don't really want to use socket programing from the ground up, so does anyone have a library that is pretty well documented with Visual C++ that can do the work with a few simple functions.
Thanks
-
August 20th, 2009, 08:00 AM
#2
Re: getting html source code with c++
You always get source code, C++ doesn't change the html in any way. Download the library libcurl, from there it is very easy to download anything that you want.
Code:
std::string buffer;
CURL * curl = curl_easy_init();
curl_easy_setopt(curl, CURLOPT_URL, "http://www.google.com");
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, somecallback);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &buffer);
CURLcode result = curl_easy_perform(curl);
int somecallback(char * data, int size, int nmemb, std::string * buffer){
buffer += std::string(data);
return (size * nmemb);
}
Your string will now have the html data from google.com. You may also need to register an errorbuffer, I"m not sure if it's required.
-
August 20th, 2009, 09:21 AM
#3
Re: getting html source code with c++
Ok I am somewhat confused by how to use this. I have downloaded libcurl and opened up the project in VC++ IDE so it has all the headers, source files, etc. But now that I have the whole thing in a project, how do I know which functions of what you just showed me are in which header files (aka to know which ones I need to use #include in my program). Do I have to open them up and look through them all? I assume there's a better way.
I am just not sure what to include above the code you gave me to make it work, since I don't know where those functions are in the millions of headers included.
-
August 20th, 2009, 09:44 AM
#4
Re: getting html source code with c++
#include <curl/curl.h>
That's the only one that you need, you should have downloaded a precompiled library with it. Then you add libcurl.a to your linking parameters. You don't need them in your project at all, that's the point of a library.
-
August 20th, 2009, 03:36 PM
#5
Re: getting html source code with c++
so I found this site to help me compile the .dll and .lib file and implement it, but my version is slightly different and when I compile the .dll, the debug folder only has the .dll file and not the .lib file that apparently I need for implementation. I'm not sure how to get this file, so could someone show me how to make the .lib file
(site i've been looking at for instructions http://curl.haxx.se/libcurl/c/visual_studio.pdf)
-
August 21st, 2009, 10:30 AM
#6
Re: getting html source code with c++
There is a precompiled version of libcurl on their website. I don't know how to compile the lib file in VS, I only know the GNU compiler. Are you sure it's not there, it may have a .a extension instead.
I think this download has a precompiled library with it: http://www.gknw.net/mirror/curl/win3...el-mingw32.zip
Last edited by ninja9578; August 21st, 2009 at 10:37 AM.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|