Click to See Complete Forum and Search --> : how to get the pdf files Links Programmatically ?


nomiitaly
January 31st, 2009, 03:40 AM
Hi !!!
i want to get the download link of pdf file programmatically ....
Website has an internal PDF Viewer tool
which shows the File , but in Browser Address Bar Url remains same
i have check the HTML script of page there are JavaScript Functions which fetch the values for links of different pdf pages available in that site.
any help will be appreciated

toraj58
January 31st, 2009, 08:15 AM
i wrote a HTML Parser program for you also having i tinny web browser beside it that i tried to write it somehow generic for you that get url and filetype(e.g php, pdf) and when you press get url it will navigate to the address provided in the textbox then when you press "get html tag" button it will show you all the links with the related extenssion or file type.

for parsing the HTML i used these classes:

1- HtmlDocument
2- HtmlElementCollection
3- HtmlElement

and some string manipulation that find desired links and add them to the list box.

Notice: my code may contain bugs and not work on some sites.
because there are many ways to navigate to an web address.

i have attched the source code for you.

also here is the code:

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;

namespace HTML_Parse
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}

private void button1_Click(object sender, EventArgs e)
{
webBrowser1.Navigate(textBox1.Text);
}

private void button2_Click(object sender, EventArgs e)
{
HtmlDocument doc = webBrowser1.Document;
HtmlElementCollection els;
els = doc.GetElementsByTagName("a");

listBox1.Items.Clear();
int cnt = 0;
foreach (HtmlElement el in els)
{
string str = el.GetAttribute("href");
//int checkValidity = str.IndexOf("http://www");
int lastDotIndex = str.LastIndexOf('.');
if (lastDotIndex != -1)
{
string prefix = str.Substring(lastDotIndex);

if (prefix == "."+textBox2.Text)
{
listBox1.Items.Add(str);
cnt++;
}
}
}

if (cnt == 0)
{
listBox1.Items.Add("No Link with such file type found!");
}
}

private void webBrowser1_ProgressChanged(object sender, WebBrowserProgressChangedEventArgs e)
{
progressBar1.Increment(1);
}

private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
progressBar1.Increment(progressBar1.Width);
}
}
}

nomiitaly
February 1st, 2009, 08:15 AM
Thanks a lots for sending me nice source file and code as well
But my problem is somehow different
Actually the URL from which i am trying to locate download link
it has no information about PDF FILES, rather there are JavaScript which is getting the Download Links for PDF files. This is the situation, and i have got tired in solving this problem

toraj58
February 1st, 2009, 10:09 AM
please give me the javascript and link to the site to see what can i do with it.

nomiitaly
February 2nd, 2009, 11:26 AM
So nice of you Toraj !!!
here is the link please check it and let me know that how i can capture the values
of Javascript from any web page

http://www.dnews.eu/3dissue/milano/pageflip.htm

Best wishes bro :)

toraj58
February 4th, 2009, 03:33 AM
hi

i saw the page it is neither javascript nor pdf.
it is flash.
now it is very easy for you to do the job.
you can download one of SWF catcher softwares that is free and good for saving any flash files.

let me know about the result.