CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 6 of 6
  1. #1
    Join Date
    Jan 2009
    Posts
    20

    how to get the pdf files Links Programmatically ?

    Hi !!!
    i want to get the download link of pdf file programmatically ....
    Website has an internal PDF Viewer tool
    which shows the File , but in Browser Address Bar Url remains same
    i have check the HTML script of page there are JavaScript Functions which fetch the values for links of different pdf pages available in that site.
    any help will be appreciated

  2. #2
    Join Date
    Mar 2008
    Location
    IRAN
    Posts
    811

    Lightbulb Re: how to get the pdf files Links Programmatically ?

    i wrote a HTML Parser program for you also having i tinny web browser beside it that i tried to write it somehow generic for you that get url and filetype(e.g php, pdf) and when you press get url it will navigate to the address provided in the textbox then when you press "get html tag" button it will show you all the links with the related extenssion or file type.

    for parsing the HTML i used these classes:

    1- HtmlDocument
    2- HtmlElementCollection
    3- HtmlElement

    and some string manipulation that find desired links and add them to the list box.

    Notice: my code may contain bugs and not work on some sites.
    because there are many ways to navigate to an web address.

    i have attched the source code for you.

    also here is the code:
    Code:
    using System;
    using System.Collections.Generic;
    using System.ComponentModel;
    using System.Data;
    using System.Drawing;
    using System.Text;
    using System.Windows.Forms;
    
    namespace HTML_Parse
    {
        public partial class Form1 : Form
        {
            public Form1()
            {
                InitializeComponent();
            }
    
            private void button1_Click(object sender, EventArgs e)
            {
                webBrowser1.Navigate(textBox1.Text);
            }
    
            private void button2_Click(object sender, EventArgs e)
            {
                HtmlDocument doc = webBrowser1.Document;
                HtmlElementCollection els;
                els = doc.GetElementsByTagName("a");
    
                listBox1.Items.Clear();
                int cnt = 0;
                foreach (HtmlElement el in els)
                {
                    string str = el.GetAttribute("href");
                    //int checkValidity = str.IndexOf("http://www");
                    int lastDotIndex = str.LastIndexOf('.');
                    if (lastDotIndex != -1)
                    {
                        string prefix = str.Substring(lastDotIndex);
    
                        if (prefix == "."+textBox2.Text)
                        {
                            listBox1.Items.Add(str);
                            cnt++;
                        }
                    }
                }
    
                if (cnt == 0)
                {
                    listBox1.Items.Add("No Link with such file type found!");
                }
            }
    
            private void webBrowser1_ProgressChanged(object sender, WebBrowserProgressChangedEventArgs e)
            {
                progressBar1.Increment(1);
            }
    
            private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
            {
                progressBar1.Increment(progressBar1.Width);
            }
        }
    }
    Attached Files Attached Files
    Please rate my post if it was helpful for you.
    Java, C#, C++, PHP, ASP.NET
    SQL Server, MySQL
    DirectX
    MATH
    Touraj Ebrahimi
    [toraj_e] [at] [yahoo] [dot] [com]

  3. #3
    Join Date
    Jan 2009
    Posts
    20

    Re: how to get the pdf files Links Programmatically ?

    Thanks a lots for sending me nice source file and code as well
    But my problem is somehow different
    Actually the URL from which i am trying to locate download link
    it has no information about PDF FILES, rather there are JavaScript which is getting the Download Links for PDF files. This is the situation, and i have got tired in solving this problem

  4. #4
    Join Date
    Mar 2008
    Location
    IRAN
    Posts
    811

    Re: how to get the pdf files Links Programmatically ?

    please give me the javascript and link to the site to see what can i do with it.
    Please rate my post if it was helpful for you.
    Java, C#, C++, PHP, ASP.NET
    SQL Server, MySQL
    DirectX
    MATH
    Touraj Ebrahimi
    [toraj_e] [at] [yahoo] [dot] [com]

  5. #5
    Join Date
    Jan 2009
    Posts
    20

    Re: how to get the pdf files Links Programmatically ?

    So nice of you Toraj !!!
    here is the link please check it and let me know that how i can capture the values
    of Javascript from any web page

    http://www.dnews.eu/3dissue/milano/pageflip.htm

    Best wishes bro

  6. #6
    Join Date
    Mar 2008
    Location
    IRAN
    Posts
    811

    Re: how to get the pdf files Links Programmatically ?

    hi

    i saw the page it is neither javascript nor pdf.
    it is flash.
    now it is very easy for you to do the job.
    you can download one of SWF catcher softwares that is free and good for saving any flash files.

    let me know about the result.
    Please rate my post if it was helpful for you.
    Java, C#, C++, PHP, ASP.NET
    SQL Server, MySQL
    DirectX
    MATH
    Touraj Ebrahimi
    [toraj_e] [at] [yahoo] [dot] [com]

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured