CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 15 of 15
  1. #1
    Join Date
    May 1999
    Posts
    11

    how to detect URLs in a string?

    i encountered problems in detecting URL from a given string, i need to know the url's location in the string. i programmed a little program but it can't fulfill my demands.

    for example, there is a string like "example: www.codeguru.com and www.microsoft.com/msn, two URLs"

    then the program should be able to detect the "www.codeguru.com" and the "www.microsoft.com/msn". can anyone give me some hint? thanks a lot.


  2. #2
    Join Date
    May 1999
    Location
    UK
    Posts
    65

    Re: how to detect URLs in a string?

    Try using a known delimiter in you string like putting quotes or commas around each URL.

    Jason
    http://www.netcomuk.co.uk/~jbrooks


  3. #3
    Join Date
    May 1999
    Posts
    11

    Re: how to detect URLs in a string?

    indeed, these strings are acquired from other data sources, not composed by me.
    there may be no delimiter in these strings, and the program should be able to detect the URLs in the string.


  4. #4
    Join Date
    Apr 1999
    Posts
    396

    Re: how to detect URLs in a string?

    What is this, for school? Just search for "www" and if you're really brave, even check if it ends in ".com" ".edu" etc.


  5. #5
    Join Date
    May 1999
    Posts
    11

    Re: how to detect URLs in a string?

    my god, there are thousands of combinations. if the string looks like "given URL is:enjoy.za.net, mail to me[email protected]",
    it's hard to know there is "http://enjoy.za.net" and "mailto[email protected]" in the string.
    but what i want is just this.


  6. #6

    Re: how to detect URLs in a string?

    You are trying to do things very hard.
    www.codeguru.com is not a URL, neither www.microsoft.com/msn is a URL.

    The URLs are:
    http://www.codeguru.com
    and
    http://www.microsoft.com/msn

    There is a RFC stating how could URLs look like.
    http://sunsite.auc.dk/RFC/rfc/rfc1738.html

    Here are some sites where you could search for other RFCs.
    http://www.ietf.org/1id-abstracts.html
    http://www.globecom.net/(nocl,sv)/ietf/index.shtml
    http://sunsite.auc.dk/RFC/

    I hope this helps.

  7. #7
    Join Date
    May 1999
    Posts
    11

    Re: how to detect URLs in a string?

    yes, i know what u mean, and i do know exactly the definition and the standard form of URLs.

    now, i'm programming a telnet client, sometimes people post their articles including some urls, but they often failed to add "http://" or "mailto:" or something else to these urls.

    i want my program can recognise all the urls without the standard form, so that when click on the kinds urls, it can launch the according program to process these urls.

    that's why i ask the question.


  8. #8
    Join Date
    May 1999
    Location
    UK
    Posts
    65

    Re: how to detect URLs in a string?

    Then I'm afraid, your going to have to resort to some clever programming on your part. If your not pulling in from standard notation. And I suspect it's something like bulk mailer type programs, then your going to have to "go to it"!

    Good luck


  9. #9
    Join Date
    May 1999
    Posts
    2

    Re: how to detect URLs in a string?

    I don't know if it'll be any use, but you could have a look at the MFC helper function AfxParseURL, also
    the SDK functions InternetCanonicalizeUrl and InternetCrackURL.

    Matt Cawley



  10. #10
    Join Date
    May 1999
    Location
    nz
    Posts
    96

    Re: how to detect URLs in a string?

    I agree 100% with your reply Todd.
    Only thing is you have to remember the international " Suffix's " as well
    eg .ad .af .ag .ai .am .an .ao .aq .ar .as .at .au .aw and .az and thats just the a's :-)

    <FontSize = 5 Color = "red"> At the Mount </Font>

  11. #11
    Join Date
    May 1999
    Location
    Sydney, Australia
    Posts
    420

    Re: how to detect URLs in a string?

    1) URL: Any 'word' that has a character followed by a dot followed by another character is part of a URL, so www.codeguru.com satisfies my condition, but www. codeguru. com does not, and the dot at the end of this sentence is followed by a space before the next sentecnce, so that doesn't make a URL. Easy?

    2) Email: Any 'word' that has a character followed by a at-symbol followed by another character is part of an email address, so [email protected] is an email address, but 12 pants @ $40 each does not fit the criteria

    This two 'alogritms should work

    Sally


  12. #12
    Join Date
    May 1999
    Posts
    11

    Re: how to detect URLs in a string?

    yes, we can use this method to detect urls, but let's think of some special situations, for example,
    we know "http://www.codeguru.com:80/i.e c/visual c++" is a URL, if it appears in
    a string like: "a example string: www.codeguru.com:80/i.e c/visual c++, some other part", it maybe quite difficult to detect the url.



  13. #13
    Join Date
    May 1999
    Location
    Sydney, Australia
    Posts
    420

    Re: how to detect URLs in a string?

    you said it:

    it maybe quite difficult to detect the url.

    and that's the answer to this thread because you are trying to detect a pattern in a text wherer there is no pattern.

    Force the users to use http:// etc, and once they realise that there URLs aren't detected, they'' start using the correct notation. mrhpf, maybe I have been using Windows and Microsoft programs for too long, hihi

    Sally


Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured