CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 5 of 5
  1. #1
    Join Date
    Sep 2009
    Posts
    3

    I need a little help with Regex

    Hello
    I want to do a thing that looks very easy to do but because it needs Regex, I can't manage to do it.
    I've got a HTML code in a string that contains links in this format:
    Code:
    <a href="/dir1/file">
    Note: number of directories is not always just 1, it can be more.
    and I want my program to change all occurences of links in a format like above to a format like that(it's just adding ".html"):
    Code:
    <a href="/dir1/file.html">
    It would be very easy to do if only String.Replace() allowed me to use wildcards. If it did, it would be probably as easy as that:
    Code:
    str = str.Replace("a href=\"*\">", "a href=\"*.html\">"
    I tried to use Regex, doing it this way:
    Code:
    str = Regex.Replace(str, "<a href=\"(?<link>[a-zA-Z0-9_/-])\"", "<a href=${link}" + ".html\"");
    but it doesn't work.


    Any help will be very appreciated.

  2. #2
    Join Date
    Jan 2002
    Location
    Scaro, UK
    Posts
    5,940

    Re: I need a little help with Regex

    Try this :

    Code:
    Regex regex = new Regex(@"\<a\s+href=""(?<link>(/\w+)+)""\s+/>");
    This matches the link to one or more occurrances of "forward slash" + "more than one instances of a word character".

    You should always replace spaces in matches with a space match (i.e. \s+) too.

    Darwen.
    www.pinvoker.com - PInvoker - the .NET PInvoke Interface Exporter for C++ Dlls.

  3. #3
    Join Date
    Oct 2008
    Location
    Singapore
    Posts
    195

    Re: I need a little help with Regex

    This works:

    Code:
                string input = "<a href=\"/dir1/file\">";
    			Regex pattern = new Regex("<a href=\"([/\\w+]*)\">");
    			Match m = pattern.Match(input);
    			if ( m.Success )
    			{
                    Console.WriteLine("New link is {0}", "<a href=\"" + pattern.Replace(input,m.Groups[1].Value + ".html\">"));
    			}

  4. #4
    Join Date
    Sep 2009
    Posts
    3

    Re: I need a little help with Regex

    Thank you very much for your help. I succeeded to edit Your regex(because it's basically the same except the * and + difference) to work with Regex.Replace(), it now looks like this:
    Code:
    str2 = Regex.Replace(str2, "<a href=\"(?<link>[/\\w-+]+)\">", "<a href=\"${link}.html\">");
    If I ever have any problems with Regex, I'll be sure to write here. Again, thank you very much.

    @Darwen
    No matter which space I replace with \s+, the IDE gives me a warning about unrecognized escape sentence.

  5. #5
    Join Date
    Jan 2002
    Location
    Scaro, UK
    Posts
    5,940

    Re: I need a little help with Regex

    No matter which space I replace with \s+, the IDE gives me a warning about unrecognized escape sentence.
    Not if you put an '@' at the front of the string which turns it into a literal string i.e. no escaping of \

    Darwen.
    www.pinvoker.com - PInvoker - the .NET PInvoke Interface Exporter for C++ Dlls.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured