Click to See Complete Forum and Search --> : I need a little help with Regex


danw8
December 20th, 2009, 01:39 PM
Hello
I want to do a thing that looks very easy to do but because it needs Regex, I can't manage to do it.
I've got a HTML code in a string that contains links in this format:
<a href="/dir1/file">
Note: number of directories is not always just 1, it can be more.
and I want my program to change all occurences of links in a format like above to a format like that(it's just adding ".html"):
<a href="/dir1/file.html">
It would be very easy to do if only String.Replace() allowed me to use wildcards. If it did, it would be probably as easy as that:
str = str.Replace("a href=\"*\">", "a href=\"*.html\">"

I tried to use Regex, doing it this way:
str = Regex.Replace(str, "<a href=\"(?<link>[a-zA-Z0-9_/-])\"", "<a href=${link}" + ".html\"");
but it doesn't work.


Any help will be very appreciated.

darwen
December 21st, 2009, 02:47 AM
Try this :


Regex regex = new Regex(@"\<a\s+href=""(?<link>(/\w+)+)""\s+/>");


This matches the link to one or more occurrances of "forward slash" + "more than one instances of a word character".

You should always replace spaces in matches with a space match (i.e. \s+) too.

Darwen.

rohshall
December 21st, 2009, 04:58 AM
This works:


string input = "<a href=\"/dir1/file\">";
Regex pattern = new Regex("<a href=\"([/\\w+]*)\">");
Match m = pattern.Match(input);
if ( m.Success )
{
Console.WriteLine("New link is {0}", "<a href=\"" + pattern.Replace(input,m.Groups[1].Value + ".html\">"));
}

danw8
December 21st, 2009, 10:36 AM
Thank you very much for your help. I succeeded to edit Your regex(because it's basically the same except the * and + difference) to work with Regex.Replace(), it now looks like this:

str2 = Regex.Replace(str2, "<a href=\"(?<link>[/\\w-+]+)\">", "<a href=\"${link}.html\">");

If I ever have any problems with Regex, I'll be sure to write here. Again, thank you very much.

@Darwen
No matter which space I replace with \s+, the IDE gives me a warning about unrecognized escape sentence.

darwen
December 21st, 2009, 04:14 PM
No matter which space I replace with \s+, the IDE gives me a warning about unrecognized escape sentence.


Not if you put an '@' at the front of the string which turns it into a literal string i.e. no escaping of \

Darwen.