CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 5 of 5
  1. #1
    Join Date
    Aug 1999
    Location
    Germany
    Posts
    2,338

    String to long for Regex?

    Hello!

    I have a website loaded into a string which is 1262594 chars long. I want to do a RegExp-search on it to find all the links to a page like:

    Code:
    Pattern = "<a(.[^<>]*)href([ \\s]*)=([ \\s'\"]*?)(http://|https://)([^<>'\"\\?]*?)(example.com)([^'\"> ]*)(['\" ]*)(.*?)>(.*?)</a>";
    Regex myRegex = new Regex(Pattern, RegexOptions.IgnoreCase | RegexOptions.Singleline);
    Console.WriteLine("Done Regex");
    MatchCollection mc = myRegex.Matches(html);
    Console.WriteLine("Done Matches");
    if (mc.Count == 0) {
    Console.WriteLine("Done mc.Count");
    }
    Console.WriteLine("Done all");
    This works fine for shorter strings, but the program hangs-up itself using long string: The main-window just freezes, no exception called or anything else. I waited about 30 minutes and then killed the process.

    The output is:

    Done Regex
    Done Matches

    ... so it seems that the if (mc.Count == 0) crashes somehow.

    When setting a breakpoint at the if (mc.Count == 0) and look at mc.Count in the Auto-Watch-Window, I get:
    Count Function evaluation disabled because a previous function evaluation timed out. You must continue execution to reenable function evaluation. int
    Step a line further crashes the applicaiton as well.

    Any ideas about that?
    Last edited by martho; December 21st, 2009 at 08:58 AM.

  2. #2
    Join Date
    Oct 2008
    Location
    Cologne, Germany
    Posts
    756

    Re: String to long for Regex?

    you're using "." dots in your expression without escaping them. do you really mean to match any character there?
    win7 x86, VS 2008 & 2010, C++/CLI, C#, .NET 3.5 & 4.0, VB.NET, VBA... WPF is comming

    remeber to give feedback you think my response deserves recognition? perhaps you may want to click the Rate this post link/button and add to my reputation

    private lessons are not an option so please don't ask for help in private, I won't replay

    if you use Opera and you'd like to have the tab-button functionality for the texteditor take a look at my Opera Tab-UserScirpt; and if you know how to stop firefox from jumping to the next control when you hit tab let me know

  3. #3
    Join Date
    Aug 1999
    Location
    Germany
    Posts
    2,338

    Re: String to long for Regex?

    Thanks for your reply.

    Changing the pattern to (see the bold part):
    Code:
    Before:
    Pattern = "<a(.[^<>]*)href([ \\s]*)=([ \\s'\"]*?)(http://|https://)([^<>'\"\\?]*?)(example.com)([^'\"> ]*)(['\" ]*)(.*?)>(.*?)</a>";
    After:
    Pattern = "<a(.[^<>]*)href([ \\s]*)=([ \\s'\"]*?)(http://|https://)([^<>'\"\\?]*?)(example.com)([^'\"> ]*)(['\" ]*)(.[^>]*?)>(.*?)</a>";
    did the trick and it runs without problems again.

    If you see any other optimizations, I would be glad to know. I don't know if there is a way to optimize the last (.*?) before the </a> (because here every char matches, except a </a> is following).

  4. #4
    Join Date
    Oct 2008
    Location
    Cologne, Germany
    Posts
    756

    Re: String to long for Regex?

    this "([ \\s]*)" is actually the same as "([\\s]*)" becase \s already matches whitespace characters
    Last edited by memeloo; December 21st, 2009 at 09:37 AM.
    win7 x86, VS 2008 & 2010, C++/CLI, C#, .NET 3.5 & 4.0, VB.NET, VBA... WPF is comming

    remeber to give feedback you think my response deserves recognition? perhaps you may want to click the Rate this post link/button and add to my reputation

    private lessons are not an option so please don't ask for help in private, I won't replay

    if you use Opera and you'd like to have the tab-button functionality for the texteditor take a look at my Opera Tab-UserScirpt; and if you know how to stop firefox from jumping to the next control when you hit tab let me know

  5. #5
    Join Date
    Aug 1999
    Location
    Germany
    Posts
    2,338

    Re: String to long for Regex?

    You are right, thank you.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured