dcsimg
CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 2 of 2
  1. #1
    Join Date
    May 2010
    Posts
    13

    generate regular expressions from a list of strings

    I am reading something from regex format, expands it and writing it out. This list can become huge while writing it out.

    While writing it out, I do not have the original regex data. So, I will have to create regexes out of the strings which I have.

    A couple of cases while reading and writing:

    Say, read regular expression is:

    abc/*
    Since 'abc' can have only 'A', 'B', 'C', 'D'(Have this list with me), Above would be translated to list of strings as

    "abc/A", "abc/B", "abc/C", "abc/D" -- 1
    Say, another read regular expression is:

    def/*/A
    Since 'def' can have only 'x', 'y', 'x'(Have this list with me), Above would be translated to list of strings as

    "def/x/A", "def/x/A", "def/x/A" -- 2
    I have already said that I do not have original regular expressions now. All I have is list of strings. I will have to create regexes out of statements number 1 and 2.

    From number 1, I should get

    abc/*
    From number 2, I should get

    def/*/A
    which are the original.

    Question: Which data structure would be efficient to solve this problem. I have thought of using tries & Aho–Corasick algorithm but could not get a clear solution at top of my head till now.

    I would be happy to expand the question more in case it is not clear. Please consider that * will not match /, //, or anything except characters.

  2. #2
    Join Date
    Feb 2017
    Posts
    273

    Re: generate regular expressions from a list of strings

    Quote Originally Posted by hemant.bhargava7 View Post
    Question: Which data structure would be efficient to solve this problem. I have thought of using tries & Aho–Corasick algorithm but could not get a clear solution at top of my head till now.

    I would be happy to expand the question more in case it is not clear. Please consider that * will not match /, //, or anything except characters.
    I suppose one solution strategy is to first turn the list of strings into a finite automation (able to recognize those strings in a text). The Aho–Corasick algorithm you mention will do this by building a Trie complemented with so called failure links,

    http://citeseerx.ist.psu.edu/viewdoc...=rep1&type=pdf

    The second step then will be to transform the finite automation into a regular expression. This seems to be quite a messy problem though,

    https://cs.stackexchange.com/questio...ar-expressions
    Last edited by wolle; April 25th, 2018 at 09:00 AM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Windows Mobile Development Center


Click Here to Expand Forum to Full Width




On-Demand Webinars (sponsored)