CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 3 of 3
  1. #1
    Join Date
    Jun 2008
    Posts
    154

    Cleaning up code with Regular Expressions

    I am terrible at knowing regular expressions very well so mostly I find examples then try to use them, however my code usually ends up being really ugly even though it works.

    Code:
                    string classifications = LineNode.InnerText;
                    //Makes This_TypeOfStringIsLong > This_Type Of String Is Long
                    string spaced = Regex.Replace(classifications, @"([a-z])([A-Z])", @"$1 $2", RegexOptions.None);
                    //Makes This_Type Of String Is Long > This_Type,Of,String,Is,Long
                    string modded = spaced.Replace(" ", ",");
                    //Makes This_Type,Of,String,Is,Long > This Type,Of,String,Is,Long
                    string again = modded.Replace("_", " ");
                    string[] LineArray = again.Split(',');
    Is there a simple shorter way of doing this?

    we begin with a string like "This_TypeOfStringIsLong" has to be parsed out by each Capital Word, then if there is a _ make a space, but its original word has to be treated as one

    example Blue_SkyDarkNightOwl > Blue Sky, Dark, Night, Owl

  2. #2
    Join Date
    Oct 2005
    Location
    Seattle, WA U.S.A.
    Posts
    353

    Re: Cleaning up code with Regular Expressions

    Hi Bix.

    Certainly I'm not any better than you with regular expressions, and almost certainly not AS good as you, but it did occur to me that one step in the process might be eliminated by combining two steps into a single statement.

    In the original code,
    step 1 inserts a space anywhere in the line where a lower-case char is followed immediately by an upper-case char, thus tokenizing the string.

    step 2 converts those spaces to commas for the impending split.

    step 3 converts the underscore to a space

    step 4 splits the line on the aforementioned commas.



    It seems likely that in this particular case, one might successfully combine steps 1 & 2 as follows:

    Code:
                string classifications = "This_TypeOfStringIsLong";
    
    //          Tokenize the string
                string spaced = Regex.Replace(classifications, @"([a-z])([A-Z])", @"$1,$2");
    
    //          Replace the underscore with a space
                string again = spaced.Replace("_", " ");
    
                string[] LineArray = again.Split(',');
    yet another option, though much less desirable, would be to combine all the operations into a single statement without doing any intervening string creation, similar to the following ....
    Code:
    string[] LineArray2 = Regex.Replace(classifications, @"([a-z])([A-Z])", @"$1,$2").Replace("_", " ").Split(',');
    Personally, I'd avoid that option like the plague because it renders the operation illegible, but I guess it is an option however foul.
    Last edited by ThermoSight; January 18th, 2011 at 10:07 PM.

  3. #3
    Join Date
    Jun 2008
    Posts
    154

    Re: Cleaning up code with Regular Expressions

    Yeh I fixed it made it waaay smaller just by changing the way I saved out the xml file. Before I had it like this >> "This_Is_OneHere_Is_AnotherYetMoreAgainSplitEachByCapital_Word"
    to >> "This_Is_One,Here_Is_Another,Yet,More,Again,Split,Each,By,Capital_Word"

    code is SUPER simplified with this
    Code:
                    string classifications = LineNode.InnerText;
                    string modded = classifications.Replace("_", " ");
                    string[] LineArray = modded.Split(',');

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured