Diacritics get stripped out of my capitalization function
CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 2 of 2

Thread: Diacritics get stripped out of my capitalization function

  1. #1
    Join Date
    Oct 2005
    Location
    Cleveland, Ohio
    Posts
    4

    Diacritics get stripped out of my capitalization function

    I have a very simple capitalization program that helps me auto-capitalize text files. It operates from a capitalization dictionary which is also a text file. The capitalization dictionary consists of nothing more than words that are always capitalized a certain way, separated by line breaks. So, for example, if the capitalization dictionary consists of the following:


    "Napoleon Dynamite"
    French
    iPad
    I


    and the string to be processed is:

    i watched "napoleon dynamite" in french on my ipad

    the output after processing will be:

    I watched "Napoleon Dynamite" in French on my iPad

    However, for some reason I cannot figure out, the program strips out (i.e., deletes) any character that has a diacritic. So if I add to my capitalization dictionary the word:

    Napoléon

    and the string is:

    i watched "napoleon dynamite" in french on my ipad with my friend napoléon

    what I end up with is:

    I watched "Napoleon Dynamite" in French on my iPad with my friend napolon

    Obviously, this is not desired. Can anyone help me figure out what the fix might be, to make sure that letters with diacritics are treated properly rather than being deleted? I think it might have something to do with the ToLower function...

    Here is the code of the function:

    Code:
            public static string CapitalizeString(List<string> wordList, string str)
            {
    
                if (str == null || str.Length == 0)
                {
                    return "";
                }
    
                string capitalizedString = str.ToLower();
    
                capitalizedString = ReplaceCapitalizedWord(wordList, capitalizedString);
    
                // capitalizes the first letter
                for (int i = 0; i < capitalizedString.Length; i++)
                {
                    char ch = capitalizedString[i];
                    if (char.IsLetter(ch) || char.IsNumber(ch))
                    {
                        if (char.IsUpper(str[i]))
                        {
                            capitalizedString = capitalizedString.Substring(0, i) + capitalizedString.Substring(i, 1).ToUpper() + capitalizedString.Substring(i + 1);
                        }
    
                        break;
                    }
                }
    
                return capitalizedString;
    
            }
    
    
            private static string ReplaceCapitalizedWord(List<string> capitalsWordList, string stringToCapitalize)
            {
                string lowerCaseString = stringToCapitalize.ToLower();
                string capitalizedString = stringToCapitalize.ToString();
    
                foreach (string capStr in capitalsWordList)
                {
                    string capStrLower = capStr.ToLower();
                    int startIndex = 0;
                    int foundIndex = -1;
                    while (startIndex < capitalizedString.Length && (foundIndex = lowerCaseString.IndexOf(capStrLower, startIndex)) >= 0)
                    {
                        bool isSeparatorPrevChar = true;
                        bool isSeparatorNextChar = true;
    
                        if (foundIndex > 0)
                        {
                            char prevChar = capitalizedString[foundIndex - 1];
                            isSeparatorPrevChar = !char.IsLetterOrDigit(prevChar) && prevChar != '-' && prevChar != '\'';
                        }
    
                        if (foundIndex + capStr.Length < capitalizedString.Length)
                        {
                            char nextChar = capitalizedString[foundIndex + capStr.Length];
                            isSeparatorNextChar = !char.IsLetterOrDigit(nextChar) && nextChar != '-';
                        }
    
    
                        if (isSeparatorPrevChar && isSeparatorNextChar)
                        {
                            capitalizedString = capitalizedString.Substring(0, foundIndex) + capStr + capitalizedString.Substring(foundIndex + capStr.Length);
                        }
    
                        startIndex = foundIndex + 1;
                    }
    
                }
    
                return capitalizedString;
            }
    Using .NET 2.0 (I think--compiling in Visual C# 2005 Express, anyway).
    All the best,
    Robert K S

  2. #2
    Arjay's Avatar
    Arjay is online now Moderator / MS MVP Power Poster
    Join Date
    Aug 2004
    Posts
    11,308

    Re: Diacritics get stripped out of my capitalization function

    Quote Originally Posted by Robert K S View Post
    I think it might have something to do with the ToLower function...
    You can find out for sure by setting a break point and step through the code in the debugger. Look at the str before the ToLower() method gets called and then afterward.

    If it is the ToLower removing the diacritic letters, then you may want to not use that function.

    Why not use the split method using a space as the delimiter, then just check if the leading character is capitalized and if it isn't make it a capital.

    Also, check out the StringBuilder class, because you can access individual characters and make in place edits (as opposed to a new string object getting create each time you call a string member function).

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Windows Mobile Development Center


Click Here to Expand Forum to Full Width

This is a CodeGuru survey question.


Featured


HTML5 Development Center