Counting words in each sentence using C#
CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 12 of 12

Thread: Counting words in each sentence using C#

  1. #1
    Join Date
    May 2012
    Posts
    1

    Counting words in each sentence using C#

    Hello, just wondering how to count words up to a point (for example, a period)?

    For example, given the string:

    That was a fun movie. I want to watch it again. We should sometime.

    What code would you need to count the words in each individual sentence (that is, 5, 6 then 3 using the above example string)?

    Thank you, I've been looking for a solution to this problem but it's got the better of me

  2. #2
    Join Date
    Feb 2012
    Location
    Strasbourg, France
    Posts
    116

    Re: Counting words in each sentence using C#

    Awnser this question, how do you see with your eyes that it's the end of a word. What did your eyes did ?
    Hint for programming : loop, if

  3. #3
    Join Date
    Jul 2001
    Location
    Sunny South Africa
    Posts
    11,099

    Re: Counting words in each sentence using C#

    You'll need to do some string manipulation. What you actually need here is to look up String.Split and String.Substring and String.LastIndexOf - Just naming a few that immediately pops into my head.

  4. #4
    Join Date
    Jul 2001
    Location
    Sunny South Africa
    Posts
    11,099

    Re: Counting words in each sentence using C#

    OK, this is a bit sloppy, but it should give you a good principle to start with :

    Code:
            private void button1_Click(object sender, EventArgs e)
            {
                string strInput = textBox1.Text;
    
                string[] arrInputDots;
                string[] arrInputSpaces;
                int intWordCounter;
                int intTotalWords = 0;
                int intTempCount = 0;
    
                arrInputDots = strInput.Split('.');
    
                    foreach (string strElement in arrInputDots)
    
    	            {
                        intTempCount += 1;
    
                        arrInputSpaces = strElement.Split(' ');
                        intWordCounter = arrInputSpaces.Length;
                       
                        intTotalWords += intWordCounter;
                        if (intTempCount == 2)
                        {
                            intWordCounter -= 1;
                            intTotalWords -= 1;
                        }
    
                        MessageBox.Show("Sentence Contains " + intWordCounter.ToString() + " Words");
    	            }
    
                    MessageBox.Show("Total Words in all sentences: " + intTotalWords.ToString());
            }
    Hope this helps!

  5. #5
    Join Date
    Feb 2011
    Location
    United States
    Posts
    1,006

    Re: Counting words in each sentence using C#

    You probably also want to use string.Split(string, StringSplitOptions) giving StringSplitOptions.RemoveEmptyEntries as the second argument.

    You might also think about canonicalizing your string to contain only a restricted character set (namely "A-Za-z0-9. "). That should avoid any problems if non-space whitespace characters are present.
    Best Regards,

    BioPhysEngr
    http://blog.biophysengr.net
    --
    All advice is offered in good faith only. You are ultimately responsible for effects of your programs and the integrity of the machines they run on.

  6. #6
    Join Date
    Jun 2012
    Posts
    10

    Re: Counting words in each sentence using C#

    Hey,
    I'm just learning to program, so I've tried to add some of the cool stuff that I've learned into this (LINQ, Yield, etc)


    class Program
    {
    static void Main(string[] args)
    {
    const string paragraph = "Sentence one. Sentence two contains some more words. Sentence three; thinking.";

    foreach (KeyValuePair<string, uint> sentence in GetWords(paragraph))
    {
    Console.WriteLine(string.Format("sentence: [{0}] \n words: {1}", sentence.Key, sentence.Value));
    }

    Console.ReadLine();
    }

    public static IEnumerable<KeyValuePair<string, uint>> GetWords(string Paragraph)
    {
    //add some exceptions
    IList<string> exceptions = new List<string>();
    exceptions.Add(",");
    exceptions.Add(";");
    exceptions.Add(" ");
    exceptions.Add(".");
    exceptions.Add(string.Empty);

    foreach (string sentence in Paragraph.Split('.'))
    {
    yield return new KeyValuePair<string, uint>(sentence, (uint)sentence.Split(' ').Where
    (i => !exceptions.Contains(i)).Count());
    }

    }

    }

  7. #7
    Join Date
    Jun 2012
    Posts
    10

    Re: Counting words in each sentence using C#

    Apologies, with the correct 'code' tag, for readability.
    I'll delete the last reply when an admin gives me those rights!

    Code:
    class Program
    {
    static void Main(string[] args)
    {
    const string paragraph = "Sentence one. Sentence two contains some more words. Sentence three; thinking.";
    
    foreach (KeyValuePair<string, uint> sentence in GetWords(paragraph))
    {
    Console.WriteLine(string.Format("sentence: [{0}] \n words: {1}", sentence.Key, sentence.Value));
    }
    
    Console.ReadLine();
    }
    
    public static IEnumerable<KeyValuePair<string, uint>> GetWords(string Paragraph)
    {
    //add some exceptions
    IList<string> exceptions = new List<string>();
    exceptions.Add(",");
    exceptions.Add(";");
    exceptions.Add(" ");
    exceptions.Add(".");
    exceptions.Add(string.Empty);
    
    foreach (string sentence in Paragraph.Split('.'))
    {
    yield return new KeyValuePair<string, uint>(sentence, (uint)sentence.Split(' ').Where
    (i => !exceptions.Contains(i)).Count());
    }
    
    }
    
    }

  8. #8
    Join Date
    Nov 2011
    Posts
    36

    Re: Counting words in each sentence using C#

    Here is my method of doing it, console application so that you can enter whatever sentence you want. (I do not like String.Split)

    That was a fun movie. I want to watch it again. We should sometime.

    OUTPUTS:
    Sentence 1 = 5 words.
    Sentence 2 = 7 words.
    Sentence 3 = 4 words.
    Press any key to continue...


    Code:
    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    
    namespace TestApplications
    {
        class Program
        {
            static void Main(string[] args)
            {
                if (args.Length == 1 && args[0] == "/?")
                {
                    Console.WriteLine("SYNTAX: TestApplications <Sentence>");
    #if DEBUG
                    Pause();
    #endif
                    return;
                }
                if (args.Length < 1)
                {
                    Console.WriteLine("Invalid Syntax! /? for help.");
    #if DEBUG
                    Pause();
    #endif
                    return;
                }
    
                string sentence = args[0];
    
                if (String.IsNullOrEmpty(sentence))
                {
                    Console.WriteLine("String is empty!");
                    return;
                }
    
                foreach (List<int> count in ParseSentence(sentence))
                {
                    Console.WriteLine(String.Format("Sentence {0} = {1} words.", count[0], count[1]));
                }
    
    #if DEBUG
                Pause();
    #endif
    
            }
            private static IEnumerable<List<int>> ParseSentence(string input)
            {
                int wordCount = 0;
                int sentenceCount = 0;
                List<int> ret = new List<int>();
    
                for (int i = 0; i < input.Length; i++)
                {
                    if (input[i] == ' ') //End of word
                    {
                        ++wordCount;
                        continue;
                    }
                    if (input[i] == '.') //End of sentence
                    {
                        ++wordCount; //Period should be at the end of a word...
                        ++sentenceCount;
                        ret.Add(sentenceCount);
                        ret.Add(wordCount);
                        yield return ret;
    
                        ret = new List<int>();
                        wordCount = 0;
                    }
                }
            }
            private static void Pause()
            {
                Console.WriteLine("Press any key to continue...");
                Console.ReadKey(true);
            }
        }
    }
    Last edited by Deranged; June 7th, 2012 at 07:44 PM.

  9. #9
    Join Date
    Feb 2011
    Location
    United States
    Posts
    1,006

    Re: Counting words in each sentence using C#

    I do not like String.Split
    Why not?
    Best Regards,

    BioPhysEngr
    http://blog.biophysengr.net
    --
    All advice is offered in good faith only. You are ultimately responsible for effects of your programs and the integrity of the machines they run on.

  10. #10
    Join Date
    Nov 2011
    Posts
    36

    Re: Counting words in each sentence using C#

    I find it too inconsistent, or rather too generic. I parse files all day and if the file has a lot of mistakes, your going to grab those mistakes with the .Split

    For example if you had a CSV file that was quote qualified:

    "Field1","Field2",_"Field3","Field,4"

    I would probably end up using a while loop with 3 int's, a start, end, and cursor and go down character by character to remove the _ which I'm signifying as a white space and use the qualifier to grab Field 4 with the comma. String.Split wouldn't be able to take in account of Field,4 unless I don't know something about it.

    Anyways, I've used it before but for something, extremely simple. Like a name delimited by a dash or something. |00912323048|John-Doe|yyyy/MM/dd|etc

  11. #11
    Join Date
    Feb 2011
    Location
    United States
    Posts
    1,006

    Re: Counting words in each sentence using C#

    Well, fair enough. As for me, I do a bunch of bioinformatics parsing and my preferred format is tab-separated value format. It's portable, easily human-readable and editable, and works with lots of existing tools (e.g. GNU tools and String.Split)

    Perhaps the greater lesson is to encourage your data sources to provide you machine-friendly data in the first place!
    Best Regards,

    BioPhysEngr
    http://blog.biophysengr.net
    --
    All advice is offered in good faith only. You are ultimately responsible for effects of your programs and the integrity of the machines they run on.

  12. #12
    Join Date
    Nov 2011
    Posts
    36

    Re: Counting words in each sentence using C#

    I work with hundreds if not a thousand insurance companies, they fail, hardcore. I have see hundreds of delimited files, all kinds are different in some way. Ugh, but you're right, tab delimited is nice but it is also the most common to have extra/missing columns because someone entering data used an extra tab (if it is entered by human means) but not always, sometimes these insurance companies hire people out of Trade school, and nothing against trade schools but they don't teach a lot and these students are still very new... Anyways..

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Windows Mobile Development Center


Click Here to Expand Forum to Full Width

This is a CodeGuru survey question.


Featured


HTML5 Development Center