CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 2 of 2
  1. #1
    Join Date
    Jan 2018
    Posts
    2

    Word stemming - help please

    Hi guys I am creating a text analyser which has the following:

    1. Tokenize - to parse the long text string
    2. Remove the stop words
    3. perform stemming

    My code so far:


    Code:
    package searc_engine;
    02
     
    03
    import javax.swing.JOptionPane;
    04
     
    05
     
    06
    public class TextAnalyser {
    07
        
    08
        //JOptionPane.showInputDialog(null,"Type you Input");
    09
        public static void main(String[] args){
    10
             
    11
      
    12
            
    13
            String myString = "I was so happpy  but innocent they said ok when i asked"; // string
    14
             
    15
            String stopWords = "I|its|with|but|a|and|be|if|in|it|of|on|or|so|the|they|there|this|which|why";
    16
            String afterStopWords = myString.replaceAll("(" + stopWords + ")\\s*", " ");
    17
             
    18
             
    19
            String delimter = " "; // delimter = where we want to split the string up, e.g. at every space.
    20
            String [] words = afterStopWords.split(delimter); //array of strings to hold each indivudal word
    21
             
    22
            for(int i = 0; i < words.length; i++){
    23
                    afterStopWords = afterStopWords.toLowerCase(); // lowercase
    24
                System.out.println(words[i]);
    25
           
    26
            }
    27
                         
    28
     
    29
        }
    30
         
    31
            }
    But my problem is i cant do the 3rd part, the stemmer which should:

    Perform stemming on the terms: Words having the same stem are usually assumed
    to have similar meaning. A typical example of a stem is the word “connect” which is
    the stem for the variants “connected”, “connecting”, “connection” and “connections”.
    In order to improve the recall of the search (i.e., to get relevant documents which don't
    contain the exact words as specified in the query), stemming is performed to remove
    the affixes. For example the word 'rides' and 'riding' would both be stemmed to 'ride'.
    In the first case this involves the removal of the end character 's'. In the second case
    this involves the removal of the characters 'ing' and the addition of the character 'e'.
    Porter's algorithm is a well-known stemming algorithm. You may refer to Porter's
    algorithm for stemming. You need to implement your own version of the algorithm.
    Here you are required to remove the end character „s‟. You can certainly implement
    more rules for stemming.
    For example, a document CatDog.txt such as:
    Cat Dog
    The cats and dogs sat in the dog-basket.
    will generate the following output:
    [cat,dog,cat,dog,sat,dog,basket]

    can anyone help please?

  2. #2
    Join Date
    Jan 2018
    Posts
    2

    Re: Word stemming - help please

    Better format of the code above:

    Code:
    package searc_engine;
    
    import javax.swing.JOptionPane;
    
    public class TextAnalyser {
       
        //JOptionPane.showInputDialog(null,"Type you Input");
        public static void main(String[] args){
           
            String myString = "I was so happpy  but innocent they said ok when i asked"; // string
            
            String stopWords = "I|its|with|but|a|and|be|if|in|it|of|on|or|so|the|they|there|this|which|why";
            String afterStopWords = myString.replaceAll("(" + stopWords + ")\\s*", " ");
            
            
            String delimter = " "; // delimter = where we want to split the string up, e.g. at every space.
            String [] words = afterStopWords.split(delimter); //array of strings to hold each indivudal word
            
            for(int i = 0; i < words.length; i++){
                    afterStopWords = afterStopWords.toLowerCase(); // lowercase
                System.out.println(words[i]);
          
            }
                
        }
        
            }

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured