CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Page 2 of 2 FirstFirst 12
Results 16 to 27 of 27
  1. #16
    Join Date
    May 2004
    Location
    45,000FT Above Nevada
    Posts
    1,539

    Re: Fast keyword search through CSortedArray??? Anyone know how?

    Quote Originally Posted by revg75
    Ya my search requirements are a bit weird, but not that weird.

    If an item in the list was

    iamadonkey

    The user could type 'don', or 'donk', or 'key' and it would return a match. However if the user typed 'yeknod' it would NOT return a match.
    That's an easy strcmp() or cstring.Find(), but you do have to look at all the words to compare...
    Jim
    ATP BE400 CE500 (C550B-SPW) CE560XL MU300 CFI CFII

    "The speed of non working code is irrelevant"... Of course that is just my opinion, I could be wrong.

    "Nothing in the world can take the place of persistence. Talent will not; nothing is more common than unsuccessful men with talent. Genius will not; unrewarded genius is almost a proverb. Education will not; the world is full of educated derelicts. Persistence and determination are omnipotent. The slogan 'press on' has solved and always will solve the problems of the human race."...Calvin Coolidge 30th President of the USA.

  2. #17
    GCDEF is offline Elite Member Power Poster
    Join Date
    Nov 2003
    Location
    Florida
    Posts
    12,637

    Re: Fast keyword search through CSortedArray??? Anyone know how?

    I don't see a sorted array helping then. I don't know an elegent solution that doesn't involve brute force or an huge indexing structure. As I said, the best think I can think of is don't start the search with every keystroke. Just wait till the user's given enough information to narrow the first set of results considerably.

  3. #18
    Join Date
    May 2004
    Location
    45,000FT Above Nevada
    Posts
    1,539

    Re: Fast keyword search through CSortedArray??? Anyone know how?

    I think GCDEF is correct...you would have to index on each word and each word starting with next letter in that word to get a good index. OR just do the search when the word is completed by the user...
    Jim
    ATP BE400 CE500 (C550B-SPW) CE560XL MU300 CFI CFII

    "The speed of non working code is irrelevant"... Of course that is just my opinion, I could be wrong.

    "Nothing in the world can take the place of persistence. Talent will not; nothing is more common than unsuccessful men with talent. Genius will not; unrewarded genius is almost a proverb. Education will not; the world is full of educated derelicts. Persistence and determination are omnipotent. The slogan 'press on' has solved and always will solve the problems of the human race."...Calvin Coolidge 30th President of the USA.

  4. #19
    Join Date
    Jun 2003
    Posts
    91

    Re: Fast keyword search through CSortedArray??? Anyone know how?

    Alrighty! Perfect thanks alot guys.

    I will just have them click a search button when they are done.

    Just incase you are interested I was trying to mimic the Winamp jump search box. If you start winamp and press the J key a little dialog comes up where you can type, and as you type the media library narrows the search.

    I haven't tried winamps search with 10's of thousands of records, but I am assuming that their will handle that.

    Wonder how they did it??

    Cheers,
    Greg

  5. #20
    Join Date
    Aug 2000
    Location
    West Virginia
    Posts
    7,725

    Re: Fast keyword search through CSortedArray??? Anyone know how?

    I don't know the particulars of CSortedArray, but in general this
    is all you have to do (you could do this if using a normal CArray
    that is sorted).

    1) Call std::lower_bound with the current "search" string. It will
    return the first position that starts with that string. (O(logN))

    2) starting at that position, do a linear loop until, adding to your
    ist until:

    a) you reach the end of the CSortedArray
    b) the start of the current position in the CSortedArray does
    not equal the search string.

  6. #21
    Join Date
    May 2004
    Location
    45,000FT Above Nevada
    Posts
    1,539

    Re: Fast keyword search through CSortedArray??? Anyone know how?

    Quote Originally Posted by Philip Nicoletti
    I don't know the particulars of CSortedArray, but in general this
    is all you have to do (you could do this if using a normal CArray
    that is sorted).

    1) Call std::lower_bound with the current "search" string. It will
    return the first position that starts with that string. (O(logN))

    2) starting at that position, do a linear loop until, adding to your
    ist until:

    a) you reach the end of the CSortedArray
    b) the start of the current position in the CSortedArray does
    not equal the search string.

    This would work but the OP was to search for the "string" anywhere in the array..NOT just starting with the search string.

    if the search was for "er" he wants the search to return the following

    biker
    error
    ferrot

    and so on....it is more of a cstring.Find() than a cstring.Compare() search.
    Jim
    ATP BE400 CE500 (C550B-SPW) CE560XL MU300 CFI CFII

    "The speed of non working code is irrelevant"... Of course that is just my opinion, I could be wrong.

    "Nothing in the world can take the place of persistence. Talent will not; nothing is more common than unsuccessful men with talent. Genius will not; unrewarded genius is almost a proverb. Education will not; the world is full of educated derelicts. Persistence and determination are omnipotent. The slogan 'press on' has solved and always will solve the problems of the human race."...Calvin Coolidge 30th President of the USA.

  7. #22
    Join Date
    Aug 2000
    Location
    West Virginia
    Posts
    7,725

    Re: Fast keyword search through CSortedArray??? Anyone know how?

    Sorry, you are right ... I did not read all of the posts.

  8. #23
    Join Date
    May 2004
    Location
    45,000FT Above Nevada
    Posts
    1,539

    Re: Fast keyword search through CSortedArray??? Anyone know how?

    Quote Originally Posted by Philip Nicoletti
    Sorry, you are right ... I did not read all of the posts.
    Not a problem..I thought the same thing at first..then he turned on the light....
    Jim
    ATP BE400 CE500 (C550B-SPW) CE560XL MU300 CFI CFII

    "The speed of non working code is irrelevant"... Of course that is just my opinion, I could be wrong.

    "Nothing in the world can take the place of persistence. Talent will not; nothing is more common than unsuccessful men with talent. Genius will not; unrewarded genius is almost a proverb. Education will not; the world is full of educated derelicts. Persistence and determination are omnipotent. The slogan 'press on' has solved and always will solve the problems of the human race."...Calvin Coolidge 30th President of the USA.

  9. #24
    Join Date
    Apr 2005
    Location
    Norway
    Posts
    3,934

    Re: Fast keyword search through CSortedArray??? Anyone know how?

    Quote Originally Posted by revg75
    Incidentally, would putting the search in a seperate thread help with the search freezing up the rest of my app while typing?
    What I would do is to put it in another thread, but delay the actual search for a couple of milliseconds (lets say 300ms). Then, if the user types in anything more before the search has started then delay the search some more. Then, only when the user stops typing (and expects the list to be updated... he can probably wait 300ms) then the search thread starts filtering the list.

    If possible maybe you can stop an ongoing search if the user begins typing again. This way you don't trigger away 5 unneccessary seaches when the user types 'monkey' in 0.4 seconds.

    The keyboard should stop hogging aswell.

    - petter

  10. #25
    Join Date
    Aug 2002
    Location
    Madrid
    Posts
    4,588

    Re: Fast keyword search through CSortedArray??? Anyone know how?

    I've tried using an index with an inverted file (i.e. indicies pointing to (string, letter position) pairs), but I only get about a 10 fold performance increase over plain search. It's still optimisable though, but I don't know how much it will bring. And it comes at the cost of multiplying storage by 5.
    Get this small utility to do basic syntax highlighting in vBulletin forums (like Codeguru) easily.
    Supports C++ and VB out of the box, but can be configured for other languages.

  11. #26
    Join Date
    May 2004
    Location
    45,000FT Above Nevada
    Posts
    1,539

    Re: Fast keyword search through CSortedArray??? Anyone know how?

    Quote Originally Posted by Yves M
    I've tried using an index with an inverted file (i.e. indicies pointing to (string, letter position) pairs), but I only get about a 10 fold performance increase over plain search. It's still optimisable though, but I don't know how much it will bring. And it comes at the cost of multiplying storage by 5.
    Good info...not too many people would do something like this to help...nice work Yves..
    Jim
    ATP BE400 CE500 (C550B-SPW) CE560XL MU300 CFI CFII

    "The speed of non working code is irrelevant"... Of course that is just my opinion, I could be wrong.

    "Nothing in the world can take the place of persistence. Talent will not; nothing is more common than unsuccessful men with talent. Genius will not; unrewarded genius is almost a proverb. Education will not; the world is full of educated derelicts. Persistence and determination are omnipotent. The slogan 'press on' has solved and always will solve the problems of the human race."...Calvin Coolidge 30th President of the USA.

  12. #27
    Join Date
    Aug 2002
    Location
    Madrid
    Posts
    4,588

    Re: Fast keyword search through CSortedArray??? Anyone know how?

    By the way, here is the code so far. There are a couple of things that could be optimized (mainly the set intersection algorithm), so if you feel like trying to pursue this route further...
    Code:
    struct inverted_file
    {
      typedef std::vector<std::vector<std::pair<size_t, unsigned char> > > index_type;
      index_type index;
      
    
      size_t char_to_index(char c)
      {
        if (c <= 'Z') {
          return c - 'A';
        } else {
          return c - 'a' + 26;
        }
      }
      void construct_index(const std::vector<std::string> &data);
      void construct_subset(const std::string &s, const std::vector<std::string> &data, std::vector<std::string> &output);
    };
    
    void inverted_file::construct_index(const std::vector<std::string> &data)
    {
      // The size of the index is the number of different letters that may appear
      const size_t index_size = 26 * 2;
      
      index.resize(index_size);
      index_type::iterator it;
      // Reserve in each component vector in the index roughly total/index_size elements
      for (it = index.begin(); it != index.end(); ++it) {
        it->reserve(data.size() / index_size);
      }
      // Now start constructing the index
      size_t dataoffset = 0;
      for (std::vector<std::string>::const_iterator datait = data.begin(); datait != data.end(); ++datait, ++dataoffset) {
        unsigned char i = 0;
        // For each character, save the position of the string it appears in and its position inside the string
        for (i = 0; i < datait->length(); ++i) {
          index&#091;char_to_index((*datait)&#091;i&#093;)&#093;.push_back(std::make_pair(dataoffset, i));
        }
      }
    }
    
    void inverted_file::construct_subset(const std::string &s, const std::vector<std::string> &data, std::vector<std::string> &output)
    {
      // Allocate the current offset inside the index
      std::vector<std::vector<std::pair<size_t, unsigned char> >::const_iterator> offset, end;
      for (std::string::const_iterator strit = s.begin(); strit != s.end(); ++strit) {
        offset.push_back(index&#091;char_to_index(*strit)&#093;.begin());
        end.push_back(index&#091;char_to_index(*strit)&#093;.end());
      }
    
      // Prepare the output buffer
      output.clear();
      
      // this is essentially a set intersection algorithm
      bool bend = false;
      size_t n = s.length(), max_string_index = 0;
      unsigned char max_char_index = 0;
    
      while (offset&#091;0&#093; != end&#091;0&#093;) {
        // advance the first offset to the maximum found before
        while (offset&#091;0&#093; != end&#091;0&#093; && (offset&#091;0&#093;->first < max_string_index || (offset&#091;0&#093;->first == max_string_index && offset&#091;0&#093;->second < max_char_index))) {
          ++offset&#091;0&#093;;
        }
        // Now look for a sequence where the strings are equal and string positions are consecutive
        size_t i = 1, current_string;
        unsigned char current_pos;
        current_string = offset&#091;0&#093;->first;
        current_pos = offset&#091;0&#093;->second;
        while ((i < n) && 
            (offset&#091;i&#093; != end&#091;i&#093;) && 
            (offset&#091;i&#093;->first < current_string || ((offset&#091;i&#093;->first == current_string) && (offset&#091;i&#093;->second <= (current_pos + i))))) {
          if ((offset&#091;i&#093;->first == current_string) && (offset&#091;i&#093;->second == (current_pos + i))) {
            ++i;
          } else {
            ++offset&#091;i&#093;;
          }
        }
        if (i == n) { 
          // We found a match
          output.push_back(data&#091;offset&#091;0&#093;->first&#093;);
          // Use this code for multiple copies of the string if the substring is found multiple times
          // ++offset&#091;0&#093;;
          // Use this code for a single copy
          while (offset&#091;0&#093; != end&#091;0&#093; && offset&#091;0&#093;->first == current_string) {
            ++offset&#091;0&#093;;
          }
        } else if (offset&#091;i&#093; == end&#091;i&#093;) {
          // We have reached the end of one of the character lists
          offset&#091;0&#093; = end&#091;0&#093;;
        } else {
          max_string_index = offset&#091;i&#093;->first;
          max_char_index = offset&#091;i&#093;->second - i;
        }
      }
    }
    Get this small utility to do basic syntax highlighting in vBulletin forums (like Codeguru) easily.
    Supports C++ and VB out of the box, but can be configured for other languages.

Page 2 of 2 FirstFirst 12

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured