-
Fast keyword search through CSortedArray??? Anyone know how?
Hi There,
I have a virtual listview that I load up anywhere from 20,000 to 200,000 records into. The virtual listview handles this no problem at all. I am using the CSortedArray by PJ Naughter class to store a array of custom objects. I can sort the array very fast no problems.
My issue occurs when I have to do a keyword search. I have to be able to allow the user to search as they type into a text box. So the user starts off with a FULL list of ALL the items appearing in the virtual listview, as the type keys into the textbox the virtual listview starts drilling down and only showing items that have the letters being typed in the name. so simple example:
List
====
Greg
John
Steve
George
If the user types G in the text box the list will now look like below
List
====
Greg
George
I have this working using a simple for loop to loop through all the items at each keystroke and remove any items from the list that do not match the keys typed. However, when my list gets to 30,000 or higher the search begins to take up WAY too much CPU. Is there some faster way I can do search like this using the CSortedArry?
Thanks
Greg
-
Re: Fast keyword search through CSortedArray??? Anyone know how?
I think the bottle neck here would be in the listing of the items in the listbox...the search should be very very fast...
-
Re: Fast keyword search through CSortedArray??? Anyone know how?
Well it's pretty fast... doesn't take it long, but when it searches as you type keys it pauses in between keystrokes because it is taking so much cpu I think...
Is this right?
-
Re: Fast keyword search through CSortedArray??? Anyone know how?
might try changing the search logic where it keeps track of the last find and then when the next key is pressed it starts searching from the last find point and not from the start.
-
Re: Fast keyword search through CSortedArray??? Anyone know how?
Certainly if your array were sorted a binary search would be a lot quicker than a sequential one.
-
Re: Fast keyword search through CSortedArray??? Anyone know how?
Quote:
Originally Posted by GCDEF
Certainly if your array were sorted a binary search would be a lot quicker than a sequential one.
I was headed there next with that suggestion...great minds.....
-
Re: Fast keyword search through CSortedArray??? Anyone know how?
The thing is though.... That I have to check each single item no matter what anyway to see if that items name contains any of the letters typed into the text box, so would it really matter in this instance if I used a binary search or not?
Do I not have to check each and every item anyway?
Also I have not seen any binary search examples that can return multiple items.. usually they only search for one single item and bring back a single found or not found.. do you know of any exmaples where they do a binary search that brings back multiples?
Thanks,
Greg
-
Re: Fast keyword search through CSortedArray??? Anyone know how?
Let me make sure I understand what you're asking.
If the list contained
Frog
go
golf
and the user typed in G, you wouldn't want frog to show up would you?
As to the binary search, it's pretty easily modified. Search until you find an item that matches your criteria, then read sequentially forward and backward from that point until your criteria is no longer met. Return everything in between.
-
Re: Fast keyword search through CSortedArray??? Anyone know how?
Yes I would want frog to show up :)
-
Re: Fast keyword search through CSortedArray??? Anyone know how?
Actually you would only need to start searching from your last find forward if the list is sorted, and you are searching for a word that starts with the first letter of the list and the user only adds chars to the end of the search word. If any other char is changed in the search string you would have to start from the beginning.
If you are searching for a word within a word then you only option is to start from the beginning.
-
Re: Fast keyword search through CSortedArray??? Anyone know how?
Wow, then you're kind of screwed. You could use some king of hash table or multimap I suppose. It would be a little faster.
-
Re: Fast keyword search through CSortedArray??? Anyone know how?
HAHAHA ok! Maybe I have to change my seach engine then... I might have to give the option to search within a word or from the begingin and just let my users know that when they want to search within a word, they are gonig to experience speed issues..
Incidentally, would putting the search in a seperate thread help with the search freezing up the rest of my app while typing?
Thanks,
Greg
-
Re: Fast keyword search through CSortedArray??? Anyone know how?
Quote:
Originally Posted by revg75
HAHAHA ok! Maybe I have to change my seach engine then... I might have to give the option to search within a word or from the begingin and just let my users know that when they want to search within a word, they are gonig to experience speed issues..
Incidentally, would putting the search in a seperate thread help with the search freezing up the rest of my app while typing?
Thanks,
Greg
A little but not much as the thread and your main GUI will have same priority...
-
Re: Fast keyword search through CSortedArray??? Anyone know how?
Quote:
Originally Posted by revg75
HAHAHA ok! Maybe I have to change my seach engine then... I might have to give the option to search within a word or from the begingin and just let my users know that when they want to search within a word, they are gonig to experience speed issues..
Incidentally, would putting the search in a seperate thread help with the search freezing up the rest of my app while typing?
Thanks,
Greg
Yeah, but what's the point. You can't start the second search until you have the results of the first I assume. You could wait until they've entered everything they're looking for before beginning your search too.
Your requirements seem a little weird though. Sounds like you're saying a user could type grof and frog would be returned. Without knowing what you're up to, that seems unusual.
-
Re: Fast keyword search through CSortedArray??? Anyone know how?
Ya my search requirements are a bit weird, but not that weird.
If an item in the list was
iamadonkey
The user could type 'don', or 'donk', or 'key' and it would return a match. However if the user typed 'yeknod' it would NOT return a match.
-
Re: Fast keyword search through CSortedArray??? Anyone know how?
Quote:
Originally Posted by revg75
Ya my search requirements are a bit weird, but not that weird.
If an item in the list was
iamadonkey
The user could type 'don', or 'donk', or 'key' and it would return a match. However if the user typed 'yeknod' it would NOT return a match.
That's an easy strcmp() or cstring.Find(), but you do have to look at all the words to compare...
-
Re: Fast keyword search through CSortedArray??? Anyone know how?
I don't see a sorted array helping then. I don't know an elegent solution that doesn't involve brute force or an huge indexing structure. As I said, the best think I can think of is don't start the search with every keystroke. Just wait till the user's given enough information to narrow the first set of results considerably.
-
Re: Fast keyword search through CSortedArray??? Anyone know how?
I think GCDEF is correct...you would have to index on each word and each word starting with next letter in that word to get a good index. OR just do the search when the word is completed by the user...
-
Re: Fast keyword search through CSortedArray??? Anyone know how?
Alrighty! Perfect thanks alot guys.
I will just have them click a search button when they are done.
Just incase you are interested I was trying to mimic the Winamp jump search box. If you start winamp and press the J key a little dialog comes up where you can type, and as you type the media library narrows the search.
I haven't tried winamps search with 10's of thousands of records, but I am assuming that their will handle that.
Wonder how they did it??
Cheers,
Greg
-
Re: Fast keyword search through CSortedArray??? Anyone know how?
I don't know the particulars of CSortedArray, but in general this
is all you have to do (you could do this if using a normal CArray
that is sorted).
1) Call std::lower_bound with the current "search" string. It will
return the first position that starts with that string. (O(logN))
2) starting at that position, do a linear loop until, adding to your
ist until:
a) you reach the end of the CSortedArray
b) the start of the current position in the CSortedArray does
not equal the search string.
-
Re: Fast keyword search through CSortedArray??? Anyone know how?
Quote:
Originally Posted by Philip Nicoletti
I don't know the particulars of CSortedArray, but in general this
is all you have to do (you could do this if using a normal CArray
that is sorted).
1) Call std::lower_bound with the current "search" string. It will
return the first position that starts with that string. (O(logN))
2) starting at that position, do a linear loop until, adding to your
ist until:
a) you reach the end of the CSortedArray
b) the start of the current position in the CSortedArray does
not equal the search string.
This would work but the OP was to search for the "string" anywhere in the array..NOT just starting with the search string.
if the search was for "er" he wants the search to return the following
biker
error
ferrot
and so on....it is more of a cstring.Find() than a cstring.Compare() search.
-
Re: Fast keyword search through CSortedArray??? Anyone know how?
Sorry, you are right ... I did not read all of the posts.
-
Re: Fast keyword search through CSortedArray??? Anyone know how?
Quote:
Originally Posted by Philip Nicoletti
Sorry, you are right ... I did not read all of the posts.
Not a problem..I thought the same thing at first..then he turned on the light....
-
Re: Fast keyword search through CSortedArray??? Anyone know how?
Quote:
Originally Posted by revg75
Incidentally, would putting the search in a seperate thread help with the search freezing up the rest of my app while typing?
What I would do is to put it in another thread, but delay the actual search for a couple of milliseconds (lets say 300ms). Then, if the user types in anything more before the search has started then delay the search some more. Then, only when the user stops typing (and expects the list to be updated... he can probably wait 300ms) then the search thread starts filtering the list.
If possible maybe you can stop an ongoing search if the user begins typing again. This way you don't trigger away 5 unneccessary seaches when the user types 'monkey' in 0.4 seconds.
The keyboard should stop hogging aswell.
- petter
-
Re: Fast keyword search through CSortedArray??? Anyone know how?
I've tried using an index with an inverted file (i.e. indicies pointing to (string, letter position) pairs), but I only get about a 10 fold performance increase over plain search. It's still optimisable though, but I don't know how much it will bring. And it comes at the cost of multiplying storage by 5.
-
Re: Fast keyword search through CSortedArray??? Anyone know how?
Quote:
Originally Posted by Yves M
I've tried using an index with an inverted file (i.e. indicies pointing to (string, letter position) pairs), but I only get about a 10 fold performance increase over plain search. It's still optimisable though, but I don't know how much it will bring. And it comes at the cost of multiplying storage by 5.
Good info...not too many people would do something like this to help...nice work Yves..
-
Re: Fast keyword search through CSortedArray??? Anyone know how?
By the way, here is the code so far. There are a couple of things that could be optimized (mainly the set intersection algorithm), so if you feel like trying to pursue this route further...
Code:
struct inverted_file
{
typedef std::vector<std::vector<std::pair<size_t, unsigned char> > > index_type;
index_type index;
size_t char_to_index(char c)
{
if (c <= 'Z') {
return c - 'A';
} else {
return c - 'a' + 26;
}
}
void construct_index(const std::vector<std::string> &data);
void construct_subset(const std::string &s, const std::vector<std::string> &data, std::vector<std::string> &output);
};
void inverted_file::construct_index(const std::vector<std::string> &data)
{
// The size of the index is the number of different letters that may appear
const size_t index_size = 26 * 2;
index.resize(index_size);
index_type::iterator it;
// Reserve in each component vector in the index roughly total/index_size elements
for (it = index.begin(); it != index.end(); ++it) {
it->reserve(data.size() / index_size);
}
// Now start constructing the index
size_t dataoffset = 0;
for (std::vector<std::string>::const_iterator datait = data.begin(); datait != data.end(); ++datait, ++dataoffset) {
unsigned char i = 0;
// For each character, save the position of the string it appears in and its position inside the string
for (i = 0; i < datait->length(); ++i) {
index[char_to_index((*datait)[i])].push_back(std::make_pair(dataoffset, i));
}
}
}
void inverted_file::construct_subset(const std::string &s, const std::vector<std::string> &data, std::vector<std::string> &output)
{
// Allocate the current offset inside the index
std::vector<std::vector<std::pair<size_t, unsigned char> >::const_iterator> offset, end;
for (std::string::const_iterator strit = s.begin(); strit != s.end(); ++strit) {
offset.push_back(index[char_to_index(*strit)].begin());
end.push_back(index[char_to_index(*strit)].end());
}
// Prepare the output buffer
output.clear();
// this is essentially a set intersection algorithm
bool bend = false;
size_t n = s.length(), max_string_index = 0;
unsigned char max_char_index = 0;
while (offset[0] != end[0]) {
// advance the first offset to the maximum found before
while (offset[0] != end[0] && (offset[0]->first < max_string_index || (offset[0]->first == max_string_index && offset[0]->second < max_char_index))) {
++offset[0];
}
// Now look for a sequence where the strings are equal and string positions are consecutive
size_t i = 1, current_string;
unsigned char current_pos;
current_string = offset[0]->first;
current_pos = offset[0]->second;
while ((i < n) &&
(offset[i] != end[i]) &&
(offset[i]->first < current_string || ((offset[i]->first == current_string) && (offset[i]->second <= (current_pos + i))))) {
if ((offset[i]->first == current_string) && (offset[i]->second == (current_pos + i))) {
++i;
} else {
++offset[i];
}
}
if (i == n) {
// We found a match
output.push_back(data[offset[0]->first]);
// Use this code for multiple copies of the string if the substring is found multiple times
// ++offset[0];
// Use this code for a single copy
while (offset[0] != end[0] && offset[0]->first == current_string) {
++offset[0];
}
} else if (offset[i] == end[i]) {
// We have reached the end of one of the character lists
offset[0] = end[0];
} else {
max_string_index = offset[i]->first;
max_char_index = offset[i]->second - i;
}
}
}