That's an easy strcmp() or cstring.Find(), but you do have to look at all the words to compare...Quote:
Originally Posted by revg75
Printable View
That's an easy strcmp() or cstring.Find(), but you do have to look at all the words to compare...Quote:
Originally Posted by revg75
I don't see a sorted array helping then. I don't know an elegent solution that doesn't involve brute force or an huge indexing structure. As I said, the best think I can think of is don't start the search with every keystroke. Just wait till the user's given enough information to narrow the first set of results considerably.
I think GCDEF is correct...you would have to index on each word and each word starting with next letter in that word to get a good index. OR just do the search when the word is completed by the user...
Alrighty! Perfect thanks alot guys.
I will just have them click a search button when they are done.
Just incase you are interested I was trying to mimic the Winamp jump search box. If you start winamp and press the J key a little dialog comes up where you can type, and as you type the media library narrows the search.
I haven't tried winamps search with 10's of thousands of records, but I am assuming that their will handle that.
Wonder how they did it??
Cheers,
Greg
I don't know the particulars of CSortedArray, but in general this
is all you have to do (you could do this if using a normal CArray
that is sorted).
1) Call std::lower_bound with the current "search" string. It will
return the first position that starts with that string. (O(logN))
2) starting at that position, do a linear loop until, adding to your
ist until:
a) you reach the end of the CSortedArray
b) the start of the current position in the CSortedArray does
not equal the search string.
Quote:
Originally Posted by Philip Nicoletti
This would work but the OP was to search for the "string" anywhere in the array..NOT just starting with the search string.
if the search was for "er" he wants the search to return the following
biker
error
ferrot
and so on....it is more of a cstring.Find() than a cstring.Compare() search.
Sorry, you are right ... I did not read all of the posts.
Not a problem..I thought the same thing at first..then he turned on the light....Quote:
Originally Posted by Philip Nicoletti
What I would do is to put it in another thread, but delay the actual search for a couple of milliseconds (lets say 300ms). Then, if the user types in anything more before the search has started then delay the search some more. Then, only when the user stops typing (and expects the list to be updated... he can probably wait 300ms) then the search thread starts filtering the list.Quote:
Originally Posted by revg75
If possible maybe you can stop an ongoing search if the user begins typing again. This way you don't trigger away 5 unneccessary seaches when the user types 'monkey' in 0.4 seconds.
The keyboard should stop hogging aswell.
- petter
I've tried using an index with an inverted file (i.e. indicies pointing to (string, letter position) pairs), but I only get about a 10 fold performance increase over plain search. It's still optimisable though, but I don't know how much it will bring. And it comes at the cost of multiplying storage by 5.
Good info...not too many people would do something like this to help...nice work Yves..Quote:
Originally Posted by Yves M
By the way, here is the code so far. There are a couple of things that could be optimized (mainly the set intersection algorithm), so if you feel like trying to pursue this route further...
Code:struct inverted_file
{
typedef std::vector<std::vector<std::pair<size_t, unsigned char> > > index_type;
index_type index;
size_t char_to_index(char c)
{
if (c <= 'Z') {
return c - 'A';
} else {
return c - 'a' + 26;
}
}
void construct_index(const std::vector<std::string> &data);
void construct_subset(const std::string &s, const std::vector<std::string> &data, std::vector<std::string> &output);
};
void inverted_file::construct_index(const std::vector<std::string> &data)
{
// The size of the index is the number of different letters that may appear
const size_t index_size = 26 * 2;
index.resize(index_size);
index_type::iterator it;
// Reserve in each component vector in the index roughly total/index_size elements
for (it = index.begin(); it != index.end(); ++it) {
it->reserve(data.size() / index_size);
}
// Now start constructing the index
size_t dataoffset = 0;
for (std::vector<std::string>::const_iterator datait = data.begin(); datait != data.end(); ++datait, ++dataoffset) {
unsigned char i = 0;
// For each character, save the position of the string it appears in and its position inside the string
for (i = 0; i < datait->length(); ++i) {
index[char_to_index((*datait)[i])].push_back(std::make_pair(dataoffset, i));
}
}
}
void inverted_file::construct_subset(const std::string &s, const std::vector<std::string> &data, std::vector<std::string> &output)
{
// Allocate the current offset inside the index
std::vector<std::vector<std::pair<size_t, unsigned char> >::const_iterator> offset, end;
for (std::string::const_iterator strit = s.begin(); strit != s.end(); ++strit) {
offset.push_back(index[char_to_index(*strit)].begin());
end.push_back(index[char_to_index(*strit)].end());
}
// Prepare the output buffer
output.clear();
// this is essentially a set intersection algorithm
bool bend = false;
size_t n = s.length(), max_string_index = 0;
unsigned char max_char_index = 0;
while (offset[0] != end[0]) {
// advance the first offset to the maximum found before
while (offset[0] != end[0] && (offset[0]->first < max_string_index || (offset[0]->first == max_string_index && offset[0]->second < max_char_index))) {
++offset[0];
}
// Now look for a sequence where the strings are equal and string positions are consecutive
size_t i = 1, current_string;
unsigned char current_pos;
current_string = offset[0]->first;
current_pos = offset[0]->second;
while ((i < n) &&
(offset[i] != end[i]) &&
(offset[i]->first < current_string || ((offset[i]->first == current_string) && (offset[i]->second <= (current_pos + i))))) {
if ((offset[i]->first == current_string) && (offset[i]->second == (current_pos + i))) {
++i;
} else {
++offset[i];
}
}
if (i == n) {
// We found a match
output.push_back(data[offset[0]->first]);
// Use this code for multiple copies of the string if the substring is found multiple times
// ++offset[0];
// Use this code for a single copy
while (offset[0] != end[0] && offset[0]->first == current_string) {
++offset[0];
}
} else if (offset[i] == end[i]) {
// We have reached the end of one of the character lists
offset[0] = end[0];
} else {
max_string_index = offset[i]->first;
max_char_index = offset[i]->second - i;
}
}
}