CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 5 of 5
  1. #1
    Join Date
    Jan 2012
    Posts
    2

    HashSet of strings - subsets

    Hey all,

    I've got a massive collection of stand-alone strings. I'm currently storing them in a hashset (maybe this is not the proper container for what I'm trying to do). I'm currently using the hashset for an efficient lookup to see if a particular string exists.

    However, for optimization purposes, I need to test if a given string is the starting subset of any particular word in the hashset. (For example, if I feed it the word "ca" and my hashset contains the word "cat", I would want it to return true.)

    If the string I feed it does NOT start any particular word in the dictionary, it should return false. (For example, if my hashset only contained one word, "cat", and I test the word "at", it should return false.)

    Hopefully that makes sense. I've looked through the hashset documentation and wasn't sure if anything would fill this niche. If hashsets are not the proper container to use here, I'm definitely open to suggestions.

    Thanks!

  2. #2
    Join Date
    May 2007
    Posts
    1,546

    Re: HashSet of strings - subsets

    The simplest way of implementing this would be to use a List<string> which is in sorted order and use a BinarySearch to figure out if it contains the item you require.
    www.monotorrent.com For all your .NET bittorrent needs

    NOTE: My code snippets are just snippets. They demonstrate an idea which can be adapted by you to solve your problem. They are not 100% complete and fully functional solutions equipped with error handling.

  3. #3
    Join Date
    Jan 2012
    Posts
    2

    Re: HashSet of strings - subsets

    Ah, excellent. I wasn't aware Lists contained a logarithmic lookup like that. Thanks!

  4. #4
    Join Date
    Feb 2011
    Location
    United States
    Posts
    1,016

    Re: HashSet of strings - subsets

    If you want to use a hashtable, use Dictionary<,>... see http://msdn.microsoft.com/en-us/library/xfhwa508.aspx

    You could add all prefixes (so for cat add "c", "ca" and "cat") and then check for contains with yourDictionary.Contains("ca"); This would be O(1), but is space inefficient (it requires all words AND all prefixes).

    You could use the list approach as Mutant Fruit suggested and get O(log n) where n is the number of strings.

    I would suggest a tree data-structure where the children of any given node are the "next letter" in the word. Consider a tree contains the words { cat, can, cold, hat } then obtain:

    Code:
                           (Root)
                           /    \
    First letter          c      h
                         / \       \
    Second letter       a   o       a
                       / \   \       \
    Third letter      t   n   l       t
                               \
    Fourth letter               d
    Then to check to see if a prefix is present, descend from root. If all nodes exist, return true;

    Example: Input "cat" and take the path: Root -> c -> a -> t; return true;

    Example: Input "ca" and take the path: Root -> c -> a; return tur;e

    Example: Input "hard"; and take the path Root -> h -> a. r does not exist! return false;

    Example: Input "at"; take the path Root. a does not exist! return false;

    This gives you a result n constant time (O(1)) in the number of words in your gigantic wordlist and a worst-case time complexity of O(k) where there are k letters in your query (which I think is as bad as hashtable... presumaby hashtable has dependence on the number of bits in the hashvalue).
    Best Regards,

    BioPhysEngr
    http://blog.biophysengr.net
    --
    All advice is offered in good faith only. You are ultimately responsible for effects of your programs and the integrity of the machines they run on.

  5. #5
    Join Date
    Dec 2011
    Posts
    61

    Re: HashSet of strings - subsets

    another choice is sql server. store your strings in a table and use sql LIKE to search the matches.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured