CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 8 of 8
  1. #1
    Join Date
    Jul 2005
    Posts
    1,030

    STL copy with a delimiter???

    As we know, we can use copy to split a string and then insert the tokens into a vector like this,

    Code:
    vector<string> vec;
    string iss;
    
    copy(istream_iterator<string>(iss), istream_iterator<string>(), back_inserter< vector<string> >(vec) );
    However, the method above works only when the string is separated by white space. What if the string is separated by some delimiter other than white space? Thanks.

  2. #2
    Join Date
    Jan 2006
    Location
    Singapore
    Posts
    6,765

    Re: STL copy with a delimiter???

    Quote Originally Posted by LarryChen
    What if the string is separated by some delimiter other than white space?
    I would probably opt for a loop with getline or use Boost.Tokenizer (which would then allow for the use of std::copy).
    C + C++ Compiler: MinGW port of GCC
    Build + Version Control System: SCons + Bazaar

    Look up a C/C++ Reference and learn How To Ask Questions The Smart Way
    Kindly rate my posts if you found them useful

  3. #3
    Join Date
    May 2009
    Posts
    2,413

    Re: STL copy with a delimiter???

    Quote Originally Posted by LarryChen View Post
    What if the string is separated by some delimiter other than white space? Thanks.
    I would have a look at regular expressions, now part of the C++ standard.

  4. #4
    Join Date
    May 2007
    Location
    Scotland
    Posts
    1,164

    Re: STL copy with a delimiter???

    I would probably use boost split. But if you don't want to use boost, with not much effort, you could write your own version of the split functionality found in boost. Something like the following should do it:

    Code:
    #include <string>
    #include <functional>
    
    struct is_any_of : std::unary_function<char, bool>
    {
      is_any_of(const std::string& values)
        :values_(values)
      {}
    
      bool operator()(char v)
      {
        return values_.find(v) != std::string::npos;
      }
    
    private:
      std::string values_;
    };
    
    template <typename OutputContainer, typename Predicate>
    void split(OutputContainer& dst, const std::string& src, Predicate predicate)
    {
      std::string::const_iterator first = src.begin();
      std::string::const_iterator last  = src.end();
    
      std::string item;
    
      while(first != last)
      {
        if(predicate(*first))
        {
          dst.push_back(item);
          item = "";
        }
        else
        {
          item.push_back(*first);
        }
    
        ++first;
      }
      dst.push_back(item);
    }
    Now assuming that you put the above in a header file called split.hpp then you could write something like:
    Code:
    #include <iostream>
    #include <fstream>
    #include <vector>
    #include <string>
    
    #include "split.hpp"
    
    int main()
    {
      std::string linebuffer;
      std::ifstream ifile;
      std::vector<std::string> vec;
    
      //Load a file here
      //....
      std::getline(ifile, linebuffer)
      
      split(vec, linebuffer, is_any_of(";:, |\t"));
    }
    Anyway, that's what I would do.

  5. #5
    Join Date
    Jul 2005
    Posts
    1,030

    Re: STL copy with a delimiter???

    Thanks so much for you guys help. I decide to use regular expression to solve my problem as nuzzle suggested. Here is my sample code,
    Code:
    int main()
    {
    	string s = "abc|def gh|ijk|lmn";
    	regex pattern("\\w+|");
    	sregex_token_iterator end;
    
    	for(sregex_token_iterator i(s.begin(), s.end(), pattern); i!=end;++i)
    	{
    		cout<<*i<<endl;
    	}
    	
    	return 0;
    }
    The problem I still have is that the retrieved tokens are "abc" "def" "gh" "ijk" "lmn" but what I expect is "abc" "def gh" "ijk" "lmn". How'd I get around the white space issue here? Thanks.

  6. #6
    Join Date
    Oct 2008
    Posts
    1,456

    Re: STL copy with a delimiter???

    just write

    Code:
    int main()
    {
    	string s = "abc|def gh|ijk|lmn";
    	regex pattern( "[|]");
    	sregex_token_iterator end;
    
    	for(sregex_token_iterator i(s.begin(), s.end(), pattern, -1 ); i!=end;++i)
    	{
    		cout<<*i<<endl;
    	}
    	
    	return 0;
    }
    the "-1" basically commands the iterator to split the string when it finds a matching pattern ...

  7. #7
    Join Date
    Jul 2005
    Posts
    1,030

    Re: STL copy with a delimiter???

    Thanks for your code. It works perfectly! Would you explain the meaning of "-1" used in sregex_token_iterator? I am not able to understand the explanation from MSDN.
    Quote Originally Posted by superbonzo View Post
    just write

    Code:
    int main()
    {
    	string s = "abc|def gh|ijk|lmn";
    	regex pattern( "[|]");
    	sregex_token_iterator end;
    
    	for(sregex_token_iterator i(s.begin(), s.end(), pattern, -1 ); i!=end;++i)
    	{
    		cout<<*i<<endl;
    	}
    	
    	return 0;
    }
    the "-1" basically commands the iterator to split the string when it finds a matching pattern ...

  8. #8
    Join Date
    Oct 2008
    Posts
    1,456

    Re: STL copy with a delimiter???

    Quote Originally Posted by LarryChen View Post
    Thanks for your code. It works perfectly! Would you explain the meaning of "-1" used in sregex_token_iterator? I am not able to understand the explanation from MSDN.
    well, a regex_token_iterator is based on regex_iterator, so let's see it first.

    now, a regex_iterator R basically wraps consecutive regex_search calls on a sequence S of characters going from the end of the previous match or the beginning of the sequence if R has been just constructed.

    Hence, the result of *R is a (const reference to) a match_result object storing the following ranges of iterators of S:
    - a prefix range R->prefix(), going from the end of the previous match to the current match
    - a suffix range R->suffix(), going from the end of the current match to the end of S
    - a match range (*R)[0], the current match
    - a set of match ranges (*R)[j], representing marked submatches

    For example, "\\d+" on "a 1 b 10 c 100 d 1000 e" will give the sequence of [prefix, match, suffix] ( there are no submatches in this case ):

    ["a ","1"," b 10 c 100 d 1000 e"]
    [" b ","10"," c 100 d 1000 e"]
    [" c ","100"," d 1000 e"]
    [" d ","1000"," e"]

    then, a regex_token_iterator T wraps a regex_iterator R and a vector of indeces V:={i1,...,iN}:

    T represents the sequence of subranges (*R)[i1],(*R)[i2],...,(*R)[iN], ++R, (*R)[i1], ..., (*R)[iN], ++R, ... and so on until R becomes an end iterator. So, it's the same as a regex_iterator but this time instead of returning a sequence of match_result's it returns a sequence of iterator ranges of S where the enumerated marked submatches ( index > 0 ) or the match itself ( index == 0 ) are specified by the supplyed vector of indeces.

    Now, in theory, only non negative indeces make sense here; actually, the token iterator supports an extended semantics where intuitively an index of "-1" represents the prefix of the current match result.
    So, if the Jth index is -1 the resulting sequence will be

    (*R)[i1],(*R)[i2], ..., (*R)[iJ-1], R->prefix(), (*R)[iJ+1], ... ,(*R)[iN], ++R, ...

    moreover, whenever a -1 index appears in V it further extends the semantics by adding a last element to the sequence represented by T, this time consisting in the suffix of the current ( and thus the last ) match result.

    So again, if the Jth index is -1 the resulting sequence will end with

    ..., (*R)[i1],(*R)[i2], ..., (*R)[iJ-1], R->prefix(), (*R)[iJ+1], ... ,(*R)[iN], R->suffix()

    the rational being that the remnant unmatched part of S could be considered the prefix of the "end" of S.

    In this way, initializing T with a single -1 index will exactly split S in substrings delimited by the specififed pattern. In the example above, sregex_token_iterator( "a 1 b 10 c 100 d 1000 e", "\\d+", -1 ) will give the sequence "a "," b "," c "," d "," e".

    and that's it
    Last edited by superbonzo; December 14th, 2011 at 05:27 AM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured