Re: UTF8 length calculation
Re: UTF8 length calculation
I've never tried to use mblen(), but it strikes me that it's probably just as easy to write your own UTF8 version of strlen(). The rules of UTF8 are pretty simple; it would look something like
Code:
while (ptr)
{
if (!(*ptr & 0x80))
ptr++;
else if (*ptr & 0xC0)
ptr += 2;
/// etc
Re: UTF8 length calculation
mblen uses the current locale to return the length of a single multi-byte character. The default locale is 'C'.
Code:
#include <iostream>
#include <locale>
#include <cstdlib>
#include <cstring>
#include <climits>
using namespace std;
// taken from GNU LibC manual
// http://www.gnu.org/software/libc/manual/
size_t mbslen(const char *s)
{
mbstate_t state;
size_t result = 0;
size_t nbytes;
memset(&state, '\0', sizeof(state));
while ((nbytes = mbrlen(s, MB_LEN_MAX, &state)) > 0)
{
if (nbytes >= (size_t) -2)
return (size_t) -1; /* Something is wrong. */
s += nbytes;
++result;
}//while
return result;
}//mbslen
int main(int argc, char **argv)
{
if (argc < 2)
{
cerr << "missing parameter" << endl;
return 1;
}//if
std::locale loc(""); // construct user-default locale
std::locale::global(loc); // make it global, for std::mbrlen
std::cout.imbue(loc); // have cout use as well
const char *ptr = argv[1];
string str = ptr;
cout << "String is [" << ptr << "]" << endl;
cout << "string::length = " << str.length() << endl;
cout << "mbslen = " << mbslen(ptr) << endl;
return 0;
}//main
This assumes that the command line has the same encoding as specified by the user's default locale - which may not be the case (but is a reasonable assumption).
If you need "UTF8 length" regardless of the user's locale, you'll have to spin your own or use something like libiconv.
gg
Re: UTF8 length calculation
Use mbrtowc() function, it will help you to check the length of every character.
Re: UTF8 length calculation
Quote:
Originally Posted by
treuss
What's ptr ?
Indeed, that code will not even compile, please copy & paste your actual program.