Genvid Forum

UTF8 encoding bug in GenvidStreams


#1

UPDATE: Have just noticed the other post about Japanese multibyte issue, there is already a patch that fixes this problem.

The CompressString function in GenvidStreams.cpp (AND GenvidStreamer.cpp) has a serious bug with handling UTF-8 strings.

In the case where compression is not used (the default, I believe, as there doesn’t seem to be any decompression implemented on the web side yet?), see this snippet:

	compressed.SetNum(stringLen + HeaderSize);

	uint8* DestBuffer = compressed.GetData();
	FMemory::Memcpy(DestBuffer, &IsCompressedHeader, HeaderSize);

	DestBuffer += HeaderSize;
	FMemory::Memcpy(DestBuffer, TCHAR_TO_UTF8(str), stringLen);

… stringLen is the incorrect length to use, as it will be different to the length of the byte array produced by TCHAR_TO_UTF8 in the case where any unicode characters need to be encoded to more than one byte. This means the string gets truncated, and produces invalid JSON when decoding it in the web client.

This should be replaced with something like:

	FTCHARToUTF8 utf8Str(str);
	int32 utf8Len = utf8Str.Length();

	compressed.SetNum(utf8Len + HeaderSize);

	uint8* DestBuffer = compressed.GetData();
	FMemory::Memcpy(DestBuffer, &IsCompressedHeader, HeaderSize);

	DestBuffer += HeaderSize;

	FMemory::Memcpy(DestBuffer, utf8Str.Get(), utf8Len);

(again in both GenvidSteams.cpp AND GenvidStreamer.cpp). i.e. use the encoded string length as the actual number of bytes to copy into the buffer.

Thanks,

Adrian


#2

Hi Adrian,

I will let the team know about this. We will be sure this is also in the fix for the 1.14.0 release coming soon.

Thanks!
Sophie