Discussion:
Composite Font display method
(too old to reply)
Tony Duff
2009-09-15 08:07:02 UTC
Permalink
I've got an RTF import method in my wordprocessing program.

I'm trying to add the ability to read in files created in Chinese versions
of Windows. I have taken care of the Unicode aspect but I need to take care
of the older system that uses double characters and high codes with composite
fonts.

For instance, the RTF code contains

\dbch\'co\'ao

I understand this is a double byte character but it is not a unicode. So,
once I have obtained these two charactersOnce I incorporate these letters
into my program, what is the right way to display them so that the correct
Chinese letter displays? Do I send them to textout as a string of two
characters, one after the other just as I received them with the appropriate
font and character set selected? Or what?

Thanks for your help,
Tony Duff
David Lowndes
2009-09-21 08:14:09 UTC
Permalink
Post by Tony Duff
I'm trying to add the ability to read in files created in Chinese versions
of Windows. I have taken care of the Unicode aspect but I need to take care
of the older system that uses double characters and high codes with composite
fonts.
For instance, the RTF code contains
\dbch\'co\'ao
I understand this is a double byte character but it is not a unicode. So,
once I have obtained these two charactersOnce I incorporate these letters
into my program, what is the right way to display them so that the correct
Chinese letter displays?
Tony,

Presumably your application is now Unicode?

Assuming it is, you need to use MultiByteToWideChar (or one of the
macro wrappers such as CA2WEX) with the correct code page ID.

Dave
Volodymyr Frytskyy
2011-07-22 06:59:06 UTC
Permalink
This is a multi-byte string, you can print it out just as-is if your app in non-Unicode, and use MultiByteToWideChar to convert it to Unicode string.

You might need to know codepage, in most of cases it's default system's locale, but not always, sometimes it comes from preceding \\fcharset RTF tag, sometimes it's available in font itself (in the font table), and sometimes tags \\ansi, \\mac, \\pc, \\pca, \\ansicpg, \\lang, \\langfe, \\langnp, \\noproof, and \\cpg

Another important thing to remember is that sometimes locale is specified as charset (for example, when it's from a Font Table), in this case you can use following mapping:

static const CPGCHAR rgCpgChar[] =
{
{0, ANSI_CHARSET },
{0, DEFAULT_CHARSET },
{0, SYMBOL_CHARSET },
{437, 254/*PC437_CHARSET obsolette?*/ },
{850, OEM_CHARSET },
{1250, EASTEUROPE_CHARSET },
{1255, HEBREW_CHARSET },
{932, SHIFTJIS_CHARSET },
{1251, RUSSIAN_CHARSET },
{936, GB2312_CHARSET },
{949, HANGEUL_CHARSET },
{1361, JOHAB_CHARSET },
{950, CHINESEBIG5_CHARSET },
{1253, GREEK_CHARSET },
{1254, TURKISH_CHARSET },
{1257, BALTIC_CHARSET },
{874, THAI_CHARSET },
{1256, ARABIC_CHARSET },
{10000, MAC_CHARSET}
};
I have got an RTF import method in my wordprocessing program.
I am trying to add the ability to read in files created in Chinese versions
of Windows. I have taken care of the Unicode aspect but I need to take care
of the older system that uses double characters and high codes with composite
fonts.
For instance, the RTF code contains
\dbch\'co\'ao
I understand this is a double byte character but it is not a unicode. So,
once I have obtained these two charactersOnce I incorporate these letters
into my program, what is the right way to display them so that the correct
Chinese letter displays? Do I send them to textout as a string of two
characters, one after the other just as I received them with the appropriate
font and character set selected? Or what?
Thanks for your help,
Tony Duff
Post by David Lowndes
Tony,
Presumably your application is now Unicode?
Assuming it is, you need to use MultiByteToWideChar (or one of the
macro wrappers such as CA2WEX) with the correct code page ID.
Dave
Volodymyr Frytskyy
2011-07-22 06:59:56 UTC
Permalink
This is a multi-byte string, you can print it out just as-is if your app in non-Unicode, and use MultiByteToWideChar to convert it to Unicode string.

You might need to know codepage, in most of cases it's default system's locale, but not always, sometimes it comes from preceding \\fcharset RTF tag, sometimes it's available in font itself (in the font table), and sometimes tags \\ansi, \\mac, \\pc, \\pca, \\ansicpg, \\lang, \\langfe, \\langnp, \\noproof, and \\cpg

Another important thing to remember is that sometimes locale is specified as charset (for example, when it's from a Font Table), in this case you can use following mapping:

static const CPGCHAR rgCpgChar[] =
{
{0, ANSI_CHARSET },
{0, DEFAULT_CHARSET },
{0, SYMBOL_CHARSET },
{437, 254/*PC437_CHARSET obsolette?*/ },
{850, OEM_CHARSET },
{1250, EASTEUROPE_CHARSET },
{1255, HEBREW_CHARSET },
{932, SHIFTJIS_CHARSET },
{1251, RUSSIAN_CHARSET },
{936, GB2312_CHARSET },
{949, HANGEUL_CHARSET },
{1361, JOHAB_CHARSET },
{950, CHINESEBIG5_CHARSET },
{1253, GREEK_CHARSET },
{1254, TURKISH_CHARSET },
{1257, BALTIC_CHARSET },
{874, THAI_CHARSET },
{1256, ARABIC_CHARSET },
{10000, MAC_CHARSET}
};
I have got an RTF import method in my wordprocessing program.
I am trying to add the ability to read in files created in Chinese versions
of Windows. I have taken care of the Unicode aspect but I need to take care
of the older system that uses double characters and high codes with composite
fonts.
For instance, the RTF code contains
\dbch\'co\'ao
I understand this is a double byte character but it is not a unicode. So,
once I have obtained these two charactersOnce I incorporate these letters
into my program, what is the right way to display them so that the correct
Chinese letter displays? Do I send them to textout as a string of two
characters, one after the other just as I received them with the appropriate
font and character set selected? Or what?
Thanks for your help,
Tony Duff
Post by David Lowndes
Tony,
Presumably your application is now Unicode?
Assuming it is, you need to use MultiByteToWideChar (or one of the
macro wrappers such as CA2WEX) with the correct code page ID.
Dave
Post by Volodymyr Frytskyy
This is a multi-byte string, you can print it out just as-is if your app in non-Unicode, and use MultiByteToWideChar to convert it to Unicode string.
You might need to know codepage, in most of cases it's default system's locale, but not always, sometimes it comes from preceding \\fcharset RTF tag, sometimes it's available in font itself (in the font table), and sometimes tags \\ansi, \\mac, \\pc, \\pca, \\ansicpg, \\lang, \\langfe, \\langnp, \\noproof, and \\cpg
static const CPGCHAR rgCpgChar[] =
{
{0, ANSI_CHARSET },
{0, DEFAULT_CHARSET },
{0, SYMBOL_CHARSET },
{437, 254/*PC437_CHARSET obsolette?*/ },
{850, OEM_CHARSET },
{1250, EASTEUROPE_CHARSET },
{1255, HEBREW_CHARSET },
{932, SHIFTJIS_CHARSET },
{1251, RUSSIAN_CHARSET },
{936, GB2312_CHARSET },
{949, HANGEUL_CHARSET },
{1361, JOHAB_CHARSET },
{950, CHINESEBIG5_CHARSET },
{1253, GREEK_CHARSET },
{1254, TURKISH_CHARSET },
{1257, BALTIC_CHARSET },
{874, THAI_CHARSET },
{1256, ARABIC_CHARSET },
{10000, MAC_CHARSET}
};
Continue reading on narkive:
Loading...