Post by DhirenHi,
Thank you for the reply.
Actually I am working on the DirectX 9.0 API's D3DXCreateText
function to create 3DText. Now D3DXCreateText requires a string of
character codes to create the 3DText. However, D3DXCreateText does not
do complex script processing for languages such as Arabic and Hindi, as
is done by the Windows API ExtTextOut. D3DXCreateText just takes the
character code for one character at a time, obtains it's corresponding
glyph using the CMAP table of the font and creates 3DText using that
glyph's shape.
However, for languages such as Arabic, the shape of a character
(i.e. it's Unicode character code) changes depending on its position in
the string. Hence I wanted to emulate the functionality of the
ExtTextOut API to obtain the actual character codes that get painted in
a window by ExtTextOut. Therefore, to obtain these new character codes,
I was using the Uniscribe API ScriptShape, which returns me the glyph
indices.
You are referring to the process of contextual-shaping. However even though
the string looks different,
there are no 'new character codes' to obtain. The character-codes do _not_
change. Only their appearance changes due to surrounding characters - which
is why you get sometimes get different glyphs for different characters.
There is a "many : many" mapping between characters and glyphs. However it
is impossible to perform the reverse mapping because:
A single character can result in multiple glyphs being generated
A single character can result in one glyph being generated.
A single character can result in a *different* glyph being generated,
depending on that character's context.
Multiple characters (combining sequences) can result in a single, or
multiple glyphs.
Any, and all of the above combinations can occur when rendering Unicode
text. The font you are using has a strong impact on the glyph-generation
process also. Without further knowledge of how Unicode works, you need to
accept that you cannot map a glyph back to a character-code.
Post by DhirenFor e.g. after installing fonts to support Hindi, if Hindi is
selected as the language in the Language bar, and if the Devanagiri
character for Shra (used in the name SHRAvan) is generated by pressing
Shift + 8, the character appears correctly in the Window Edit box
control. Now, when I do GetWindowText() I get a string of 3 characters.
After that, I create a Memory DC into which I select the Arial Unicode
MS font (since it supports character sets for almost all languages). I
then pass the string to ScriptItemize and then to ScriptShape.
Now ScriptShape returns me a single resultant glyph index for the
input string of 3 characters. The value of this glyph index is 7085. To
check whether 7085 is the glyph index for any character code from 0 to
65535 in the Arial Unicode MS font, I also used the GetGlyphIndices
function 65535 times, passing it a string of length 1 corresponding to
each character code and wrote the char code - glyph index mapping to a
file. However, I found out that the glyph index did not match for any
character code from 0 to 65535. In fact, I found out that there were 4
sets of ranges in ascending order for the glyph indices. They were
3-5428, 8355-..., ...-... and 5429...64.. (don't remember all the
values). Any index even cloase to 7000 did not figure in the list of
glyph indices.
please don't take this the wrong way, but I would suggest not wasting any
more time on this approach....
Post by DhirenEarlier I had hooked the ExtTextOut API to my own function using
a DLL hooking code downloaded from the Net. I found out that the
ETO_GLYPH_INDEX flag was set for the fuOptions parameter in the call to
ExtTextOut. This means, that the Edit control either uses the Uniscribe
API ScriptShape or the obsolete Win32 API GetCharacterPlacement,
obtains a list of glyph indices from them, and then passes the glyph
indices to ExtTextOut. Now, using these glyph indices, how does
ExtTextOut lookup the font data to draw the correct character in a
window. If I am also able to somehow use glyph indices to lookup the
font data for the corresponding character codes, then my problem will
be solved.
Regards,
Dhiren.
On Windows XP at least, the EDIT control uses the Uniscribe ScriptString
API, so it is Uniscribe calling *back into* ExtTextOut with the
ETO_GLYPH_INDEX flag.
Once ExtTextOut has 'glyph data' it does not really lookup any font data to
'draw the correct character' as you say. The font-data has already been
accessed. Uniscribe is a wrapper library over the lower-level OpenType
services that Windows provides. The process of generating Glyph-Indices from
character-codes is extremely complex and requires access to an font's
internal OpenType tables. Uniscribe makes this process a little more
bearable. But once you have a glyph-index there is nothing more to do.
ExtTextOut does not 'draw a character'. It draws a vector graphic (with a
particular glyph-index) contained in the font you are using.
So in summary:
It is impossible to map glyph-indices back to character-codes. You must
maintain your original Unicode string, and use the Uniscribe 'logical
character attribute' array to perform the mapping from Unicode characters ->
glyph indices.
Lots of info about Uniscribe in the tutorials section of my website:
--
James Brown
Microsoft MVP - Windows SDK
www.catch22.net
Free Win32 Tutorials and Sourcecode