Skip to content

cmp_d_to_utype

Vaughan Kitchen edited this page Dec 13, 2016 · 1 revision

About

In order to perform a thorough test of the Unicode ctype-like methods a reference implementation was needed. As the D language's Phobos runtime library already had an implementation, that is an obvious comparison (especially since D is a nice language to program in). The cmp_d_to_utype.d program performs that comparison and reports on any discrepancies between the JASS Unicode implementation and the Phobos implementation.

There are differences. The Phobos implementation claims to be derived from Unicode version 6.2, but it is not. It is derived from 6.3, with some additions from 7.0. When the JASS Unicode methods are derived from Unicode 6.3 the following differences are observed (those that were checked are Unicode 7.0 changes):

GRAPH:Codepoint:0x180E D says 0
SPACE:Codepoint:0x180E D says 0
WHITESPACE:Codepoint:0x180E D says 0
PUNCT:Codepoint:0x207B D says 0
PUNCT:Codepoint:0x208B D says 0
PUNCT:Codepoint:0x2212 D says 0
PUNCT:Codepoint:0x2308 D says 1
SYMBOL:Codepoint:0x2308 D says 0
PUNCT:Codepoint:0x2309 D says 1
SYMBOL:Codepoint:0x2309 D says 0
PUNCT:Codepoint:0x230A D says 1
SYMBOL:Codepoint:0x230A D says 0
PUNCT:Codepoint:0x230B D says 1
SYMBOL:Codepoint:0x230B D says 0

Usage

On Mac OS X there's a shell script test_unicode_database_to_c.sh That does this so it is highly unlikely that the following steps will need to be manually performed.

First build a version of the Unicode methods from version 6.3 of the Unicode standard. The necessary files are in the external/Unicode directory.

unicode_database_to_c UnicodeData_v6_3.txt  PropList_v6_3.txt  CaseFolding.txt > unicode_v6_3.cpp

Then compile that file, but do not link it

g++ -c unicode_v6_3.cpp -o unicode_v6_3.o

Now build this tool using your favourite D compiler (dmd was used during development)

dmd cmp_d_to_utype.d unicode_v6_3.o

Now perform the comparison (and discover that they don't match, see above)

./cmp_d_to_utype
Clone this wiki locally