Unicode displayment library (unidisp)
This library allows (quite clever) displaying of unicode characters
on non-unicode terminals. It tries to display even characters which
are not in current font: in many cases, it is possible to display "a"
with ~ over it as "~a" or even as "a".
Following functions are callable from user-level program. Nothing
more than this functions present should be assumed about unidisp.
- void inituni( void );
This should be called once at
beggining of the program.
- char *getlocalchar( int );
Converts given unicode
character into (null terminaled) string of characters, which can be
display on local terminal. If null is returned, library does not know
how to display this character.
- int getunichar( unsigned char );
Converts given local
character into its unicode representation.
Anyone who wants to call functions from unidisp library should
include file unidisp.h.
Format of unicode map
Unidisp library uses unicode maps to convert characters between
local encoding and unicode. These tables should be stored in
(i.e. /usr/lib/unimap/map.ISO-8859-2). (Look bellow for explanation of
charsets). These maps have following format:
- 256 lines with hexadecimal numbers
- n-th line corresponds to n-th character in local
charset. Each line contains hexadecimal number (unicode value of
character) or string uprt for characters that are not printable.
- decimal number
- Recommended size of hash table. This
should be prime number (2-3 times) bigger than number of entries in
translation table (look below)
- decimal number
- Total length of strings in translation table +
total number of strings (so this gives amount of space which will be
needed for C-like allocation)
- translation table
- Each line contains hexadecimal number of
unicode character (4 characters), a space, and a string to which
should be this unicode character expanded
Format of charset description
Since it would be quite hard to create unicode maps by hand, they are
usually compiled by mkunimap script from charset descriptions
files (recommended name is src.[charset], place them to same
directory with unicode maps). Each line consists
AA BBBB [description]
where AA is local character code (first 256 lines must contain
descriptions of characters 00..FF, sorted)
in hex, followed by tab and unicode code of character (optionally
followed by description). If character is unprintable, replace BBBB
with "uprt". See map.ISO-8859-2 for example of such
file. Note that there may be more than one line with same code at the
begining. This is usable for accents: character 0x27
(apostrophe - ') is both apostrophe and accent.
Environment variable CHARSET is expected to contain description of charset
used on current terminal. Charset value should be the same as charset
value defined in mime standart (see RFC 2046, page 8-9).
The defined charset values are:
(1) US-ASCII -- as defined in ANSI X3.4-1986 [US-ASCII].
(2) ISO-8859-X -- where "X" is to be replaced, as
necessary, for the parts of ISO-8859 [ISO-8859]. Note
that the ISO 646 character sets have deliberately been
omitted in favor of their 8859 replacements, which are
the designated character sets for Internet mail. As of
the publication of this document, the legitimate values
for "X" are the digits 1 through 10.
- Decide what language encoding(s) you use on your system
- Make sure you have perl installed
- Create charset description files if neccessary
- Compile charset description files into unicode maps
- Make sure environment variable charset contains right value
Where to get it
Currently nowhere, if you are asking for complete
distribution. Copy of directory where I'm working on forum is readable
by world, and it has subdirectory 'mforum' which will be probably
interesting for you (look for 'unidisp.c').