Unicode displayment library (unidisp)

This library allows (quite clever) displaying of unicode characters on non-unicode terminals. It tries to display even characters which are not in current font: in many cases, it is possible to display "a" with ~ over it as "~a" or even as "a".

Interface

Following functions are callable from user-level program. Nothing more than this functions present should be assumed about unidisp.

Anyone who wants to call functions from unidisp library should include file unidisp.h.

Format of unicode map

Unidisp library uses unicode maps to convert characters between local encoding and unicode. These tables should be stored in /usr/share/unimap/map.[charset] files (i.e. /usr/lib/unimap/map.ISO-8859-2). (Look bellow for explanation of charsets). These maps have following format:

ABRAKADABRA
256 lines with hexadecimal numbers
n-th line corresponds to n-th character in local charset. Each line contains hexadecimal number (unicode value of character) or string uprt for characters that are not printable.
DABRAKAABRA
decimal number
Recommended size of hash table. This should be prime number (2-3 times) bigger than number of entries in translation table (look below)
decimal number
Total length of strings in translation table + total number of strings (so this gives amount of space which will be needed for C-like allocation)
translation table
Each line contains hexadecimal number of unicode character (4 characters), a space, and a string to which should be this unicode character expanded

Format of charset description

Since it would be quite hard to create unicode maps by hand, they are usually compiled by mkunimap script from charset descriptions files (recommended name is src.[charset], place them to same directory with unicode maps). Each line consists of:

AA  BBBB  [description]
where AA is local character code (first 256 lines must contain descriptions of characters 00..FF, sorted) in hex, followed by tab and unicode code of character (optionally followed by description). If character is unprintable, replace BBBB with "uprt". See map.ISO-8859-2 for example of such file. Note that there may be more than one line with same code at the begining. This is usable for accents: character 0x27 (apostrophe - ') is both apostrophe and accent.

Charset

Environment variable CHARSET is expected to contain description of charset used on current terminal. Charset value should be the same as charset value defined in mime standart (see RFC 2046, page 8-9).

   The defined charset values are:

    (1)   US-ASCII -- as defined in ANSI X3.4-1986 [US-ASCII].
    (2)   ISO-8859-X -- where "X" is to be replaced, as
          necessary, for the parts of ISO-8859 [ISO-8859].  Note
          that the ISO 646 character sets have deliberately been
          omitted in favor of their 8859 replacements, which are
          the designated character sets for Internet mail.  As of
          the publication of this document, the legitimate values
          for "X" are the digits 1 through 10.

Installation

Where to get it

Currently nowhere, if you are asking for complete distribution. Copy of directory where I'm working on forum is readable by world, and it has subdirectory 'mforum' which will be probably interesting for you (look for 'unidisp.c').


Pavel Machek
pavel@atrey.karlin.mff.cuni.cz