Input methods
Input methods
-
How to get text into an application from a user with
a keyboard and mouse is a major issue
-
The keyboard is typically the American-English keyboard,
although there are keyboards for some special languages
-
There needs to be a mechanism to translate keystrokes from a
normal keyboard into Unicode characters
-
The user is not expected to know Unicode - only their own language
-
An input method is a mechanism for entering characters
typically from one language using a normal keyboard
-
There can be many input methods for a language.
e.g. for Chinese there are pinyin (latin alphabetic form)
-
References
Using Input Methods on the JavaTM Platform,
"Java Internationalisation" book
Input Method Framework Specification
International text in JDK 1.2
Background
-
The IMF relies heavily on how Java handles events
-
A background in this is necessary to understand IMF
Event queue
-
Java maintains an event queue consisting of input events
-
Input events are key events and mouse events
-
The queue is managed by two threads: one to put events onto the queue,
the other to dispatch events from the queue
-
Placing events into the queue and handling them on removal is different for
AWT objects and Swing objects
AWT objects and events
-
Objects such as
Button
rely on native objects
-
The native object is called a peer object, and handles all
events and drawing
-
Under Linux, the
Button
peer is a Motif object
XmPushButton
, which is written in C
-
A button press event is generated and handled by
-
A button release event is generated and handled by
-
The native peer object (the Motif
XmPushButton
) recognises
that it has been "clicked" and generates an action event
Swing objects and events
-
Objects such as
JButton
rely only on simple mechanisms such
as drawing lines and strings - a Swing object does not rely on an
associated peer object
-
A button press event is handled by
-
A button release event is handled by
-
This time the Swing object recognises that it has been "clicked" and sends itself
an action event
AWT versus Swing
-
AWT relies on native peer objects to handle events
-
If the peer objects support i18n, then the AWT objects can as well
-
If the peer objects don't, then it is hard for the AWT objects to do so
-
Swing objects rely less on the native window system, and handle more by
Java code
-
i18n support is better in Swing because it is cross-platform code
-
Use Swing objects for more portable code
Listeners
-
Key and mouse listeners are invoked by the event queue handler before
being given to the target object
-
The listeners can modify or consume the event
Changing key values
This program uses a listener to change all keys to upper case
Discarding characters
The following program only accepts alphabetic characters
Attributed text
-
Drawing within any UI object is done in the
paint(Graphics g)
method
-
A
Graphics
object has methods such as drawLine()
,
drawChars()
etc
-
The method
Graphics.drawString(AttributedStringIterator, int, int)
uses a string which has additional "attribute markers" added to it
-
Text attributes include
-
background colour
-
font
-
foreground colour
-
direction of text
-
strikethrough, underline, weight
-
A program to draw attributed text is
Terms
From Concepts in C/UNIX Internationalisation
-
pre-edit area - window for display of intermediate text entered during pre-editing.
-
auxiliary area - window used for popup menus and dialogs that may be necessary for the input method.
-
on-the-spot pre-editing - pre-editing in which the pre-edited text appears at the text insertion location, rendered identically to the surrounding text.
-
over-the-spot pre-editing - pre-editing in which the pre-edited text appears in an input method window placed over the point of insertion.
-
off-the-spot pre-editing - pre-editing in which the pre-edit window is inside the client window, but not over the point of insertion; usually at the bottom of the client window.
-
root-window pre-editing - pre-editing in which the pre-edit window is the child of the root window.
-
input server - a process separate from the client which implements an input method.
-
An input method may be implemented as a stub communicating with an input server or as a local library.
-
input architecture - either front end architecture or back end architecture.
-
front end architecture - an input architecture in which there are two connections to the server. Keystrokes go from the X server to the input method on one connection and other events go to the server on the other connection. The input method sends composed strings to the client.
-
back end architecture - an input architecture in which a dispatching mechanism in the client delegates appropriate keystrokes to the input method which returns them to the client for display.
-
input context - a combination of a input method, locale specifying the encoding of strings returned by the input method, a client window, internal state information, and layout and appearance characteristics; associated one-to-one with an input method.
-
Compound text encoding is the standard format for exchanging textual data between X window system applications.
-
compound text - a string which is stored in the compound text encoding.
-
compound text encoding - an encoding in which a string is encoded and stored along with the name of its encoding. Useful for sharing text between clients running in different locales.
-
tagged text - compound text in which each segment is preceded by a tag that identifies the locale.
-
structured text - text which is a list of objects, each object containing a text segment with attributes (font, style, point size, locale).
-
Context dependencies do not extend beyond white space in a string.
-
context dependency - the choice of a glyph depends on the position of the corresponding character in the text string.
Naive Unicode editor
The following text editor uses a very naive way of entering unicode text.
It uses on-the-spot pre-editing
Problems with naive editor
-
The "language" is hard-coded into the editor - it should be "pluggable"
at runtime by the user
-
There is no choice of languages
-
The pre-edit style is hard-coded into the editor
Java Input Method Framework
-
The Java Input Method Framework allows applications to use input
methods in a cross-platform manner
-
An input method filters keystrokes before they get
to a text component
-
Input methods are contained in
jar
files that
are placed in standard locations
-
Input methods may be chosen from an application's system menu,
or set by a text component. For Linux, which doesn't have a
single window manager, the tool
InputMethodHotKey
can be used to define a hot key:
java -jar InputMethodHotKey.jar
This shows a GUI that can be used to set a hot key for all Java
applications.
This jar file can be downloaded from
http://java.sun.com/products/jfc/tsc/articles/InputMethod/inputmethod.html
-
Input methods are associated with locales: the application
asks to install an input method that handles a particular locale
text = ...
locale = ...
InputContext context = text.getInputContext();
context.selectInputMethod(locale);
Using the IMF
Method 1: don't change the application, let the user choose by hotkeys or system menu
While this is running, use the hot key or the system menu to get a selection
of input methods
Using the IMF
Method 2: code the input method into the application
Event handling with input methods
-
The input method may pass an event along unchanged
-
The input method may consume events and generate new input method events
-
Text sent to a text object may be committed or converted. That is, it is the final
text after it has managed by the input system
-
Text sent to a text object may be raw text or composed text.
That is, it is still in some intermediate stage of input before becoming
finalised
Input styles
The IMF supports three styles:
Pig Latin
-
Children often play language games: a "secret" language is formed
by "encoding" each word into Pig Latin
-
In this "language", each word is transformed according to these rules
-
For words which begin with consonants, take the
consonants off the front of the word, add them to the end
and then further add "ay" after the consonants.
So
-
cat becomes atcay
-
gorilla becomes orillagay
-
still becomes illstay
-
scratch becomes atchscray
-
For words that begin with a vowel, just add "yay" to the end
-
add becomes addyay
-
ouch becomes ouchyay
-
adapt becomes adaptyay
-
A sentence such as the cat sat on the mat becomes
ethay atcay onsay ethay atmay
-
Some people become quite adept at talking and listening in this
"language", and there are several Web sites for Pig Latin
Locale for Pig Latin
-
There is no ISO recognition of Pig Latin, and none by other
authorities, although
x-pig-latin
is unofficially
recognised as a possible value for the LANG
-
The language in this case is English
-
The dialect could be any variety of English, for
example British English or US English
-
The language and country arguments are restricted to values
defined by ISO, so we can't define our own anyway;
the variant argument is described by Sun as "vendor or browser-specific",
but others say it can be used for any other refinement
-
So ... we regard Pig Latin as a variant of English
-
We define a constant locale for this, for convenience
Locale PIG_LATIN_LOCALE = new Locale("en", "", "x-pig-latin")
Pig Latin input method
-
There needs to be an input method for this locale, that
will take, say, a word in ordinary English and transform
it to reversed form.
-
The input method should be able to examine characters and
decide when a word has finished: say by a punctuation character
-
The input method should take the "raw" text and convert it
to "committed" text when the conversion occurs
Input method descriptor
Pig Latin input method descriptor
Finding the input method
-
A client using input methods should be able to be written
without a knowledge of what input methods are available
-
For example, a program running in China should be able to use one of the
Chinese input methods; a program running in Thailand should be able
to use a Thai input method; a program running in India should be able
to use one of the Indian input methods
-
An input method should be regarded (usually) as a runtime
configuration parameter, and
a program should be able to be written with no knowledge of its runtime
environment
-
An
InputMethodDescriptor
provides one level of indirection:
it describes what locales can be handled before instantiating the code
to do this
-
InputMethodDescriptor
's have to be made available to the
Java runtime
-
Choices could be
-
put the descriptors in the classpath
-
put the descriptors in some sort of configuration file
-
put the descriptors in a standard location
-
The Input Method Framework puts them in a particular configuration
file in a standard location - maybe not ideal, but reasonably
simple
Input descriptor configuration file
Pig Latin Input Method
Presenting alternative choices
-
Short English word list at http://www.langmaker.com/wordlist/basiclex.htm
-
Many lookup methods give a set of possible choices
-
If you enter Chinese text using Pinyin (represent Chinese characters using the
latin alphabet), then one Pinyin word can have many characters. So after entry
of the Pinyin, you still need to select the character
-
In Arabic, a character could have many presentation forms. Once a character has been
chosen, the user may still need to select a presentation form
-
In predictive text, the currently entered text is used to predict
the possible words
Presenting choices
-
Presenting choices means that there must now be three states:
-
entering raw text
-
selecting from choices
-
committing selected choice
-
In addition, a window must be brought up to show the choices
-
It is common to use the space character in this:
one space ends a word; two spaces bring up a lookup window;
each successive space moves you through the choices;
return commits the current selection
-
Jan Newmarch (http://jan.newmarch.name)
jan@newmarch.name
Last modified: Mon Aug 14 11:10:05 EST 2006
Copyright ©Jan Newmarch