org.thdl.tib.text.tshegbar
Interface UnicodeReadyThunk

All Known Implementing Classes:
TshegBar, UnicodeGraphemeCluster

public interface UnicodeReadyThunk

A UnicodeReadyThunk represents a string of codepoints. While there are ways to turn a string of Unicode codepoints into a list of UnicodeReadyThunks (DLC reference it), you cannot necessarily recover the exact sequence of Unicode codepoints from a UnicodeReadyThunk. For codepoints that are not Tibetan Unicode and are not one of a handful of other known codepoints, only the most primitive operations are available. Generally in this case you can recover the exact string of Unicode codepoints, but don't bank on it.

Author:
David Chandler

Method Summary
 String getUnicodeRepresentation()
          Returns a sequence of Unicode codepoints that is equivalent to this thunk if possible.
 boolean hasUnicodeRepresentation()
          Returns true iff there exists a sequence of Unicode codepoints that correctly represents this thunk.
 boolean isTibetan()
          Returns true iff this thunk is entirely Tibetan (regardless of whether or not all codepoints come from the Tibetan range of Unicode 3, i.e.
 

Method Detail

isTibetan

public boolean isTibetan()
Returns true iff this thunk is entirely Tibetan (regardless of whether or not all codepoints come from the Tibetan range of Unicode 3, i.e. U+0F00-U+0FFF, and regardless of whether or not this thunk is syntactically legal Tibetan).


getUnicodeRepresentation

public String getUnicodeRepresentation()
                                throws UnsupportedOperationException
Returns a sequence of Unicode codepoints that is equivalent to this thunk if possible. It is only possible if hasUnicodeRepresentation() is true. Unicode has more than one way to refer to the same language element, so this is just one method. When more than one Unicode sequence exists, and when the thunk is Tibetan, this method returns sequences that the Unicode 3.2 standard does not discourage.

Returns:
a String of Unicode codepoints
Throws:
UnsupportedOperationException - if hasUnicodeRepresentation() is false

hasUnicodeRepresentation

public boolean hasUnicodeRepresentation()
Returns true iff there exists a sequence of Unicode codepoints that correctly represents this thunk. This will not be the case if the thunk contains Tibetan characters for which the Unicode standard does not provide. See the Extended Wylie Transliteration System (EWTS) document (DLC ref, DLC mention Dza,fa,va doc bug) for more info, and see the Unicode 3 standard section 9.13. The presence of head marks or multiple vowels in the thunk would cause this to return false, for example.



These API docs were created 02/02/2003 08:19 PM.
Copyright © 2001-2002 Tibetan and Himalayan Digital Library. All Rights Reserved.
Hosted by SourceForge_Logo