org.thdl.tib.text.tshegbar
Class TshegBar
java.lang.Object
|
+--org.thdl.tib.text.tshegbar.TshegBar
- All Implemented Interfaces:
- UnicodeReadyThunk
- Direct Known Subclasses:
- LegalTshegBar
- public abstract class TshegBar
- extends Object
- implements UnicodeReadyThunk
A TshegBar (pronounced tsek bar) is roughly a Tibetan
syllable. In truth, it is the stuff between two tseks.
First, some terminology.
- When we talk about a grapheme cluster (or
grcl), we mean what the Unicode standard calls a "grapheme
cluster". Most glyphs (i.e., pictures) found in a font are
grapheme clusters, but the picture corresponding to the Unicode
codepoint
\u0F74
is not a grapheme cluster. In
addition, in English, many fonts have a single glyph (a
"ligature") for the combination of two grapheme clusters,
e.g. "fi". A single grapheme cluster may have one or more
representations by sequences of Unicode codepoints, or it may not
be representable becuase it is only part of one Unicode codepoint
or pictures a nonstandard character. - We will attempt to
avoid using the word "character", as it sometimes refers to a
codepoint and sometimes refers to a glyph in a font and yet other
times refers to a grapheme cluster.
- We'll try to avoid
using the word "stack" because it sometimes refers to a sequence
of stacked Tibetan consonants and sometimes refers to an entire
grapheme cluster.
- A Tibetan stack is or one or
more consonants stacked vertically, plus an optional vocalic
modification such as an anusvara (DLC what do we call a bindu?) or
visarga, plus zero or more signs like
\u0F35
,
plus an optional a-chung (\u0F71
), plus an
optional simple vowel. - By simple vowel, we mean
any of
\u0F72
, \u0F74
,
\u0F7A
, \u0F7B
,
\u0F7C
, \u0F7D
, or
\u0F80
.
(Note: The string "\u0F68\u0F7E\u0F7C"
seems to equal "\u0F00"
, though the Unicode
standard does not indicate that it is so. This code treats it
that way.)
This class allows for invalid tsheg bars, like those
containing more than one prefix, more than two suffixes, an
invalid postsuffix (secondary suffix), more than one consonant
stack (excluding the special case of what we call in THDL Extended
Wylie "'i", which is technically a consonant stack but is used in
Tibetan like a suffix).
.
Subclasses exist for valid, grammatically correct tsheg bars,
and for invalid tsheg bars. Note that correctness is at the tsheg
bar level only; it may be grammatically incorrect to concatenate
two valid tsheg bars. Some subclasses can be represented in
Unicode, but others contain nonstandard glyphs/characters and
cannot be.
- Author:
- David Chandler
Method Summary |
boolean |
isTibetan()
Returns true, as we consider a transliteration in the Tibetan
alphabet of a non-Tibetan language, say Chinese, as being
Tibetan. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
TshegBar
public TshegBar()
isTibetan
public boolean isTibetan()
- Returns true, as we consider a transliteration in the Tibetan
alphabet of a non-Tibetan language, say Chinese, as being
Tibetan.
- Specified by:
isTibetan
in interface UnicodeReadyThunk
- Returns:
- true
These API docs were created 02/02/2003 08:19 PM.
Copyright © 2001-2002 Tibetan and Himalayan Digital Library. All Rights Reserved.
Hosted by