public class UTF8CharacterHandler extends Object implements EncodedCharacterHandler
Constructor and Description |
---|
UTF8CharacterHandler() |
Modifier and Type | Method and Description |
---|---|
boolean |
canEncode(char c)
Returns whether or not the character encoding can encode the given
character.
|
int |
convertCharacter(int c)
Modify a character returned by
readCharacter to make
different character classes compare equal. |
protected int |
decode(int[] charData,
int offset,
int length) |
CharacterClass |
getCharacterClass(int c,
boolean inWord)
Test the character class of a character returned by
readCharacter . |
String |
getEncodingName()
Return the name of the encoding supported by this handler.
|
int |
readCharacter(ByteBuffer buffer)
Decode the character at the current buffer position.
|
int |
readPreviousCharacter(ByteBuffer buffer)
Decode the character before the character at the current buffer position.
|
public int readCharacter(ByteBuffer buffer) throws BufferUnderflowException, IndexOutOfBoundsException, CharacterCodingException
EncodedCharacterHandler
position()
will be at the start of the
next character.readCharacter
in interface EncodedCharacterHandler
buffer
- The buffer which contains the encoded character.BufferUnderflowException
- if the end of the buffer is reached before a character
is completely decoded.CharacterCodingException
- if the bytes at the current buffer position are not
a legal encoded character.IndexOutOfBoundsException
public int readPreviousCharacter(ByteBuffer buffer) throws BufferUnderflowException, CharacterCodingException
EncodedCharacterHandler
position()
will be at the start of the
character returned. Calling this method multiple times will effectively read the encoded
string backwards.readPreviousCharacter
in interface EncodedCharacterHandler
buffer
- The buffer which contains the encoded character.BufferUnderflowException
- if the end of the buffer is reached before a character
is completely decoded.CharacterCodingException
- if the bytes at the current buffer position are not
a legal encoded character.protected int decode(int[] charData, int offset, int length) throws CharacterCodingException
CharacterCodingException
public int convertCharacter(int c)
EncodedCharacterHandler
readCharacter
to make
different character classes compare equal. This is used for searching and indexing to
treat certain character classes as identical with respect to comparison. Examples are
uppercase and lowercase western characters, or katakana and hiragana. What characters
are converted is dependent of the class implementing this interface and may be
configured by modifying the object's state.convertCharacter
in interface EncodedCharacterHandler
c
- The character to convert, as returned by
readCharacter
.public CharacterClass getCharacterClass(int c, boolean inWord)
EncodedCharacterHandler
readCharacter
. These are specialized character classes
which do not directly map to any unicode character classes. They are used during index
creation to decide if the current character is part of an indexable word.getCharacterClass
in interface EncodedCharacterHandler
c
- The character to test.inWord
- true
, if the character before the current character was in
character class ROMAN_WORD
. This may influence
the character class of the tested character.public boolean canEncode(char c)
EncodedCharacterHandler
canEncode
in interface EncodedCharacterHandler
public String getEncodingName()
EncodedCharacterHandler
java.nio.charset.Charset
.getEncodingName
in interface EncodedCharacterHandler
Copyright © 2001-2013 the JGloss developers. All Rights Reserved.