public class KanjiParser extends AbstractParser
The parser will look for a character which is either a kanji or katakana. Then it
will search for the first character which is not a kanji/katakana. If this word
consists of katakana characters, it will be looked up immediately and the result added
to the annotation list. If it is a kanji word, if the following
character is the reading annotation start delimiter, all following characters until the
reading annotation end delimiter are treated as reading for the word.
Hiragana characters directly following the word or the reading annotation will be used as possible
verb/adjective inflections. The createAnnotations
method will then be called
with this word/reading/hiragana tuple.
annotatedWords, exclusions, firstOccurrenceOnly, ignoreNewlines, parsePosition
Constructor and Description |
---|
KanjiParser(Dictionary[] dictionaries,
Set<String> exclusions)
Creates a new parser which will use the given dictionaries, use no reading annotation
delimiters and will cache dictionary lookups and not ignore newlines.
|
KanjiParser(Dictionary[] dictionaries,
Set<String> exclusions,
boolean firstOccurrenceOnly)
Creates a new parser which will use the given dictionaries, use no reading annotation
delimiters and will cache dictionary lookups and not ignore newlines.
|
KanjiParser(Dictionary[] dictionaries,
Set<String> exclusions,
boolean cacheLookups,
boolean ignoreNewlines,
boolean firstOccurrenceOnly)
Creates a new parser which will use the given dictionaries.
|
Modifier and Type | Method and Description |
---|---|
int |
getCacheHits()
Returns the number of dictionary lookups where the result was found in the
lookup cache.
|
Locale |
getLanguage()
Returns the language which the parser can parse.
|
int |
getLookups()
Returns the number of dictionary lookups.
|
String |
getName()
Returns the name of the parser in a user-presentable form.
|
List<TextAnnotation> |
parse(char[] text,
int start,
int length)
Parses the text, returning a list with annotations for words in the text.
|
void |
reset()
Clears the lookup cache.
|
getParsePosition, ignoreWord, isAnnotateFirstOccurrenceOnly, isIgnoreNewlines, setAnnotateFirstOccurrenceOnly, setIgnoreNewlines
public KanjiParser(Dictionary[] dictionaries, Set<String> exclusions)
dictionaries
- The dictionaries used for word lookups.exclusions
- Set of words which should not be annotated. May be null
.public KanjiParser(Dictionary[] dictionaries, Set<String> exclusions, boolean firstOccurrenceOnly)
dictionaries
- The dictionaries used for word lookups.exclusions
- Set of words which should not be annotated. May be null
.public KanjiParser(Dictionary[] dictionaries, Set<String> exclusions, boolean cacheLookups, boolean ignoreNewlines, boolean firstOccurrenceOnly)
dictionaries
- The dictionaries used for word lookups.exclusions
- Set of words which should not be annotated. May be null
.cacheLookups
- true
if dictionary lookups should be cached.ignoreNewlines
- If this is true
, 0x0a and 0x0d characters in the parsed text
will be ignored and the character immediately before and after the newline
will be treated as if forming a single word.public List<TextAnnotation> parse(char[] text, int start, int length) throws SearchException
text
- The text to parse.SearchException
- If an error occurrs during a dictionary lookup.public void reset()
reset
in interface Parser
reset
in class AbstractParser
public int getLookups()
public int getCacheHits()
public String getName()
Parser
Copyright © 2001-2013 the JGloss developers. All Rights Reserved.