public class StringTools extends Object
Modifier and Type | Method and Description |
---|---|
static StringBuilder |
addToRegex(CharSequence text,
StringBuilder regex)
Appends a character sequence to a regular expression pattern, escaping any
special characters.
|
static StringBuilder |
addToRegex(char c,
StringBuilder regex)
Appends a character to a regular expression pattern, escaping any special characters.
|
static boolean |
containsKanji(String word)
Test if a string contains any kanji characters.
|
static boolean |
isCJKSymbolsAndPunctuation(char c) |
static boolean |
isCJKUnifiedIdeographs(char c) |
static boolean |
isHiragana(char c) |
static boolean |
isKana(char c) |
static boolean |
isKanji(char c)
Test if c is either in the character class of CJK unified ideographs or is the kanji repeat mark.
|
static boolean |
isKatakana(char c) |
static String[][] |
splitWordReading(String word,
String reading)
Split a kanji/kana compound word in kanji and kana parts.
|
static String[][] |
splitWordReading(String inflectedWord,
String baseWord,
String baseReading)
Split a kanji/kana compound word in kanji and kana parts.
|
static String |
toHiragana(String s) |
static String |
toHiragana(String s,
boolean ignoreSpecialChars)
Returns a new string with all katakana characters in the original string converted to
hiragana.
|
static String |
toKatakana(String s)
Returns a new string with all hiragana characters in the original string converted to
katakana.
|
static String |
toKatakana(String s,
boolean ignoreSpecialChars)
Returns a new string with all hiragana characters in the original string converted to
katakana.
|
static Iterable<String> |
tokenize(String string,
String delimiter) |
static Character.UnicodeBlock |
unicodeBlockOf(char c)
Returns the unicode block of a character.
|
static String |
unicodeEscape(char c)
Return the unicode escape string for a character.
|
static String |
unicodeUnescape(String str)
Returns a new string with all unicode escape sequences replaced with the character
represented by the sequence.
|
public static Character.UnicodeBlock unicodeBlockOf(char c)
Character.UnicodeBlock.of
for Japanese characters, but will work slower
for other scripts.public static boolean isKatakana(char c)
public static boolean isHiragana(char c)
public static boolean isKana(char c)
public static boolean isCJKUnifiedIdeographs(char c)
public static boolean isCJKSymbolsAndPunctuation(char c)
public static boolean isKanji(char c)
public static String toHiragana(String s, boolean ignoreSpecialChars)
public static String toKatakana(String s)
public static String toKatakana(String s, boolean ignoreSpecialChars)
public static boolean containsKanji(String word)
isKanji
method.public static String[][] splitWordReading(String word, String reading)
splitWordReading( word, word, reading)
.public static String[][] splitWordReading(String inflectedWord, String baseWord, String baseReading)
inflectedWord
- Inflected form of the kanji/kana word. Everything after the last kanji
character is treated as inflected form and added to the output array as last element.baseWord
- Dictionary form of the kanji/kana word.baseReading
- Reading (in hiragana) of the word in base form.StringIndexOutOfBoundsException
- if the word/base/reading tuple is not parseable.public static String unicodeEscape(char c)
unicodeUnescape(String)
public static String unicodeUnescape(String str)
unicodeEscape(char)
public static StringBuilder addToRegex(char c, StringBuilder regex)
public static StringBuilder addToRegex(CharSequence text, StringBuilder regex)
Copyright © 2001-2013 the JGloss developers. All Rights Reserved.