下面這段程式碼能夠判斷word是否只含有英文、數字、中文、日文、韓文,也能接受注音符號哦!
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
boolean isValidWord(String str) { | |
return str.matches("[" + | |
"\\p{Alnum}" + | |
"\\p{InCJK_UNIFIED_IDEOGRAPHS}" + | |
"\\p{InBOPOMOFO}" + | |
"\\p{InCJK_SYMBOLS_AND_PUNCTUATION}" + | |
"\\p{InHIRAGANA}" + "\\p{InKATAKANA}" + | |
"\\p{InHANGUL_SYLLABLES}" + | |
"]+"); | |
} |
詳細的API資訊可以參考Java的Character.UnicodeBlock,以及Pattern類別中強大的Unicode scripts。
FileFormat.Info這個網站有列出Unicode的範圍,也提供Unicode Character Search,
需要查詢特定語言的Unicode範圍時可以參考這個網站。
然而,比起用 [\u4E00-\u9FA5]+ 這類的Pattern表示式,
使用 [\p{InCJK_UNIFIED_IDEOGRAPHS}]+ 之類的UnicodeBlock可讀性會更高哦。
沒有留言:
張貼留言