Static Public Member Functions | |
| static | detect ($string, $is_clean=false) |
| static | docNgrams ($string, $n=3) |
| static | makeNgrams ($word, $n=3) |
| static | ngramDistance ($n1, $n2) |
list($lang, $confid) = Pluf_Text_Lang::detect($string);
| static Pluf_Text_Lang::detect | ( | $ | string, | |
| $ | is_clean = false | |||
| ) | [static] |
Given a string, returns the language.
Algorithm by Cavnar et al. 94.
| string | ||
| bool | Is the string clean (false) |
| static Pluf_Text_Lang::docNgrams | ( | $ | string, | |
| $ | n = 3 | |||
| ) | [static] |
Returns the sorted n-grams of a document.
FIXME: We should detect the proportion of thai/chinese/japanese characters and switch to unigram instead of n-grams if the proportion is greater than 50%.
| string | The clean document. | |
| int | Maximum size of the n grams (3) |
| static Pluf_Text_Lang::makeNgrams | ( | $ | word, | |
| $ | n = 3 | |||
| ) | [static] |
Returns the n-grams of rank n of the word.
| string | Word. |
| static Pluf_Text_Lang::ngramDistance | ( | $ | n1, | |
| $ | n2 | |||
| ) | [static] |
Return the distance between two document ngrams.
| array | n-gram | |
| array | n-gram |