Pluf_Text_Lang Class Reference

List of all members.

Static Public Member Functions

static detect ($string, $is_clean=false)
static docNgrams ($string, $n=3)
static makeNgrams ($word, $n=3)
static ngramDistance ($n1, $n2)


Detailed Description

Detect the language of a text.

list($lang, $confid) = Pluf_Text_Lang::detect($string);


Member Function Documentation

static Pluf_Text_Lang::detect ( string,
is_clean = false 
) [static]

Given a string, returns the language.

Algorithm by Cavnar et al. 94.

Parameters:
string 
bool Is the string clean (false)
Returns:
array Language, Confidence

static Pluf_Text_Lang::docNgrams ( string,
n = 3 
) [static]

Returns the sorted n-grams of a document.

FIXME: We should detect the proportion of thai/chinese/japanese characters and switch to unigram instead of n-grams if the proportion is greater than 50%.

Parameters:
string The clean document.
int Maximum size of the n grams (3)
Returns:
array N-Grams

static Pluf_Text_Lang::makeNgrams ( word,
n = 3 
) [static]

Returns the n-grams of rank n of the word.

Parameters:
string Word.
Returns:
array N-grams

static Pluf_Text_Lang::ngramDistance ( n1,
n2 
) [static]

Return the distance between two document ngrams.

Parameters:
array n-gram
array n-gram
Returns:
integer distance


The documentation for this class was generated from the following file:

Generated on Wed Feb 3 15:44:52 2010 for Pluf by  doxygen