810 B
810 B
cld2 tokenizer
A bleve tokenizer which passes the input text to the cld2 library. The library determines what it thinks the language most likely is. The ISO-639 language code is returned as the single token resulting from the analysis.
Building
-
Acquire the source to cld2 in this directory.
$ svn checkout http://cld2.googlecode.com/svn/trunk/ cld2-read-only
-
Build cld2
$ cd cld2-read-only/internal/ $ ./compile_libs.sh
-
Put the resulting libraries somewhere your dynamic linker can find.
$ cp *.so /usr/local/lib
-
Run the unit tests
$ cd ../.. $ go test -v === RUN TestCld2Tokenizer --- PASS: TestCld2Tokenizer (0.03 seconds) PASS ok github.com/couchbaselabs/bleve/analysis/tokenizers/cld2 0.067s