28 lines
810 B
Markdown
28 lines
810 B
Markdown
# cld2 tokenizer
|
|
|
|
A bleve tokenizer which passes the input text to the cld2 library. The library determines what it thinks the language most likely is. The ISO-639 language code is returned as the single token resulting from the analysis.
|
|
|
|
# Building
|
|
|
|
1. Acquire the source to cld2 in this directory.
|
|
|
|
$ svn checkout http://cld2.googlecode.com/svn/trunk/ cld2-read-only
|
|
|
|
2. Build cld2
|
|
|
|
$ cd cld2-read-only/internal/
|
|
$ ./compile_libs.sh
|
|
|
|
|
|
3. Put the resulting libraries somewhere your dynamic linker can find.
|
|
|
|
$ cp *.so /usr/local/lib
|
|
|
|
4. Run the unit tests
|
|
|
|
$ cd ../..
|
|
$ go test -v
|
|
=== RUN TestCld2Tokenizer
|
|
--- PASS: TestCld2Tokenizer (0.03 seconds)
|
|
PASS
|
|
ok github.com/couchbaselabs/bleve/analysis/tokenizers/cld2 0.067s |