0
0
bleve/analysis/tokenizers/cld2
2014-07-21 17:21:31 -04:00
..
cld2_tokenizer_test.go new tokenizer which uses cld2 to guess the field's language 2014-07-21 17:21:31 -04:00
cld2_tokenizer.cc new tokenizer which uses cld2 to guess the field's language 2014-07-21 17:21:31 -04:00
cld2_tokenizer.go new tokenizer which uses cld2 to guess the field's language 2014-07-21 17:21:31 -04:00
cld2_tokenizer.h new tokenizer which uses cld2 to guess the field's language 2014-07-21 17:21:31 -04:00
README.md new tokenizer which uses cld2 to guess the field's language 2014-07-21 17:21:31 -04:00

cld2 tokenizer

A bleve tokenizer which passes the input text to the cld2 library. The library determines what it thinks the language most likely is. The ISO-639 language code is returned as the single token resulting from the analysis.

Building

  1. Acquire the source to cld2 in this directory.

    $ svn checkout http://cld2.googlecode.com/svn/trunk/ cld2-read-only
    
  2. Build cld2

    $ cd cld2-read-only/internal/
    $ ./compile_libs.sh
    
  3. Put the resulting libraries somewhere your dynamic linker can find.

    $ cp *.so /usr/local/lib
    
  4. Run the unit tests

    $ cd ../..
    $ go test -v
    === RUN TestCld2Tokenizer
    --- PASS: TestCld2Tokenizer (0.03 seconds)
    PASS
    ok  	github.com/couchbaselabs/bleve/analysis/tokenizers/cld2	0.067s