bleve

Author	SHA1	Message	Date
Marty Schoch	3a0263bb72	finished initial impl of fuzzy search you can do a manual fuzzy term search using the FuzzyQuery struct or, more suitable for most users the MatchQuery now supports some fuzzy options. Here you can specify fuzziness and prefix_length, to turn the underlying term search into a fuzzy term search. This has the benefit that analysis is performed on your input, just like the analyzed field, prior to computing the fuzzy variants. closes #82	2014-10-24 13:39:48 -04:00
Marty Schoch	78467c0836	refactored yacc parsing code to be threadsafe, removed mutex	2014-10-23 15:56:59 -04:00
Marty Schoch	d485b0ef26	initial impl of fuzzy search	2014-10-23 13:02:29 -04:00
Steve Yen	91501262b6	typo with NewListIndexesHandler/NewCreateIndexHandler	2014-10-22 16:40:21 -07:00
Marty Schoch	0500a572af	exposed Get/Set/Delete Internal methods these are to be used to store side-channel information along with the index	2014-10-22 16:03:55 -04:00
Marty Schoch	40a8154bab	changed en analyzer to use pure go components behavior should be similar with unicode segmentation and a porter stemmer	2014-10-21 16:38:58 -04:00
Marty Schoch	c4d1782689	new pure go porter stemmer integrated renamed original libstemmer porter to "stemmer_porter_classic" new pure go stemmer is "stemmer_porter"	2014-10-20 16:55:24 -04:00
Marty Schoch	cf3643f292	added pure go tokenizer to do unicode word boundary segmentation	2014-10-17 18:07:48 -04:00
Marty Schoch	dcb90ad176	added benchmark for tokenizing English text	2014-10-17 18:07:01 -04:00
Marty Schoch	febb8d2df1	renamed unicode_word_boundary package to icu this is in preparation of alternative unicode word boundary impls	2014-10-17 15:15:13 -04:00
Marty Schoch	7bf44e1ba7	added ability to return all document fields by requesting field *	2014-10-15 19:16:16 -04:00
Marty Schoch	8222fbea57	improve lexer handling of special characters characters like + and - are special but they should only be special at the beginning of strings inside someting that would otherwise be a string we should just let them be characters closes #103	2014-10-10 20:45:57 -07:00
Marty Schoch	af6a5d27eb	allow term searches for numbers closes #108	2014-10-10 20:36:31 -07:00
Marty Schoch	19d45dfdb6	fix compliation with the latest changes to kagome	2014-10-10 19:59:24 -07:00
Marty Schoch	8be0652dc8	better String() impl when request.Size=0 closes #107	2014-10-10 18:08:20 -07:00
Marty Schoch	64b0066121	added support for tracking index stats and exposing via expvar closes #83	2014-10-02 11:12:49 -07:00
Marty Schoch	97902e2619	text analysis now moved out of index write lock onto goroutine 1. text analysis is now done before the write lock is acquired 2. there is now a pool of analysis workers 3. the size of this pool is configurable 4. this allows for documents in a batch to be analyzed concurrently as a part of benchmarking these changes i've also introduce a new null storage implementation. this should never be used, as it does not actualy build an index. it does however let us go through all the normal indexing machinery, without incuring any indexing I/O. this is very helpful in measuring improvements made to the text analsysis pipeline, which are often overshadowed by indexing times in benchmarks actually building an index.	2014-09-24 08:13:14 -04:00
Marty Schoch	1dc466a800	modified token filters to avoid creating new token stream often the result stream was the same length, so can reuse the existing token stream also, in cases where a new stream was required, set capacity to the length of the input stream. most output stream are at least as long as the input, so this may avoid some subsequent resizing	2014-09-23 18:41:32 -04:00
Marty Schoch	95e6e37e67	added build tag to fix runngin tests without tag	2014-09-16 11:28:44 -04:00
Marty Schoch	608b9163a3	Merge branch 'master' of github.com:blevesearch/bleve	2014-09-16 11:22:01 -04:00
Marty Schoch	55c0e84665	relocated kagome tokenizer and introduced ja analyzer	2014-09-16 11:21:29 -04:00
Silvan Jegen	29bdc094a9	Use byte positions instead of character positions	2014-09-14 13:19:30 +02:00
Marty Schoch	3dc66b5338	Merge pull request #99 from jingweno/patch-1 Update README.md	2014-09-13 22:45:21 -04:00
Jingwen Owen Ou	79691770c4	Update README.md Fix broken example.	2014-09-13 19:38:24 -07:00
Silvan Jegen	a8ec7f7af2	Add tests for the Kagome tokenizer	2014-09-13 17:45:22 +02:00
Silvan Jegen	ebf100c097	Add the Kagome tokenizer for Japanese	2014-09-13 17:45:19 +02:00
Marty Schoch	198ca1ad4d	major refactor of kvstore/index internals, see below In the index/store package introduce KVReader creates snapshot all read operations consistent from this snapshot must close to release introduce KVWriter only one writer active access to all operations allows for consisten read-modify-write must close to release introduce AssociativeMerge operation on batch allows efficient read-modify-write for associative operations used to consolidate updates to the term summary rows saves 1 set and 1 get op per shared instance of term in field In the index package introduced an IndexReader exposes a consisten snapshot of the index for searching At top level All searches now operate on a consisten snapshot of the index	2014-09-12 17:21:35 -04:00
Marty Schoch	7819deb447	added boltdb benchmark, same as others	2014-09-12 16:55:50 -04:00
Marty Schoch	2294b24b9d	remove forestdb for now not any benfefit in maintaining this for the time being	2014-09-12 16:55:11 -04:00
Marty Schoch	8c16d68c00	include cjk analyzer in default config	2014-09-11 10:44:14 -04:00
Marty Schoch	1a1cf32a86	introducing cjk_bigram filter and cjk analyzer closes #34	2014-09-11 10:39:05 -04:00
Marty Schoch	cb5ccd2b1d	fix whitespace tokenizer previously would fail to split ascii running into ideographic	2014-09-11 10:38:02 -04:00
Marty Schoch	8debf26cb7	changed many components to not have defaults many of these defaults were arbitrary, and not having defaults lets us more easily flag them for configuration added a shingle filter introduce new toke type for shingles	2014-09-09 18:15:14 -04:00
Marty Schoch	8dd8fb8910	fix compilation	2014-09-07 14:13:32 -04:00
Marty Schoch	6b4c86b35a	changed whitespace tokenizer to work better on cjk input now it will return each cjk character as a separate token this will pair well with a cjk bigram filter for indexing	2014-09-07 14:11:01 -04:00
Marty Schoch	933d99c576	rename the configurable token map from standard to custom this makes it consistent with the "custom" analyzer which operates similarly also, added it to the config.go so its registerd and available for use	2014-09-07 14:09:38 -04:00
Marty Schoch	22911888c4	refactor registry package and bleve_registry utility	2014-09-07 14:07:42 -04:00
Marty Schoch	9e78643bad	icu tokenier uses brk status to set token type part of #34	2014-09-07 10:24:02 -04:00
Marty Schoch	44df73d317	apply doc fix patch from rakoo closes #95	2014-09-07 09:09:47 -04:00
Marty Schoch	f87a22e24c	added json struct tag to http doc count response	2014-09-05 12:16:26 -04:00
Marty Schoch	b1dd4215fc	added features to readme	2014-09-04 15:09:19 -04:00
Marty Schoch	f384f9dead	added link to wiki search to readme	2014-09-04 14:43:25 -04:00
Marty Schoch	d90697f725	added features to readme	2014-09-04 14:31:26 -04:00
Marty Schoch	afdb5f057f	added convenience method to add field to highlight request	2014-09-04 10:13:13 -04:00
Marty Schoch	9d2187706e	another round of golint	2014-09-03 19:53:59 -04:00
Marty Schoch	8b9255f52f	even more golint cleanups	2014-09-03 19:32:27 -04:00
Marty Schoch	e21935f850	another round of golint cleanup	2014-09-03 19:16:46 -04:00
Marty Schoch	e1b77956d4	more golint cleanups	2014-09-03 18:47:02 -04:00
Marty Schoch	377ae090d0	additional golint issues resolved	2014-09-03 18:17:26 -04:00
Marty Schoch	d534b0836b	converted ALL_CAPS constants to CamelCase	2014-09-03 17:48:40 -04:00

... 16 17 18 19 20 ...

1126 Commits