bleve

Author	SHA1	Message	Date
Marty Schoch	e5c1af4164	add travis config to run integration tests against firestorm	2016-01-05 13:00:36 -05:00
Marty Schoch	ab67b2f642	Merge pull request #267 from pmezard/doc-dump-methods index: document DumpAll, DumpDoc and DumpFields methods	2016-01-05 09:55:35 -05:00
Marty Schoch	db7363fba1	Merge pull request #305 from steveyen/WIP-perf-20160102 perf 20160102	2016-01-05 08:54:47 -05:00
Steve Yen	70b7e73c82	firestorm compensator inFlight.Get() might return nil	2016-01-03 10:21:54 -08:00
Steve Yen	fb8c9a7475	firestorm.Batch() collects [][]IndexRows instead of []IndexRow Rather than append() all received rows into a flat []IndexRow during the result gathering loop, this change instead collects the analysis result rows into a [][]IndexRow, which avoids extra copying. As part of this, firestorm batchRows() now takes the [][]IndexRow as its input.	2016-01-02 12:30:47 -08:00
Steve Yen	1c5b84911d	firestorm DictUpdater NotifyBatch is more async	2016-01-02 12:21:25 -08:00
Steve Yen	b241242465	firestorm.Analyze() preallocs rows, with analyzeField() func The new analyzeField() helper func is used for both regular fields and for composite fields. With this change, all analysis is done up front, for both regular fields and composite fields. After analysis, this change counts up all the row capacity needed and extends the AnalysisResult.Rows in one shot, as opposed to the previous approach of dynamically growing the array as needed during append()'s. Also, in this change, the TermFreqRow for _id is added first, which seems more correct.	2016-01-02 12:21:25 -08:00
Steve Yen	325a616993	unicode.Tokenize() avoids array growth via array of arrays	2016-01-02 12:21:25 -08:00
Steve Yen	918732f3d8	unicode.Tokenize() allocs backing array of Tokens Previously, unicode.Tokenize() would allocate a Token one-by-one, on an as-needed basis. This change allocates a "backing array" of Tokens, so that it goes to the runtime object allocator much less often. It takes a heuristic guess as to the backing array size by using the average token (segment) length seen so far. Results from micro-benchmark (null-firestorm, bleve-blast) seem to give perhaps less than ~0.5 MB/second throughput improvement.	2016-01-02 12:21:25 -08:00
Steve Yen	5b2bc1c20f	firestorm.indexField() check for includeTermVectors moved out of loop	2016-01-02 12:21:25 -08:00
Steve Yen	45e9eaaacb	firestorm.indexField() allocs up-front array of TermFreqRow's This uses the "backing array" technique to allocate many TermFreqRow's at the front of firestorm.indexField(), instead of the previous one-by-one, as-needed TermFreqRow allocation approach. Results from micro-benchmark, null-firestorm, bleve-blast has this change producing a ~half MB/sec improvement.	2016-01-02 12:21:24 -08:00
Steve Yen	7ae696d661	firestorm lookuper notified via batch Previously, the firestorm.Batch() would notify the lookuper goroutine on a document by document basis. If the lookuper input channel became full, then that would block the firestorm.Batch() operation. With this change, lookuper is notified once, with a "batch" that is an []InFlightItem. This change also reuses that same []InFlightItem to invoke the compensator.MutateBatch(). This also has the advantage of only converting the docID's from string to []byte just once, outside of the lock that's used by the compensator. Micro-benchmark of this change with null-firestorm bleve-blast does not show large impact, neither degradation or improvement.	2016-01-02 12:21:24 -08:00
Steve Yen	38d50ed8b5	renamed var to docsUpdated to match docsDeleted naming	2016-01-02 12:21:24 -08:00
Steve Yen	3feeb14b7d	firestorm.batchRows reuses buf for all IndexRows	2016-01-02 12:21:24 -08:00
Steve Yen	0a7f7e3df8	firestorm.Analyze() converts docID to bytes only once	2016-01-02 12:21:24 -08:00
Steve Yen	fd81d0364c	firestorm.indexField() uses capacity of len(tokenFreqs)	2016-01-02 12:21:24 -08:00
Steve Yen	a345e7951e	TokenFrequency() alloc's all TokenLocations up front	2016-01-02 12:21:17 -08:00
Steve Yen	ee5ccda112	use KeyTo/ValueTo in firestorm.batchRows After this change, with null kvstore micro-benchmark... GOMAXPROCS=8 ./bleve-blast -source=../../tmp/enwiki.txt \ -count=100000 -numAnalyzers=8 -numIndexers=8 \ -config=../../configs/null-firestorm.json -batch=100 Then TermFreqRow key and value methods dissapear as large boxes from the cpu profile graphs.	2016-01-01 09:57:59 -08:00
Steve Yen	fd287bdfa4	firestorm.md markdown fixes	2016-01-01 09:57:59 -08:00
Steve Yen	b605224106	use shorter go idiom	2015-12-29 22:14:45 -08:00
Marty Schoch	6ddcde4c04	Merge pull request #294 from Shugyousha/fuzzytest Add tests for fuzzy search	2015-12-25 11:38:35 -08:00
Marty Schoch	8ae2aee0bc	Merge pull request #297 from aybabtme/firestorm-dont-gc-if-no-documents Firestorm: dont gc if no documents	2015-12-25 11:23:49 -08:00
Antoine Grondin	6806343677	firestore: fix #296 for division by zero on GC	2015-12-25 11:34:19 +07:00
Antoine Grondin	a6f7abdfa3	firestore: reproducer for division by zero on GC	2015-12-25 11:33:46 +07:00
Marty Schoch	8efbd556a3	fix indexing bug with data coming from arrays fixes #295	2015-12-21 14:59:32 -05:00
Marty Schoch	7bb58e1be4	add ability for integration test to check hit locations	2015-12-21 14:42:43 -05:00
Silvan Jegen	84c755cdb0	Add tests for fuzzy search	2015-12-20 17:00:46 +01:00
Marty Schoch	f7698f1f15	support match_all, match_none and docid queries via JSON also fixed bug in docIDQuery execution which would cause not matching the highest docID passed in if it was in fact a valid ID	2015-12-16 14:53:14 -05:00
Marty Schoch	849b69c318	more enhancements to bleve_query	2015-12-16 14:52:33 -05:00
Marty Schoch	cf67fe2cbc	fix major synchronization issue in the field_cache The field cache is expected to be the authority on which field names are identified by which identifier. This code was optimized for the most common case in which fields already exist. However, if we deterimine the field is missing with the read lock (shared), we incorrectly immediately proceed to create a new row with the write lock (exclusive). The problem is that multiple goroutines might have come to the same conclusion, and they all proceed to add rows. The two choices were to do the whole operation with the write lock, or recheck the value again with the write lock. We have chosen to repeat the check inside the write-lock, as this optimizes for what we believe to be the most common case, in which most fields will already exist.	2015-12-15 16:39:38 -05:00
Marty Schoch	84ec206fec	add some tests for index names in results	2015-12-08 14:38:46 -05:00
Marty Schoch	d73beac3b9	search result hits now have a field with the name of the index this allows you to figure out where a result actually came from when using aliases	2015-12-08 13:55:04 -05:00
Marty Schoch	9d30e1c96b	Merge branch 'master' into give_indexes_names	2015-12-08 11:56:53 -05:00
Marty Schoch	b4d4ee2fff	fix incorrect results returned by phrase search previously phrase searcher would not validate that consecutive terms were actually occurring in the same array position fixes #292	2015-12-06 15:55:00 -05:00
Marty Schoch	6e9da3bab7	allow running prefix queries through bleve_query command	2015-12-06 14:01:53 -05:00
Marty Schoch	aa7658bbb0	give indexes names, make stats available via expvar by default	2015-12-06 14:01:03 -05:00
Marty Schoch	a73a178923	fix incorrect prefix search behavior avoids double incrementing of end term when reading term dict fixes #293	2015-12-04 14:07:16 -05:00
Marty Schoch	699c86073a	make existing integration tests work with firestorm	2015-12-01 12:29:56 -05:00
Marty Schoch	9777846206	Merge branch 'master' into firestorm	2015-11-30 15:02:46 -05:00
Marty Schoch	e472b3e807	add support for a "web" tokenizer/analyzer The goal of the "web" tokenizer is to recognize web things like - email addresses - URLs - twitter @handles and #hashtags This implementation uses regexp exceptions. There will most likely be endless debate about the regular expressions. These were chosein as "good enough for now". There is also a "web" analyzer. This is just the "standard" analyzer, but using the "web" tokenizer instead of the "unicode" one. NOTE: after processing the exceptions, it still falls back to the standard "unicode" one. For many users, you can simply set your mapping's default analyzer to be "web". closes #269	2015-11-30 14:27:18 -05:00
Marty Schoch	6d851cfcc2	fix bug in warmup which led to docs being deleted	2015-11-30 10:18:14 -05:00
Marty Schoch	aa8d98f5fa	include space after prefix in log output	2015-11-30 10:17:48 -05:00
Marty Schoch	68d8742826	correctly prefix internal rows with 'i' and print them in debug	2015-11-30 10:17:15 -05:00
Marty Schoch	17cfe8cff0	Merge branch 'master' into firestorm	2015-11-30 07:25:33 -05:00
Marty Schoch	b2ac05c6d0	support metrics through bleve query	2015-11-30 07:24:31 -05:00
Marty Schoch	c93de9734e	fix issues identified by errcheck	2015-11-24 14:32:33 -05:00
Marty Schoch	bbef1980d8	Merge branch 'master' into firestorm	2015-11-24 13:04:36 -05:00
Marty Schoch	808f2c1e43	remove exceptions from errcheck	2015-11-24 12:52:46 -05:00
Marty Schoch	ff11f83842	properly handle errors inside metrics kvstore reporting	2015-11-24 12:52:03 -05:00
Marty Schoch	a707d44e0b	Merge branch 'master' into firestorm	2015-11-24 09:44:47 -05:00

1 2 3 4 5 ...

753 Commits