0
0
Fork 0
polecat/TODO

24 lines
1004 B
Plaintext

- implement stemmer for strings
- implement a simple stemmer
- build an analyzer to split the strings
? the analyzer uses tokens to split the stream
? it can throw out string parts, which have no meaning
- builds an array of strings with number of occurences (needed for scoring)
* the index writer will do this, as i think the analyzer just has to
deliver the array itself
* IndexSearcher
* build indexes for attributes
* implement tree structures for index
* binary tree
* b-tree
* word tree for strings
* index has only the position of document in storage
* add scoring for findings
* could be implemented as per document score stored in the index
* findings of string in the document in relation to findings in all documents
* info should be accessible after building the index
* implement resultset
* should be streamlined to gather resultset from multiple queries
* sorts the result if needed
* returns only the top x documents