24 lines
1004 B
Plaintext
24 lines
1004 B
Plaintext
- implement stemmer for strings
|
|
- implement a simple stemmer
|
|
- build an analyzer to split the strings
|
|
? the analyzer uses tokens to split the stream
|
|
? it can throw out string parts, which have no meaning
|
|
- builds an array of strings with number of occurences (needed for scoring)
|
|
* the index writer will do this, as i think the analyzer just has to
|
|
deliver the array itself
|
|
* IndexSearcher
|
|
* build indexes for attributes
|
|
* implement tree structures for index
|
|
* binary tree
|
|
* b-tree
|
|
* word tree for strings
|
|
* index has only the position of document in storage
|
|
* add scoring for findings
|
|
* could be implemented as per document score stored in the index
|
|
* findings of string in the document in relation to findings in all documents
|
|
* info should be accessible after building the index
|
|
* implement resultset
|
|
* should be streamlined to gather resultset from multiple queries
|
|
* sorts the result if needed
|
|
* returns only the top x documents
|