- implement stemmer for strings - implement a simple stemmer - build an analyzer to split the strings ? the analyzer uses tokens to split the stream ? it can throw out string parts, which have no meaning - builds an array of strings with number of occurences (needed for scoring) * the index writer will do this, as i think the analyzer just has to deliver the array itself * IndexSearcher * build indexes for attributes * implement tree structures for index * binary tree * b-tree * word tree for strings * index has only the position of document in storage * add scoring for findings * could be implemented as per document score stored in the index * findings of string in the document in relation to findings in all documents * info should be accessible after building the index * implement resultset * should be streamlined to gather resultset from multiple queries * sorts the result if needed * returns only the top x documents