Fixing retrieval bugs #94
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I identified multiple issues in the retrieval system, that together cause the inconsistent behaviours noted in #91.
nltk's word tokenizer. This seems to help for cases of failed queries involving gene symbols (NFAT1).langchain-chromawe were using was not properly returning Documents fromSelfQueryRetriever. This appears to have been a bug that is now fixed in the version I've upgraded to here. This causedEnsembleRetrieverandMergerRetrieverto miss vector search results (glycolysis).chat_historyonly for the 1st message, which strangely seems to work as a workaround.@heliamoh you should evaluate this branch and see if it is an acceptable minimal fix for the study-system.