Fix incorrect test set values in leave_k_out splits with sparse user rows#640
Fix incorrect test set values in leave_k_out splits with sparse user rows#640chrisjkuch wants to merge 3 commits intobenfred:mainfrom
Conversation
|
It looks like a single test failed in one of the builds: Given that the tests are only failing in a single build and passing in all others, my guess is that a completely null row was present in the randomly generated matrix that caused it to be included in the training set in addition to all other randomly chosen users. I think the best solution to this is to add a check onto _get_matrix() that ensures that there aren't any completely zero rows. |
|
Hmmm, even after making the fix to ensure always-populated rows, the test is still failing intermittently, and it's failing intermittently both for the random 100x100 sparse matrix as well as the newly-added fixed matrix. My guess is that this is some function of the combination of the Usually, the value for the number of users is only slightly off of the chosen value. @benfred would an |
Closes #639
This PR fixes a bug in the evaluation of the
leave_k_out_splitin which the produced test matrix would contain values that were many multiples of their original value. Tests are also added on static (non-random) matrices that otherwise fail in the un-corrected implementation.This bug resulted from a calculation that required an input array with sequential values - the fact that non-sequential values were provided led to an error in processing.
Specifically, the
arrargument in _take_tailswas being provided as
candidate_users, from which user indices falling below the threshold were removed, resulting in a list in which the ordered set of unique integers was not consecutive and therefore the provided array was invalid.