You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@ismailarilik observed Pierre Schmitz being duplicated in the Subscribe List on archlinux.org/planet. His assumption of a duplicate entry in the DB seems to be correct. I did some digging and would like to do some refactoring regarding a sustainable fix/improvement.
Status Quo
Where the list is built
Template — templates/planet/index.html:
<h4>Subscribe</h4><ulclass="planet-list">
{% for feed in official_feeds %}
<li><ahref="{{ feed.website }}" title="{{ feed.title }}">{{ feed.title }}</a></li>
{% endfor %}
</ul>
I talked to Pierre and he updated his profile (website rss) and deleted the URL, saved his profile and was not present in the list. Then he re-added his feed URL, saved and shortly after he appeared in the list again, only once this time.
How can the duplicate come into place?
All creation goes through create_feed_model on UserProfile pre_save in devel/models.py. It only runs when website_rss changes on save. Then it:
Deletes rows where website_rss == old profile value (dbmodel.website_rss)
Inserts a new row for the new URL
Caution
There is no “one feed per user” rule and no uniqueness on website_rss.
One can think of szenarios like those that could cause duplicates:
Profile RSS changes A → B, but a row for B already exists
Delete targets the wrong URL (feed row ≠ old profile value)
First time RSS is set while that URL already has a row
The (seemingly) intended happy path:
Profile RSS changes A → B, only one row for B exists (or none), and every existing row for this person used A. Delete removes A, create adds B → one Subscribe entry.
The issues:
delete by old URL only
create always adds a row
no per-user dedupe
What I would like to do
My assumption is that one user can only have one feed. So I think it would be nice to introduce a foreign key constraint on Feed in planet/models.py for the user (OneToOne relation which gives us uniqueness ootb). Have it nullable at first, then run a command/fill in the missing data and adjust the business logic to respect the new column. Once migrated and cleaned up in production, we can run a second migration to make the column non nullable to ensure a feed is always attached to one user.
Follow up on #680
@ismailarilik observed Pierre Schmitz being duplicated in the Subscribe List on archlinux.org/planet. His assumption of a duplicate entry in the DB seems to be correct. I did some digging and would like to do some refactoring regarding a sustainable fix/improvement.
Status Quo
Where the list is built
Template —
templates/planet/index.html:View —
planet/views.py:Feed.objects.all()returns every feed row; there is nodistinct()or merge by user.Database and Model information
The only code that creates
Feedrows is theUserProfilepre_savehandler indevel/models.py(create_feed_model):Relevant model facts (
planet/models.py):Feedhas no foreign key toUser.website_rssis not unique.Duplication observed in production
Note
I talked to Pierre and he updated his profile (website rss) and deleted the URL, saved his profile and was not present in the list. Then he re-added his feed URL, saved and shortly after he appeared in the list again, only once this time.
How can the duplicate come into place?
All creation goes through create_feed_model on UserProfile pre_save in devel/models.py. It only runs when website_rss changes on save. Then it:
Caution
There is no “one feed per user” rule and no uniqueness on website_rss.
One can think of szenarios like those that could cause duplicates:
The (seemingly) intended happy path:
Profile RSS changes A → B, only one row for B exists (or none), and every existing row for this person used A. Delete removes A, create adds B → one Subscribe entry.
The issues:
What I would like to do
My assumption is that one user can only have one feed. So I think it would be nice to introduce a foreign key constraint on
Feedinplanet/models.pyfor the user (OneToOne relation which gives us uniqueness ootb). Have it nullable at first, then run a command/fill in the missing data and adjust the business logic to respect the new column. Once migrated and cleaned up in production, we can run a second migration to make the column non nullable to ensure a feed is always attached to one user.The rough plan:
useronFeed)dedupe_planet_feedson productionsync_planet_feed,update_planet301 fix, testsuser, uniquewebsite_rss/planetSubscribe list; spot-check RSS import viaupdate_planet