Skip to content

fix(pgvector): make doc deletion query faster and use chunking#289

Draft
kyteinsky wants to merge 2 commits intomasterfrom
fix/long-deletes
Draft

fix(pgvector): make doc deletion query faster and use chunking#289
kyteinsky wants to merge 2 commits intomasterfrom
fix/long-deletes

Conversation

@kyteinsky
Copy link
Copy Markdown
Contributor

@kyteinsky kyteinsky commented Mar 20, 2026

CI logging for slow queries has also been enabled, not sure if we will see that in the CI though.

Sample output for the slow deletion query where a missing index on the source_id foreign key in access_list table was the culprit.
Calculated time: 3.495 + 0.310 + 0.129 = 3.934 ms
Actual time: 201177.123 ms or 201 s

        Query Text: DELETE FROM docs WHERE docs.source_id IN ($1::VARCHAR, $2::VARCHAR, ..., $275::VARCHAR) RETURNING docs.chunks
        Query Parameters: ...
        Delete on docs  (cost=1126.32..2018.25 rows=275 width=6) (actual time=0.192..3.495 rows=218 loops=1)
    ->  Bitmap Heap Scan on docs  (cost=1126.32..2018.25 rows=275 width=6) (actual time=0.144..0.310 rows=218 loops=1)
                Recheck Cond: ((source_id)::text = ANY ('{"files__default: 20392","files__default: 23092", ... }'::text[]))
                Heap Blocks: exact=25
                ->  Bitmap Index Scan on docs_pkey  (cost=0.00..1125.56 rows=275 width=0) (actual time=0.129..0.129 rows=218 loops=1)
                      Index Cond: ((source_id)::text = ANY ('{"files__default: 20392", ...
2026-03-19 11:28:59.760 UTC [6703] LOG:  duration: 201177.123 ms  execute <unnamed>: DELETE FROM docs WHERE docs.source_id IN ($1::VARCHAR, $2::VARCHAR, ..., $275::VARCHAR) RETURNING docs.chunks
2026-03-19 11:28:59.760 UTC [6703] DETAIL:  Parameters: $1 = 'files__default: 20392', $2 = ...

(put the chunking part in a different PR)

index the source_id column in the access_list table

Signed-off-by: Anupam Kumar <kyteinsky@gmail.com>
Signed-off-by: Anupam Kumar <kyteinsky@gmail.com>
f'{DOCUMENTS_TABLE_NAME}.source_id',
ondelete='CASCADE',
),
index=True,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DB migration needs to be done for this to happen on existing installations.

@kyteinsky kyteinsky marked this pull request as draft March 20, 2026 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant