Skip to content

[fix](inverted index) Make select_best_reader deterministic for multi-index columns#61596

Open
airborne12 wants to merge 3 commits intoapache:masterfrom
airborne12:fix-build-index-multi-analyzer-order
Open

[fix](inverted index) Make select_best_reader deterministic for multi-index columns#61596
airborne12 wants to merge 3 commits intoapache:masterfrom
airborne12:fix-build-index-multi-analyzer-order

Conversation

@airborne12
Copy link
Member

@airborne12 airborne12 commented Mar 21, 2026

What problem does this PR solve?

Issue Number: close #DORIS-24685

Related PR: N/A

Problem Summary:

When multiple inverted indexes with different analyzers exist on the same column, select_best_reader() returns the first matching candidate based on iteration order of _reader_entries. Since _reader_entries ordering depends on the rowset schema's index ordering, and different segments can have different orderings (e.g. after sequential BUILD INDEX operations), the same query selects different indexes for different segments, producing inconsistent results.

Fix: Replace all order-dependent candidate selection in select_for_text(), select_for_numeric(), and select_best_reader() with deterministic selection by smallest index_id via pick_preferred() and pick_smallest_index_id() helpers. This ensures consistent index selection regardless of schema ordering across segments.

Release note

Fixed a bug where queries on columns with multiple inverted indexes could return inconsistent results after BUILD INDEX operations, due to non-deterministic index selection across segments with different schema orderings.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

…sults with multiple analyzers

### What problem does this PR solve?

Issue Number: close #DORIS-24685

Problem Summary:
When a table has multiple inverted indexes with different analyzers on the same
column, and these indexes are built via sequential BUILD INDEX operations with
data inserts interleaved between builds, the query results become inconsistent
with a baseline table that has the same indexes defined at creation time.

Root cause: BUILD INDEX appends new indexes to the output rowset schema via
`append_index()`, so successive builds produce index ordering that depends on
execution order (e.g., [idx_ch3, idx_ch2]) rather than matching the tablet
schema order (e.g., [idx_ch2, idx_ch3]). The search function's
`select_best_reader()` returns the first matching candidate for REGEXP and
other query types, so different index ordering across segments causes different
indexes to be selected for the same query, producing inconsistent results.

Fix: After appending new indexes in `IndexBuilder::update_inverted_index_info()`,
reorder the output rowset schema's indexes to match the tablet schema's order
via a new `TabletSchema::reorder_indexes_by()` method.

### Release note

Fixed a bug where BUILD INDEX with multiple analyzers on the same column could
produce query results inconsistent with tables where indexes were defined at
creation time. The issue was caused by index ordering in rowset schemas diverging
from the tablet schema after sequential build index operations.

### Check List (For Author)

- Test: Regression test / Unit Test
    - Added regression test: test_build_index_multi_analyzer_order
    - Added unit tests: TestReorderIndexesBy, TestReorderIndexesByWithExtraIndexes
- Behavior changed: No
- Does this need documentation: No
@Thearas
Copy link
Contributor

Thearas commented Mar 21, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@airborne12
Copy link
Member Author

/review

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

PR Goal: Fix BUILD INDEX producing inconsistent query results when multiple inverted indexes with different analyzers exist on the same column. The root cause is that select_best_reader() returns the first matching candidate, and successive BUILD INDEX operations append indexes in execution order rather than tablet schema order, causing different segments to select different indexes for the same query.

Fix approach: Add TabletSchema::reorder_indexes_by() to sort indexes in the output rowset schema to match the tablet schema's ordering after appending new indexes.

The fix is well-targeted and the approach is correct. The implementation correctly uses stable_sort to preserve relative order of indexes not in the reference schema, and properly rebuilds lookup caches after reordering. Both unit tests and regression tests are included.

Critical Checkpoint Conclusions

  1. Goal achievement: Yes, the fix correctly addresses the root cause. The regression test proves correctness by comparing BUILD INDEX results against a baseline table.

  2. Modification scope: Small, focused, and clear. Only adds one new method and one call site.

  3. Concurrency: Not applicable - reorder_indexes_by() operates on a local output_rs_tablet_schema that is not shared across threads at the point of the call.

  4. Lifecycle management: No special lifecycle concerns.

  5. Configuration items: None added.

  6. Incompatible changes: None - existing rowset schemas with inconsistent ordering will continue to work; only newly built rowsets will have corrected ordering.

  7. Parallel code paths: CloudIndexChangeCompaction::rebuild_tablet_schema() in cloud_index_change_compaction.cpp has a similar pattern of clearing and appending indexes from a list. It may benefit from the same reordering fix, but is lower risk since the ordering comes from FE's index list rather than multiple sequential BUILD INDEX operations. Worth noting but not blocking.

  8. Special conditional checks: The std::numeric_limits<size_t>::max() sentinel for unknown indexes is appropriate - ensures unknown indexes sort to the end.

  9. Test coverage: Good - two unit tests (basic reorder + stale index) and a comprehensive regression test. See inline suggestion for additional mixed-type test coverage.

  10. Observability: Not needed for this fix.

  11. Transaction/persistence: Not applicable.

  12. Data writes: Not directly applicable - this only reorders in-memory schema metadata before rowset creation.

  13. FE-BE variable passing: Not applicable.

  14. Performance: reorder_indexes_by() uses stable_sort (O(n log n)) and a full cache rebuild (O(n * m) where m is col_unique_ids per index). For typical index counts (<100), this is negligible and occurs once per rowset during BUILD INDEX.

Issues Found

See inline comments for two suggestions (non-blocking).

// their relative order.
std::unordered_map<int64_t, size_t> ref_order;
const auto& ref_indexes = reference_schema->inverted_indexes();
for (size_t i = 0; i < ref_indexes.size(); ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion (non-blocking): inverted_indexes() only returns indexes with IndexType::INVERTED, but _indexes stores ALL index types (BLOOMFILTER, NGRAM_BF, ANN, BITMAP, INVERTED). The stable_sort on line 16 sorts the entire _indexes vector, so any non-INVERTED indexes (e.g., NGRAM_BF, BLOOMFILTER) will receive std::numeric_limits<size_t>::max() and be pushed to the end of _indexes.

In the current call site (index_builder.cpp), output_rs_tablet_schema is copy_from() of the input rowset schema, which may contain NGRAM_BF or BLOOMFILTER indexes. After reorder_indexes_by(), these would move from their original positions (potentially interleaved with INVERTED indexes) to the end of the vector.

Since the lookup caches are correctly rebuilt, this doesn't cause functional incorrectness. However, it's worth considering using reference_schema->indexes() (all indexes) instead of inverted_indexes() for the reference ordering, to maintain the relative positions of ALL index types, not just inverted ones. Alternatively, the sort could be scoped to only inverted indexes in the vector.

}

TEST_F(TabletSchemaIndexTest, TestReorderIndexesBy) {
// Simulate the build index scenario: indexes appended in different order
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion (non-blocking): Consider adding a test case with mixed index types (e.g., INVERTED + NGRAM_BF or BLOOMFILTER) to verify that reorder_indexes_by() correctly handles schemas containing non-INVERTED indexes. Currently both test cases use only IndexType::INVERTED, but in production, output_rs_tablet_schema from copy_from() can contain BLOOMFILTER and NGRAM_BF indexes as well.

…-index columns

### What problem does this PR solve?

Issue Number: close #DORIS-24685

Problem Summary:
When multiple inverted indexes with different analyzers exist on the same
column, select_best_reader() returns the first matching candidate based on
iteration order of _reader_entries. Since _reader_entries ordering depends
on the rowset schema's index ordering, and different segments can have
different orderings (e.g. after sequential BUILD INDEX operations), the
same query selects different indexes for different segments, producing
inconsistent results.

Fix: Replace all order-dependent candidate selection in select_for_text(),
select_for_numeric(), and select_best_reader() with deterministic selection
by smallest index_id via pick_preferred() and pick_smallest_index_id()
helpers. This ensures consistent index selection regardless of schema
ordering across segments.

### Release note

Fixed a bug where queries on columns with multiple inverted indexes could
return inconsistent results after BUILD INDEX operations, due to
non-deterministic index selection across segments with different schema
orderings.

### Check List (For Author)

- Test
    - [x] Regression test
    - [x] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
        - [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason

- Behavior changed:
    - [x] No.
    - [ ] Yes.

- Does this need documentation?
    - [x] No.
    - [ ] Yes.

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label
@airborne12 airborne12 changed the title [fix](inverted index) Fix BUILD INDEX producing inconsistent query results with multiple analyzers [fix](inverted index) Make select_best_reader deterministic for multi-index columns Mar 21, 2026
@airborne12
Copy link
Member Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 26980 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit ff6281f676ab2deb199d07241a8e1a8c6b9df9cc, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17607	4481	4297	4297
q2	q3	10643	771	524	524
q4	4675	353	252	252
q5	7564	1201	1017	1017
q6	174	173	147	147
q7	775	868	680	680
q8	9558	1463	1384	1384
q9	5121	4672	4717	4672
q10	6338	1913	1648	1648
q11	495	264	248	248
q12	737	585	489	489
q13	18045	2885	2189	2189
q14	237	242	213	213
q15	q16	734	733	676	676
q17	739	832	471	471
q18	5907	5417	5219	5219
q19	1414	986	624	624
q20	529	497	377	377
q21	4526	1890	1594	1594
q22	482	362	259	259
Total cold run time: 96300 ms
Total hot run time: 26980 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4753	4697	4691	4691
q2	q3	3874	4382	3874	3874
q4	908	1229	821	821
q5	4183	4510	4341	4341
q6	183	176	144	144
q7	1755	1612	1504	1504
q8	2551	2713	2578	2578
q9	7537	7344	7407	7344
q10	3906	4112	3633	3633
q11	498	434	425	425
q12	539	577	463	463
q13	2922	3178	2309	2309
q14	298	315	284	284
q15	q16	818	791	718	718
q17	1171	1334	1389	1334
q18	7072	6678	6661	6661
q19	911	961	900	900
q20	2099	2142	2008	2008
q21	3935	3561	3376	3376
q22	458	426	382	382
Total cold run time: 50371 ms
Total hot run time: 47790 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 169227 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit ff6281f676ab2deb199d07241a8e1a8c6b9df9cc, data reload: false

query5	4367	614	494	494
query6	340	238	217	217
query7	4219	467	275	275
query8	349	246	236	236
query9	8741	2738	2714	2714
query10	536	396	357	357
query11	7026	5060	4919	4919
query12	183	134	127	127
query13	1278	498	347	347
query14	5748	3755	3539	3539
query14_1	2852	2866	2813	2813
query15	206	193	180	180
query16	972	483	447	447
query17	1127	732	629	629
query18	2467	457	354	354
query19	219	212	190	190
query20	137	126	128	126
query21	214	139	114	114
query22	13526	14434	14772	14434
query23	16267	15887	15685	15685
query23_1	15895	15816	16253	15816
query24	7206	1654	1225	1225
query24_1	1254	1252	1249	1249
query25	619	475	410	410
query26	1242	269	150	150
query27	2751	484	294	294
query28	4439	1799	1837	1799
query29	823	567	469	469
query30	301	223	192	192
query31	995	946	870	870
query32	82	72	72	72
query33	516	330	274	274
query34	878	884	522	522
query35	660	688	599	599
query36	1084	1117	960	960
query37	133	97	80	80
query38	2937	2971	2840	2840
query39	840	835	814	814
query39_1	793	797	794	794
query40	235	153	136	136
query41	64	61	59	59
query42	262	255	252	252
query43	249	251	220	220
query44	
query45	199	195	181	181
query46	885	1003	620	620
query47	2155	2108	2426	2108
query48	324	313	224	224
query49	616	485	386	386
query50	678	272	211	211
query51	4032	4029	4048	4029
query52	265	260	259	259
query53	293	335	290	290
query54	299	268	267	267
query55	92	87	87	87
query56	318	321	327	321
query57	1933	1869	1787	1787
query58	286	295	269	269
query59	2789	2972	2747	2747
query60	346	342	325	325
query61	148	155	159	155
query62	629	587	500	500
query63	309	283	275	275
query64	4988	1290	1024	1024
query65	
query66	1455	453	367	367
query67	24198	24272	24166	24166
query68	
query69	411	323	289	289
query70	930	940	876	876
query71	331	312	301	301
query72	2796	2712	2436	2436
query73	555	545	331	331
query74	9554	9602	9395	9395
query75	2858	2783	2471	2471
query76	2304	1027	669	669
query77	384	387	314	314
query78	10976	11093	10529	10529
query79	3079	807	565	565
query80	1792	627	557	557
query81	578	255	225	225
query82	1003	155	122	122
query83	333	265	246	246
query84	263	125	96	96
query85	926	496	454	454
query86	486	304	331	304
query87	3145	3117	3002	3002
query88	3573	2674	2671	2671
query89	428	366	346	346
query90	2089	177	166	166
query91	173	170	140	140
query92	84	76	74	74
query93	1455	809	501	501
query94	649	315	289	289
query95	595	340	326	326
query96	632	518	224	224
query97	2502	2476	2389	2389
query98	238	219	227	219
query99	1037	1006	928	928
Total cold run time: 253009 ms
Total hot run time: 169227 ms

…patibility

Master merged apache#61142 which removed the vectorized namespace. Update
references in inverted_index_iterator to use DataTypePtr/DataTypeString
directly to fix BE UT compilation failure on CI.
@airborne12
Copy link
Member Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 26707 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 4bfa7c5cffc38c4f23f5b8ed3dce047a54e14fda, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17577	4435	4292	4292
q2	q3	10641	765	527	527
q4	4683	367	251	251
q5	7573	1230	1043	1043
q6	182	174	148	148
q7	769	846	673	673
q8	9302	1495	1342	1342
q9	5012	4715	4666	4666
q10	6304	1929	1656	1656
q11	500	248	246	246
q12	766	576	459	459
q13	18040	2909	2177	2177
q14	225	243	214	214
q15	q16	766	741	657	657
q17	748	821	478	478
q18	6091	5397	5141	5141
q19	1152	990	626	626
q20	554	483	376	376
q21	4521	1851	1434	1434
q22	562	406	301	301
Total cold run time: 95968 ms
Total hot run time: 26707 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4823	4644	4756	4644
q2	q3	3845	4305	3879	3879
q4	918	1226	834	834
q5	4041	4407	4355	4355
q6	177	180	143	143
q7	1752	1663	1537	1537
q8	2472	2717	2623	2623
q9	7645	7536	7410	7410
q10	3789	3979	3628	3628
q11	506	429	423	423
q12	504	596	465	465
q13	2777	3273	2686	2686
q14	289	309	296	296
q15	q16	732	785	734	734
q17	1223	1388	1370	1370
q18	7279	6981	6711	6711
q19	965	975	973	973
q20	2067	2149	1971	1971
q21	3952	3552	3341	3341
q22	443	445	384	384
Total cold run time: 50199 ms
Total hot run time: 48407 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 169627 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 4bfa7c5cffc38c4f23f5b8ed3dce047a54e14fda, data reload: false

query5	4324	656	502	502
query6	339	248	226	226
query7	4220	481	282	282
query8	347	271	235	235
query9	8699	2734	2708	2708
query10	529	392	338	338
query11	6955	5110	4881	4881
query12	185	127	126	126
query13	1278	472	364	364
query14	5769	3715	3544	3544
query14_1	2868	2825	2794	2794
query15	220	195	180	180
query16	974	483	457	457
query17	934	738	641	641
query18	2454	458	363	363
query19	217	224	196	196
query20	134	129	127	127
query21	214	138	111	111
query22	13286	14071	14382	14071
query23	16114	15951	15785	15785
query23_1	16207	15937	15817	15817
query24	7521	1700	1229	1229
query24_1	1243	1239	1242	1239
query25	577	523	413	413
query26	1228	272	147	147
query27	2774	491	305	305
query28	4517	1848	1849	1848
query29	863	564	465	465
query30	300	226	194	194
query31	986	939	875	875
query32	85	69	73	69
query33	512	346	282	282
query34	936	882	531	531
query35	641	684	620	620
query36	1105	1092	949	949
query37	138	97	86	86
query38	2923	2931	2940	2931
query39	854	829	818	818
query39_1	791	809	796	796
query40	234	151	138	138
query41	63	60	59	59
query42	264	259	257	257
query43	248	254	231	231
query44	
query45	201	187	184	184
query46	881	986	617	617
query47	2121	2134	2077	2077
query48	369	347	238	238
query49	626	458	390	390
query50	686	292	222	222
query51	4125	4071	4027	4027
query52	267	270	261	261
query53	291	358	298	298
query54	315	292	280	280
query55	95	88	85	85
query56	333	329	328	328
query57	1948	1915	1723	1723
query58	286	278	277	277
query59	2786	2936	2764	2764
query60	349	341	336	336
query61	157	156	156	156
query62	624	596	545	545
query63	314	277	284	277
query64	5107	1272	982	982
query65	
query66	1479	449	365	365
query67	24294	24330	24204	24204
query68	
query69	415	304	296	296
query70	998	976	983	976
query71	359	323	314	314
query72	2897	2718	2655	2655
query73	556	559	332	332
query74	9643	9618	9423	9423
query75	2849	2771	2482	2482
query76	2290	1043	692	692
query77	373	397	340	340
query78	10957	11050	10490	10490
query79	2616	780	572	572
query80	1804	639	572	572
query81	555	261	228	228
query82	1002	153	119	119
query83	340	266	245	245
query84	300	118	106	106
query85	916	498	445	445
query86	407	335	282	282
query87	3118	3183	3033	3033
query88	3584	2678	2658	2658
query89	439	376	357	357
query90	2066	181	187	181
query91	174	160	145	145
query92	82	77	74	74
query93	1150	867	497	497
query94	659	317	266	266
query95	607	414	320	320
query96	650	541	226	226
query97	2424	2508	2394	2394
query98	238	220	215	215
query99	1001	989	915	915
Total cold run time: 252721 ms
Total hot run time: 169627 ms

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.83% (19862/37597)
Line Coverage 36.29% (185307/510634)
Region Coverage 32.54% (143516/440988)
Branch Coverage 33.76% (62857/186212)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.43% (27021/36800)
Line Coverage 56.84% (289291/508924)
Region Coverage 54.26% (241475/445039)
Branch Coverage 55.94% (104442/186700)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants