SOLR-8127 Distributed Luke by kotman12 · Pull Request #4149 · apache/solr

kotman12 · 2026-02-19T20:42:44Z

https://issues.apache.org/jira/browse/SOLR-8127

Description

Currently, LukeRequestHandler fetches its data from the local shard handling the request. Thus all the statistics it produces are for a single shard only. This is a bit misleading for a multi-shard cloud because an API that exists at the collection level produces shard-specific response. The solution proposed here distributes the request to one replica from each shard and aggregates the responses whenever possible. I introduce a shards response collection for data that can't be aggregated, or at least cannot be practically aggregated.

Solution

I used Claude Opus 4.6 to generate the implementation with many manual refinements/iterations.

Design Decisions

One-replica-per-shard aggregation
The distributed Luke handler queries one replica from each shard and aggregates the results. An alternative approach would be to query every replica in the cluster, which could be useful for inspecting low-level Lucene characteristics (directory paths, segment files, version numbers). This implementation takes the simpler one-replica-per-shard approach to match how other distributed handlers in Solr work and to keep the scope manageable. Querying all replicas can be considered as a future enhancement if there's community interest.
distrib=false as default
The handler defaults to distrib=false (local mode), preserving backward compatibility since the response structure changes for distrib=true. When distrib=true, the response adds a shards section containing per-shard data that is either not mergeable or difficult to merge. This includes full index metadata (directory, segmentsFile, version, userData, lastModified) and detailed per-field statistics (topTerms, distinct, histogram).
docsAsLong field in LukeResponse
The existing docs field in LukeResponse.FieldInfo is an int. When summing document counts across many shards, this can overflow. A new docsAsLong accessor was added to LukeResponse to handle collection-wide doc counts safely. The original docs field is preserved for backward compatibility.
show=doc lucene response has shard-local docFreq
When looking up a document with distrib=true, the response includes both a high-level solr section (stored fields) and a low-level lucene section (per-field index analysis including docFreq). The docFreq values in the lucene section are not aggregated across shards, they reflect only the shard where the document was found. Aggregating docFreq across shards would require additional fan-out requests to every shard for each term, which is expensive and I'm not sure what the demand is for this kind of logic. I acknowledge that it may feel inconsistent since the docs logic per field does aggregate across shards.
distrib=true with show=doc only only supports id request param, not docId
Lucene document IDs (docId) are internal to each shard's index and have no meaning across shards, i.e. the same docId value refers to different documents on different shards. Only the logical id parameter (using the schema's uniqueKeyField) is supported for distributed doc lookup, since it can be unambiguously resolved across the cluster. Passing docId with distrib=true returns a clear error. Additionally, if the same id is found on multiple shards (indicating index corruption), the handler returns an error rather than silently picking one.
Single-shard fallback to local mode
When distrib=true is set on a collection with only one shard, the handler falls back to local mode returning the standard local Luke response without the shards field. This matches a common pattern in other Solr handlers. However, this creates a response structure inconsistency: clients parsing a distributed response need to handle two shapes depending on shard count. For a single-shard collection, per-shard index metadata (like directory, segmentsFile) appears at the top level; for multi-shard collections, it appears under shards. This may be annoying for client code that needs to look in two places for the same data. An alternative would be to always use the distributed response structure when distrib=true is explicitly requested, even for single-shard collections. This is an open question worth community input.

Tests

Added LukeRequestHandlerDistribTest to test this feature. Also have built this locally to ensure it works E2E.

Checklist

Please review the following and check all that apply:

I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
I have created a Jira issue and added the issue ID to my pull request title.
I have given Solr maintainers access to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation)
I have developed this patch against the main branch.
I have run ./gradlew check.
I have added tests for my changes.
I have added documentation for the Reference Guide
I have added a changelog entry for my change

This reverts commit 5de8cd7.

kotman12 · 2026-02-26T17:59:15Z

solr/core/src/java/org/apache/solr/handler/admin/LukeRequestHandler.java

      String v = uniqueKey.getType().toInternal(params.get(ID));
      Term t = new Term(uniqueKey.getName(), v);
      docId = searcher.getFirstMatch(t);
-      if (docId < 0) {


This a slight change to the existing contract. Previously if a docId was missing it would throw. Now I propose returning an empty response (more in line with how search works). This is to handle the distributed case more elegantly. It is theoretically a breaking change although perhaps unlikely someone was relying on this error? One way I could see this being a problem is someone assuming the doc field will be in the response if the response code is a 200 without null checking first. I suppose this could be problematic and perhaps warrants more care...

the change makes sense to me.

kotman12 · 2026-02-26T18:00:54Z

solr/core/src/java/org/apache/solr/handler/admin/LukeRequestHandler.java

+    final String originalShardAddr;
+    final LukeResponse.FieldInfo originalFieldInfo;
+    private Object indexFlags;
+    private String indexFlagsShardAddr;


The source of the field info and the index flags can actually be different. If a shard has all the documents of a particular field deleted it will have field info but without the index flags, those can theoretically come from yet another shard.

A different approach might be to simply get "first doc" from the index instead of "first live doc". I don't know if this has other implications. But it would eliminate this edge case. It's funny that we already touched first live doc logic here #4157

How does the edger case result in difficulty? Meaning... difficulty in the client to interpret the results, difficulty in explaining, difficulty in implementing this? I tend to find it useful to err on the side of optimizing for result explain-ability.

Mainly implementation difficulty. To start, this edge case only matters for the shard inconsistency validation so if we don't care about that then there's no point in even tracking the actual shard that contributed the first index flags.

Assuming we do care for a second, the only reason I can think of it being advantageous to read index flags from the first live doc as opposed to simply the first doc is if you did something like this:

Had some index flags set to x on field f

Deleted all docs witht field f

Set index flags to y on field f

If you don't delete all docs with field f in step 2 you wouldn't be able to ingest any more docs because the indexing chain's new flags wouldn't match the existing indexed fields. That being said, I don't think I've actually tried this so I am not sure the steps above would provide a valid, writeable index either. That's why I was posing the question whether it matters if the doc is deleted or not.

If we're not sure it matters, let's not over-complicate this. Do the simple thing ignore "live docs". We're not boxed in from realizing later we'd like something more complicated.

I did a little more testing on this and my initial theory that this distinction only exists because we get the first live doc rather than the first doc is incorrect. It turns out a fully deleted field will lack terms even though it does still have schema flags. If a field lacks terms then we can't even get any doc for it (live or not). A workaround for this would be to ignore all fields that don't have terms (that should have terms) but this can get complicated for other reasons. As such, I am leaning toward keeping it the way it is currently.

kotman12 · 2026-02-26T18:01:54Z

solr/core/src/java/org/apache/solr/handler/admin/LukeRequestHandler.java

+
+    // docId is a Lucene-internal integer, not meaningful across shards
+    if (reqParams.getInt(DOC_ID) != null) {
+      throw new SolrException(


The current API returns a single doc in case of show=doc, thus requesting a Lucene docId in distributed mode makes no sense.

kotman12 · 2026-02-26T18:03:44Z

solr/core/src/java/org/apache/solr/handler/admin/LukeRequestHandler.java

+
+    String[] shards = rb.shards;
+    if (shards == null || shards.length == 0) {
+      return false;


The distributed response includes extra fields, thus delegating to the non-distributed API in case of a single sharded is a bit annoying API-wise because the client may have to handle different response structure per, say, collection or cloud.

kotman12 · 2026-02-26T18:05:45Z

solr/core/src/java/org/apache/solr/handler/admin/LukeRequestHandler.java

    // Deal with the chance that the first bunch of terms are in deleted documents. Is there a
    // better way?
-    for (int idx = 0; idx < 1000 && postingsEnum == null; ++idx) {
+    for (int idx = 0; idx < 1000; ++idx) {


This was explained in #4157

kotman12 · 2026-02-26T18:06:26Z

solr/core/src/test/org/apache/solr/handler/admin/LukeRequestHandlerDistribTest.java

+   * different Lucene FieldInfo (and thus different index flags strings) on each shard.
+   */
+  @Test
+  public void testInconsistentIndexFlagsAcrossShards() throws Exception {


Is this test too brittle and possibly overkill?

If we are ok sweeping inconsistencies under the rug we can remove the validation the distributed luke handler does and remove this test. If we remove this test without removing the validation some important code paths of the validation won't be hit in tests. I guess I wanted to showcase how you can get into odd states in Solr with the right (or wrong) sequence of allowed operations.

kotman12 · 2026-02-26T18:07:33Z

solr/solrj/src/java/org/apache/solr/client/solrj/response/LukeResponse.java

    String schema;
    int docs;
    int distinct;
+    Long docsAsLong;


A bit sad I had to add this. It would have been nice to be able to change docs. If I got this in before Solr 10 it could have been possible?

yes it would have been.

Oh right, I want to note (thanks to my efforts) Solr knows the client's version! So we could actually gate the response based on that. req.getHttpSolrCall().getUserAgentSolrVersion()

dsmiley

So so so useful!

solr/solr-ref-guide/modules/indexing-guide/pages/luke-request-handler.adoc

dsmiley · 2026-03-18T01:41:40Z

solr/solr-ref-guide/modules/indexing-guide/pages/luke-request-handler.adoc

+To get a collection-wide view:
+
+[source,text]
+http://localhost:8983/solr/techproducts/admin/luke?distrib=true
+
+To get detailed field statistics across all shards for a specific field:
+
+[source,text]
+http://localhost:8983/solr/techproducts/admin/luke?distrib=true&fl=manu


This distinction reminds me of a direction I want to set for SolrCloud: distrib vs local should be based on the request -- how the destination is addressed. If you address a collection, you thus get the whole thing by default (distrib=true implied), but if you address a specific core, well, you just get that core (distrib=false implied). Just putting this out there for your consideration. WDYT of going this way in this PR?

My only worry is backwards compatibility. The response structure has changed for the distributed case because certain data is not amenable to aggregation. Thus I wanted distrib=true to be an opt-in to not break users who rely on the old structure. Yes, they currently call the collection API and only get one random shard's data but that is exactly how I've been using it until this point (called it at the collection level and still used whatever shard it gave me in the response). Other than that what you said makes sense.

ehh... I think a 10.1 we can change something of this nature. It's a matter of defaults with a clear way a user to get the old behavior.

Is there an existing pattern of what you are describing or is this something new? I couldn't find anything quite like this in existing solr handlers. Also, is this something that is easier to do specifically with V2HttpCall? I wasn't able to find anything the request context that could easily distinguish what the original path was (in v1 at least). I'm kind of assuming this would require a routing layer change but before I dive in that direction I wanted to see if there was something I was missing.

Ok I do see there is a prefix in V2HttpSolrCall that could be used for this. But I am not sure I see the v1 equivalent.

solr/solr-ref-guide/modules/indexing-guide/pages/luke-request-handler.adoc

solr/solrj/src/java/org/apache/solr/client/solrj/response/LukeResponse.java

dsmiley · 2026-03-18T01:56:03Z

solr/solrj/src/java/org/apache/solr/client/solrj/response/LukeResponse.java

+    return n != null ? n.longValue() : null;
+  }
+
+  public Integer getNumDocs() {


is the back-compat really needed?

If we make this a long it would potentially lead to compilation failures for users of solrj across minor versions. I imagine someone somewhere has:

int docs = lukeResponse.getNumDocs();

and I didn't want to break them on a non-major-version upgrade. But I defer to your intuition here.

That's minor enough for a 10.1 change. What I'd like to retain is a SolrJ 10.0 JAR using default javabin to talk to Solr 10.1. XML & JSON consistency as well but that is not expected to change with Integer vs Long... it's javabin that's the risk (if I recall).

CHANGELOG.md

dsmiley · 2026-03-18T01:59:26Z

solr/core/src/java/org/apache/solr/handler/admin/LukeRequestHandler.java

+
+  /** Per-field accumulation state across shards: aggregated response data and field validation. */
+  private static class AggregatedFieldData {
+    final SimpleOrderedMap<Object> aggregated = new SimpleOrderedMap<>();


key is what? (add a comment)

dsmiley · 2026-03-18T02:00:54Z

solr/core/src/java/org/apache/solr/handler/admin/LukeRequestHandler.java

+    }
+  }
+
+  private static class ShardData {


for NamedLists/Maps, please ensure the "key" is clear... like just a comment on the same line is sufficient

dsmiley · 2026-03-18T02:05:54Z

solr/core/src/java/org/apache/solr/handler/admin/LukeRequestHandler.java

+    String currentStr = currentVal != null ? currentVal.toString() : null;
+    String expectedStr = expectedVal != null ? expectedVal.toString() : null;
+    if (!Objects.equals(currentStr, expectedStr)) {
+      throw new SolrException(


Hopefully this basically never happens. I haven't thought about it much but it'd suck to be investigating and see this exception, completely hosing the request overall, concealing info I need to investigate.

I wanted to err on the side of correctness with a clear error which indicates which field+shard combination failed validation. This code has the unique ability to check for distributed corruption which, although rare, is possible with some degenerate sequence of steps (I show one such case in the tests). To address your concern about request hosing; you could still call this in distrib=false mode to get each shard's view. It's just when aggregating I felt bad assuming consistency. OTOH it is a lot of work to test this validation code so maybe it's not worth it.

I don't mean to suggest sweeping this problem under the rug. But maybe add as a header in all-caps and log a warning as well. OTOH, what you said about doing distrib=false seems like a perfectly fine work-around for a user in this case. I'm fine either way.

dsmiley · 2026-03-18T02:09:16Z

solr/core/src/test/org/apache/solr/handler/admin/LukeRequestHandlerDistribTest.java

+import org.junit.BeforeClass;
+import org.junit.Test;
+
+public class LukeRequestHandlerDistribTest extends SolrCloudTestCase {


AIs and humans for that matter may not know that Solr has extensive test infrastructure for distributed search: BaseDistributedSearchTestCase which compares distributed with not for equivalence. It hasn't aged very well but it's a brilliant approach IMO. So not using it warrants a justification. BTW take note @since solr 1.5 and thus pre-dating SorlCloud :-)

Yes BaseDistributedSearchTestCase is great. I've used it quite a bit before (on custom extension code that had some shard response merging logic as well as in the saved search PR which initially also had custom response merging logic). The implicit distributed v non-distributed response comparison is super useful. I would've certainly used it if I wrote this test from scratch as I'm more familiar with it than with SolrCloudTestCase. I went line by line but apparently I still missed little things like this!

…handler.adoc Co-authored-by: David Smiley <dsmiley@apache.org>

…nto distributed-luke

kotman12 added 7 commits February 17, 2026 17:28

add distributed support for /luke (WiP)

2bfb8c0

stylistic edits

e0da019

fix test

bfd8702

simplify field info parsing

974879a

revert backwards incompat changes

3879273

backtrack API expansion

df081d5

tidy up

9fba72e

github-actions bot added documentation Improvements or additions to documentation client:solrj tests labels Feb 19, 2026

kotman12 closed this Feb 19, 2026

kotman12 reopened this Feb 19, 2026

kotman12 changed the title ~~Distributed luke~~ SOLR-8127 Distributed luke Feb 19, 2026

kotman12 added 14 commits February 19, 2026 17:20

refactor

30ce01e

refactor with MergedFieldData

bb3397f

actually let's validate these

cac6d57

actually let's validate these

a4410f3

tidy

6fec56e

remove unnecessary ternary

1cd6ac6

don't show nulled out fields with show=schema

06b9134

test degenerate mismatch schema validation

f0a9750

test deferred index flags

89c64a1

implement distributed show=doc

3a348cd

clean up

0e78dae

clean up

6d49773

named list to xpath validation

d8c38a1

merge -> aggregate as verb

195a73c

kotman12 marked this pull request as ready for review February 25, 2026 20:27

kotman12 added 2 commits February 25, 2026 16:16

changelog

5de8cd7

Revert "changelog"

09d0de4

This reverts commit 5de8cd7.

whitespace

46b20ec

kotman12 changed the title ~~SOLR-8127 Distributed luke~~ SOLR-8127 Distributed Luke Feb 26, 2026

kotman12 commented Feb 26, 2026

View reviewed changes

remove log

4b10c1f

dsmiley reviewed Mar 18, 2026

View reviewed changes

kotman12 and others added 4 commits March 20, 2026 13:34

Update solr/solr-ref-guide/modules/indexing-guide/pages/luke-request-…

552d910

…handler.adoc Co-authored-by: David Smiley <dsmiley@apache.org>

comment and changelog reversion

4a6a1b6

Merge branch 'distributed-luke' of https://github.com/kotman12/solr i…

ef8a992

…nto distributed-luke

address some PR comments

607627f

Conversation

kotman12 commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Solution

Design Decisions

Tests

Checklist

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dsmiley left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kotman12 Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kotman12 commented Feb 19, 2026 •

edited

Loading

kotman12 Mar 18, 2026 •

edited

Loading