Bug Type (问题类型)
logic (逻辑设计问题)
Before submit
Environment (环境信息)
Expected & Actual behavior (期望与实际表现)
While investigating the hstore CI failure in #2994, we found an existing latent issue in the HStore range-index scan path.
For range-index queries with limit, offset, or paging, HugeGraph's upper layer assumes that backend range scan results are returned in global range-index-key order. It also assumes that the returned PageState.position() can be reused as a HugeGraph range cursor.
However, HStore's multi-node/tablet scan path can return entries in backend iterator order instead of globally sorted key order. The page state is also an internal storage cursor, not necessarily a HugeGraph range-index key. This can make range-index queries return unstable ordering or skip valid entries when paging is involved.
One concrete failure exposed by #2994 was:
graph.traversal().V().hasLabel("person")
.has("birth", P.between(date2013, date2016))
.limit(2)
.toList();
The expected range-index order is:
2013 -> 2014 -> 2015
But the HStore scan returned entries like:
2014 -> 2013 -> 2015
Then limit(2) selected the wrong first two entries.
Another paging-related failure showed that after the first page, the page position was an HStore internal cursor. Reusing it as the range scan start could skip valid range-index entries.
In #2994 we added a narrow workaround in GraphIndexTransaction: for HStore range-index queries whose visible result depends on limit, offset, or paging, the index layer reads the matched range-index entries, sorts them by range-index value, and slices them at the HugeGraph layer. Unbounded range-index scans still use the original streaming path to avoid disturbing count, joint-index, and cleanup paths.
This workaround fixes the immediate user-visible correctness issue, but the lower-level contract is still unclear.
Expected behavior
HStore range scans should have a clear and reliable contract:
If HugeGraph range-index scan semantics require ordered results, HStore should return globally sorted entries across node/tablet iterators.
PageState.position() should have a well-defined meaning. It should be clear whether it is a backend-internal cursor or a HugeGraph key cursor.
Range-index paging should not skip valid entries or depend on accidental backend iterator order.
Possible fix direction
A more complete fix should probably be handled in the HStore store-client / scan iterator layer:
- define whether IdRangeQuery results must be globally ordered by key;
- merge multiple node/tablet iterators by key order when serving ordered range scans;
- separate backend-internal page cursor semantics from HugeGraph range-key cursor semantics;
- add HStore-specific regression tests for:
- range index + limit;
- range index + offset;
- range index + paging across multiple pages;
- cross-node/tablet range scans;
- count / joint-index / left-index cleanup paths to avoid regressions.
Related context
This was exposed during #2994, but it does not seem to be caused by the query-condition refactoring itself. The PR only made the latent HStore issue visible in CI.
Vertex/Edge example (问题点 / 边数据举例)
Schema [VertexLabel, EdgeLabel, IndexLabel] (元数据结构)
Bug Type (问题类型)
logic (逻辑设计问题)
Before submit
Environment (环境信息)
Expected & Actual behavior (期望与实际表现)
While investigating the hstore CI failure in #2994, we found an existing latent issue in the HStore range-index scan path.
For range-index queries with
limit,offset, or paging, HugeGraph's upper layer assumes that backend range scan results are returned in global range-index-key order. It also assumes that the returnedPageState.position()can be reused as a HugeGraph range cursor.However, HStore's multi-node/tablet scan path can return entries in backend iterator order instead of globally sorted key order. The page state is also an internal storage cursor, not necessarily a HugeGraph range-index key. This can make range-index queries return unstable ordering or skip valid entries when paging is involved.
One concrete failure exposed by #2994 was:
The expected range-index order is:
2013 -> 2014 -> 2015
But the HStore scan returned entries like:
2014 -> 2013 -> 2015
Then limit(2) selected the wrong first two entries.
Another paging-related failure showed that after the first page, the page position was an HStore internal cursor. Reusing it as the range scan start could skip valid range-index entries.
In #2994 we added a narrow workaround in GraphIndexTransaction: for HStore range-index queries whose visible result depends on limit, offset, or paging, the index layer reads the matched range-index entries, sorts them by range-index value, and slices them at the HugeGraph layer. Unbounded range-index scans still use the original streaming path to avoid disturbing count, joint-index, and cleanup paths.
This workaround fixes the immediate user-visible correctness issue, but the lower-level contract is still unclear.
Expected behavior
HStore range scans should have a clear and reliable contract:
If HugeGraph range-index scan semantics require ordered results, HStore should return globally sorted entries across node/tablet iterators.
PageState.position() should have a well-defined meaning. It should be clear whether it is a backend-internal cursor or a HugeGraph key cursor.
Range-index paging should not skip valid entries or depend on accidental backend iterator order.
Possible fix direction
A more complete fix should probably be handled in the HStore store-client / scan iterator layer:
Related context
This was exposed during #2994, but it does not seem to be caused by the query-condition refactoring itself. The PR only made the latent HStore issue visible in CI.
Vertex/Edge example (问题点 / 边数据举例)
Schema [VertexLabel, EdgeLabel, IndexLabel] (元数据结构)