DAOS-19122 ddb: Fix flag name collision between global and subcommand parsers#18466
DAOS-19122 ddb: Fix flag name collision between global and subcommand parsers#18466knard38 wants to merge 12 commits into
Conversation
Fix test race condition where status was checked outside of the completion callback. Signed-off-by: Alexander A Oganezov <alexander.oganezov@hpe.com>
opts_add_pci_addr() allocates opts.pci_allowed via realloc() but daos_spdk_init() never frees it after calling spdk_env_init(). Add free(opts.pci_allowed) at the out: label so both the success and error paths release the buffer. _collect() allocates ctrlr->pci_addr via calloc() for each controller, but free_ctrlr_fields() freed all other heap fields (model, serial, fw_rev, vendor_id, pci_type) while omitting pci_addr. Add the missing free(). Two new unit tests are added to nvme_control_ut.c to verify under ASan that opts.pci_allowed and pci_addr are both freed on return. Also fix mock_spdk_nvme_ctrlr_get_pci_device() to use a static local pool instead of calloc(), which was itself leaking under ASan. Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@hpe.com>
It is unnecessary to evict the vos object from cache after related DTX committed; otherwise, other concurrent modification against the same object shard maybe required to retry. Signed-off-by: Fan Yong <fan.yong@hpe.com>
Retire these release branches now that 2.6.5 is released. Signed-off-by: Dalton Bohning <dalton.bohning@hpe.com>
Speedup ftest/nvme/io.py by: - Increasing the client processes - Reducing some aggregate file sizes Signed-off-by: Dalton Bohning <dalton.bohning@hpe.com>
|
Ticket title is 'ddb: |
Replace the two-statement var declaration + assignment with a single ':=' short variable declaration, as suggested in code review. Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@hpe.com>
… parsers go-flags processes the full argument list before dispatching to grumble. When a subcommand defines a flag with the same name as a global option (e.g. 'rm_pool --db_path', 'open -w', 'feature -s'), go-flags intercepts it as the global option and the subcommand never receives it. This was introduced as a regression by commit 86f31a2 (DAOS-18304), which added a validation that rejected any invocation where --db_path was set (globally, by go-flags) without --vos_path -- even when the user correctly passed --db_path as a subcommand-level argument. Fix: add flags.PassAfterNonOption to the go-flags parser so it stops consuming flags after the first positional argument (the subcommand name). Everything after the subcommand lands in RunCmdArgs and is forwarded to grumble's own flag parser. Also: - Move vosPathMissErr to ddb_commands.go (sole use site after this fix) and add dtxAggrMutuallyExclusiveErr and dtxAggrRequiredOptErr constants to eliminate raw string literals in Go-layer error messages. - Fix a typo in the CLI long description (--vos-path -> --vos_path) and accurately list commands that manage their own pool lifecycle. - Update the two TestDdb_parseOpts cases whose behavior changed with PassAfterNonOption (--help after a subcommand now lands in RunCmdArgs instead of being processed by go-flags). Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@hpe.com>
- TestDdb_parseOpts: add two cases verifying PassAfterNonOption behavior:
- flags after the subcommand name are NOT consumed by go-flags (regression case)
- flags before the subcommand name are still consumed globally (validation still fires)
- TestDdb_runDdb: complete noAutoOpen coverage for all 8 commands in the
noAutoOpen list (close, prov_mem, dev_list, dev_replace were missing).
- TestDdb_Cmds: remove the now-obsolete skipCmdLine escape hatch and add
regression coverage for the fixed flag conflicts:
- open: long/short forms of -w/--write_mode and -p/--db_path now work
correctly in command-line mode
- rm_pool: --db_path after subcommand correctly reaches grumble
- prov_mem: -s/--tmpfs_size flag no longer consumed as global VosPath
Also update dtxAggr error assertions to use the new named constants.
Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@hpe.com>
- Add Go-layer input validation to the feature command Run handler: - Enforce that exactly one of --enable, --disable, --show is provided - Reject --db_path when no VOS path argument is given - Add featureOnlyOneOptErr constant and onlyOne() helper - Add LongHelp to the feature command documenting the validation rules - Replace featureFnCheckingShow with the more general featureFnChecking factory and add string2FlagsCapturing to verify enable/disable routing - Add full test coverage for all feature flags (short/long enable, disable, show, db_path) and all validation error paths Features: recovery Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@hpe.com>
8a13591 to
c101fde
Compare
|
Test stage Functional on EL 9 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-18466/1/execution/node/1039/log |
The syscall function signature changed, so suppressions needed to be updated. Also added a suppression related to an internal runtime context.Context implementation. Signed-off-by: Kris Jacque <kris.jacque@hpe.com>
|
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-18466/2/execution/node/1369/log |
|
The two functional test failures of the build #2 are known issues:
|
| const dtxAggrRequiredOptErr = "'--cmt_time' or '--cmt_date' option has to be defined" | ||
| const featureOnlyOneOptErr = "exactly one of --enable, --disable, --show must be provided" | ||
|
|
||
| func onlyOne(bools ...bool) bool { |
There was a problem hiding this comment.
could this go in a generic utility file for reuse?
Worker processes spawned by _Dfs.parallel_list may raise exceptions that never reached the calling process. This results in indefinite hang during Dataset and IterableDataset construction with no surfaced error to the user. Replacing manual Process + Queue scheme and its queued/processed counter with a multiprocessing.Pool driven by imap_unordered. Pool re-raises worker exceptions in the parent when their results are consumed, so a worker error now propagates as a raised OSError instead of a deadlock, and the Pool context manager reaps all workers on any exit path. `concurrent.futures.ProcessPoolExecutor` would be even better but its initializer/initargs arguments are unavailable before Python 3.7, and the target runtime includes EL8.8 / Python 3.6. Signed-off-by: Denis Barakhtanov <dbarahtanov@enakta.com>
…/daos-19122/patch-001 Features: recovery
daltonbohning
left a comment
There was a problem hiding this comment.
This PR seems to have a lot of commits already on master
I think, it is because I have recently merged the target PR with master. |
I think it's because the target branch |
|
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-18466/3/execution/node/1311/log |
Description
Commit 86f31a2 (DAOS-18304) introduced a regression:
ddb rm_pool --db_path ...fails with:This breaks the
recovery/pool_list_consolidation.pytest (test_lost_majority_ps_replicas) on MD-on-SSD clusters, as observed in PR #17965 CI build 22.Root cause: go-flags processes the full command line before dispatching to grumble. Because
--db_pathis registered as a global option incliOptions, go-flags intercepts it — even when the user supplies it as a subcommand-level argument (e.g.ddb rm_pool --db_path ...). The new validation added by86f31a2bd3then rejects the invocation because the global--vos_pathwas not also provided. The same collision affectsopen(-w,-p),feature(-p,-s), andprov_mem(-s).Solution
This PR fixes the regression and improves the
featurecommand with better input validation, all covered by new unit tests. The PR is split into four focused commits, each independently buildable and testable.Commit 1 — Go style: idiomatic logger initialisation
Replace a two-statement
vardeclaration + assignment with a single:=short variable declaration, as suggested in code review of PR #18086.Commit 2 — Fix:
PassAfterNonOptionAdd
flags.PassAfterNonOptionto the go-flags parser. This makes go-flags stop consuming flags after the first positional argument (the subcommand name). Everything after the subcommand lands inRunCmdArgsand is forwarded to grumble's own flag parser, which correctly handles command-level flags.Also includes the two
TestDdb_parseOptscases whose expected behavior changed withPassAfterNonOption(see Side effects below), a doc fix (--vos-path→--vos_path), and error message constants fordtxAggr.Commit 3 — Regression tests
TestDdb_parseOpts: two new cases —--db_pathafter the subcommand is not consumed globally (core regression case);--db_pathbefore the subcommand still triggers thevosPathMissErrvalidation (correct misuse detection preserved).TestDdb_runDdb: completenoAutoOpencoverage for all 8 commands (close,prov_mem,dev_list,dev_replacewere previously missing).TestDdb_Cmds: removeskipCmdLine(now obsolete); add regression tests foropen(-w,--db_path),rm_pool(--db_path), andprov_mem(-s).Commit 4 —
featurecommand improvements--enable/--disable/--showrequired;--db_pathrequires a VOS path argument.featureOnlyOneOptErrconstant andonlyOne()helper.LongHelpdocumenting the validation rules.featureFnCheckingShowwith the more generalfeatureFnCheckingfactory; addstring2FlagsCapturingto verify enable/disable routing.Side effects
PassAfterNonOptionchanges one existing user-visible behavior: flags that appear after the subcommand name are now passed to grumble instead of go-flags. The most visible effect is thatddb <cmd> --helpnow shows the subcommand-specific help rather than the general ddb help:This is a correction of incorrect behavior — users naturally expect
ddb ls --helpto document thelscommand. Flags placed before the subcommand (e.g.ddb --vos_path /foo ls) continue to work as before.Steps for the author:
After all prior steps are complete: