Skip to content

Comments

More test flakes#8907

Merged
rustyrussell merged 7 commits intoElementsProject:masterfrom
rustyrussell:guilt/flakes29
Feb 20, 2026
Merged

More test flakes#8907
rustyrussell merged 7 commits intoElementsProject:masterfrom
rustyrussell:guilt/flakes29

Conversation

@rustyrussell
Copy link
Contributor

@rustyrussell rustyrussell commented Feb 19, 2026

Mainly revealed by CI rework, which stresses things in different places.

Changelog-None: tests only.

xpay can get upset if askrene goes away first:

lightningd-1 2026-02-18T02:47:44.908Z **BROKEN** plugin-cln-xpay: askrene-create-layer failed with {"code":-32601,"message":"Unknown command 'askrene-create-layer'"}

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
@rustyrussell rustyrussell added this to the v26.04 milestone Feb 19, 2026
…c" under valgrind.

Rather than playing whack-a-mole:

```
ERROR tests/test_misc.py::test_emergencyrecover - ValueError: 
Node errors:
 - lightningd-1: had BROKEN or That's weird messages
...
lightningd-1 2026-02-18T02:29:54.826Z UNUSUAL jsonrpc#76: That's weird: Request signpsbt took 7466 milliseconds
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
…lish_closed_channels

We can have:
1. A slow test, so we're near the 60 second tolerance for bitcoind failures.
2. On shutdown, we fail and we hit the limit.

```
lightningd-2 2026-02-18T02:21:19.642Z **BROKEN** plugin-bcli: bitcoin-cli -regtest -datadir=/tmp/ltests-f3nd9ykw/test_reestablish_closed_channels_1/lightning-2/ -rpcclienttimeout=60 -rpcport=57403 -rpcuser=... -stdinrpcpass -stdin getblockhash 104 exited 1 (after 58 other errors) 'error: JSON value of type null is not of expected type number
lightningd-2 2026-02-18T02:21:19.642Z **BROKEN** plugin-bcli: '; we have been retrying command for --bitcoin-retry-timeout=60 seconds; bitcoind setup or our --bitcoin-* configs broken?
lightningd-2 2026-02-18T02:21:19.642Z INFO    plugin-bcli: Killing plugin: exited during normal operation
lightningd-2 2026-02-18T02:21:19.642Z **BROKEN** lightningd: The Bitcoin backend died.
lightningd-2 2026-02-18T02:21:19.865Z **BROKEN** lightningd: FATAL SIGNAL 6 (version 7f635ff-modded)
lightningd-2 2026-02-18T02:21:19.865Z **BROKEN** lightningd: backtrace: /home/runner/work/lightning/lightning/common/daemon.c:46 (send_backtrace) 0x562ab3ef1307
lightningd-2 2026-02-18T02:21:19.865Z **BROKEN** lightningd: backtrace: /home/runner/work/lightning/lightning/common/daemon.c:83 (crashdump) 0x562ab3ef2758
lightningd-2 2026-02-18T02:21:19.865Z **BROKEN** lightningd: backtrace: ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0 ((null)) 0x7fd7e584532f
lightningd-2 2026-02-18T02:21:19.865Z **BROKEN** lightningd: backtrace: ./nptl/pthread_kill.c:44 (__pthread_kill_implementation) 0x7fd7e589eb2c
lightningd-2 2026-02-18T02:21:19.865Z **BROKEN** lightningd: backtrace: ./nptl/pthread_kill.c:78 (__pthread_kill_internal) 0x7fd7e589eb2c
lightningd-2 2026-02-18T02:21:19.865Z **BROKEN** lightningd: backtrace: ./nptl/pthread_kill.c:89 (__GI___pthread_kill) 0x7fd7e589eb2c
lightningd-2 2026-02-18T02:21:19.865Z **BROKEN** lightningd: backtrace: ../sysdeps/posix/raise.c:26 (__GI_raise) 0x7fd7e584527d
lightningd-2 2026-02-18T02:21:19.865Z **BROKEN** lightningd: backtrace: ./stdlib/abort.c:79 (__GI_abort) 0x7fd7e58288fe
lightningd-2 2026-02-18T02:21:19.865Z **BROKEN** lightningd: backtrace: /home/runner/work/lightning/lightning/lightningd/log.c:1128 (fatal_vfmt) 0x562ab3bc7675
lightningd-2 2026-02-18T02:21:19.865Z **BROKEN** lightningd: backtrace: /home/runner/work/lightning/lightning/lightningd/log.c:1138 (fatal) 0x562ab3bc77db
lightningd-2 2026-02-18T02:21:19.865Z **BROKEN** lightningd: backtrace: /home/runner/work/lightning/lightning/lightningd/bitcoind.c:27 (bitcoin_destructor) 0x562ab3a4ae63
lightningd-2 2026-02-18T02:21:19.866Z **BROKEN** lightningd: backtrace: /home/runner/work/lightning/lightning/ccan/ccan/tal/tal.c:246 (notify) 0x562ab40d5d52
lightningd-2 2026-02-18T02:21:19.866Z **BROKEN** lightningd: backtrace: /home/runner/work/lightning/lightning/ccan/ccan/tal/tal.c:437 (del_tree) 0x562ab40d6d5a
lightningd-2 2026-02-18T02:21:19.866Z **BROKEN** lightningd: backtrace: /home/runner/work/lightning/lightning/ccan/ccan/tal/tal.c:532 (tal_free) 0x562ab40d66f0
lightningd-2 2026-02-18T02:21:19.866Z **BROKEN** lightningd: backtrace: /home/runner/work/lightning/lightning/lightningd/plugin.c:469 (plugin_kill) 0x562ab3cd9112
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
With the extra padding pings, we can get more!

```
        # Make sure the noise is within reasonable bounds
        assert tally['query_short_channel_ids'] <= 1
        assert tally['query_channel_range'] <= 1
>       assert tally['ping'] <= 3
E       assert 4 <= 3

tests/test_gossip.py:2396: AssertionError
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This test restarts l2 twice.  Each time, l1 is reconnecting, and backs
off.  If the test is slow enough, the backoff gets extreme:

```
2026-02-19T02:13:03.7669982Z lightningd-1 2026-02-19T01:50:56.541Z DEBUG   033845802d25b4e074ccfd7cd8b339a41dc75bf9978a034800444b51d42b07799a-lightningd: peer_disconnected
2026-02-19T02:13:03.7670444Z lightningd-1 2026-02-19T01:50:56.547Z DEBUG   033845802d25b4e074ccfd7cd8b339a41dc75bf9978a034800444b51d42b07799a-connectd: Will try reconnect in 256 seconds
```

This isn't a bug!  The backoff caps at 300 seconds, and only gets
reset if we remain connected for that long.

A manual reconnect here not only fixes the flake, but make the test
much faster, by not *doubling* the time for slow tests as shown on my
laptop (the final test using `taskset -c 1`):

           Normal      Valgrind      Valgrind, 1 CPU
Before:     22sec        124sec               230sec
After:      18sec        102sec               191sec

These are from a single run: it could be much more in the worst case.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
@rustyrussell rustyrussell force-pushed the guilt/flakes29 branch 2 times, most recently from 35c8be4 to 2effb16 Compare February 19, 2026 08:20
Unfortunately the effect of leaving Nagle enabled is subtle.  Here it
is in v25.12:

Normal: 
    tests/test_connection.py::test_no_delay PASSED
    ====================================================================== 1 passed in 13.87s

Nagle enabled:
    tests/test_connection.py::test_no_delay PASSED
    ====================================================================== 1 passed in 21.70s

So it's hard to both catch this issue and not have false positives.  Improve the
test by deliberately running with Nagle enabled, so we can do a direct comparison.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Not sure how this happens, but it does, and may explain other races:

```
        line = l1.daemon.wait_for_log(f"plugin-all_notifications.py: notification pay_failure: ")
        dict_str = line.split("notification pay_failure: ", 1)[1]
>       data = ast.literal_eval(dict_str)

tests/test_plugin.py:4869: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/opt/hostedtoolcache/Python/3.10.19/x64/lib/python3.10/ast.py:64: in literal_eval
    node_or_string = parse(node_or_string.lstrip(" \t"), mode='eval')
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

source = "{'origin': 'pay', 'payload': {'payment_hash': 'a9caff840abedac582c3e22b46f78d205d8"
filename = '<unknown>', mode = 'eval'

    def parse(source, filename='<unknown>', mode='exec', *,
              type_comments=False, feature_version=None):
        """
        Parse the source into an AST node.
        Equivalent to compile(source, filename, mode, PyCF_ONLY_AST).
        Pass type_comments=True to get back type comments where the syntax allows.
        """
        flags = PyCF_ONLY_AST
        if type_comments:
            flags |= PyCF_TYPE_COMMENTS
        if isinstance(feature_version, tuple):
            major, minor = feature_version  # Should be a 2-tuple.
            assert major == 3
            feature_version = minor
        elif feature_version is None:
            feature_version = -1
        # Else it should be an int giving the minor version for 3.x.
>       return compile(source, filename, mode, flags,
                       _feature_version=feature_version)
E         File "<unknown>", line 1
E           {'origin': 'pay', 'payload': {'payment_hash': 'a9caff840abedac582c3e22b46f78d205d8
E                                                         ^
E       SyntaxError: unterminated string literal (detected at line 1)

/opt/hostedtoolcache/Python/3.10.19/x64/lib/python3.10/ast.py:50: SyntaxError
...
lightningd-1 2026-02-19T03:32:31.425Z INFO    plugin-all_notifications.py: notification pay_failure: {'origin': 'pay', 'payload': {'payment_hash': 'a9caff840abedac582c3e22b46f78d205d87ff1212de05a64b12a6a53459bf29', 'bolt11': 'lnbcrt100n1p5edpz0sp5l5nj5p28kzm47d0tug2c2hz9q9gknd5sve6nwkf59fnpfvzecvcqpp54890lpq2hmdvtqkrug45daudypwc0lcjzt0qtfjtz2n22dzehu5sdq8v3jhxccxqyjw5qcqp9rzjqvuytqpdyk6wqaxvl47d3vee5swuwklej79qxjqqg394r4ptqaue5qqqvuqqqqgqqqqqqqqpqqqqqzsqqc9qxpqysgqvqzc5wd097xa6shdfkvr3yarvj36gu00ylfwdlfql9wkq3kueqgkk3pa3zdzd8a0de5vny0whzwjzrvhcuf86q6m9yakwjwjs3hzl9gpcf5lzn', 'error': {'message': 'failed: WIRE_INCORRECT_OR_UNKNOWN_PAYMENT_DETAILS (reply from remote)'}}, 'pay_failure': {'payment_hash': 'a9caff840abedac582c3e22b46f78d205d87ff1212de05a64b12a6a53459bf29', 'bolt11': 'lnbcrt100n1p5edpz0sp5l5nj5p28kzm47d0tug2c2hz9q9gknd5sve6nwkf59fnpfvzecvcqpp54890lpq2hmdvtqkrug45daudypwc0lcjzt0qtfjtz2n22dzehu5sdq8v3jhxccxqyjw5qcqp9rzjqvuytqpdyk6wqaxvl47d3vee5swuwklej79qxjqqg394r4ptqaue5qqqvuqqqqgqqqqqqqqpqqqqqzsqqc9qxpqysgqvqzc5wd097xa6shdfkvr3yarvj36gu00ylfwdlfql9wkq3kueqgkk3pa3zdzd8a0de5vny0whzwjzrvhcuf86q6m9yakwjwjs3hzl9gpcf5lzn', 'error': {'message': 'failed: WIRE_INCORRECT_OR_UNKNOWN_PAYMENT_DETAILS (reply from remote)'}}}
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
@rustyrussell rustyrussell merged commit 823a575 into ElementsProject:master Feb 20, 2026
83 of 85 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant