-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
MDEV-37098 Fix Spider test failures in network-less environments #4479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 10.11
Are you sure you want to change the base?
Conversation
6a5d77f to
3fcb625
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the other comment (not sure how to delete this one...) #4479 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If, as mentioned in the description https://jira.mariadb.org/browse/MDEV-37098 only two tests fail with warnings, then why not add mtr.add_suppression to these tests instead of changing the warning to info? Something like:
call mtr.add_suppression(".*\\[Warning\\] mariadbd: Can't get hardware address with error");
storage/spider/spd_table.cc
Outdated
| If we can't get the hardware address, we zero it out. | ||
| The spider_unique_id will then look like: -000000000000-PID- | ||
| This is still unique enough per-process to detect self-loops within | ||
| the same server instance, which is the primary purpose. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove
", which is the primary purpose"
because the detection is not mainly meant for same server - in fact the opposite might be true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed those lines from the comment, they are actually not mandatory.
No, it is way more that two tests. In fact so many tests that MTR at default setting stop completely (10+ failures?). Logs are linked in the Jira. This was originally not a warning, but made into one in 734f10f. The commit message does not explain why make it a warning. The main change in that commit was to fix a crash with an addition two lines below, and the warning was just an extra change. As explained in the commit message the warning does not seem like something that should be a warning. |
mariadb-YuchenPei
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it is way more that two tests. In fact so many tests that MTR at default setting stop completely (10+ failures?). Logs are linked in the Jira.
My apologies. I should have reread the comments.
This was originally not a warning, but made into one in 734f10f. The commit message does not explain why make it a warning.
I see. Do you know what happens if instead of sql_print_information you simply change the MYF(ME_WARNING) to MYF(ME_NOTE) or MYF(0) - would either fix the problem?
As an aside, maybe spider should do something like in my_uuid_init and generate a random id (see below) - will create a separate MDEV for that.
if (my_gethwaddr(mac))
{
uint i;
/*
Generating random "hardware addr"
Specs explicitly specify that node identifier should NOT
correlate with a clock_seq value, so we use a separate
randominit() here.
*/
/* purecov: begin inspected */
my_rnd_init(&uuid_rand, (ulong) (seed2+ now/2), (ulong) (now+rand()));
for (i=0; i < array_elements(uuid_suffix) -2 ; i++)
mac[i]= (uchar)(my_rnd(&uuid_rand)*255);
/* purecov: end */
}
I didn't try that at all as I think we do want the server admin (who controls what network interfaces are attached to the server) to see that their system does not have a network. I think the best place is what I did now. Putting a NOTE out to SQL users will 1) not be seen as it is even less than a warning 2) seen by users connecting to the server (although they must all have a local connection in this case). I think the server log is the most appropriate place to log issues related to server network config. The server ID string is long, and indeed your suggestion to just generate it randomly would work perfectly well as the chances of a collision with such a large number is very small. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after making the two suggested changes.
Please also target the PR to 10.11 instead of 10.6, given the fixversions of the MDEV is 10.11+
storage/spider/spd_table.cc
Outdated
| { | ||
| my_printf_error(ER_SPIDER_CANT_NUM, ER_SPIDER_CANT_STR1, MYF(ME_WARNING), | ||
| "get hardware address with error ", errno); | ||
| sql_print_information("Spider: Can't get hardware address with error %d", errno); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please keep my_printf_error but use ME_NOTE instead of ME_WARNING.
I checked and it uses sql_print_information under the hood:
/* in my_message_sql: */
if (MyFlags & ME_NOTE)
{
level= Sql_condition::WARN_LEVEL_NOTE;
func= sql_print_information;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to just doing ME_NOTE and rebased on 10.11 as requested.
When running tests in environments without a network interface (e.g.
Podman container launched with `--network=none`), Spider was not able to
retrieve a hardware address to generate a node ID. This triggered a
warning in the server log, causing MTR to fail multiple Spider tests due
to unexpected warnings and output result mismatches with:
[Warning] mariadbd: Can't get hardware address with error 2
Fix this by logging Spider hardware address errors to server log only
by setting it as a NOTE. This does not pollute the client output.
When`my_gethwaddr` fails, the code zeroes out the address buffer,
resulting in a `spider_unique_id` formatted like `-000000000000-PID-`,
which is fully valid and emitting warnings was a bit overkill to begin
with.
3fcb625 to
25750b8
Compare
Description (=git commit message body)
When running tests in environments without a network interface (e.g. Podman container launched with
--network=none), Spider was not able to retrieve a hardware address to generate a node ID. This triggered a warning in the server log, causing MTR to fail multiple Spider tests due to unexpected warnings and output result mismatches with:Fix this by logging Spider hardware address errors to server log only and does not pollute the client output.
When
my_gethwaddrfails, the code zeroes out the address buffer, resulting in aspider_unique_idformatted like-000000000000-PID-, which is fully valid and emitting warnings was a bit overkill to begin with.Release Notes
This is testing-only, probably not worth mentioning in release notes.
How can this PR be tested?
Run MTR inside a container that has no network as explained in the Jira.
Basing the PR against the correct MariaDB version
This behavior was introduced in 734f10f60155 on 10.5 but it is EOL, thus targeting fix olderst still maintained branch 10.6.
PR quality check