Skip to content

fibre_channel: fix crash when attempting to dereference invalid counters#3573

Open
eaibmz wants to merge 1 commit intoprometheus:masterfrom
eaibmz:collector-fibrechannel-crash-fix
Open

fibre_channel: fix crash when attempting to dereference invalid counters#3573
eaibmz wants to merge 1 commit intoprometheus:masterfrom
eaibmz:collector-fibrechannel-crash-fix

Conversation

@eaibmz
Copy link

@eaibmz eaibmz commented Mar 5, 2026

The statistics counters of a disabled FC host cannot be read from sysfs and any read attempt fails with errno ENOENT. Therefore, all statistics counters returned by procfs are nil in such a case. This results in a crash on s390x Linux when a zFCP host is disabled with chzdev -d <host id>.

Crash stacktrace in a GDB

$ gdb /root/node_exporter/node_exporter
...
Thread 4 "node_exporter" received signal SIGSEGV, Segmentation fault. [Switching to LWP 35579]
0x00000000007be86c in github.com/prometheus/node_exporter/collector.(*fibrechannelCollector).Update (c=0xc00012fef0, ch=0xc0002fc8c0, ~r0=...)
    at /root/node_exporter/collector/fibrechannel_linux.go:133
133                     c.pushCounter(ch, "dumped_frames_total", *host.Counters.DumpedFrames, *host.Name)
(gdb) bt
    at /root/node_exporter/collector/fibrechannel_linux.go:133
(gdb) p host
$1 = {Name = 0xc00038faf0, Speed = 0xc00038fb00, PortState = 0xc00038fb10, PortType = 0xc00038fb30, SymbolicName = 0xc00038fba0, NodeName = 0xc00038fb40, PortID = 0xc00038fb50,
  PortName = 0xc00038fb60, FabricName = 0xc00038fb80, DevLossTMO = 0xc00038fb90, SupportedClasses = 0xc00038fbb0, SupportedSpeeds = 0xc00038fbd0, Counters = 0xc0001cb2d0}
(gdb) p host.Name
$2 = (string *) 0xc00038faf0
(gdb) p *host.Name
$3 = "host0"
(gdb) p *host.Counters.DumpedFrames
❌️ Cannot access memory at address 0x0
(gdb) p *host.Counters
$4 = {DumpedFrames = 0x0, ErrorFrames = 0x0, InvalidCRCCount = 0x0, RXFrames = 0x0, RXWords = 0x0, TXFrames = 0x0, TXWords = 0x0, SecondsSinceLastReset = 0x0, InvalidTXWordCount = 0x0,
  LinkFailureCount = 0x0, LossOfSyncCount = 0x0, LossOfSignalCount = 0x0, NosCount = 0x0, FCPPacketAborts = 0x0}
(gdb) bt
    at /root/node_exporter/collector/fibrechannel_linux.go:133
    name="fibrechannel", c=..., ch=0xc0002fc8c0, logger=0xc0002104a0)
    at /root/node_exporter/collector/collector.go:160
(gdb) p *host.Counters
$5 = {DumpedFrames = 0x0, ErrorFrames = 0x0, InvalidCRCCount = 0x0, RXFrames = 0x0,
  RXWords = 0x0, TXFrames = 0x0, TXWords = 0x0, SecondsSinceLastReset = 0x0,
  InvalidTXWordCount = 0x0, LinkFailureCount = 0x0, LossOfSyncCount = 0x0,
  LossOfSignalCount = 0x0, NosCount = 0x0, FCPPacketAborts = 0x0}
(gdb)

@eaibmz eaibmz force-pushed the collector-fibrechannel-crash-fix branch 2 times, most recently from e099a34 to 5d36126 Compare March 5, 2026 07:49
The statistics counters of a disabled FC host cannot be read from sysfs
and any read attempt fails with errno ENOENT. Therefore, all statistics
counters returned by procfs are nil in such a case. This results in
a crash on s390x Linux when a zFCP host is disabled with
`chzdev -d <host id>`.

Crash stacktrace in a GDB
-------------------------

$ gdb /root/node_exporter/node_exporter
...
Thread 4 "node_exporter" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 35579]
0x00000000007be86c in github.com/prometheus/node_exporter/collector.(*fibrechannelCollector).Update (c=0xc00012fef0, ch=0xc0002fc8c0, ~r0=...)
    at /root/node_exporter/collector/fibrechannel_linux.go:133
133                     c.pushCounter(ch, "dumped_frames_total", *host.Counters.DumpedFrames, *host.Name)
(gdb) bt
    at /root/node_exporter/collector/fibrechannel_linux.go:133
(gdb) p host
$1 = {Name = 0xc00038faf0, Speed = 0xc00038fb00, PortState = 0xc00038fb10, PortType = 0xc00038fb30, SymbolicName = 0xc00038fba0, NodeName = 0xc00038fb40, PortID = 0xc00038fb50,
  PortName = 0xc00038fb60, FabricName = 0xc00038fb80, DevLossTMO = 0xc00038fb90, SupportedClasses = 0xc00038fbb0, SupportedSpeeds = 0xc00038fbd0, Counters = 0xc0001cb2d0}
(gdb) p host.Name
$2 = (string *) 0xc00038faf0
(gdb) p *host.Name
$3 = "host0"
(gdb) p *host.Counters.DumpedFrames
❌️ Cannot access memory at address 0x0
(gdb) p *host.Counters
$4 = {DumpedFrames = 0x0, ErrorFrames = 0x0, InvalidCRCCount = 0x0, RXFrames = 0x0, RXWords = 0x0, TXFrames = 0x0, TXWords = 0x0, SecondsSinceLastReset = 0x0, InvalidTXWordCount = 0x0,
  LinkFailureCount = 0x0, LossOfSyncCount = 0x0, LossOfSignalCount = 0x0, NosCount = 0x0, FCPPacketAborts = 0x0}
(gdb) bt
    at /root/node_exporter/collector/fibrechannel_linux.go:133
    name="fibrechannel", c=..., ch=0xc0002fc8c0, logger=0xc0002104a0)
    at /root/node_exporter/collector/collector.go:160
(gdb) p *host.Counters
$5 = {DumpedFrames = 0x0, ErrorFrames = 0x0, InvalidCRCCount = 0x0, RXFrames = 0x0,
  RXWords = 0x0, TXFrames = 0x0, TXWords = 0x0, SecondsSinceLastReset = 0x0,
  InvalidTXWordCount = 0x0, LinkFailureCount = 0x0, LossOfSyncCount = 0x0,
  LossOfSignalCount = 0x0, NosCount = 0x0, FCPPacketAborts = 0x0}
(gdb)

Signed-off-by: Alexander Egorenkov <eaibmz@gmail.com>
@eaibmz eaibmz force-pushed the collector-fibrechannel-crash-fix branch from 5d36126 to 5bb3fb4 Compare March 5, 2026 07:53
Copy link
Member

@discordianfish discordianfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be fixed in procfs to return empty (non-nil) Counters

@eaibmz
Copy link
Author

eaibmz commented Mar 23, 2026

I think this should be fixed in procfs to return empty (non-nil) Counters

The PR prometheus/procfs#623 in particular introduced references to
to distinguish between non-existing values and 0 values.
I don't think we want it undone ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants