Introduces numa_maps metrics inside linux_proc pmda and PCP numastat '-p' option support #2527
Conversation
|
Hi, @natoscott Thanks! |
|
Local test with numastat tool [root@localhost pcp]# pcp numastat -p 2 dbus
Linux 5.15.0-316.196.4.2.el8uek.x86_64 (localhost.localdomain) 03/13/26 x86_64 (2 CPU)
Timestamp : 03/13/2026 08:18:36
Per-node process memory usage (in MBs)
PID Node 0 Total
--------------------- --------------- ---------------
2 (kthreadd) 0.00 0.00
1740 (dbus-daemon) 6.70 6.70
5620 (libvirt-dbus) 12.71 12.71
2868087 (dbus-daemon) 5.79 5.79
--------------------- --------------- ---------------
Total 25.20 25.20
[root@ssosharm-ol8-1 pcp]# numastat -p 2 dbus
Per-node process memory usage (in MBs)
PID Node 0 Total
--------------------- --------------- ---------------
2 (kthreadd) 0.00 0.00
1740 (dbus-daemon) 6.70 6.70
5620 (libvirt-dbus) 12.70 12.70
2868087 (dbus-daemon) 5.80 5.80
--------------------- --------------- ---------------
Total 25.20 25.20Detailed view of a process [root@localhost pcp]# numastat -p 2868087
Per-node process memory usage (in MBs) for PID 2868087 (dbus-daemon)
Node 0 Total
--------------- ---------------
Huge 0.00 0.00
Heap 0.16 0.16
Stack 0.02 0.02
Private 5.61 5.61
---------------- --------------- ---------------
Total 5.79 5.79
[root@localhost pcp]#
[root@localhost pcp]# pcp numastat -p 2868087
Linux 5.15.0-316.196.4.2.el8uek.x86_64 (localhost.localdomain) 03/13/26 x86_64 (2 CPU)
Timestamp : 03/13/2026 08:19:17
Per-node process memory usage (in MBs) for PID 2868087 (dbus-daemon)
Node 0 Total
--------------- ---------------
Huge 0.00 0.00
Heap 0.16 0.16
Stack 0.02 0.02
Private 5.61 5.61
---------------- --------------- ---------------
Total 5.79 5.79 |
0f100d5 to
6614d5e
Compare
natoscott
left a comment
There was a problem hiding this comment.
LGTM - few suggested changes in-line.
Implemented option -p in pcp numastat tool same as numactl numastat tool
- Add new per-process NUMA maps metrics derived from /proc/<pid>/numa_maps: proc.numa_maps.huge, proc.numa_maps.heap, proc.numa_maps.stack,
proc.numa_maps.private (per-node usage in MB, exported as nodeN:<mb> pairs).
- Extend pcp numastat/pcp-numastat with -p/--process to display per-process NUMA memory usage in a numastat -p-style layout.
- Convert N<node>=<pages> counts to MB using system page size (and Hugepagesize where applicable, with fallback).
How to Test
- pcp numastat -p <pid> (detail view) and pcp numastat -p <pid1> <pid2> (summary view)
Signed-off-by: Sourav Sharma <sourav.ss.sharma@oracle.com>
6614d5e to
0ea70c7
Compare
|
Hi, @natoscott I closely looked at your above given comments and worked on them. I have made changes and updated the source. Here's a quick diff for review, as rebase made compare tab huge: diff --git a/src/pcp/numastat/pcp-numastat.1 b/src/pcp/numastat/pcp-numastat.1
index 4804a5b2c3..2bd9638d47 100644
--- a/src/pcp/numastat/pcp-numastat.1
+++ b/src/pcp/numastat/pcp-numastat.1
@@ -75,7 +75,7 @@ operand is all digits) or regular expressions for command name matching.
Matching for command names is performed against the full command line
(as reported by \fBproc.psinfo.psargs\fR).
When a single process ID operand is provided and it matches exactly one
-process, a detailed table of Huge, Heap, Stack and Private mappings is
+process, a detailed table of Hugepage, Heap, Stack and Private mappings is
shown (as in \fBnumastat \-p\fR). When multiple process ID operands are
provided, a per-process summary table is shown (also like
\fBnumastat \-p\fR) and may be split into multiple node column blocks
diff --git a/src/pcp/numastat/pcp-numastat.py b/src/pcp/numastat/pcp-numastat.py
index 472ea07108..decdf124cc 100755
--- a/src/pcp/numastat/pcp-numastat.py
+++ b/src/pcp/numastat/pcp-numastat.py
@@ -91,14 +91,14 @@ PROCESS_METRICS = [
"proc.psinfo.pid",
"proc.psinfo.cmd",
"proc.psinfo.psargs",
- "proc.numa_maps.huge",
+ "proc.numa_maps.hugepage",
"proc.numa_maps.heap",
"proc.numa_maps.stack",
"proc.numa_maps.private",
]
PROCESS_NUMA_METRICS = [
- ("Huge", "proc.numa_maps.huge"),
+ ("Huge", "proc.numa_maps.hugepage"),
("Heap", "proc.numa_maps.heap"),
("Stack", "proc.numa_maps.stack"),
("Private", "proc.numa_maps.private"),
diff --git a/src/pmdas/linux_proc/help b/src/pmdas/linux_proc/help
index fd61456b4c..b1cfdfcfe7 100644
--- a/src/pmdas/linux_proc/help
+++ b/src/pmdas/linux_proc/help
@@ -75,12 +75,18 @@ kernel threads
@ proc.runq.kernel number of kernel threads
Instantaneous number of processes with virtual size of zero (kernel threads)
-@ proc.numa_maps.huge per-node hugepage mapped memory in MB (/proc/<pid>/numa_maps)
-Values are reported as comma-separated node/value pairs, e.g.
+@ proc.numa_maps.hugepage per-node hugepage mapped memory in MB (/proc/<pid>/numa_maps)
+Hugepage map values are reported as comma-separated node/value pairs, e.g.
node0:0.00,node1:4.00, where each value is in megabytes.
@ proc.numa_maps.heap per-node heap mapped memory in MB (/proc/<pid>/numa_maps)
+Heap map values are reported as comma-separated node/value pairs, e.g.
+node0:0.00,node1:4.00, where each value is in megabytes.
@ proc.numa_maps.stack per-node stack mapped memory in MB (/proc/<pid>/numa_maps)
+Stack map values are reported as comma-separated node/value pairs, e.g.
+node0:0.00,node1:4.00, where each value is in megabytes.
@ proc.numa_maps.private per-node private mapped memory in MB (/proc/<pid>/numa_maps)
+Private map values are reported as comma-separated node/value pairs, e.g.
+node0:0.00,node1:4.00, where each value is in megabytes.
@ proc.control.all.threads process indom includes threads
If set to one, the process instance domain as reported by pmdaproc
diff --git a/src/pmdas/linux_proc/indom.h b/src/pmdas/linux_proc/indom.h
index abe2596ab2..c5c00a6792 100644
--- a/src/pmdas/linux_proc/indom.h
+++ b/src/pmdas/linux_proc/indom.h
@@ -62,6 +62,7 @@ extern FILE *proc_statsfile(const char *, char *, int);
/* Generic globals setup during PMDA startup */
extern size_t _pm_system_pagesize;
+extern size_t _pm_system_hugepagesize;
extern long _pm_hertz;
/*
diff --git a/src/pmdas/linux_proc/pmda.c b/src/pmdas/linux_proc/pmda.c
index a531edaca6..b8cc860ebf 100644
--- a/src/pmdas/linux_proc/pmda.c
+++ b/src/pmdas/linux_proc/pmda.c
@@ -66,6 +66,7 @@ static int autogroup = -1; /* =1 autogroup enabled */
static unsigned int threads; /* control.all.threads */
static char * cgroups; /* control.all.cgroups */
size_t _pm_system_pagesize;
+size_t _pm_system_hugepagesize;
long _pm_hertz;
/*
@@ -1408,7 +1409,7 @@ static pmdaMetric metrictab[] = {
* numa_maps cluster
*/
-/* proc.numa_maps.huge */
+/* proc.numa_maps.hugepage */
{ NULL, { PMDA_PMID(CLUSTER_PID_NUMA_MAPS,0), PM_TYPE_STRING, PROC_INDOM,
PM_SEM_INSTANT, PMDA_PMUNITS(0,0,0,0,0,0)}},
/* proc.numa_maps.heap */
@@ -3595,9 +3596,9 @@ proc_fetchCallBack(pmdaMetric *mdesc, unsigned int inst, pmAtomValue *atom)
return 0;
switch(item) {
- case 0: /* proc.numa_maps.huge */
- atom->cp = proc_strings_lookup(entry->numa_maps.huge_id);
- break;
+ case 0: /* proc.numa_maps.hugepage */
+ atom->cp = proc_strings_lookup(entry->numa_maps.huge_id);
+ break;
case 1: /* proc.numa_maps.heap */
atom->cp = proc_strings_lookup(entry->numa_maps.heap_id);
@@ -3976,6 +3977,25 @@ proc_gidname_lookup(int gid)
return "";
}
+static size_t
+proc_hugepagesize(void)
+{
+ unsigned long huge_page_size_kb = 0;
+ char buf[128];
+ FILE *fs;
+
+ if ((fs = fopen("/proc/meminfo", "r")) == NULL)
+ return 0;
+
+ while (fgets(buf, sizeof(buf), fs)) {
+ if (sscanf(buf, "Hugepagesize: %lu kB", &huge_page_size_kb) == 1)
+ break;
+ }
+ fclose(fs);
+
+ return huge_page_size_kb * 1024;
+}
+
/*
* Initialise the agent (both daemon and DSO).
*/
@@ -3997,6 +4017,10 @@ proc_init(pmdaInterface *dp)
_pm_system_pagesize = atoi(envpath);
else
_pm_system_pagesize = getpagesize();
+ if ((envpath = getenv("PROC_HUGEPAGESIZE")) != NULL)
+ _pm_system_hugepagesize = atoi(envpath);
+ else
+ _pm_system_hugepagesize = proc_hugepagesize();
if ((envpath = getenv("PROC_STATSPATH")) != NULL)
proc_statspath = envpath;
if ((envpath = getenv("PROC_THREADS")) != NULL)
diff --git a/src/pmdas/linux_proc/proc_pid.c b/src/pmdas/linux_proc/proc_pid.c
index 36615a9bcf..2efb2ff29e 100644
--- a/src/pmdas/linux_proc/proc_pid.c
+++ b/src/pmdas/linux_proc/proc_pid.c
@@ -2568,9 +2568,6 @@ fetch_proc_pid_fdinfo(int id, proc_pid_t *proc_pid, int *sts)
#define PROCESS_PRIVATE_INDEX 3
#define PROCESS_CATEGORY_COUNT 4
-#define SMALL_BUF_SIZE 128
-
-#define KILOBYTE (1024.0)
#define MEGABYTE (1024.0 * 1024.0)
static const char *process_mem_tokens[] = {
@@ -2590,29 +2587,6 @@ typedef struct {
double values[PROCESS_CATEGORY_COUNT];
} numa_node_totals_t;
-static double
-get_huge_page_size_in_bytes(void)
-{
- static double cached_huge_page_size = -1.0;
- double huge_page_size_kb = 0.0;
- char buf[SMALL_BUF_SIZE];
- FILE *fs;
-
- if (cached_huge_page_size >= 0.0)
- return cached_huge_page_size;
-
- if ((fs = fopen("/proc/meminfo", "r")) == NULL)
- return 0.0;
-
- while (fgets(buf, sizeof(buf), fs)) {
- if (sscanf(buf, "Hugepagesize: %lf kB", &huge_page_size_kb) == 1)
- break;
- }
- fclose(fs);
- cached_huge_page_size = huge_page_size_kb * KILOBYTE;
- return cached_huge_page_size;
-}
-
static int
append_numa_maps_node(strbuf_t *b, int node_num, double value_mb)
{
@@ -2718,19 +2692,28 @@ parse_proc_numa_maps(proc_pid_entry_t *ep, size_t buflen, char *buf)
int node_count = 0;
int sts = 0;
char *cur = buf;
+ char *end;
- (void)buflen;
+ if (buf == NULL || buflen == 0)
+ return 0;
+ /*
+ * Ensure the proc buffer is NUL-terminated so the string routines below
+ * cannot read past the end. read_proc_entry() allocates len+1 bytes and
+ * uses buflen (len) for the bytes-read value passed here.
+ */
+ buf[buflen] = '\0';
+ end = buf + buflen;
- page_size_bytes = (double)sysconf(_SC_PAGESIZE);
+ page_size_bytes = (double)_pm_system_pagesize;
if (page_size_bytes <= 0.0)
page_size_bytes = 4096.0;
- huge_page_size_bytes = get_huge_page_size_in_bytes();
+ huge_page_size_bytes = (double)_pm_system_hugepagesize;
if (huge_page_size_bytes <= 0.0)
huge_page_size_bytes = page_size_bytes;
- while (cur && *cur) {
- char *nl = strchr(cur, '\n');
+ while (cur < end && *cur) {
+ char *nl = memchr(cur, '\n', (size_t)(end - cur));
char *tok, *saveptr = NULL;
int category;
diff --git a/src/pmdas/linux_proc/root_proc b/src/pmdas/linux_proc/root_proc
index 79c9765be6..d80915e9a9 100644
--- a/src/pmdas/linux_proc/root_proc
+++ b/src/pmdas/linux_proc/root_proc
@@ -459,7 +459,7 @@ proc {
}
proc.numa_maps {
- huge PROC:81:0
+ hugepage PROC:81:0
heap PROC:81:1
stack PROC:81:2
private PROC:81:3Please take a look. Thanks for your time reviewing this code. |
|
@sourav-sharma796 I notice the values have a trailing comma - could you fix that? There's also a few failing QA tests as a result of this change, I'll take a quick look at those. |
Yes, I can fix this. I'll open up a PR asap. |
See: #2544 |
|
Another thing I forgot about - the overnight Coverity scan came back with this... It's benign since the kernel wont give us garbage (hopefully ;) - but we could easily guard against overflow here (eg just check the huge page size is less than something that'd overflow) - can you take a look? |
Yes, I'm looking into it. I appreciate your efforts. |
Hi @natoscott , following change should fix this: --- a/src/pmdas/linux_proc/pmda.c
+++ b/src/pmdas/linux_proc/pmda.c
@@ -3993,7 +3993,16 @@ proc_hugepagesize(void)
}
fclose(fs);
- return huge_page_size_kb * 1024;
+ /* Guard against overflow (CID 502014) */
+ if (huge_page_size_kb > ULONG_MAX / 1024UL) {
+ /* Unexpected value from /proc - fail safely */
+ return 0;
+ }
+
+ return huge_page_size_kb * 1024UL;
}What do you think on this? |
|
@sourav-sharma796 LGTM |
Hi @natoscott , Thanks! |
Implemented option -p in pcp numastat tool same as numactl numastat tool
How to Test