Skip to content

Introduces numa_maps metrics inside linux_proc pmda and PCP numastat '-p' option support #2527

Merged
natoscott merged 1 commit intoperformancecopilot:mainfrom
sourav-sharma796:add_numa_metrics
Mar 18, 2026
Merged

Introduces numa_maps metrics inside linux_proc pmda and PCP numastat '-p' option support #2527
natoscott merged 1 commit intoperformancecopilot:mainfrom
sourav-sharma796:add_numa_metrics

Conversation

@sourav-sharma796
Copy link
Copy Markdown
Contributor

Implemented option -p in pcp numastat tool same as numactl numastat tool

  • Add new per-process NUMA maps metrics derived from /proc//numa_maps: proc.numa_maps.huge, proc.numa_maps.heap, proc.numa_maps.stack, proc.numa_maps.private (per-node usage in MB, exported as nodeN: pairs).
  • Extend pcp numastat/pcp-numastat with -p/--process to display per-process NUMA memory usage in a numastat -p-style layout.
  • Convert N= counts to MB using system page size (and Hugepagesize where applicable, with fallback).

How to Test

  • pcp numastat -p (detail view) and pcp numastat -p (summary view)

@sourav-sharma796
Copy link
Copy Markdown
Contributor Author

Hi, @natoscott
Please take a look at this new enhancement.

Thanks!

@sourav-sharma796 sourav-sharma796 changed the title Introduces numa_maps metrics inside linux_proc pmda Introduces numa_maps metrics inside linux_proc pmda and PCP numastat '-p' option support Mar 13, 2026
@sourav-sharma796
Copy link
Copy Markdown
Contributor Author

sourav-sharma796 commented Mar 13, 2026

Local test with numastat tool

[root@localhost pcp]# pcp numastat -p 2 dbus
Linux  5.15.0-316.196.4.2.el8uek.x86_64  (localhost.localdomain)  03/13/26  x86_64    (2 CPU)
Timestamp            : 03/13/2026 08:18:36
Per-node process memory usage (in MBs)
PID                             Node 0            Total
---------------------  ---------------  ---------------
2 (kthreadd)                      0.00             0.00
1740 (dbus-daemon)                6.70             6.70
5620 (libvirt-dbus)              12.71            12.71
2868087 (dbus-daemon)             5.79             5.79
---------------------  ---------------  ---------------
Total                            25.20            25.20

[root@ssosharm-ol8-1 pcp]# numastat -p 2 dbus

Per-node process memory usage (in MBs)
PID                             Node 0           Total
---------------------  --------------- ---------------
2 (kthreadd)                      0.00            0.00
1740 (dbus-daemon)                6.70            6.70
5620 (libvirt-dbus)              12.70           12.70
2868087 (dbus-daemon)             5.80            5.80
---------------------  --------------- ---------------
Total                            25.20           25.20

Detailed view of a process

[root@localhost pcp]# numastat -p 2868087

Per-node process memory usage (in MBs) for PID 2868087 (dbus-daemon)
                           Node 0           Total
                  --------------- ---------------
Huge                         0.00            0.00
Heap                         0.16            0.16
Stack                        0.02            0.02
Private                      5.61            5.61
----------------  --------------- ---------------
Total                        5.79            5.79
[root@localhost pcp]#
[root@localhost pcp]# pcp numastat -p 2868087
Linux  5.15.0-316.196.4.2.el8uek.x86_64  (localhost.localdomain)  03/13/26  x86_64    (2 CPU)
Timestamp            : 03/13/2026 08:19:17
Per-node process memory usage (in MBs) for PID 2868087 (dbus-daemon)
                           Node 0            Total
                  ---------------  ---------------
Huge                         0.00             0.00
Heap                         0.16             0.16
Stack                        0.02             0.02
Private                      5.61             5.61
----------------  ---------------  ---------------
Total                        5.79             5.79

Copy link
Copy Markdown
Member

@natoscott natoscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - few suggested changes in-line.

Comment thread src/pmdas/linux_proc/proc_pid.c Outdated
Comment thread src/pmdas/linux_proc/proc_pid.c Outdated
Comment thread src/pmdas/linux_proc/proc_pid.c Outdated
Comment thread src/pmdas/linux_proc/proc_pid.c
Comment thread src/pmdas/linux_proc/help
Comment thread src/pmdas/linux_proc/help Outdated
Comment thread src/pmdas/linux_proc/help
Implemented option -p in pcp numastat tool same as numactl numastat tool

  - Add new per-process NUMA maps metrics derived from /proc/<pid>/numa_maps: proc.numa_maps.huge, proc.numa_maps.heap, proc.numa_maps.stack,
    proc.numa_maps.private (per-node usage in MB, exported as nodeN:<mb> pairs).
  - Extend pcp numastat/pcp-numastat with -p/--process to display per-process NUMA memory usage in a numastat -p-style layout.
  - Convert N<node>=<pages> counts to MB using system page size (and Hugepagesize where applicable, with fallback).

  How to Test

  - pcp numastat -p <pid> (detail view) and pcp numastat -p <pid1> <pid2> (summary view)

Signed-off-by: Sourav Sharma <sourav.ss.sharma@oracle.com>
@sourav-sharma796
Copy link
Copy Markdown
Contributor Author

Hi,

@natoscott I closely looked at your above given comments and worked on them.
I researched about your proposed ideas to prevent overhead and slowness.

I have made changes and updated the source.

Here's a quick diff for review, as rebase made compare tab huge:

diff --git a/src/pcp/numastat/pcp-numastat.1 b/src/pcp/numastat/pcp-numastat.1
index 4804a5b2c3..2bd9638d47 100644
--- a/src/pcp/numastat/pcp-numastat.1
+++ b/src/pcp/numastat/pcp-numastat.1
@@ -75,7 +75,7 @@ operand is all digits) or regular expressions for command name matching.
 Matching for command names is performed against the full command line
 (as reported by \fBproc.psinfo.psargs\fR).
 When a single process ID operand is provided and it matches exactly one
-process, a detailed table of Huge, Heap, Stack and Private mappings is
+process, a detailed table of Hugepage, Heap, Stack and Private mappings is
 shown (as in \fBnumastat \-p\fR). When multiple process ID operands are
 provided, a per-process summary table is shown (also like
 \fBnumastat \-p\fR) and may be split into multiple node column blocks
diff --git a/src/pcp/numastat/pcp-numastat.py b/src/pcp/numastat/pcp-numastat.py
index 472ea07108..decdf124cc 100755
--- a/src/pcp/numastat/pcp-numastat.py
+++ b/src/pcp/numastat/pcp-numastat.py
@@ -91,14 +91,14 @@ PROCESS_METRICS = [
     "proc.psinfo.pid",
     "proc.psinfo.cmd",
     "proc.psinfo.psargs",
-    "proc.numa_maps.huge",
+    "proc.numa_maps.hugepage",
     "proc.numa_maps.heap",
     "proc.numa_maps.stack",
     "proc.numa_maps.private",
 ]

 PROCESS_NUMA_METRICS = [
-    ("Huge", "proc.numa_maps.huge"),
+    ("Huge", "proc.numa_maps.hugepage"),
     ("Heap", "proc.numa_maps.heap"),
     ("Stack", "proc.numa_maps.stack"),
     ("Private", "proc.numa_maps.private"),
diff --git a/src/pmdas/linux_proc/help b/src/pmdas/linux_proc/help
index fd61456b4c..b1cfdfcfe7 100644
--- a/src/pmdas/linux_proc/help
+++ b/src/pmdas/linux_proc/help
@@ -75,12 +75,18 @@ kernel threads
 @ proc.runq.kernel number of kernel threads
 Instantaneous number of processes with virtual size of zero (kernel threads)

-@ proc.numa_maps.huge per-node hugepage mapped memory in MB (/proc/<pid>/numa_maps)
-Values are reported as comma-separated node/value pairs, e.g.
+@ proc.numa_maps.hugepage per-node hugepage mapped memory in MB (/proc/<pid>/numa_maps)
+Hugepage map values are reported as comma-separated node/value pairs, e.g.
 node0:0.00,node1:4.00, where each value is in megabytes.
 @ proc.numa_maps.heap per-node heap mapped memory in MB (/proc/<pid>/numa_maps)
+Heap map values are reported as comma-separated node/value pairs, e.g.
+node0:0.00,node1:4.00, where each value is in megabytes.
 @ proc.numa_maps.stack per-node stack mapped memory in MB (/proc/<pid>/numa_maps)
+Stack map values are reported as comma-separated node/value pairs, e.g.
+node0:0.00,node1:4.00, where each value is in megabytes.
 @ proc.numa_maps.private per-node private mapped memory in MB (/proc/<pid>/numa_maps)
+Private map values are reported as comma-separated node/value pairs, e.g.
+node0:0.00,node1:4.00, where each value is in megabytes.

 @ proc.control.all.threads process indom includes threads
 If set to one, the process instance domain as reported by pmdaproc
diff --git a/src/pmdas/linux_proc/indom.h b/src/pmdas/linux_proc/indom.h
index abe2596ab2..c5c00a6792 100644
--- a/src/pmdas/linux_proc/indom.h
+++ b/src/pmdas/linux_proc/indom.h
@@ -62,6 +62,7 @@ extern FILE *proc_statsfile(const char *, char *, int);

 /* Generic globals setup during PMDA startup */
 extern size_t _pm_system_pagesize;
+extern size_t _pm_system_hugepagesize;
 extern long _pm_hertz;

 /*
diff --git a/src/pmdas/linux_proc/pmda.c b/src/pmdas/linux_proc/pmda.c
index a531edaca6..b8cc860ebf 100644
--- a/src/pmdas/linux_proc/pmda.c
+++ b/src/pmdas/linux_proc/pmda.c
@@ -66,6 +66,7 @@ static int                    autogroup = -1; /* =1 autogroup enabled */
 static unsigned int            threads;        /* control.all.threads */
 static char *                  cgroups;        /* control.all.cgroups */
 size_t                         _pm_system_pagesize;
+size_t                         _pm_system_hugepagesize;
 long                           _pm_hertz;

 /*
@@ -1408,7 +1409,7 @@ static pmdaMetric metrictab[] = {
 * numa_maps cluster
 */

-/* proc.numa_maps.huge */
+/* proc.numa_maps.hugepage */
   { NULL, { PMDA_PMID(CLUSTER_PID_NUMA_MAPS,0), PM_TYPE_STRING, PROC_INDOM,
     PM_SEM_INSTANT, PMDA_PMUNITS(0,0,0,0,0,0)}},
 /* proc.numa_maps.heap */
@@ -3595,9 +3596,9 @@ proc_fetchCallBack(pmdaMetric *mdesc, unsigned int inst, pmAtomValue *atom)
           return 0;

        switch(item) {
-          case 0: /* proc.numa_maps.huge */
-           atom->cp = proc_strings_lookup(entry->numa_maps.huge_id);
-            break;
+          case 0: /* proc.numa_maps.hugepage */
+                   atom->cp = proc_strings_lookup(entry->numa_maps.huge_id);
+                   break;

           case 1: /* proc.numa_maps.heap */
            atom->cp = proc_strings_lookup(entry->numa_maps.heap_id);
@@ -3976,6 +3977,25 @@ proc_gidname_lookup(int gid)
     return "";
 }

+static size_t
+proc_hugepagesize(void)
+{
+    unsigned long      huge_page_size_kb = 0;
+    char               buf[128];
+    FILE               *fs;
+
+    if ((fs = fopen("/proc/meminfo", "r")) == NULL)
+       return 0;
+
+    while (fgets(buf, sizeof(buf), fs)) {
+       if (sscanf(buf, "Hugepagesize: %lu kB", &huge_page_size_kb) == 1)
+           break;
+    }
+    fclose(fs);
+
+    return huge_page_size_kb * 1024;
+}
+
 /*
  * Initialise the agent (both daemon and DSO).
  */
@@ -3997,6 +4017,10 @@ proc_init(pmdaInterface *dp)
        _pm_system_pagesize = atoi(envpath);
     else
        _pm_system_pagesize = getpagesize();
+    if ((envpath = getenv("PROC_HUGEPAGESIZE")) != NULL)
+       _pm_system_hugepagesize = atoi(envpath);
+    else
+       _pm_system_hugepagesize = proc_hugepagesize();
     if ((envpath = getenv("PROC_STATSPATH")) != NULL)
        proc_statspath = envpath;
     if ((envpath = getenv("PROC_THREADS")) != NULL)
diff --git a/src/pmdas/linux_proc/proc_pid.c b/src/pmdas/linux_proc/proc_pid.c
index 36615a9bcf..2efb2ff29e 100644
--- a/src/pmdas/linux_proc/proc_pid.c
+++ b/src/pmdas/linux_proc/proc_pid.c
@@ -2568,9 +2568,6 @@ fetch_proc_pid_fdinfo(int id, proc_pid_t *proc_pid, int *sts)
 #define PROCESS_PRIVATE_INDEX  3
 #define PROCESS_CATEGORY_COUNT 4

-#define SMALL_BUF_SIZE 128
-
-#define KILOBYTE (1024.0)
 #define MEGABYTE (1024.0 * 1024.0)

 static const char *process_mem_tokens[] = {
@@ -2590,29 +2587,6 @@ typedef struct {
     double     values[PROCESS_CATEGORY_COUNT];
 } numa_node_totals_t;

-static double
-get_huge_page_size_in_bytes(void)
-{
-    static double      cached_huge_page_size = -1.0;
-    double     huge_page_size_kb = 0.0;
-    char       buf[SMALL_BUF_SIZE];
-    FILE       *fs;
-
-    if (cached_huge_page_size >= 0.0)
-       return cached_huge_page_size;
-
-    if ((fs = fopen("/proc/meminfo", "r")) == NULL)
-       return 0.0;
-
-    while (fgets(buf, sizeof(buf), fs)) {
-       if (sscanf(buf, "Hugepagesize: %lf kB", &huge_page_size_kb) == 1)
-           break;
-    }
-    fclose(fs);
-    cached_huge_page_size = huge_page_size_kb * KILOBYTE;
-    return cached_huge_page_size;
-}
-
 static int
 append_numa_maps_node(strbuf_t *b, int node_num, double value_mb)
 {
@@ -2718,19 +2692,28 @@ parse_proc_numa_maps(proc_pid_entry_t *ep, size_t buflen, char *buf)
     int                        node_count = 0;
     int                        sts = 0;
     char               *cur = buf;
+    char               *end;

-    (void)buflen;
+    if (buf == NULL || buflen == 0)
+       return 0;
+    /*
+     * Ensure the proc buffer is NUL-terminated so the string routines below
+     * cannot read past the end.  read_proc_entry() allocates len+1 bytes and
+     * uses buflen (len) for the bytes-read value passed here.
+     */
+    buf[buflen] = '\0';
+    end = buf + buflen;

-    page_size_bytes = (double)sysconf(_SC_PAGESIZE);
+    page_size_bytes = (double)_pm_system_pagesize;
     if (page_size_bytes <= 0.0)
        page_size_bytes = 4096.0;

-    huge_page_size_bytes = get_huge_page_size_in_bytes();
+    huge_page_size_bytes = (double)_pm_system_hugepagesize;
     if (huge_page_size_bytes <= 0.0)
        huge_page_size_bytes = page_size_bytes;

-    while (cur && *cur) {
-       char    *nl = strchr(cur, '\n');
+    while (cur < end && *cur) {
+       char    *nl = memchr(cur, '\n', (size_t)(end - cur));
        char    *tok, *saveptr = NULL;
        int     category;

diff --git a/src/pmdas/linux_proc/root_proc b/src/pmdas/linux_proc/root_proc
index 79c9765be6..d80915e9a9 100644
--- a/src/pmdas/linux_proc/root_proc
+++ b/src/pmdas/linux_proc/root_proc
@@ -459,7 +459,7 @@ proc {
 }

 proc.numa_maps {
-    huge               PROC:81:0
+    hugepage           PROC:81:0
     heap               PROC:81:1
     stack              PROC:81:2
     private            PROC:81:3

Please take a look.

Thanks for your time reviewing this code.

@natoscott natoscott merged commit 5c9d2da into performancecopilot:main Mar 18, 2026
23 checks passed
@natoscott
Copy link
Copy Markdown
Member

@sourav-sharma796 I notice the values have a trailing comma - could you fix that?

proc.numa_maps.private
    inst [785462 or "785462 -bash"] value "node0:3.69,"
    inst [429983 or "429983 /var/lib/pcp/pmdas/proc/pmdaproc"] value "node0:5.69,"

There's also a few failing QA tests as a result of this change, I'll take a quick look at those.

@sourav-sharma796
Copy link
Copy Markdown
Contributor Author

@sourav-sharma796 I notice the values have a trailing comma - could you fix that?

proc.numa_maps.private
    inst [785462 or "785462 -bash"] value "node0:3.69,"
    inst [429983 or "429983 /var/lib/pcp/pmdas/proc/pmdaproc"] value "node0:5.69,"

There's also a few failing QA tests as a result of this change, I'll take a quick look at those.

Yes, I can fix this. I'll open up a PR asap.
Thanks!

@sourav-sharma796
Copy link
Copy Markdown
Contributor Author

@sourav-sharma796 I notice the values have a trailing comma - could you fix that?

proc.numa_maps.private
    inst [785462 or "785462 -bash"] value "node0:3.69,"
    inst [429983 or "429983 /var/lib/pcp/pmdas/proc/pmdaproc"] value "node0:5.69,"

There's also a few failing QA tests as a result of this change, I'll take a quick look at those.

Yes, I can fix this. I'll open up a PR asap. Thanks!

See: #2544

@natoscott
Copy link
Copy Markdown
Member

Another thing I forgot about - the overnight Coverity scan came back with this...

*** CID 502014:         Insecure data handling  (INTEGER_OVERFLOW)
/src/pmdas/linux_proc/pmda.c: 3996             in proc_hugepagesize()
3990         while (fgets(buf, sizeof(buf), fs)) {
3991     	if (sscanf(buf, "Hugepagesize: %lu kB", &huge_page_size_kb) == 1)
3992     	    break;
3993         }
3994         fclose(fs);
3995     
>>>     CID 502014:         Insecure data handling  (INTEGER_OVERFLOW)
>>>     "huge_page_size_kb * 1024UL", which might have overflowed, is returned from the function.
3996         return huge_page_size_kb * 1024;
3997     }
3998     
3999     /*
4000      * Initialise the agent (both daemon and DSO).
4001      */

It's benign since the kernel wont give us garbage (hopefully ;) - but we could easily guard against overflow here (eg just check the huge page size is less than something that'd overflow) - can you take a look?

@sourav-sharma796
Copy link
Copy Markdown
Contributor Author

Another thing I forgot about - the overnight Coverity scan came back with this...

*** CID 502014:         Insecure data handling  (INTEGER_OVERFLOW)
/src/pmdas/linux_proc/pmda.c: 3996             in proc_hugepagesize()
3990         while (fgets(buf, sizeof(buf), fs)) {
3991     	if (sscanf(buf, "Hugepagesize: %lu kB", &huge_page_size_kb) == 1)
3992     	    break;
3993         }
3994         fclose(fs);
3995     
>>>     CID 502014:         Insecure data handling  (INTEGER_OVERFLOW)
>>>     "huge_page_size_kb * 1024UL", which might have overflowed, is returned from the function.
3996         return huge_page_size_kb * 1024;
3997     }
3998     
3999     /*
4000      * Initialise the agent (both daemon and DSO).
4001      */

It's benign since the kernel wont give us garbage (hopefully ;) - but we could easily guard against overflow here (eg just check the huge page size is less than something that'd overflow) - can you take a look?

Yes, I'm looking into it.

I appreciate your efforts.
Thanks!

@sourav-sharma796
Copy link
Copy Markdown
Contributor Author

Another thing I forgot about - the overnight Coverity scan came back with this...

*** CID 502014:         Insecure data handling  (INTEGER_OVERFLOW)
/src/pmdas/linux_proc/pmda.c: 3996             in proc_hugepagesize()
3990         while (fgets(buf, sizeof(buf), fs)) {
3991     	if (sscanf(buf, "Hugepagesize: %lu kB", &huge_page_size_kb) == 1)
3992     	    break;
3993         }
3994         fclose(fs);
3995     
>>>     CID 502014:         Insecure data handling  (INTEGER_OVERFLOW)
>>>     "huge_page_size_kb * 1024UL", which might have overflowed, is returned from the function.
3996         return huge_page_size_kb * 1024;
3997     }
3998     
3999     /*
4000      * Initialise the agent (both daemon and DSO).
4001      */

It's benign since the kernel wont give us garbage (hopefully ;) - but we could easily guard against overflow here (eg just check the huge page size is less than something that'd overflow) - can you take a look?

Hi @natoscott , following change should fix this:

--- a/src/pmdas/linux_proc/pmda.c
+++ b/src/pmdas/linux_proc/pmda.c
@@ -3993,7 +3993,16 @@ proc_hugepagesize(void)
         }
         fclose(fs);
     
-        return huge_page_size_kb * 1024;
+        /* Guard against overflow (CID 502014) */
+        if (huge_page_size_kb > ULONG_MAX / 1024UL) {
+            /* Unexpected value from /proc - fail safely */
+            return 0;
+        }
+
+        return huge_page_size_kb * 1024UL;
 }

What do you think on this?

@natoscott
Copy link
Copy Markdown
Member

@sourav-sharma796 LGTM

@sourav-sharma796
Copy link
Copy Markdown
Contributor Author

sourav-sharma796 commented Mar 23, 2026

@sourav-sharma796 LGTM

Hi @natoscott ,
I've opened a PR #2548 for this.
Please check.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants