Skip to content

config agent: reduce full config check frequency from 5s to 1m and compare hashes instead#3028

Open
nikw9944 wants to merge 1 commit intomainfrom
nikw/config-agent-cpu
Open

config agent: reduce full config check frequency from 5s to 1m and compare hashes instead#3028
nikw9944 wants to merge 1 commit intomainfrom
nikw/config-agent-cpu

Conversation

@nikw9944
Copy link
Contributor

@nikw9944 nikw9944 commented Feb 17, 2026

Summary

  • Instead of applying config to the device every 5s, apply it only when the config hash changes, or when it has been 60s since it was last applied.

Changes

Controller

  • Refactor GetConfig by extracting config generation into reusable helper functions
  • Add architecture documentation with sequence diagram

Agent

  • Refactor main loop
  • Implement a simple caching scheme for device config
  • Instead of apply the full config from the controller every 5 seconds, only apply it if it has changed
  • Also apply the full config after cache timeout (default 60s)

Testing Verification

  • Added unit tests for controlplane/agent/cmd/main.go
  • No functionality has changed so e2e tests should run as-is

@nikw9944 nikw9944 linked an issue Feb 17, 2026 that may be closed by this pull request
@nikw9944 nikw9944 changed the title agent: reduce network and CPU usage by reducing full config check frequency from 5s to 1m and comparing config hashes instead agent: reduce full config check frequency from 5s to 1m and compare config hashes instead Feb 17, 2026
@nikw9944 nikw9944 self-assigned this Feb 20, 2026
@nikw9944 nikw9944 force-pushed the nikw/config-agent-cpu branch 6 times, most recently from 6fa1ee4 to 2ab7f72 Compare February 24, 2026 21:03
@nikw9944
Copy link
Contributor Author

nikw9944 commented Feb 27, 2026

Here's a before and after comparison of total CPU usage by Arista EOS ConfigAgent when running all tests in parallel.

Before change:

     1400 +---------------------------------------------------------------------------------------------------------+
          |                 +               A+                 +                 +                +                 |
          |                                 *                                                                       |
     1200 |-+                              **                                                                     +-|
          |                                * *                                                                      |
          |                               *  *                                                                      |
          |                               A  *                                                                      |
     1000 |-+                             *  *                                                                    +-|
          |                               *   *                                                                     |
          |                               *   *                                                                     |
      800 |-+                            *    A        A                                                          +-|
          |                              *     A       *                                                            |
CPU %     |                              *      *A    * *                                                           |
      600 |-+                            *        *   * A                                                         +-|
          |                              *        *   *  *                                                          |
          |                              *         A *    A  *A                *A*AA*A*A*A*AA*A*A*A*AA*A*A*A*A      |
      400 |-+                           *           *A     *A  *AA*A*A*A*AA*A*A                                   +-|
          |                             *                                                                           |
          |                             A                                                                           |
          |                            *                                                                            |
      200 |-+                         A                                                                           +-|
          |                           *                                                                             |
          |                 +        *       +                 +                 +                +                 |
        0 +---------------------------------------------------------------------------------------------------------+
          0                100              200               300               400              500               600
                                                       Elapsed (seconds)

After change:

     1400 +---------------------------------------------------------------------------------------------------------+
          |                 +                +     A           +                 +                +                 |
          |                                        **                                                               |
     1200 |-+                           A         * *                                                             +-|
          |                             *         *  *                                                              |
          |                            * *        *  A                                                              |
          |                            * *        *   *                                                             |
     1000 |-+                          *  *      *    *                                                           +-|
          |                           *   A      A     *                                                            |
          |                           A   *      *     A                                                            |
      800 |-+                         *    *    *       A                                                         +-|
          |                           *    *    *        *A                                                         |
CPU %     |                           *     **A *          *                                                        |
      600 |-+                         *     A * *           A                                                     +-|
          |                          *         *             *A                                                     |
          |                          *         A               *A *A                                                |
      400 |-+                        *                           A  *A*A                                          +-|
          |                          *                                  *AA*A*A                                     |
          |                        A*A                                         *A*AA*A*A*A                          |
          |                       *                                                       *A*AA*A*A*A*AA*A*A        |
      200 |-+                    A                                                                                +-|
          |                     *                                                                                   |
          |                 +   *            +                 +                 +                +                 |
        0 +---------------------------------------------------------------------------------------------------------+
          0                100              200               300               400              500               600
                                                       Elapsed (seconds)

Note that the peak usage is about the same, but the steady state usage (after all tests have run and the config agents are just polling) is reduced from about 400% (consuming 4 cores) to about 200% (consuming 2 cores) across 29 EOS containers.

@nikw9944 nikw9944 force-pushed the nikw/config-agent-cpu branch 2 times, most recently from b80e2f9 to c8454d9 Compare February 27, 2026 21:10
@nikw9944 nikw9944 changed the title agent: reduce full config check frequency from 5s to 1m and compare config hashes instead config agent: reduce full config check frequency from 5s to 1m and compare config hashes instead Feb 27, 2026
@nikw9944 nikw9944 changed the title config agent: reduce full config check frequency from 5s to 1m and compare config hashes instead config agent: reduce full config check frequency from 5s to 1m and compare hashes instead Feb 27, 2026
@nikw9944 nikw9944 marked this pull request as ready for review February 27, 2026 21:13
@nikw9944 nikw9944 requested a review from packethog February 27, 2026 21:13
@nikw9944 nikw9944 force-pushed the nikw/config-agent-cpu branch from c8454d9 to ab051f9 Compare March 2, 2026 20:31
@nikw9944 nikw9944 marked this pull request as draft March 2, 2026 21:40
@nikw9944 nikw9944 force-pushed the nikw/config-agent-cpu branch 6 times, most recently from e939e25 to 3d6caec Compare March 16, 2026 20:20
@nikw9944 nikw9944 requested a review from packethog March 17, 2026 15:48
@nikw9944 nikw9944 marked this pull request as ready for review March 17, 2026 15:48
@nikw9944 nikw9944 force-pushed the nikw/config-agent-cpu branch from 3d6caec to 89aa601 Compare March 17, 2026 15:49
@nikw9944 nikw9944 marked this pull request as draft March 17, 2026 15:51
@nikw9944 nikw9944 removed the request for review from packethog March 17, 2026 15:52
@nikw9944 nikw9944 force-pushed the nikw/config-agent-cpu branch 3 times, most recently from 91d1675 to 10b7bbf Compare March 24, 2026 19:31
#3026)

The agent now fetches the full config every 5 seconds but only applies
it to the EOS device when the content has changed (using local SHA256
hash computation) or after a 60-second timeout. This reduces CPU usage
on Arista EOS devices by avoiding unnecessary config applications while
maintaining responsiveness to config changes.
@nikw9944 nikw9944 force-pushed the nikw/config-agent-cpu branch from 10b7bbf to 69b9539 Compare March 24, 2026 20:05
@nikw9944 nikw9944 marked this pull request as ready for review March 24, 2026 20:07
@nikw9944 nikw9944 requested a review from packethog March 24, 2026 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reduce config agent resource consumption

2 participants