Skip to content

ROX-33198: Instrument inode tracking on file open lsm hook#391

Open
JoukoVirtanen wants to merge 14 commits intomainfrom
jv-ROX-33198-instrument-inode-tracking-on-file_open-lsm-hook
Open

ROX-33198: Instrument inode tracking on file open lsm hook#391
JoukoVirtanen wants to merge 14 commits intomainfrom
jv-ROX-33198-instrument-inode-tracking-on-file_open-lsm-hook

Conversation

@JoukoVirtanen
Copy link
Contributor

@JoukoVirtanen JoukoVirtanen commented Mar 15, 2026

Description

Host file paths need to be reported correctly. When a file is created in a tracked directory its inode should be added to a hash set in kernel space. In user space an entry needs to be added into a map with the inode as the key and file path as the value.

Checklist

  • Investigated and inspected CI test results
  • Updated documentation accordingly

Automated testing

  • Added unit tests
  • Added integration tests
  • Added regression tests

If any of these don't apply, please comment below.

Testing Performed

cat .cargo/config.toml
[build]
rustflags = ["-C", "force-frame-pointers=yes"]

[target.'cfg(all())']
runner = "sudo -E"

The following command was run in the main branch and this branch.

FACT_LOGLEVEL=debug cargo run --bin fact -- -p '/etc/sensitive-files/**/*' -p /etc/sensitive-files --expose-metrics

The following commands were also run

etc$ sudo mkdir sensitive-files
/etc/sensitive-files$ sudo touch hello_world.txt
/etc/sensitive-files$ sudo vi hello_world.txt

One of the tests was run with bye_world.txt instead.

In the main branch

{
  "timestamp": 1773981129521946307,
  "hostname": "jvirtane-thinkpadp1gen3.rmtuswa.csb",
  "process": {
    "comm": "vim",
    "args": [
      "/usr/bin/vim",
      "bye_world.txt"
    ],
    "exe_path": "/usr/bin/vim",
    "container_id": null,
    "uid": 0,
    "username": "root",
    "gid": 0,
    "login_uid": 4203274,
    "pid": 1252416,
    "in_root_mount_ns": true,
    "lineage": [
      {
        "uid": 4203274,
        "exe_path": "/usr/bin/sudo"
      },
      {
        "uid": 4203274,
        "exe_path": "/usr/bin/sudo"
      }
    ]
  },
  "file": {
    "Chmod": {
      "inner": {
        "filename": "/etc/sensitive-files/bye_world.txt",
        "host_file": "",
        "inode": {
          "inode": 0,
          "dev": 0
        }
      },
      "new_mode": 33188,
      "old_mode": 33188
    }
  }
}

Note that the host_file is blank.

In this branch

{
  "timestamp": 1773980922622536024,
  "hostname": "jvirtane-thinkpadp1gen3.rmtuswa.csb",
  "process": {
    "comm": "vim",
    "args": [
      "/usr/bin/vim",
      "hello_world.txt"
    ],
    "exe_path": "/usr/bin/vim",
    "container_id": null,
    "uid": 0,
    "username": "root",
    "gid": 0,
    "login_uid": 4203274,
    "pid": 1251608,
    "in_root_mount_ns": true,
    "lineage": [
      {
        "uid": 4203274,
        "exe_path": "/usr/bin/sudo"
      },
      {
        "uid": 4203274,
        "exe_path": "/usr/bin/sudo"
      }
    ]
  },
  "file": {
    "Chmod": {
      "inner": {
        "filename": "/etc/sensitive-files/hello_world.txt",
        "host_file": "/etc/sensitive-files/hello_world.txt",
        "inode": {
          "inode": 74960529,
          "dev": 37
        },
        "parent_inode": {
          "inode": 0,
          "dev": 0
        }
      },
      "new_mode": 33188,
      "old_mode": 33188
    }
  }
}

Note that host_file is now populated.

@JoukoVirtanen JoukoVirtanen requested a review from Molter73 March 15, 2026 18:14
return Ok(());
}

let host_path = host_info::prepend_host_mount(event.get_filename());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial plan was to look up the parent, get its file path, and then add the file name. However, it seems possible to get the entire path from the event.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only works if the file is not mounted to a different directory. Try monitoring /etc/monitored on your host, then run something like docker run --rm -it -v /etc/monitored:/in-container fedora:43 and access some files in /in-container. You should see the events have a filename in /in-container with a host path of /etc/monitored.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has now been changed.

# Wait for creation event
process = Process.from_proc()
creation_event = Event(process=process, event_type=EventType.CREATION,
file=fut, host_path='')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should host_path be populated here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed this file.

"""
cwd = os.getcwd()
config = {
'paths': [f'{monitored_dir}/**', '/mounted/**', '/container-dir/**'],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what globbing to use here.

Comment on lines 129 to 130
if path.is_file() || path.is_dir() {
self.metrics.scan_inc(ScanLabels::FileScanned);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have separate metrics here, so we can check how many files and directories we are tracking exactly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return Ok(());
}

let host_path = host_info::prepend_host_mount(event.get_filename());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only works if the file is not mounted to a different directory. Try monitoring /etc/monitored on your host, then run something like docker run --rm -it -v /etc/monitored:/in-container fedora:43 and access some files in /in-container. You should see the events have a filename in /in-container with a host path of /etc/monitored.

.with_context(|| format!("Failed to add creation event entry for {}", host_path.display()))?;
} else {
debug!("Creation event for non-existent file: {}", host_path.display());
self.metrics.scan_inc(ScanLabels::FileRemoved);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably worth it to add a separate label for this case, missing a host file is a separate condition IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is outdated now.

Comment on lines +242 to +243
if event.is_creation() {
if let Err(e) = self.handle_creation_event(&event) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we change edition in fact/Cargo.toml to 2024 we can rewrite this like:

if event.is_creation() && 
    let Err(e) = self.handle_creation_event(&event) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think formatting checks started to fail after I updated edition. Perhaps that change could be moved to a different PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a PR to update edition and make the formatting changes here #413


/// Handle file creation events by adding new inodes to the map.
fn handle_creation_event(&self, event: &Event) -> anyhow::Result<()> {
if self.get_host_path(Some(event.get_inode())).is_some() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition can never be true, the events coming from the kernel don't have the host path, I would remove the block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been changed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a separate test for this? I assume you should be able to tweak the tests in test_file_open.py to test your changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file has now been removed.

}

unsigned long magic = inode->i_sb->s_magic;
unsigned long magic = BPF_CORE_READ(inode, i_sb, s_magic);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inode_to_key needs to be able to use untrusted pointers, so that it can get the key for a parent inode. To get it to be able to use untrusted pointers, BPF_CORE_READ is used.

Comment on lines +50 to +62
// For file creation events, check if the parent directory is being
// monitored. If so, add the new file's inode to the tracked set.
if (event_type == FILE_ACTIVITY_CREATION) {
struct dentry* parent_dentry = BPF_CORE_READ(file, f_path.dentry, d_parent);
if (parent_dentry) {
struct inode* parent_inode = BPF_CORE_READ(parent_dentry, d_inode);
inode_key_t parent_key = inode_to_key(parent_inode);

if (inode_is_monitored(inode_get(&parent_key)) == MONITORED) {
inode_add(&inode_key);
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move some of this logic to is_monitored? We can change it to return an enum with values like MONITORED, NOT_MONITORED and PARENT_MONITORED, then we can decide what to do based on that returned value (i.e, add the inode if the event is creation and the parent is monitored).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. I tried this locally. It seemed to involve a lot of changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial attempt didn't work. I might try again.

const char filename[PATH_MAX],
inode_key_t* inode) {
inode_key_t* inode,
inode_key_t* parent_inode) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parent inode was added to the event so that it could be used to find the parent path. It is added for all event types, not just when a file is opened. Perhaps NULL could be passed instead of the actual parent inode for the other type of events for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, passing NULL should be enough for now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@JoukoVirtanen JoukoVirtanen requested a review from Molter73 March 20, 2026 05:22
@JoukoVirtanen JoukoVirtanen marked this pull request as ready for review March 20, 2026 05:23
@JoukoVirtanen JoukoVirtanen requested review from a team and rhacs-bot as code owners March 20, 2026 05:23
fut = '/mounted/test.txt'
fut_host = f'{ignored_dir}/test.txt'
fut_backup = f'{fut}~'
fut_backup_host = f'{ignored_dir}/test.txt~'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fut_host and fut_backup_host should probably be blank.

def test_sed(vi_container, server, ignored_dir):
# File Under Test
fut = '/mounted/test.txt'
fut_host = f'{ignored_dir}/test.txt'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

Copy link
Contributor

@Molter73 Molter73 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be a lot of style changes, is this due to you using a newer Rust version? Can you check what version you are using? Is it 1.92+? If this is due to a version difference and yours is newer, can you open a new PR with those style changes? that way we can focus this PR on the actual inode tracking.

@Molter73
Copy link
Contributor

There seems to be a lot of style changes, is this due to you using a newer Rust version? Can you check what version you are using? Is it 1.92+? If this is due to a version difference and yours is newer, can you open a new PR with those style changes? that way we can focus this PR on the actual inode tracking.

Actually, on a second read-through, it might also be related to the bump to version 2024, if so, please move the bump to a separate PR and we can stack this one on top of that one, sorry for the inconvenience!

Copy link
Contributor

@Molter73 Molter73 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need to go through the integration test changes, but I see them failing in the PR, do you need help debugging the failures?

self.update_entry(path.as_path()).with_context(|| {
format!("Failed to update entry for {}", path.display())
})?;
} else if path.is_dir() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is slightly related to a conversation I've been having with @Stringy. IMO, in the case of finding a directory that matches a glob pattern, we should recursively scan into it, the reason being that tracking the directory itself isn't of much value if the pattern doesn't hit any files inside the directory.

For now we can leave it like this, but I think we need to take a close look at what users might expect when configuring different types of glob patterns (i.e do they want that strict set of patterns to be monitored, or are they expressing a set of paths that work as the base to be monitored?)

/// We use the parent inode provided by the eBPF code
/// to look up the parent directory's host path, then construct the full
/// path by appending the new file's name.
fn handle_creation_event(&self, event: &Event) -> anyhow::Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hoping this is still WIP and the debug statements are for testing, we really don't need a debug statement on each return condition, particularly not in the hot path where monitoring sets of files mildly noisy will just flood the logs and make it hard to see what is actually going on.

Please cut down on the debug statements, if something is expected to happen, don't put a statement there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried removing all the debug statements in this method and managed to boil it down from 50+ lines to just this:

        let inode = event.get_inode();
        let parent_inode = event.get_parent_inode();
        if self.get_host_path(Some(inode)).is_some() || parent_inode.empty() {
            return Ok(());
        }

        if let Some(filename) = event.get_filename().file_name()
            && let Some(parent_host_path) = self.get_host_path(Some(parent_inode))
        {
            let host_path = parent_host_path.join(filename);
            self.update_entry_with_inode(inode, host_path)
                .with_context(|| {
                    format!(
                        "Failed to add creation event entry for {}",
                        filename.display()
                    )
                })?;
        }

        Ok(())

Given this is a lot more compact and easy to ready and the additional verbosity in the logs the implementation in the PR doesn't bring much of a benefit IMO, I would propose we try to move the implementation closer to this.

@JoukoVirtanen JoukoVirtanen force-pushed the jv-ROX-33198-instrument-inode-tracking-on-file_open-lsm-hook branch from 8fc3c95 to 4f1686a Compare March 20, 2026 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants