Skip to content

Backport request: #2765 / Reva #635, #640, #641, #644 to stable-4.0 due to production deletion incident #2971

Description

@tomickc

Hello OpenCloud team,

We experienced a production incident that appears similar to #2765 and would like to ask whether the related Reva fixes are already included in, or planned to be backported to, the OpenCloud stable-4.0 branch.
Environment

OpenCloud: 4.0.7 stable / production branch

Deployment: Docker Compose

Storage backend: POSIX storage via Reva

Reva observed in logs: v2.40.4

Clients: mostly Windows Desktop Client users, OpenCloud Desktop / mirall 3.0.3.2073

Affected data: shared/project spaces

Users report that they did not manually delete the affected folders/files

Impact

Shared/project-space folders disappeared from the web UI and from clients.

At least one affected folder was still physically present on disk with OpenCloud xattrs, but OpenCloud kept reporting it as missing from cache.

The same folder was not visible in the trash bin.

Multiple desktop clients later processed deletions.

Logging out from the affected desktop client stopped the deletion stream.

Server-side symptoms

We observed repeated server-side errors such as:

Failed to delete data: nats: timeout
could not delete id from cache
could not get spaceID and nodeID from cache
path not found in cache
failed to get ids for entry

Example affected server path pattern, redacted:

/var/lib/opencloud/storage/users/projects///

Example OpenCloud/Reva log locations in stack traces:

github.com/opencloud-eu/reva/v2@v2.40.4/pkg/storage/fs/posix/tree/tree.go:502
github.com/opencloud-eu/reva/v2@v2.40.4/pkg/storage/fs/posix/tree/tree.go:556
github.com/opencloud-eu/reva/v2@v2.40.4/pkg/storage/fs/posix/lookup/store_idcache.go:79

Desktop-client symptoms

The desktop client logs show many successful local-to-server remove operations:

CSYNC_INSTRUCTION_REMOVE | OCC::SyncFileItem::Up | OCC::SyncFileItem::Success | http result code 204

This is important because Up means the desktop client sent removals from the local machine to the server. Users report that they did not intentionally delete these files/folders.

In one affected case, the local sync folder entered a conflicted-copy state shortly before the mass REMOVE Up operations started. The local folder contained a conflicted copy similar to:

(conflicted copy 2026-06-17 144450)

Related server-side WebDAV/access logs show successful DELETE requests from affected desktop client IPs with status 204.
Additional observation: delayed folder reappearance after full Docker Compose restart

One affected folder temporarily disappeared from the web UI and was not visible in the trash bin. Users reported it as deleted/missing.

However, the folder still existed on the server filesystem with OpenCloud xattrs. After running:

docker compose down
docker compose up -d

the folder did not reappear immediately, but became visible again after approximately 4–5 minutes.

Relevant timeline, local time UTC+3:

2026-06-17 20:26:56
storage-users reported:
path not found in cache
failed to get ids for entry
for the affected folder path.

2026-06-17 20:31:12
admin_audit logged a successful file read inside the same affected folder.

This suggests that the data was not physically deleted from disk, but the server-side path/id-cache mapping was temporarily inconsistent or unavailable. After the full restart and some delay, the mapping appears to have recovered and the folder became accessible again.

This behavior seems important for diagnosing the incident, because desktop clients may have interpreted a temporary inconsistent/missing state as removals/deletions.
Evidence excerpts

Below are short redacted excerpts from the incident logs.

  1. NATS / id-cache timeout during delete/cache operation

service=storage-users
driver=posix
error="error deleting cache entry: nats: timeout"
message="could not delete id from cache"
line="github.com/opencloud-eu/reva/v2@v2.40.4/pkg/storage/fs/posix/lookup/store_idcache.go:79"

error="Failed to delete data: nats: timeout"
nodeID=""
path="/"

  1. Existing folder later reported as missing from cache

time="2026-06-17T17:26:56Z"
service="storage-users"
driver="posix"
error="error: not found: path not found in cache:/var/lib/opencloud/storage/users/projects///"
path="/var/lib/opencloud/storage/users/projects///"
line="github.com/opencloud-eu/reva/v2@v2.40.4/pkg/storage/fs/posix/tree/tree.go:502"
message="failed to get ids for entry"

  1. Folder was physically present and later became readable again

At the time above, the folder was not visible in the web UI and was not visible in the trash bin. However, it still existed on the server filesystem with OpenCloud xattrs.

After a full restart:

docker compose down
docker compose up -d

the folder did not reappear immediately, but became visible again after approximately 4–5 minutes.

A few minutes after the path not found in cache error, admin_audit logged a successful read inside the same folder:

time="2026-06-17T17:31:12Z"
service="admin_audit"
action="file_read"
message="user '' read file '$!///.pdf'"

This suggests the data was not physically deleted from disk, but the path/id-cache mapping was temporarily inconsistent or unavailable.
4. Desktop client sent successful local-to-server removals

The desktop client sync log contains many entries like:

timestamp="2026-06-17T12:10:00Z"
file="//"
instruction="CSYNC_INSTRUCTION_REMOVE"
direction="OCC::SyncFileItem::Up"
status="OCC::SyncFileItem::Success"
http_result_code="204"

Another example:

timestamp="2026-06-17T12:15:xxZ"
file="/"
instruction="CSYNC_INSTRUCTION_REMOVE"
direction="OCC::SyncFileItem::Up"
status="OCC::SyncFileItem::Success"
http_result_code="204"

Up is important here because it means the desktop client sent the remove operation from the local machine to the server.
5. Server-side WebDAV/access logs confirm successful DELETE requests

Server-side access logs show successful DELETE requests from desktop clients:

time="2026-06-17T12:10:xxZ"
client_ip=""
method="DELETE"
status="204"
user_agent="Mozilla/5.0 (Windows) mirall/3.0.3.2073 (OpenCloud, windows-10...)"
path="/remote.php/dav/spaces///"

  1. Local conflicted-copy state before mass REMOVE Up operations

On the affected desktop client, the local sync folder entered a conflicted-copy state shortly before the mass REMOVE Up operations started:

(conflicted copy 2026-06-17 144450)

The first mass REMOVE Up operations started around:

2026-06-17T12:10:00Z

which is approximately:

2026-06-17 15:10:00 local time UTC+3

This was shortly after the conflicted-copy timestamp.

Why this looks related to #2765

The observed chain looks similar to the issue described in #2765:

server reports existing paths as not found / cache lookup fails
-> desktop clients later process or send remove/delete operations
-> files/folders are deleted locally/server-side although users did not intentionally delete them

I can see that #2765 was closed via opencloud-eu/reva#635, and related NATS/id-cache fixes also seem to be in:

opencloud-eu/reva#635

opencloud-eu/reva#640

opencloud-eu/reva#641

opencloud-eu/reva#644

However, our OpenCloud 4.0.x production deployment still logs Reva v2.40.4, while OpenCloud 7.x appears to use newer Reva versions where these fixes are listed in release notes.

Could you please confirm:

Are the fixes for #2765 already included in any OpenCloud 4.0.x stable release?

If not, are there plans to backport them to stable-4.0?

Will these fixes be included in the next production/stable 4.0 release?

Is there a recommended mitigation for production users on 4.0.x until the fixes are available?

Is there a safe way to rebuild/repair the affected id-cache mapping for a folder that still exists on disk with OpenCloud xattrs but is reported as path not found in cache?

This is important because the issue can lead to mass deletions on desktop clients without user action.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Qualification

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions