Skip to content

[pull] master from axboe:master#314

Open
pull[bot] wants to merge 1363 commits intokubestone:masterfrom
axboe:master
Open

[pull] master from axboe:master#314
pull[bot] wants to merge 1363 commits intokubestone:masterfrom
axboe:master

Conversation

@pull
Copy link
Copy Markdown

@pull pull bot commented Dec 10, 2021

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

axboe and others added 27 commits June 23, 2025 09:08
…b.com/SuhoSon/fio

* 'fix_real_file_size_when_pi_is_enabled' of https://github.com/SuhoSon/fio:
  io_uring: ensure accurate real_file_size setup for full device access with PI enabled
Cygwin and msys2 now provide nanosleep and clock_gettime, so fio no
longer needs to implement them. The presence of our implementations was
triggering build failures:

https://github.com/axboe/fio/actions/runs/15828051168

Since fio no longer provides clock_gettime, stop unconditionally setting
clock_gettime and clock_monotonic to yes on Windows and start detectinga
these features at build time. These two features are successfully
detected by our configure script:

https://github.com/vincentkfu/fio/actions/runs/15832278184

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
For randtrimwrite, we should issue trim + write pair and those offsets
should be same.

This works good for cases without `offset=` option, but not for cases
with `offset=` option.  In cases with `offset=` option, it's necessary
to subtract `file_offset`, which is value of `offset=` option, when
calculationg offset of write.

This is a bit confusing because `last_start` is an actual offset that
has already been issued through trim.  However, `last_start` is the
value to which `file_offset` is added.  Since we add back `file_offset`
later on after calling `get_next_block` in `get_next_offset`,
`last_start` should be adjusted.

Signed-off-by: Jungwon Lee <jjung1.lee@samsung.com>
Signed-off-by: Minwoo Im <minwoo.im@samsung.com>
[+ updated commit title]
* 'fix-randtrimwrite' of https://github.com/minwooim/fio:
  io_u: fix offset calculation in randtrimwrite
Previously when using the HTTP engine and nrfiles > 1, the engine would
upload a single object N times, instead of N files once. This was due to
a file name reference using the first item in the files list, instead of
the file name passed in the IO information.

Signed-off-by: Renar Narubin <renar.narubin@snowflake.com>
Security tokens are an element of S3 authorization in some environments. This
change adds a parameter to allow users to specify a security token, and pass
this to S3 requests with the appropriate header.

Signed-off-by: Renar Narubin <renar.narubin@snowflake.com>
As Commit 813445e ('backend: clean up requeued io_u's') has been
applied, backend cleans up the remained io_u's in td->io_u_requeues.
However, with end_fsync=1, the __get_io_u() function returns an io_u
from td->io_u_requeues if any io_u exist, and pops it. This leads that
the synced io_u will not put file which it got, and, finally, cannot
close the file.

This patch returns io_u from td->io_u_free_list when td->runstate is
TD_FSYNCING, so that the io_u's in td->io_u_requeues will be cleaned up
and leads to close file appropriately.

Signed-off-by: Jonghwi Jeong <jongh2.jeong@samsung.com>
…ngjonghwi/fio

* 'fsync-get-io-u-from-freelist' of https://github.com/jeongjonghwi/fio:
  io_u: get io_u from io_u_freelist when TD_FSYNCING
…seen events"

This reverts commit ae8646a.

fio_ioring_cqring_reap() returns up to max - events CQEs. However, the
return value of fio_ioring_cqring_reap() is used to both add to events
and subtract from max. This means that if less than min CQEs are
available and the CQ needs to be polled again, max is effectively
lowered by the number of CQEs that were available. Adding to events is
sufficient to ensure the next call to fio_ioring_cqring_reap() will only
return the remaining CQEs. Commit ae8646a ("engines/io_uring:
update getevents max to reflect previously seen events") added an
incorrect subtraction from max as well, so revert it.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Fixes: ae8646a ("engines/io_uring: update getevents max to reflect previously seen events")
fio_ioring_cqring_reap() takes both an events and a max argument and
will return up to events - max CQEs. Only one of the two callers passes
an existing events count. So remove the events argument and have
fio_ioring_getevents() pass events - max instead. This simplifies the
function signature and avoids an addition inside the loop over CQEs.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Currently fio_ioring_cqring_reap() loops over each available CQE,
re-loading the tail index, incrementing local variables, and checking
whether the max requested CQEs have been seen.
Avoid the loop by computing the number of available CQEs as tail - head
and capping it to the requested max.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
fio_ioring_cqring_reap() can't fail and returns an unsigned variable. So
change its return type from int to unsigned.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
There is no point in comparing events to min again after calling
io_uring_enter() to wait for events, as it doesn't change either events
of min. So remove the loop condition and only compare events to min
after updating events. Don't bother repeating fio_ioring_cqring_reap()
before calling io_uring_enter() if less than the min requested events
were available, as it's highly unlikely the CQ tail will have changed.
Avoid breaking and then branching on the return value by just returning
the value from inside the loop.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Add a relaxed-ordering atomic store helper, analogous to
atomic_store_release() and atomic_load_relaxed().

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
fio_ioring_getevents() advances the io_uring CQ head index in
fio_ioring_cqring_reap() before fio_ioring_event() is called to read the
CQEs. In general this would allow the kernel to reuse the CQE slot
prematurely, but the CQ is sized large enough for the maximum iodepth
and a new io_uring operation isn't submitted until the CQE is processed.
Add a comment to explain why it's safe to advance the CQ head index
early. Use relaxed ordering for the store, as there aren't any accesses
to the CQEs that need to be ordered before the store.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
…/fio

* 'fix/io_uring-cq-reap' of https://github.com/calebsander/fio:
  engines/io_uring: relax CQ head atomic store ordering
  arch: add atomic_store_relaxed()
  engines/io_uring: simplify getevents control flow
  engines/io_uring: return unsigned from fio_ioring_cqring_reap()
  engines/io_uring: remove loop over CQEs in fio_ioring_cqring_reap()
  engines/io_uring: consolidate fio_ioring_cqring_reap() arguments
  Revert "engines/io_uring: update getevents max to reflect previously seen events"
The filetype option enables the skipping of the 'stat' syscall for each file
defined in jobs at initialization stage, thus optimizing the huge-set-of-files
fio usage scenario.

Signed-off-by: Sergei Truschev <s.truschev@yadro.com>
For some reason folks thought this was a good idea, but sprinkling
strcmp() calls in a hot path is pretty crazy. Particularly when
you can just check the io_ops address for the right IO engine,
trading a string compare for a simple address compare.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Replace the memory compare with ioengine_uring_cmd with checking the
prep pointer, as that should always be sane.

Outside of that, about half the comparisons are either redundant (eg
it's ONLY run in a uring_cmd specific handler), or should be factored
out into separate code.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
For the love of deity, let's use functions where they make sense.
It nicely encapsulates code that is specific to one thing, AND it
avoids having a ton of indented levels making the code utterly
unreadable.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Don't use an overly long line if it can be avoided.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Don't repeat the code for open/close file, just have the cmd variants
call the normal helper for the actual open or close part.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
io_u->numberio is used to keep track of the sequence number of writes
and verify reads. It is entirely feasible to issue millions or even
billions of IOs in a singe load, so let's use enough bits to handle
that.

numberio is copied into io_piece and verify_header, so update those
structs accordingly.

Signed-off-by: Riley Thomasson <riley.thomasson@gmail.com>
kawasaki and others added 30 commits March 3, 2026 19:11
When -m option is provided for t/zbd/test-zbd-support, the option
write_zone_remainder is specified to fio. In this case, the test case
71 fails because fio does writes to small remainder areas at zone ends
and it changed the number of writes. To avoid the failure, modify the
test condition of the test case.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://lore.kernel.org/r/20260303013159.3543787-9-shinichiro.kawasaki@wdc.com
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
When parsing `ioengine=external:/path`, `td->o.ioengine_so_path` was
previously assigned as a pointer directly into the `td->o.ioengine`
string buffer. If `td->o.ioengine` was subsequently reallocated (e.g.,
due to multiple ioengine definitions or the use of include directives),
`ioengine_so_path` became a dangling pointer, resulting in a
heap-use-after-free during `dlopen_ioengine`.

Fix this by ensuring `ioengine_so_path` owns its own memory allocation
independent of the `ioengine` string. Since this field is not defined
as a standard option entry, manual lifecycle management is implemented:

1.  **str_ioengine_external_cb**: Use `strdup` to store the path and
    free any previously allocated string.
2.  **fio_options_mem_dupe**: Explicitly duplicate the string when
    copying thread options.
3.  **fio_options_free**: Explicitly free the string when tearing down
    thread options.

This approach resolves the UAF while adhering to the requirement of not
adding a new option entry to the parser. Verified with ASan and existing
test suites.

Signed-off-by: Matthew Suozzo <msuozzo@google.com>
* 'push-lnvrzuqpnylp' of https://github.com/msuozzo/fio:
  options: fix heap-use-after-free in ioengine_so_path
The --bandwidth-log option currently uses a hard-coded
agg-[read,write,trim]_bw.log filename for its log files. This patch
provides a means to specify the stub filename for these log files.  The
value assigned to this option (if supplied) will replace the "agg" in
the filename. If no value is supplied the original agg-*_bw.log
filenames will be used.

This is useful for repeated invocations of Fio with the --bandwidth-log
option. Without this option the user would have to rename the
agg-*_bw.log files between invocations to avoid losing data.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Switch to an updated checkout@v6. The original v4 was triggering this
warning:

Node.js 20 actions are deprecated. The following actions are running on
Node.js 20 and may not work as expected: actions/checkout@v4,
actions/upload-artifact@v4. Actions will be forced to run with Node.js
24 by default starting June 2nd, 2026. Please check if updated versions
of these actions are available that support Node.js 24. To opt into
Node.js 24 now, set the FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=true
environment variable on the runner or in your workflow file. Once
Node.js 24 becomes the default, you can temporarily opt out by setting
ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION=true. For more information see:
https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Switch to v6 of the upload-artifact action. The original v4 action was
triggering this warning:

Node.js 20 actions are deprecated. The following actions are running on
Node.js 20 and may not work as expected: actions/checkout@v4,
actions/upload-artifact@v4. Actions will be forced to run with Node.js
24 by default starting June 2nd, 2026. Please check if updated versions
of these actions are available that support Node.js 24. To opt into
Node.js 24 now, set the FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=true
environment variable on the runner or in your workflow file. Once
Node.js 24 becomes the default, you can temporarily opt out by setting
ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION=true. For more information see:
https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Currently, rate_iops does not produce the expected I/O rate with
workloads that use 'bssplit' option. Consider the following example
configuration -

[global]
direct=1
time_based
runtime=30s
ioengine=io_uring
thread=1

[bssplit_rate_iops_repro]
filename=/dev/sdX
rw=randread
iodepth=8
bs=64K
rate_iops=50

This works correctly and ~50 IOPS I/O rate is logged during the run.

If we replace 'bs=64K' with the following bssplit option -

bssplit=32ki/20:64ki/40:256ki/10:512ki/25:1mi/5

in the configuration above, then some incorrect (much lower) IOPS values
are observed to be in effect at run time.

This problem happens because fio, in order to derive the required
I/O rate from 'rate_iops' value provided by the user, simply multiplies
the IOPS value by the minimum block size (min_bs). Once bps I/O rate is
calculated this way, the processing for 'rate' and 'rate_iops' becomes
identical.

This works if the I/O issued has the uniform min_bs, as in case of using
'bs=64K'. However, with 'bssplit' option in effect, fio may issue I/O
with sizes that are much different from min_bs. Yet the code in
usec_for_io() currently always calculates I/O issue delays based on
min_bs leading to incorrect IOPS being produced.

Fix this by modifying usec_for_io() function to check for
bssplit+rate_iops being in effect. For this case, derive the IOPS rate
from bps 'rate' member of thread data and then calculate the delay to
the next I/O using the IOPS value, not the bps rate.

Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://patch.msgid.link/20260310205804.477935-1-dmitry.fomichev@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add new keys to the JSON data specifying the units for latency_target
and latency_window.

Also add units for these values in the normal output.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Added more error numbers(errno) after ERANGE to support various errno
string to options like `--ignore_error=ETIMEDOUT`.  unvme-cli libunvmed
ioengine returns ETIMEDOUT if a command is timed out.  To mask this
situation with `--ignore_error=` option, errno after ERANGE should be
supported in `str2errr()`.

Signed-off-by: Minwoo Im <minwoo.im@samsung.com>
When td->o.io_size > td->o.size, sequential writes wrap around and
revisit the same offsets.  In this case, ``fio_offset_overlap_risk()``
must return true so that io_hist uses an rb-tree instaed of a plain
flist, allowing overallping io_pieces to be detected and replaced
correctly.

This check may produce false positives for multi-file jobs where
per-file wrap-around does not actually occur, but that is acceptable
for the simplicity and since the only cost is using rb-tree
unnecessarily.  A false negative, however, would silently corrupt the
verification history.

Signed-off-by: Minwoo Im <minwoo.im@samsung.com>
o->comm may be NULL if job initialization fails or the job
structure is only partially initialized before thread creation.
Calling prctl(PR_SET_NAME, NULL) results in a NULL pointer
dereference inside strncpy().

Add a NULL check before calling prctl().

Fixes: #2072
Reported-by: Criticayon Black
Signed-off-by: Criticayon Black <1318083585@qq.com>
* 'posix-errnos' of https://github.com/minwooim/fio:
  options: add support more POSIX errnos
* 'fix-null-comm-prctl' of https://github.com/Criticayon/fio:
  backend: guard prctl(PR_SET_NAME) against NULL thread name
Issue: __show_running_run_stats() acquires stat_sem then blocks on each
worker's rusage_sem. But workers need stat_sem to reach the code that
posts rusage_sem, creating an ABBA deadlock. The verify path deadlocks
via a blocking fio_sem_down(stat_sem). The IO path's trylock loop can
mitigate the contention but times out under sustained contention with
multiple workers.

Fix: Moved rusage collection before the stat_sem acquire so the stat
thread never holds stat_sem while waiting on rusage_sem. Added a
double-check of td->runstate after setting update_rusage to guard
against blocking on a worker that has already exited. The trylock
loop and check_update_rusage() calls are retained as precautions.

Signed-off-by: Ryan Tedrick <ryan.tedrick@nutanix.com>
…/fio

* 'fix_statsem_deadlock' of https://github.com/RyanTedrick/fio:
  Fix stat_sem/rusage_sem deadlock during stats collection
prune_io_piece_log() is called only at the start of each loop iteration, so
io_piece entries accumulated during the final do_io() run are never explicitly
freed.

When fio runs as a process this goes unnoticed because the OS reclaims the heap
on exit. When fio is embedded as a pthread, which is a use-case of unvme-cli,
the parent process keeps running, so those allocations become a genuine
memory leak proportional to the number of write IOs logged for verify.

Signed-off-by: Haeun Kim <hanee.kim@samsung.com>
Signed-off-by: Minwoo Im <minwoo.im@samsung.com>
* 'ipo' of https://github.com/minwooim/fio:
  iolog: free io_piece log on thread cleanup
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Introduce a new ioengine that mmaps anonymous memory and copies data
on read/write to trigger page faults. This allows us to leverage FIOs
powerful framework for MM related testing, and will ideally allow us to
quickly expand testing, by leveraging previously FS related fio scripts.

Signed-off-by: Nico Pache <npache@redhat.com>
Link: https://patch.msgid.link/20260408012004.198115-2-npache@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Document the new page fault engine.

Signed-off-by: Nico Pache <npache@redhat.com>
Link: https://patch.msgid.link/20260408012004.198115-3-npache@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Merge page fault engine from Nico:

"This series introduces a new page_fault ioengine for Anonymous memory
  testing. This enables using fio’s existing framework and job files for
  memory management style workloads without relying on a filesystem. An
  example job file is included to demonstrate usage and lays the
  groundwork for how we plan on utilizing fio to test a number of MM
  related workloads."

* anon-fault:
  engines/page_fault: minor style cleanups
  Documentation: update the documentation to include the page_fault engine
  page_fault: add mmap-backed ioengine for anonymous faults
On our ARM platform, select() could return -1 with errno EINTR fairly
often, while we have almost never observed this on x86 platforms.
This breaks the helper_thread loop with A_EXIT, and stops status update
at stdout as well as bandwidth logging (the one enabled by
`write_bw_log` and `log_avg_msec`), causing `bw` logs to look like
getting cutoff since a random time for prolonged runs (~1 hour).
The issue can be easily reproduced on our ARM platform even with
`ioengine=null` and `filename=/dev/null` by spwaning ~30 individual
fio processes with each logging `bw`, and observe the lines of all
produced logs with `wc -l` once all processes finish.

Added action enum A_NOOP and a check to handle the situation
as no error.

Tested on both ARM and x86 platforms with and without
CONFIG_HAVE_TIMERFD_CREATE marco defined. x86 platform never reproduces
the issue in any situation, and result looks good. ARM platform
no longer reproduces the bug and retains the full `bw` log
after the fix.

Signed-off-by: Alex Qiu <xqiu@google.com>
This reverts commit 981c372.

The previous patch "fio.h: treat io_size > size as offset overlap risk"
has led verify table to rb-tree rather than the simple flist in case of
io_size > size meaning that we can simply revert this patch to allow
overlap for sequential readwrite workload.
* 'arm-select-eintr' of https://github.com/alex310110/fio:
  helper_thread: Handle EINTR errno from select()
Update `total_bytes` not for `td_rw(td)`, but `td_write(td)` to keep
going when WRITE goes being overlapped.

Signed-off-by: Minwoo Im <minwoo.im@samsung.com>
Wrap around `f->last_pos` in case of `io_size` > `size` to consider
overlap.  Also, for write-only sequential jobs, `total_bytes` was capped
at size regardless of `io_size`, causing the write phase to stop after
one pass over the file even when `io_size` > `size`.  Unlike rw mode
where reads and writes both consume `bytes_issued`, write-only jobs only
count writes, so there is no risk of `io_size` being consumed by reads.
Allow `io_size` to control how much to write for this case.

This allows online verification when `io_size` > `size` with write-only
sequential jobs.

	fio \
	--name=online \
	--ioengine=io_uring_cmd --filename=/dev/ng0n1 \
	--cmd_type=nvme \
	--rw=write --bs=128k --size=1M --io_size=2M \
	--verify=crc32 --do_verify=1 --debug=io,verify

Before this patch:

- Writes didn't happen twice (no overlap) even though `io_size` >
  `size` since reads consumed the `bytes_issued`.

io       238438 complete: io_u 0x74a23c000d80: off=0x0,len=0x20000,ddir=1,file=/dev/ng0n1
io       238438 complete: io_u 0x74a23c000d80: off=0x20000,len=0x20000,ddir=1,file=/dev/ng0n1
io       238438 complete: io_u 0x74a23c000d80: off=0x40000,len=0x20000,ddir=1,file=/dev/ng0n1
io       238438 complete: io_u 0x74a23c000d80: off=0x60000,len=0x20000,ddir=1,file=/dev/ng0n1
io       238438 complete: io_u 0x74a23c000d80: off=0x80000,len=0x20000,ddir=1,file=/dev/ng0n1
io       238438 complete: io_u 0x74a23c000d80: off=0xa0000,len=0x20000,ddir=1,file=/dev/ng0n1
io       238438 complete: io_u 0x74a23c000d80: off=0xc0000,len=0x20000,ddir=1,file=/dev/ng0n1
io       238438 complete: io_u 0x74a23c000d80: off=0xe0000,len=0x20000,ddir=1,file=/dev/ng0n1
io       238438 complete: io_u 0x74a23c000d80: off=0x0,len=0x20000,ddir=0,file=/dev/ng0n1
io       238438 complete: io_u 0x74a23c000d80: off=0x20000,len=0x20000,ddir=0,file=/dev/ng0n1
io       238438 complete: io_u 0x74a23c000d80: off=0x40000,len=0x20000,ddir=0,file=/dev/ng0n1
io       238438 complete: io_u 0x74a23c000d80: off=0x60000,len=0x20000,ddir=0,file=/dev/ng0n1
io       238438 complete: io_u 0x74a23c000d80: off=0x80000,len=0x20000,ddir=0,file=/dev/ng0n1
io       238438 complete: io_u 0x74a23c000d80: off=0xa0000,len=0x20000,ddir=0,file=/dev/ng0n1
io       238438 complete: io_u 0x74a23c000d80: off=0xc0000,len=0x20000,ddir=0,file=/dev/ng0n1
io       238438 complete: io_u 0x74a23c000d80: off=0xe0000,len=0x20000,ddir=0,file=/dev/ng0n1

After this patch:

- Writes are overlapped, but verify once with the latest `numberio`
  verification.

io       237335 complete: io_u 0x71c1f4000d80: off=0x0,len=0x20000,ddir=1,file=/dev/ng0n1
io       237335 complete: io_u 0x71c1f4000d80: off=0x20000,len=0x20000,ddir=1,file=/dev/ng0n1
io       237335 complete: io_u 0x71c1f4000d80: off=0x40000,len=0x20000,ddir=1,file=/dev/ng0n1
io       237335 complete: io_u 0x71c1f4000d80: off=0x60000,len=0x20000,ddir=1,file=/dev/ng0n1
io       237335 complete: io_u 0x71c1f4000d80: off=0x80000,len=0x20000,ddir=1,file=/dev/ng0n1
io       237335 complete: io_u 0x71c1f4000d80: off=0xa0000,len=0x20000,ddir=1,file=/dev/ng0n1
io       237335 complete: io_u 0x71c1f4000d80: off=0xc0000,len=0x20000,ddir=1,file=/dev/ng0n1
io       237335 complete: io_u 0x71c1f4000d80: off=0xe0000,len=0x20000,ddir=1,file=/dev/ng0n1
io       237335 complete: io_u 0x71c1f4000d80: off=0x0,len=0x20000,ddir=1,file=/dev/ng0n1
io       237335 complete: io_u 0x71c1f4000d80: off=0x20000,len=0x20000,ddir=1,file=/dev/ng0n1
io       237335 complete: io_u 0x71c1f4000d80: off=0x40000,len=0x20000,ddir=1,file=/dev/ng0n1
io       237335 complete: io_u 0x71c1f4000d80: off=0x60000,len=0x20000,ddir=1,file=/dev/ng0n1
io       237335 complete: io_u 0x71c1f4000d80: off=0x80000,len=0x20000,ddir=1,file=/dev/ng0n1
io       237335 complete: io_u 0x71c1f4000d80: off=0xa0000,len=0x20000,ddir=1,file=/dev/ng0n1
io       237335 complete: io_u 0x71c1f4000d80: off=0xc0000,len=0x20000,ddir=1,file=/dev/ng0n1
io       237335 complete: io_u 0x71c1f4000d80: off=0xe0000,len=0x20000,ddir=1,file=/dev/ng0n1
io       237335 complete: io_u 0x71c1f4000d80: off=0x0,len=0x20000,ddir=0,file=/dev/ng0n1
io       237335 complete: io_u 0x71c1f4000d80: off=0x20000,len=0x20000,ddir=0,file=/dev/ng0n1
io       237335 complete: io_u 0x71c1f4000d80: off=0x40000,len=0x20000,ddir=0,file=/dev/ng0n1
io       237335 complete: io_u 0x71c1f4000d80: off=0x60000,len=0x20000,ddir=0,file=/dev/ng0n1
io       237335 complete: io_u 0x71c1f4000d80: off=0x80000,len=0x20000,ddir=0,file=/dev/ng0n1
io       237335 complete: io_u 0x71c1f4000d80: off=0xa0000,len=0x20000,ddir=0,file=/dev/ng0n1
io       237335 complete: io_u 0x71c1f4000d80: off=0xc0000,len=0x20000,ddir=0,file=/dev/ng0n1
io       237335 complete: io_u 0x71c1f4000d80: off=0xe0000,len=0x20000,ddir=0,file=/dev/ng0n1

Signed-off-by: Minwoo Im <minwoo.im@samsung.com>
reset_io_counters() clears td->nr_done_files so that keep_running()
does not return false prematurely because fio_files_done() sees all
files already marked done.

The existing conditions (time_based, loops > 1, do_verify) cover the
cases where the job is expected to restart file iteration.  A job with
io_size > size also requires restarting: the sequential write pointer
wraps around and visits every offset a second time, so the "file done"
bit must be cleared at the start of each outer iteration.

Without this fix, the first pass sets each fio_file's done flag, and
keep_running() exits the outer loop early instead of writing the
second pass -- the overlap that the io_size > size feature is meant to
produce never happens.

Signed-off-by: Minwoo Im <minwoo.im@samsung.com>
Add a Python test suite testing fio's numberio overlap verification for
the three fixes that preceded this commit:

  - fio.h: treat io_size > size as offset overlap risk
  - io_u.c: check wrap around f->last_pos if verify_only=1
  - backend.c: use io_size as limit for seq write-only
  - libfio.c: reset nr_done_files when io_size > size

The test creates situations where io_size > size so that every offset
is written more than once.  fio_offset_overlap_risk() must return true
to activate the rb-tree io_hist backend, which retains only the latest
io_piece per offset.  The verify phase then reads each block exactly
once and checks that numberio on disk matches the latest write.

13 test cases, grouped by rw mode / verify style will be run by two
sync/async ioengines in turn.

Offline verify (OfflineOverlapVerifyTest) runs two separate fio
invocations: a write phase (verify_state_save=1) followed by a verify
phase (verify_only=1, verify_state_load=1).  The verify phase performs
a dry-run write pass to rebuild io_hist, then reads each block and
checks numberio via verify_write_sequence=1.

Online verify (OnlineOverlapVerifyTest) runs a single fio invocation
with do_verify=1, which writes and verifies in the same job.

The filesize < size cases pre-truncate the file smaller than size=
before running fio, verifying that fio_offset_overlap_risk() activates
the rb-tree even before the first I/O because real_file_size < io_size
at setup time.

Signed-off-by: Minwoo Im <minwoo.im@samsung.com>
…m/fio

* 'check-numberio-read-only' of https://github.com/minwooim/fio:
  t/numberio_overlap: add overlap write verification tests
  libfio: reset nr_done_files when io_size > size
  io_u: check wrap around `f->last_pos` for overlapped case
  backend: update `total_bytes` for TD_DDIR_WRITE
  Revert "backend: fix verify issue during readwrite"
  fio.h: treat io_size > size as offset overlap risk
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.