Official MINIX sources - Automatically replicated from gerrit.minix3.org
Go to file
David van Moolenbroek 10b7016b5a Fix soft faults in FSes resulting in partial I/O
In order to resolve page faults on file-mapped pages, VM may need to
communicate (through VFS) with a file system.  The file system must
therefore not be the one to cause, and thus end up being blocked on,
such page faults.  To resolve this potential deadlock, the safecopy
system was previously extended with the CPF_TRY flag, which causes the
kernel to return EFAULT to the caller of a safecopy function upon
getting a pagefault, bypassing VM and thus avoiding the loop.  VFS was
extended to repeat relevant file system calls that returned EFAULT,
after resolving the page fault, to keep these soft faults from being
exposed to applications.

However, general UNIX I/O semantics dictate that if an I/O transfer
partially succeeded before running into a failure, the partial result
is to be returned.  Proper file system implementations may therefore
end up returning partial success rather than the EFAULT code resulting
from a soft fault.  Since VFS does not get the EFAULT code in this
case, it does not know that a soft fault occurred, and thus does not
repeat the call either.  The end result is that an application may get
partial I/O results (e.g., a short read(2)) even on regular files.
Applications cannot reasonably be expected to deal with this.

Due to the fact that most of the current file system implementations
do not implement proper partial-failure semantics, this problem is not
yet widespread.  In fact, it has only occurred on direct block device
I/O so far.  However, the next generation of file system services will
be implementing proper I/O semantics, thus exacerbating the problem.

To remedy this situation, this patch changes the CPF_TRY semantics:
whenever the kernel experiences a soft fault during a safecopy call,
in addition to returning FAULT, the kernel also stores a mark in the
grant created with CPF_TRY.  Instead of testing on EFAULT, VFS checks
whether the grant was marked, as part of revoking the grant.  If the
grant was indeed marked by the kernel, VFS repeats the file system
operation, regardless of its initial return value.  Thus, the EFAULT
code now only serves to make the file system fail the call faster.

The approach is currently supported for both direct and magic grants,
but is used only with magic grants - arguably the only case where it
makes sense.  Indirect grants should not have CPF_TRY set; in a chain
of indirect grants, the original grant is marked, as it should be.
In order to avoid potential SMP issues, the mark stored in the grant
is its grant identifier, so as to discard outdated kernel writes.
Whether this is necessary or effective remains to be evaluated.

This patch also cleans up the grant structure a bit, removing reserved
space and thus making the structure slightly smaller.  The structure
is used internally between system services only, so there is no need
for binary compatibility.

Change-Id: I6bb3990dce67a80146d954546075ceda4d6567f8
2016-01-16 14:04:21 +01:00
bin Import NetBSD ps(1) 2016-01-13 20:32:52 +01:00
common w(1): switch to libkvm 2016-01-13 20:32:50 +01:00
crypto NetBSD re-synchronization of the source tree 2016-01-13 20:32:14 +01:00
distrib Import NetBSD ipcrm(1) 2016-01-16 14:04:14 +01:00
docs Add PTYFS, Unix98 pseudo terminal support 2015-06-23 17:43:46 +00:00
etc PM: generic process event publish/subscribe system 2016-01-16 14:04:10 +01:00
external Rename top(1) to mtop(1), import NetBSD top(1) 2016-01-13 20:32:53 +01:00
games Fix weird flock uses 2016-01-13 20:32:23 +01:00
gnu NetBSD re-synchronization of the source tree 2016-01-13 20:32:14 +01:00
include Start using sysctl(3) throughout userland 2016-01-13 20:32:45 +01:00
lib Import NetBSD dev_mkdb(8) 2016-01-13 20:32:51 +01:00
libexec Start using sysctl(3) throughout userland 2016-01-13 20:32:45 +01:00
minix Fix soft faults in FSes resulting in partial I/O 2016-01-16 14:04:21 +01:00
releasetools Add MIB service, sysctl(2) support 2016-01-13 20:32:37 +01:00
sbin Import NetBSD sysctl(8) 2016-01-13 20:32:48 +01:00
share Integrate ASR instrumentation into build system 2016-01-13 20:32:34 +01:00
sys IPC server: NetBSD sync, general improvements 2016-01-16 13:58:47 +01:00
tests NetBSD re-synchronization of the source tree 2016-01-13 20:32:14 +01:00
tools NetBSD re-synchronization of the source tree 2016-01-13 20:32:14 +01:00
usr.bin Import NetBSD ipcrm(1) 2016-01-16 14:04:14 +01:00
usr.sbin Import NetBSD dev_mkdb(8) 2016-01-13 20:32:51 +01:00
.gitignore gitignore: ignore some more generated files 2012-12-06 13:29:20 +00:00
.gitreview build:update 'git review' config to match gerrit.minix3.org 2014-07-28 17:05:15 +02:00
build.sh NetBSD re-synchronization of the source tree 2016-01-13 20:32:14 +01:00
LICENSE Fix parameter parsing in cut 2010-01-21 10:16:05 +00:00
Makefile NetBSD re-synchronization of the source tree 2016-01-13 20:32:14 +01:00
Makefile.inc Synchronize on NetBSD-CVS (2013/12/1 12:00:00 UTC) 2014-07-28 17:05:06 +02:00