jimharris [Mon, 11 Jan 2016 17:32:56 +0000 (17:32 +0000)]
MFC r293352:
nvme: add hw.nvme.min_cpus_per_ioq tunable
Due to FreeBSD system-wide limits on number of MSI-X vectors
(https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199321),
it may be desirable to allocate fewer than the maximum number
of vectors for an NVMe device, in order to save vectors for
other devices (usually Ethernet) that can take better
advantage of them and may be probed after NVMe.
This tunable is expressed in terms of minimum number of CPUs
per I/O queue instead of max number of queues per controller,
to allow for a more even distribution of CPUs per queue. This
avoids cases where some number of CPUs have a dedicated queue,
but other CPUs need to share queues. Ideally the PR referenced
above will eventually be fixed and the mechanism implemented
here becomes obsolete anyways.
While here, fix a bug in the CPUs per I/O queue calculation to
properly account for the admin queue's MSI-X vector.
jimharris [Mon, 11 Jan 2016 17:31:18 +0000 (17:31 +0000)]
MFC r293328:
nvme: do not revert to single I/O queue when per-CPU queues not available
Previously nvme(4) would revert to a single I/O queue if it could not
allocate enought interrupt vectors or NVMe submission/completion queues
to have one I/O queue per core. This patch determines how to utilize a
smaller number of available interrupt vectors, and assigns (as closely
as possible) an equal number of cores to each associated I/O queue.
ngie [Sun, 10 Jan 2016 17:39:49 +0000 (17:39 +0000)]
Unbreak stable/10 buildworlds on arm/arm, mips/mips, mips/mips64, mips/mipsel,
mips/mipsn32, powerpc/powerpc, powerpc/powerpc64, sparc64/sparc64 with gcc
after r293307 (some of the BURN_BRIDGES code)
MFC after: 3 days
Pointyhat to: markj
Sponsored by: EMC / Isilon Storage Division
ae [Sun, 10 Jan 2016 13:53:57 +0000 (13:53 +0000)]
MFC r292057:
Make detection of GPT a bit more reliable.
When we are detecting a partition table and didn't find PMBR, try to
read backup GPT header from the last sector and if it is correct,
assume that we have GPT.
dchagin [Sat, 9 Jan 2016 18:28:15 +0000 (18:28 +0000)]
MFC r288994 (by bdrewery):
Remove redundant RFFPWAIT/vfork(2) handling in Linux fork(2) and clone(2) wrappers.
r161611 added some of the code from sys_vfork() directly into the Linux
module wrappers since they use RFSTOPPED. In r232240, the RFFPWAIT handling
was moved to syscallret(), thus this code in the Linux module is no longer
needed as it will be called later.
This also allows the Linux wrappers to benefit from the fix in r275616 for
threads not getting suspended if their vforked child is stopped while they
wait on them.
dchagin [Sat, 9 Jan 2016 18:07:48 +0000 (18:07 +0000)]
MFC r283544:
When I merged the lemul branch I missied kib@'s r282708 commit.
This is not the final fix as I need properly cleanup thread resources
before other threads suicide.
dchagin [Sat, 9 Jan 2016 18:05:04 +0000 (18:05 +0000)]
MFC r283498:
Linux nanosleep() and clock_nanosleep() system calls always
writes the remaining time into the structure pointed to by rmtp
unless rmtp is NULL. The value of *rmtp can then be used to call
nanosleep() again and complete the specified pause if the previous
call was interrupted.
Note. clock_nanosleep() with an absolute time value does not write
the remaining time.
dchagin [Sat, 9 Jan 2016 18:03:09 +0000 (18:03 +0000)]
MFC r283496:
The latest cp tool is trying to use the btrfs clone operation that is
implemented via ioctl interface. First of all return ENOTSUP for this
operation as a cp fallback to usual method in that case. Secondly, do
not print out the message about unimplemented operation.
dchagin [Sat, 9 Jan 2016 17:44:08 +0000 (17:44 +0000)]
MFC r283483:
Convert signal number to native for VT_SETMODE ioctl and remove
strange and invalid ISSIGVALID macro.
The code has not been tested right way but it was originally broken.
dchagin [Sat, 9 Jan 2016 17:39:41 +0000 (17:39 +0000)]
MFC r283479:
The kernel sends signals to the processes via ABI specific sv_sendsig method.
Native ABI do not need signal conversion, only emulators may want this. Usually
emulators implements its own sv_sendsig method. For now only ibcs2 emulator does
not have own sv_sendsig implementation and depends on native sendsig() method.
So, remove any extra attempts to convert signal numbers from native sendsig()
methods except from i386 where ibsc2 is living.
dchagin [Sat, 9 Jan 2016 17:29:08 +0000 (17:29 +0000)]
MFC r283474:
Rework signal code to allow using it by other modules, like linprocfs:
1. Linux sigset always 64 bit on all platforms. In order to move Linux
sigset code to the linux_common module define it as 64 bit int. Move
Linux sigset manipulation routines to the MI path.
2. Move Linux signal number definitions to the MI path. In general, they
are the same on all platforms except for a few signals.
3. Map Linux RT signals to the FreeBSD RT signals and hide signal conversion
tables to avoid conversion errors.
4. Emulate Linux SIGPWR signal via FreeBSD SIGRTMIN signal which is outside
of allowed on Linux signal numbers.
dchagin [Sat, 9 Jan 2016 17:22:51 +0000 (17:22 +0000)]
MFC r283471:
According to Linux man sigaltstack(3) shall return EINVAL if the ss
argument is not a null pointer, and the ss_flags member pointed to by ss
contains flags other than SS_DISABLE. However, in fact, Linux also
allows SS_ONSTACK flag which is simply ignored.
For buggy apps (at least mono) ignore other than SS_DISABLE
flags as a Linux do.
While here move MI part of sigaltstack code to the appropriate place.
dchagin [Sat, 9 Jan 2016 17:08:33 +0000 (17:08 +0000)]
MFC r283461:
As for now our tmpfs is no longer being considered
"highly experimental" remove /dev/shm magic commited
in r218497 and convert tmpfs type to an expected magic number.
dchagin [Sat, 9 Jan 2016 16:44:17 +0000 (16:44 +0000)]
MFC r283441:
Implement epoll family system calls. This is a tiny wrapper
around kqueue() to implement epoll subset of functionality.
The kqueue user data are 32bit on i386 which is not enough for
epoll user data, so we keep user data in the proc emuldata.
Initial patch developed by rdivacky@ in 2007, then extended
by Yuri Victorovich @ r255672 and finished by me
in collaboration with mjg@ and jillies@.
dchagin [Sat, 9 Jan 2016 16:39:15 +0000 (16:39 +0000)]
MFC r283440:
For future use in the Linuxulator:
1. Add a kern_kqueue() counterpart for kqueue() with flags parameter.
2. Be a bit secure. To avoid a double fp lookup add a kern_kevent_fp()
counterpart for kern_kevent() with file pointer parameter instead
of file descriptor an pass the buck to it.
dchagin [Sat, 9 Jan 2016 16:28:40 +0000 (16:28 +0000)]
MFC r283435:
Convert Linux wait options to the FreeBSD.
Check wait options as a Linux do.
Linux always set WEXITED option not a WUNTRACED|WNOHANG
which is a strange bug.
dchagin [Sat, 9 Jan 2016 16:26:39 +0000 (16:26 +0000)]
MFC r283433:
Rewrite linux_recvfrom. To avoid double conversion of sockaddr use
kern_recvit() directly.
And check fromlen parameter before sockaddr copyin and conversion.
dchagin [Sat, 9 Jan 2016 16:25:30 +0000 (16:25 +0000)]
MFC r283432:
Being exported through vdso the note.Linux section used by glibc
to determine the kernel version (this saves one uname call).
Temporarily disable the export of a note.Linux section until I figured
out how to change the kernel version in the note.Linux on the fly.
dchagin [Sat, 9 Jan 2016 16:21:39 +0000 (16:21 +0000)]
MFC r283428:
Change linux faccessat syscall definition to match actual linux one.
The AT_EACCESS and AT_SYMLINK_NOFOLLOW flags are actually implemented
within the glibc wrapper function for faccessat(). If either of these
flags are specified, then the wrapper function employs fstatat() to
determine access permissions.