trasz [Tue, 12 Jan 2016 10:14:57 +0000 (10:14 +0000)]
MFC r290548:
Userspace part of reroot support. This makes it possible to change
the root filesystem without full reboot, using "reboot -r". This can
be used to to eg. boot from a temporary md_image preloaded by loader(8),
setup an iSCSI session, and continue booting from rootfs mounted over
iSCSI.
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3693
trasz [Tue, 12 Jan 2016 10:11:29 +0000 (10:11 +0000)]
MFC r287964:
Kernel part of reroot support - a way to change rootfs without reboot.
Note that the mountlist manipulations are somewhat fragile, and not very
pretty. The reason for this is to avoid changing vfs_mountroot(), which
is (obviously) rather mission-critical, but not very well documented,
and thus hard to test properly. It might be possible to rework it to use
its own simple root mount mechanism instead of vfs_mountroot().
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D2698
trasz [Tue, 12 Jan 2016 10:09:03 +0000 (10:09 +0000)]
MFC r287107:
Make vfs_unmountall() unmount /dev after /, not before. The only
reason this didn't result in an unclean shutdown is that devfs ignores
MNT_FORCE flag.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3467
trasz [Tue, 12 Jan 2016 09:27:01 +0000 (09:27 +0000)]
MFC r289110:
Make geom_nop(4) collect statistics on all types of BIOs, not just
reads and writes.
PR: kern/198405
Submitted by: Matthew D. Fuller <fullermd at over-yonder dot net>
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3679
gjb [Tue, 12 Jan 2016 02:12:40 +0000 (02:12 +0000)]
MFC r293188:
Prevent memstick installation medium from attempting to mount
the root filesystem read-write. This causes problems booting
the memstick installation medium from write-protected USB flash
drives.
PR: 187161, 205886
Sponsored by: The FreeBSD Foundation
asomers [Mon, 11 Jan 2016 21:12:49 +0000 (21:12 +0000)]
MFC r292218
Don't retry SAS commands in response to protocol errors
sys/dev/mpr/mpr_sas_lsi.c
sys/dev/mps/mps_sas_lsi.c
When mp[rs]sas_get_sata_identify returns
MPI2_IOCSTATUS_SCSI_PROTOCOL_ERROR, don't bother retrying. Protocol
errors aren't likely to be fixed by sleeping.
Without this change, a system that generated may protocol errors due
to signal integrity issues was taking more than an hour to boot, due
to all the retries.
asomers [Mon, 11 Jan 2016 20:25:41 +0000 (20:25 +0000)]
MFC r292019
When iostat(8) receives SIGINT while running with "-w" or "-c", it will now
print statistics one more time before exiting. Also, it now implements the
wait using setitimer instead of sleep, so the waits will be more consistent
when the system is heavily loaded.
asomers [Mon, 11 Jan 2016 20:24:56 +0000 (20:24 +0000)]
MFC r292020
Increase devd's client socket buffer size to 256KB. This is not as large as
it looks, because we'll hit the sockbuf's mbuf limit long before hitting its
data limit. A 256KB data limit allows creating a ZFS pool on about 450
drives without overflowing the client socket buffers.
trasz [Mon, 11 Jan 2016 20:10:14 +0000 (20:10 +0000)]
MFC r287396:
It's 2015, and some people are still trying to use fdisk and then
go asking what debug flags to set for GEOM to make it work. Advice
them to use gpart(8) instead.
Something similar should probably done with disklabel,
but I need to rewrite the disklabel examples first.
jimharris [Mon, 11 Jan 2016 17:32:56 +0000 (17:32 +0000)]
MFC r293352:
nvme: add hw.nvme.min_cpus_per_ioq tunable
Due to FreeBSD system-wide limits on number of MSI-X vectors
(https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199321),
it may be desirable to allocate fewer than the maximum number
of vectors for an NVMe device, in order to save vectors for
other devices (usually Ethernet) that can take better
advantage of them and may be probed after NVMe.
This tunable is expressed in terms of minimum number of CPUs
per I/O queue instead of max number of queues per controller,
to allow for a more even distribution of CPUs per queue. This
avoids cases where some number of CPUs have a dedicated queue,
but other CPUs need to share queues. Ideally the PR referenced
above will eventually be fixed and the mechanism implemented
here becomes obsolete anyways.
While here, fix a bug in the CPUs per I/O queue calculation to
properly account for the admin queue's MSI-X vector.
jimharris [Mon, 11 Jan 2016 17:31:18 +0000 (17:31 +0000)]
MFC r293328:
nvme: do not revert to single I/O queue when per-CPU queues not available
Previously nvme(4) would revert to a single I/O queue if it could not
allocate enought interrupt vectors or NVMe submission/completion queues
to have one I/O queue per core. This patch determines how to utilize a
smaller number of available interrupt vectors, and assigns (as closely
as possible) an equal number of cores to each associated I/O queue.
ngie [Sun, 10 Jan 2016 17:39:49 +0000 (17:39 +0000)]
Unbreak stable/10 buildworlds on arm/arm, mips/mips, mips/mips64, mips/mipsel,
mips/mipsn32, powerpc/powerpc, powerpc/powerpc64, sparc64/sparc64 with gcc
after r293307 (some of the BURN_BRIDGES code)
MFC after: 3 days
Pointyhat to: markj
Sponsored by: EMC / Isilon Storage Division
ae [Sun, 10 Jan 2016 13:53:57 +0000 (13:53 +0000)]
MFC r292057:
Make detection of GPT a bit more reliable.
When we are detecting a partition table and didn't find PMBR, try to
read backup GPT header from the last sector and if it is correct,
assume that we have GPT.
dchagin [Sat, 9 Jan 2016 18:28:15 +0000 (18:28 +0000)]
MFC r288994 (by bdrewery):
Remove redundant RFFPWAIT/vfork(2) handling in Linux fork(2) and clone(2) wrappers.
r161611 added some of the code from sys_vfork() directly into the Linux
module wrappers since they use RFSTOPPED. In r232240, the RFFPWAIT handling
was moved to syscallret(), thus this code in the Linux module is no longer
needed as it will be called later.
This also allows the Linux wrappers to benefit from the fix in r275616 for
threads not getting suspended if their vforked child is stopped while they
wait on them.
dchagin [Sat, 9 Jan 2016 18:07:48 +0000 (18:07 +0000)]
MFC r283544:
When I merged the lemul branch I missied kib@'s r282708 commit.
This is not the final fix as I need properly cleanup thread resources
before other threads suicide.
dchagin [Sat, 9 Jan 2016 18:05:04 +0000 (18:05 +0000)]
MFC r283498:
Linux nanosleep() and clock_nanosleep() system calls always
writes the remaining time into the structure pointed to by rmtp
unless rmtp is NULL. The value of *rmtp can then be used to call
nanosleep() again and complete the specified pause if the previous
call was interrupted.
Note. clock_nanosleep() with an absolute time value does not write
the remaining time.
dchagin [Sat, 9 Jan 2016 18:03:09 +0000 (18:03 +0000)]
MFC r283496:
The latest cp tool is trying to use the btrfs clone operation that is
implemented via ioctl interface. First of all return ENOTSUP for this
operation as a cp fallback to usual method in that case. Secondly, do
not print out the message about unimplemented operation.
dchagin [Sat, 9 Jan 2016 17:44:08 +0000 (17:44 +0000)]
MFC r283483:
Convert signal number to native for VT_SETMODE ioctl and remove
strange and invalid ISSIGVALID macro.
The code has not been tested right way but it was originally broken.
dchagin [Sat, 9 Jan 2016 17:39:41 +0000 (17:39 +0000)]
MFC r283479:
The kernel sends signals to the processes via ABI specific sv_sendsig method.
Native ABI do not need signal conversion, only emulators may want this. Usually
emulators implements its own sv_sendsig method. For now only ibcs2 emulator does
not have own sv_sendsig implementation and depends on native sendsig() method.
So, remove any extra attempts to convert signal numbers from native sendsig()
methods except from i386 where ibsc2 is living.
dchagin [Sat, 9 Jan 2016 17:29:08 +0000 (17:29 +0000)]
MFC r283474:
Rework signal code to allow using it by other modules, like linprocfs:
1. Linux sigset always 64 bit on all platforms. In order to move Linux
sigset code to the linux_common module define it as 64 bit int. Move
Linux sigset manipulation routines to the MI path.
2. Move Linux signal number definitions to the MI path. In general, they
are the same on all platforms except for a few signals.
3. Map Linux RT signals to the FreeBSD RT signals and hide signal conversion
tables to avoid conversion errors.
4. Emulate Linux SIGPWR signal via FreeBSD SIGRTMIN signal which is outside
of allowed on Linux signal numbers.
dchagin [Sat, 9 Jan 2016 17:22:51 +0000 (17:22 +0000)]
MFC r283471:
According to Linux man sigaltstack(3) shall return EINVAL if the ss
argument is not a null pointer, and the ss_flags member pointed to by ss
contains flags other than SS_DISABLE. However, in fact, Linux also
allows SS_ONSTACK flag which is simply ignored.
For buggy apps (at least mono) ignore other than SS_DISABLE
flags as a Linux do.
While here move MI part of sigaltstack code to the appropriate place.
dchagin [Sat, 9 Jan 2016 17:08:33 +0000 (17:08 +0000)]
MFC r283461:
As for now our tmpfs is no longer being considered
"highly experimental" remove /dev/shm magic commited
in r218497 and convert tmpfs type to an expected magic number.