marius [Wed, 13 Jan 2016 21:38:52 +0000 (21:38 +0000)]
MFC: r292943, r292960
- (Ab)use udivx for dividing the u_int pc_cpuid when implementing
CPU_ISSET(), CPU_SET() etc. in sparc64 asm. This approach has the
benefit of not clobbering %y, allowing to revert r222827 and
partially r222828.
- In r222828, CATR() already was changed to use the equivalent of
PCPU_GET(cpuid) instead of the MD module ID for KTR_MASK, so
belatedly also catch up with KTR_CPU and the C side of ktr(9).
Originally, in r203838 CATR() was moved away from directly reading
the module ID or equivalent as that became impractical with other
CPU types than USI/II supported. With r222828 in place, per-CPU
data generally is set up soon enough, though, that employing
PCPU things in ktr(9) also for use during early stages works.
- Unfortunately, an exception to the latter is the ktr(9) use
in pmap_bootstrap(), which actually is run so early that even
checking for bootverbose being set via the loader doesn't work.
Consequently, replace the ktr(9) use in pmap_bootstrap() with
OF_printf(9) and put it under #ifdef DIAGNOSTIC instead.
delphij [Wed, 13 Jan 2016 08:22:53 +0000 (08:22 +0000)]
MFC r292861:
hyperv: vmbus: run non-blocking message handlers in vmbus_msg_swintr()
We'll remove the per-channel control_work_queue because it can't properly
do serialization of message handling, e.g., when there are 2 NIC devices,
vmbus_channel_on_offer() -> hv_queue_work_item() has a race condition:
for an SMP VM, vmbus_channel_process_offer() can run concurrently on
different CPUs and if the second NIC's
vmbus_channel_process_offer() -> hv_vmbus_child_device_register() runs
first, the second NIC's name will be hn0 and the first NIC's name will
be hn1!
We can fix the race condition by removing the per-channel control_work_queue
and run all the message handlers in the global
hv_vmbus_g_connection.work_queue -- we'll do this in the next patch.
With the coming next patch, we have to run the non-blocking handlers
directly in the kernel thread vmbus_msg_swintr(), because the special
handling of sub-channel: when a sub-channel (e.g., of the storvsc driver)
is received and being handled in vmbus_channel_on_offer() running on the
global hv_vmbus_g_connection.work_queue, vmbus_channel_process_offer()
invokes channel->sc_creation_callback, i.e., storvsc_handle_sc_creation,
and the callback will invoke hv_vmbus_channel_open() -> hv_vmbus_post_message
and expect a further reply from the host, but the handling of the further
messag can't be done because the current message's handling hasn't finished
yet; as result, hv_vmbus_channel_open() -> sema_timedwait() will time out
and th device can't work.
Also renamed the handler type from hv_pfn_channel_msg_handler to
vmbus_msg_handler: the 'pfn' and 'channel' in the old name make no sense.
Submitted by: Dexuan Cui <decui microsoft com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D4596
MFC r292859:
hyperv: vmbus: remove the per-channel control_work_queue
Now vmbus_channel_on_offer() -> vmbus_channel_process_offer() can
safely run on the global hv_vmbus_g_connection.work_queue now.
We remove the per-channel control_work_queue to achieve the proper
serialization of the message handling.
I removed the bogus TODO in vmbus_channel_on_offer(): a vmbus offer
can only come from the parent partition, i.e., the host.
PR: kern/205156
Submitted by: Dexuan Cui <decui microsoft com>
Reviewed by: Howard Su <howard0su gmail com>, delphij
Differential Revision: https://reviews.freebsd.org/D4597
kevlo [Wed, 13 Jan 2016 01:32:04 +0000 (01:32 +0000)]
MFC r293491:
- Add the definition of CHARCLASS_NAME_MAX, as per POSIX.1-2001.
- Avoid namespace pollution and move definitions of _POSIX2_CHARCLASS_NAME_MAX
and _POSIX2_COLL_WEIGHTS_MAX into the .2001 section.
With input from bde.
davidcs [Tue, 12 Jan 2016 22:58:46 +0000 (22:58 +0000)]
MFC r292638
Check for packet_length is greater than 60 bytes as well as packet_length is
greater than len_on_bd, before invoking the routine to handle jumbo over SGL
(bxe_service_rxsgl()).
Add counters for number of jumbo_over_SGL packets (rx_bxe_service_rxsgl) and
erroneous jumbo_over_SGL packets (rx_erroneous_jumbo_sge_pkts)
dim [Tue, 12 Jan 2016 19:33:43 +0000 (19:33 +0000)]
MFC r292950:
Drop the clang patch which adds recognition of 'CC' suffixes as aliases
for --driver-mode=g++, since this was never upstreamed. For backwards
compatibility, add a wrapper shell script.
allanjude [Tue, 12 Jan 2016 16:38:09 +0000 (16:38 +0000)]
MFC: r284589
Add the ability to detect ZFS and GELI encrypted file systems to fstyp(8)
MFC: r284644
Fix GCC Warnings
MFC: r284728
Only build ZFS support in absense of WITHOUT_ZFS
MFC: r285426
Remove excess copyrights
MFC: r286569
Use GELI sentinel constant
MFC: r287937
Eliminate unneeded copying of vdev data, goto, etc. and add a note
that checksum of vdev label should be checked (which is not done
currently).
No functional change.
While I'm there, raise WARNS to 2.
MFC: r292757
Fix order of includes in usr.sbin/fstyp/zfs.c
trasz [Tue, 12 Jan 2016 14:18:54 +0000 (14:18 +0000)]
Hide the "unmount of /dev failed (BUSY)" warning at shutdown or reboot,
introduced with r293742, just like it was hidden before that commit.
This is a direct commit to 10-STABLE; this special case is not needed
in 11-CURRENT, because devfs supports forced unmounts there. The forced
unmount could be MFC-ed, but there are some LORs at shutdown, and I have
a weird feelings about it.
trasz [Tue, 12 Jan 2016 10:14:57 +0000 (10:14 +0000)]
MFC r290548:
Userspace part of reroot support. This makes it possible to change
the root filesystem without full reboot, using "reboot -r". This can
be used to to eg. boot from a temporary md_image preloaded by loader(8),
setup an iSCSI session, and continue booting from rootfs mounted over
iSCSI.
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3693
trasz [Tue, 12 Jan 2016 10:11:29 +0000 (10:11 +0000)]
MFC r287964:
Kernel part of reroot support - a way to change rootfs without reboot.
Note that the mountlist manipulations are somewhat fragile, and not very
pretty. The reason for this is to avoid changing vfs_mountroot(), which
is (obviously) rather mission-critical, but not very well documented,
and thus hard to test properly. It might be possible to rework it to use
its own simple root mount mechanism instead of vfs_mountroot().
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D2698
trasz [Tue, 12 Jan 2016 10:09:03 +0000 (10:09 +0000)]
MFC r287107:
Make vfs_unmountall() unmount /dev after /, not before. The only
reason this didn't result in an unclean shutdown is that devfs ignores
MNT_FORCE flag.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3467
trasz [Tue, 12 Jan 2016 09:27:01 +0000 (09:27 +0000)]
MFC r289110:
Make geom_nop(4) collect statistics on all types of BIOs, not just
reads and writes.
PR: kern/198405
Submitted by: Matthew D. Fuller <fullermd at over-yonder dot net>
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3679
gjb [Tue, 12 Jan 2016 02:12:40 +0000 (02:12 +0000)]
MFC r293188:
Prevent memstick installation medium from attempting to mount
the root filesystem read-write. This causes problems booting
the memstick installation medium from write-protected USB flash
drives.
PR: 187161, 205886
Sponsored by: The FreeBSD Foundation
asomers [Mon, 11 Jan 2016 21:12:49 +0000 (21:12 +0000)]
MFC r292218
Don't retry SAS commands in response to protocol errors
sys/dev/mpr/mpr_sas_lsi.c
sys/dev/mps/mps_sas_lsi.c
When mp[rs]sas_get_sata_identify returns
MPI2_IOCSTATUS_SCSI_PROTOCOL_ERROR, don't bother retrying. Protocol
errors aren't likely to be fixed by sleeping.
Without this change, a system that generated may protocol errors due
to signal integrity issues was taking more than an hour to boot, due
to all the retries.
asomers [Mon, 11 Jan 2016 20:25:41 +0000 (20:25 +0000)]
MFC r292019
When iostat(8) receives SIGINT while running with "-w" or "-c", it will now
print statistics one more time before exiting. Also, it now implements the
wait using setitimer instead of sleep, so the waits will be more consistent
when the system is heavily loaded.
asomers [Mon, 11 Jan 2016 20:24:56 +0000 (20:24 +0000)]
MFC r292020
Increase devd's client socket buffer size to 256KB. This is not as large as
it looks, because we'll hit the sockbuf's mbuf limit long before hitting its
data limit. A 256KB data limit allows creating a ZFS pool on about 450
drives without overflowing the client socket buffers.
trasz [Mon, 11 Jan 2016 20:10:14 +0000 (20:10 +0000)]
MFC r287396:
It's 2015, and some people are still trying to use fdisk and then
go asking what debug flags to set for GEOM to make it work. Advice
them to use gpart(8) instead.
Something similar should probably done with disklabel,
but I need to rewrite the disklabel examples first.
jimharris [Mon, 11 Jan 2016 17:32:56 +0000 (17:32 +0000)]
MFC r293352:
nvme: add hw.nvme.min_cpus_per_ioq tunable
Due to FreeBSD system-wide limits on number of MSI-X vectors
(https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199321),
it may be desirable to allocate fewer than the maximum number
of vectors for an NVMe device, in order to save vectors for
other devices (usually Ethernet) that can take better
advantage of them and may be probed after NVMe.
This tunable is expressed in terms of minimum number of CPUs
per I/O queue instead of max number of queues per controller,
to allow for a more even distribution of CPUs per queue. This
avoids cases where some number of CPUs have a dedicated queue,
but other CPUs need to share queues. Ideally the PR referenced
above will eventually be fixed and the mechanism implemented
here becomes obsolete anyways.
While here, fix a bug in the CPUs per I/O queue calculation to
properly account for the admin queue's MSI-X vector.
jimharris [Mon, 11 Jan 2016 17:31:18 +0000 (17:31 +0000)]
MFC r293328:
nvme: do not revert to single I/O queue when per-CPU queues not available
Previously nvme(4) would revert to a single I/O queue if it could not
allocate enought interrupt vectors or NVMe submission/completion queues
to have one I/O queue per core. This patch determines how to utilize a
smaller number of available interrupt vectors, and assigns (as closely
as possible) an equal number of cores to each associated I/O queue.
ngie [Sun, 10 Jan 2016 17:39:49 +0000 (17:39 +0000)]
Unbreak stable/10 buildworlds on arm/arm, mips/mips, mips/mips64, mips/mipsel,
mips/mipsn32, powerpc/powerpc, powerpc/powerpc64, sparc64/sparc64 with gcc
after r293307 (some of the BURN_BRIDGES code)
MFC after: 3 days
Pointyhat to: markj
Sponsored by: EMC / Isilon Storage Division
ae [Sun, 10 Jan 2016 13:53:57 +0000 (13:53 +0000)]
MFC r292057:
Make detection of GPT a bit more reliable.
When we are detecting a partition table and didn't find PMBR, try to
read backup GPT header from the last sector and if it is correct,
assume that we have GPT.
dchagin [Sat, 9 Jan 2016 18:28:15 +0000 (18:28 +0000)]
MFC r288994 (by bdrewery):
Remove redundant RFFPWAIT/vfork(2) handling in Linux fork(2) and clone(2) wrappers.
r161611 added some of the code from sys_vfork() directly into the Linux
module wrappers since they use RFSTOPPED. In r232240, the RFFPWAIT handling
was moved to syscallret(), thus this code in the Linux module is no longer
needed as it will be called later.
This also allows the Linux wrappers to benefit from the fix in r275616 for
threads not getting suspended if their vforked child is stopped while they
wait on them.
dchagin [Sat, 9 Jan 2016 18:07:48 +0000 (18:07 +0000)]
MFC r283544:
When I merged the lemul branch I missied kib@'s r282708 commit.
This is not the final fix as I need properly cleanup thread resources
before other threads suicide.
dchagin [Sat, 9 Jan 2016 18:05:04 +0000 (18:05 +0000)]
MFC r283498:
Linux nanosleep() and clock_nanosleep() system calls always
writes the remaining time into the structure pointed to by rmtp
unless rmtp is NULL. The value of *rmtp can then be used to call
nanosleep() again and complete the specified pause if the previous
call was interrupted.
Note. clock_nanosleep() with an absolute time value does not write
the remaining time.
dchagin [Sat, 9 Jan 2016 18:03:09 +0000 (18:03 +0000)]
MFC r283496:
The latest cp tool is trying to use the btrfs clone operation that is
implemented via ioctl interface. First of all return ENOTSUP for this
operation as a cp fallback to usual method in that case. Secondly, do
not print out the message about unimplemented operation.
dchagin [Sat, 9 Jan 2016 17:44:08 +0000 (17:44 +0000)]
MFC r283483:
Convert signal number to native for VT_SETMODE ioctl and remove
strange and invalid ISSIGVALID macro.
The code has not been tested right way but it was originally broken.
dchagin [Sat, 9 Jan 2016 17:39:41 +0000 (17:39 +0000)]
MFC r283479:
The kernel sends signals to the processes via ABI specific sv_sendsig method.
Native ABI do not need signal conversion, only emulators may want this. Usually
emulators implements its own sv_sendsig method. For now only ibcs2 emulator does
not have own sv_sendsig implementation and depends on native sendsig() method.
So, remove any extra attempts to convert signal numbers from native sendsig()
methods except from i386 where ibsc2 is living.
dchagin [Sat, 9 Jan 2016 17:29:08 +0000 (17:29 +0000)]
MFC r283474:
Rework signal code to allow using it by other modules, like linprocfs:
1. Linux sigset always 64 bit on all platforms. In order to move Linux
sigset code to the linux_common module define it as 64 bit int. Move
Linux sigset manipulation routines to the MI path.
2. Move Linux signal number definitions to the MI path. In general, they
are the same on all platforms except for a few signals.
3. Map Linux RT signals to the FreeBSD RT signals and hide signal conversion
tables to avoid conversion errors.
4. Emulate Linux SIGPWR signal via FreeBSD SIGRTMIN signal which is outside
of allowed on Linux signal numbers.