mav [Wed, 11 May 2016 11:23:22 +0000 (11:23 +0000)]
MFC r297508: MFV r297505:
6739 userland version of cv_timedwait_hires() always assumes absolute time
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: George Wilson <george.wilson@delphix.com>
asomers [Tue, 10 May 2016 17:34:35 +0000 (17:34 +0000)]
MFC 294923
Fix grep_test:recurse with ZFS and TMPFS tmpdirs
contrib/netbsd-tests/usr.bin/grep/t_grep.sh
Fix grep_test:recurse when /tmp is either zfs or tmpfs. The test was
relying on an implicit ordering of directory recursion which happens to
be true when using UFS. grep's specification requires no such ordering.
The solution is to ignore the order of grep's results.
asomers [Tue, 10 May 2016 16:49:50 +0000 (16:49 +0000)]
MFC 297868
Fix rare double free in vdev_geom_attrchanged
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
Don't drop the g_topology_lock before freeing old_physpath. That
opens up a race where one thread can call vdev_geom_attrchanged,
set old_physpath, drop the g_topology_lock, then block trying to
acquire the SCL_STATE lock. Then another thread can come into
vdev_geom_attrchanged, set old_physpath to the same value, and
proceed to free it. When the first thread resumes, it will free
the same location.
It turns out that the SCL_STATE lock isn't needed. It was
originally added by gibbs to protect vd->vdev_physpath while
updating the same. However, the update process subsequently was
switched to an atomic operation (a pointer swap). Now, there is
no need for the SCL_STATE lock, and hence no need to drop the
g_topology_lock.
jhb [Tue, 10 May 2016 03:42:18 +0000 (03:42 +0000)]
MFC 299205: Restore name=value format of PCI location strings.
When devctl was added, the location string for PCI devices was changed to
use the PCI "selector" that pciconf and devctl accept. However, devd
assumes that location strings are formatted as a list of name=value pairs.
As a result, devd is no longer parsing any of the values out of PCI
device events. Restore the previous format of the PCI location strings
to restore the location and slot keywords in case any devd scripts are
using this. Add the "selector" as a new 'dbsf' location variable.
davidcs [Tue, 10 May 2016 02:26:26 +0000 (02:26 +0000)]
MFC r298294
1. modify fwdump (a.k.a grcdump) so that grcdump memory is allocated
and freed on as needed basis.
2. grcdump can be taken at failure points by invoking bxe_grc_dump()
when trigger_grcdump sysctl flag is set. When grcdump is taken
grcdump_done sysctl flag is set.
3. grcdump_done can be monitored by the user to retrieve the grcdump
hselasky [Mon, 9 May 2016 13:09:41 +0000 (13:09 +0000)]
MFC r298771:
Add function to detect the presence of a port module and use this
function to error out early when no port module is present and doing
eeprom access. This also prevents error codes from filling up in
dmesg.
kib [Sun, 8 May 2016 09:02:51 +0000 (09:02 +0000)]
MFC r298890:
Make it explicit that D_MEM cdevsw d_flag is to signify that the
driver is (or behaves identically to) /dev/mem. Remove the D_MEM flag
from random drivers.
rmacklem [Sat, 7 May 2016 20:17:23 +0000 (20:17 +0000)]
MFC: r298523
Allow the NFSv4 server to reply NFSERR_WRONGSEC for the SetClientID operation.
It was reported via email that a Linux client couldn't do a Kerberized
NFS mount when only "sec=krb5" was specified for the exports. The Linux
client attempted a mount via krb5i and the server replied NFSERR_SERVERFAULT.
Although NFSERR_WRONGSEC isn't listed as an error for SetClientID, I
think it is the correct reply, so this patch enables that.
I do not know if this fixes the mount attempt, but adding "krb5i" to the
list of allowed security flavours does allow the mount to work.
rmacklem [Sat, 7 May 2016 20:09:15 +0000 (20:09 +0000)]
MFC: r298495
Fix a LOR in the NFSv4.1 server.
The ordering of acquisition of the state and session mutexes was
reversed in two cases executed when an NFSv4.1 client created/freed
a session. Since clients will typically do this only when mounting
and dismounting, the likelyhood of causing a deadlock was low but possible.
This can only occur for NFSv4.1 mounts, since the others do not
use sessions.
This was detected while testing the pNFS server/client where the
client crashed during dismounting.
The patch also reorders the unlocks, although that isn't necessary
for correct operation.
dchagin [Sat, 7 May 2016 19:05:39 +0000 (19:05 +0000)]
MFC r295856 (by des@):
Implement /proc/$$/limits.
MFC r297781 (by dchagin@):
More complete implementation of /proc/self/limits.
Fix the way the code accesses process limits struct - pointed out by mjg@.
MFC r298318, 298319 (by cem@):
Don't print uninitialized values and initialize error return before use.
dchagin [Sat, 7 May 2016 08:30:21 +0000 (08:30 +0000)]
MFC r298519:
Fix streams and svr4 module dependency. Both modules are complaining about
undefined symbol svr4_delete_socket which was moved from streams to the svr4 module
in r160558 that created a two-way dependency between them.
MFC r298520:
Allow to build svr4 module with SYSV support separatelly from the kernel build.
rmacklem [Sat, 7 May 2016 00:02:28 +0000 (00:02 +0000)]
MFC: r297869
If the VOP_SETATTR() call that saves the exclusive create verifier failed,
the NFS server would leave the newly created vnode locked. This could
result in a file system that would not unmount and processes wedged,
waiting for the file to be unlocked.
Since this VOP_SETATTR() never fails for most file systems, this bug
doesn't normally manifest itself. I found it during testing of an
exported GlusterFS file system, which can fail.
This patch adds the vput() and changes the error to the correct NFS one.
rmacklem [Fri, 6 May 2016 23:44:24 +0000 (23:44 +0000)]
MFC: r297837
Bruce Evans reported that there was a performance regression between
the old and new NFS clients. He did a good job of isolating the problem
which was caused by the new NFS client not setting the post write mtime
correctly. The new NFS client code was cloned from the old client, but
was incorrect, because the mtime in the nfs vnode's cache wasn't yet
updated. This patch fixes this problem. The patch also adds missing mutex
locking.
bcr [Fri, 6 May 2016 17:55:11 +0000 (17:55 +0000)]
MFC r298893:
Provide an example to the kqueue man page, showing
a basic usage example. Although it is an
untypical example for the use of kqueue, it is
better than nothing and should get people started.
sephe [Fri, 6 May 2016 05:44:12 +0000 (05:44 +0000)]
MFC r298385
dhclient: Log a warning instead of bailing upon "illegal" options
In Azure, the DHCP servers add private option (id 0xf5), which contains
binary form of an IPv4 address. Once this option is converted to string
form, it could contain '$', e.g.
dhclient bails upon "illegal" options like the above example, thus the
VM bring-up will fail.
Also as a side note, this "illegal" option detection was added in
OpenBSD ~11years ago:
http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sbin/dhclient/dhclient.c?rev=1.50&content-type=text/x-cvsweb-markup
And it was removed along with the removal of script support in OpenBSD
~3years ago:
http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sbin/dhclient/dhclient.c?rev=1.159&content-type=text/x-cvsweb-markup
Reported by: Hongxiong Xian <v-hoxian microsoft com>
Reviewed by: jhb, Dexuan Cui <decui microsoft com>
Tested by: Hongxiong Xian <v-hoxian microsoft com>
Analyzed by: Dong Liu <doliu microsoft com>
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5853
jtl [Fri, 6 May 2016 01:26:58 +0000 (01:26 +0000)]
MFC r298408:
Prevent underflows in tp->snd_wnd if the remote side ACKs more than
tp->snd_wnd. This can happen, for example, when the remote side responds
to a window probe by ACKing the one byte it contains.
glebius [Wed, 4 May 2016 17:27:49 +0000 (17:27 +0000)]
Merge r299077, which provides ability to override NO_INSTALLEXTRAKERNELS.
Override NO_INSTALLEXTRAKERNELS for the release build. Historically, in
the stable/10 branch the regular 'installkernel' target DID NOT install
extra kernels, while the release build DID. This change finally restores
POLA, I hope.
ngie [Wed, 4 May 2016 07:37:02 +0000 (07:37 +0000)]
MFC r298304:
Fix issues identified by Coverity
- Always munmap memory regions after mmap'ing them.
- Make sure getpagesize() returns a value greater than 0 and use a
cached value instead of always calling getpagesize(3).
- Remove intermediate variable for assigning from $TMPDIR if set in the
environment to eliminate warnings about pointer conversions with "/tmp",
and to mute an invalid buffer overflow concern from Coverity
(snprintf and tacking on a NUL terminator was alleviating that concern
before).
- Remove useless self-test of psize before it's initialized.
- Check the return values of getrlimit/setrlimit.
Cosmetic changes:
- Replace a `(void*)0` with NULL.
- Do some minor whitespace clean up.
- Remove an unnecessary cast to mmap.
- Make all munmap calls use ATF_REQUIRE_MSG instead of using the:
> if (munmap(..) == -1)
> atf_tc_fail(..)
idiom. Employ the new idiom consistently when calling munmap.
ngie [Wed, 4 May 2016 07:35:43 +0000 (07:35 +0000)]
MFC r298301:
Fix leaks and test for getpagesize() returning == -1
- close file descriptors after use.
- Always munmap memory regions after mmap'ing them.
- Make sure getpagesize() returns a value greater than 0 and use a
cached value instead of always calling getpagesize(3).
ngie [Wed, 4 May 2016 00:39:03 +0000 (00:39 +0000)]
MFC r298753:
Fix va_list handling
- Add missing va_end's after corresponding va_start's to cleanup state
- Eliminate questionable bzero'ing of va_list passed in to
do_buff_decode(..) and do_encode(..) from buff_{de,en}code_visit(..)
and csio_{de,en}code_visit(..). Make va_list a pointer instead and
pass NULL into the underlying functions to handler this in a portable
way.
- Do some minor style(9) clean up in affected functions.
ngie [Wed, 4 May 2016 00:34:45 +0000 (00:34 +0000)]
MFC r298450:
Simplify always evaluated branch (`e != NULL`)
- xalloc(..) ensures that e will be non-null via malloc + err.
- `e` is already dereferenced above, so logically it's impossible
to hit the lower test without crashing if it was indeed NULL.
freopen handles closing file descriptors on error, with the exception of
fdopen'ed descriptors, so closing an already fclose'd file descriptor is
incorrect
mav [Tue, 3 May 2016 08:35:35 +0000 (08:35 +0000)]
MFC r298103: Simplify memory allocation for NS requests.
Since we no longer need additional buffers for request and response IOCBs,
we can increase receive space by 192 bytes, that is enough for fetching 48
more ports. The new limit is 1020 fabric ports per virtual port.
mav [Tue, 3 May 2016 08:07:38 +0000 (08:07 +0000)]
MFC r297818: Update 25xx chips firmware from 7.03.00 to 8.03.00.
While the same update is also available for 24xx chips, it seems have
a problem with disabling virtual ports -- firmware handles the request,
but does not respong on it, causing timeout in driver.
mav [Tue, 3 May 2016 08:05:31 +0000 (08:05 +0000)]
MFC r297991: Extract virtual port address from RQSTYPE_RPT_ID_ACQ.
This should close the race between request arriving on new target mode
virtual port and its scanner thread finally fetch its address for request
routing.
mav [Tue, 3 May 2016 08:04:34 +0000 (08:04 +0000)]
MFC r297915: Filter Port Database Changed notifications.
For some reason firmware sends Port Database Changed notifications in case
of explicit login requests from the driver when target port is unavailabe.
Those notifications don't give driver any new information, but only cause
infinite scan loop.
mav [Tue, 3 May 2016 08:03:07 +0000 (08:03 +0000)]
MFC r297867: Make all CT Pass-Through (name server requests) asynchronous.
Previously we had to do it synchronously because we could not drop the lock
due to potential scratch memory use conflicts. Previous commits fixed that
collision, so here it goes -- slower and less reliable external requests
are executed asynchronously without spinning in tight loop and with more
safe timeout handling.
mav [Tue, 3 May 2016 08:00:54 +0000 (08:00 +0000)]
MFC r297858: Allocate separate DMA area for synchronous IOCB execution.
Usually IOCBs should be put on queue for asynchronous processing and should
not require additional DMA memory. But there are some cases like aborts and
resets that for external reasons has to be synchronous. Give those cases
separate 2*64 byte DMA area to decouple them from other DMA scratch area
users, using it for asynchronous requests.
mav [Tue, 3 May 2016 07:49:40 +0000 (07:49 +0000)]
MFC r297963: Remove watchdog timer stop check.
There are bunch of reports that this check fails at least on Nuvoton
NCT6776 chips. I don't see why this check needed there, and Linux does
not have it either. So far this check only made watchdogd unstopable.
When a user defines "jail_list" in rc.conf the jails are started in the
order defined. Currently the jails are not are stopped in reverse order
which may break dependencies between jails/services and prevent a clean
shutdown. The new parameter "jail_reverse_stop" will shutdown jails in
"jail_list" in reverse order when set to "YES".
Please note that this does not affect manual invocation of the jail rc
script. If a user runs the command
# service jail stop jail1 jail2 jail3
the jails will be stopped in exactly the order specified regardless of
jail_reverse_stop being defined in rc.conf.
MFC r295568:
Document the new jail_reverse_stop parameter
While here clean up the documentation for jail_list
Note the existence of module-specific jail paramters, starting with the
linux.* parameters when linux emulation is loaded.
MFC r298585:
Encapsulate SYSV IPC objects in jails. Define per-module parameters
sysvmsg, sysvsem, and sysvshm, with the following bahavior:
inherit: allow full access to the IPC primitives. This is the same as
the current setup with allow.sysvipc is on. Jails and the base system
can see (and moduly) each other's objects, which is generally considered
a bad thing (though may be useful in some circumstances).
disable: all no access, same as the current setup with allow.sysvipc off.
new: A jail may see use the IPC objects that it has created. It also
gets its own IPC key namespace, so different jails may have their own
objects using the same key value. The parent jail (or base system) can
see the jail's IPC objects, but not its keys.
Add a new jail OSD method, PR_METHOD_REMOVE. It's called when a jail is
removed from the user perspective, i.e. when the last pr_uref goes away,
even though the jail mail still exist in the dying state. It will also
be called if either PR_METHOD_CREATE or PR_METHOD_SET fail.
MFC r298683:
Delay removing the last jail reference in prison_proc_free, and instead
put it off into the pr_task. This is similar to prison_free, and in fact
uses the same task even though they do something slightly different.
MFC r298566:
Pass the current/new jail to PR_METHOD_CHECK, which pushes the call
until after the jail is found or created. This requires unlocking the
jail for the call and re-locking it afterward, but that works because
nothing in the jail has been changed yet, and other processes won't
change the important fields as long as allprison_lock remains held.
Keep better track of name vs namelc in kern_jail_set. Name should
always be the hierarchical name (relative to the caller), and namelc
the last component.
Remove the PR_REMOVE flag, which was meant as a temporary marker for
a jail that might be seen mid-removal. It hasn't been doing the right
thing since at least the ability to resurrect dying jails, and such
resurrection also makes it unnecessary.
msdosfs: Prevent buffer overflow when expanding win95 names
In win2unixfn() we expand Windows 95 style long names. In some cases that
requires moving the data in the nbp->nb_buf buffer backwards to make room. That
code failed to check for overflows, leading to a stack overflow in win2unixfn().
We now check for this event, and mark the entire conversion as failed in that
case. This means we present the 8 character, dos style, name instead.
MFC r297967:
Ensure the received IP header gets 32-bits aligned.
The FreeBSD's TCP/IP stack assumes that the IP-header is 32-bits aligned
when decoding it. Else unaligned 32-bit memory access can happen, which
not all processor architectures support.
When downing a mlxen network adapter we need to check the port_up variable
to ensure we don't continue to transmit data or restart timers which can
reside in freed memory.
Don't remove the /var/run/jail_name.id file if a jail fails to start.
This messes up ezjail (and possibly others), when attempting to start
a jail that already exists.
MFC r298521;
regex: prevent two improbable signed integer overflows.
In matcher() we used an integer to index nsub of type size_t.
In print() we used an integer to index nstates of type sopno,
typedef'd long.
In both cases the indexes never take negative values.
MFC 297932,298295:
Improvements for PCI passthru devices.
297932:
Handle PBA that shares a page with MSI-X table for passthrough devices.
If the PBA shares a page with the MSI-X table, map the shared page via
/dev/mem and emulate accesses to the portion of the PBA in the shared
page by accessing the mapped page.
298295:
Always emit an error message on passthru configuration errors.
Previously, many errors (such as the PCI device not being attached
to the ppt(4) driver) resulted in bhyve silently exiting without
starting the virtual machine. Now any errors encountered when
configuring a virtual slot for a PCI passthru device should be noted
on stderr.
MFC 297039,297374,297398,297484:
Poll the IPI status while waiting constantly instead of delaying
5 microseconds between checks. This avoids inserting a minimum
latency of 5 microseconds on each IPI.
297039:
Check IPI status more frequently when waiting.
An IPI cannot be sent via the local APIC if a previous IPI is still
being delivered. Attempts to send an IPI will wait for a pending IPI
to clear. Prior to r278325 these checks used a spin loop with a
hardcoded maximum count which broke AP startup on some systems.
However, r278325 also enforced a minimum latency of 5 microseconds if an
IPI was still pending which resulted in a measurable performance hit.
This change reduces that minimum latency to 1 microsecond.
297374:
Calibrate the frequency of the of the native_lapic_ipi_wait() loop,
and avoid a delay while waiting for IPI delivery acknowledgement in
xAPIC mode. This makes the loop exit immediately after the delivery
bit in APIC_ICR register is set, instead of waiting for some
microseconds.
We only need to ensure that some amount of time is allowed for the
LAPIC to react to the command, and we need that the wait time is
finite and reasonable. For that reasons, it is irrelevant if the CPU
frequency or throttling decrease the speed and make the loop,
calibrated for full CPU speed at boot time, execute somewhat slower.
297398:
Fix several bugs in r297374:
- fix UP build [1]
- do not obliterate initial reading of rdtsc by the loop counter [2]
- restore the meaning of the argument -1 to native_lapic_ipi_wait()
as wait until LAPIC acknowledge without timeout
- correct formula for calculating loop iteration count for 1us, it was
inverted, and ensure that even on unlikely slow CPUs at least one
check for ack is performed.
297484:
Style(9), use tabs for the #define LOOPS line.
Print unsigned values with %u.
Make code slightly more compact by inlining loop limit.
The default value of MINFREE is defined to be 8% in
ufs/ffs/fs.h and not 10%. The newfs(8) and tunefs(8)
man pages had this change already, but fs(5) did not.
This change makes it consistent again.