Get rid of the msg_peek() function, which has a problem. If there was less
data in the socket buffer than requested by the caller, the function would busy
loop, as select(2) will always return immediately.
We can just receive nvlhdr now, because some time ago we splitted receive of
data from the receive of descriptors.
gjb [Sun, 15 Dec 2013 20:47:27 +0000 (20:47 +0000)]
Export 'REPOS_DIR' when the selected source medium for package
installation is cdrom. This enables bsdconfig(8) to make use
of the on-disc pkg(8) repository configuration, which fixes
package selection and installation from the dvd installer.
MFC after: 3 days
M-MFC-With: r259426
X-MFC-Before: -RC3
Sponsored by: The FreeBSD Foundation
jhibbits [Sun, 15 Dec 2013 18:07:25 +0000 (18:07 +0000)]
Save r3 before using it for the trap check, else we end up saving the new r3,
containing the trap instruction encoding (0x7c810808), and restoring it back
with the frame on return. This caused it to panic on my ppc32 machine, but
somehow my ppc64 machine overlooked it, because I was using such a simple
dtrace probe.
nwhitehorn [Sun, 15 Dec 2013 16:58:23 +0000 (16:58 +0000)]
Set max_lun to zero. This field is ignored unless we are manually probing
LUNs anyway, and we certainly don't want to probe 2^32 values by hand in
that case.
hrs [Sun, 15 Dec 2013 16:17:00 +0000 (16:17 +0000)]
Replace Sun RPC license for TI-RPC library with a 3-clause BSD license,
with the explicit permission of Sun Microsystems in 2009.
The code in question in this file was copied from lib/libc/rpc/pmap_getport.c.
alfred [Sun, 15 Dec 2013 07:07:13 +0000 (07:07 +0000)]
Defer start/stop port to workqueues.
We need to do this because the Linux compat layer uses sx(9) for
mutex, however the lagg code uses rmlocks and calls into the mellanox
driver. This causes deadlock due to sleeping while holding a rmlock.
Submitted by: Shahar Klein (shahark mellanox.com)
MFC After: 3 days.
nwhitehorn [Sat, 14 Dec 2013 22:07:40 +0000 (22:07 +0000)]
Widen lun_id_t to 64 bits. This is a follow-on to r257345 to let the kernel
support all valid SAM-5 LUN IDs. CAM_VERSION is bumped, as the CAM ABI
(though not API) is changed. No behavior is changed relative to r257345
except that LUNs with non-zero high 32 bits will no longer be ignored
during device enumeration for SIMs that have set PIM_EXTLUNS.
gavin [Sat, 14 Dec 2013 18:49:59 +0000 (18:49 +0000)]
Fix several panics when initialization of an ISA or PC-CARD device fails:
o Assign sc->an_dev in an_probe() (which isn't really a probe function in
the standard newbus sense) as we may need it for printing errors.
o Use device_printf() rather than if_printf() in an_reset() - this is
called from an_probe() long before the ifp structure is initialised
in an_attach().
o Initialize the ifp structure early in an_attach() as we use if_printf()
in cases where allocation of descriptors etc fails.
np [Sat, 14 Dec 2013 03:08:03 +0000 (03:08 +0000)]
Read card capabilities after firmware initialization, instead of setting
them up as part of firmware initialization (which the driver gets to do
only if it's the master driver).
Read the range of tids available for the ETHOFLD functionality if it's
enabled.
New is_ftid() and is_etid() functions to test whether a tid falls within
the range of filter tids or ETHOFLD tids respectively.
asomers [Fri, 13 Dec 2013 22:58:57 +0000 (22:58 +0000)]
sbin/devd/devd.cc
Promoting the SIGINFO handler's log message from LOG_INFO to
LOG_NOTICE, and promoting the "Processing event ..." message from
LOG_DEBUG to LOG_INFO. Setting the logfile to LOG_NOTICE with this
change will have the same result as setting it to LOG_INFO without
this change. Setting it to LOG_INFO with this change will include
the useful "Processing event ..." messages that were previously at
LOG_DEBUG, without including useless messages like "Pushing table".
The intent of this change is that one can log "Processing event ..."
without logging "Pushing table" and related messages that are sent
for every event. The number of lines actually logged is reduced by
about 75% by making this change and setting syslog to LOG_INFO vs
setting syslog to LOG_DEBUG.
etc/syslog.conf
Changing the recommended loglevel to notice instead of info.
asomers [Fri, 13 Dec 2013 21:49:41 +0000 (21:49 +0000)]
sbin/devd/devd.cc
Increase the size of devd's client socket's send buffer from the
default (8k) to 128k. This prevents clients from getting
POLLHUPped during event storms. For example, during zpool creation,
the kernel emits a resource.fs.zfs.statechange event for every vdev
in the pool. A 128k buffer is large enough to hold the statechange
events for a pool with nearly 800 drives.
bjk [Fri, 13 Dec 2013 03:09:29 +0000 (03:09 +0000)]
Apply patch from upstream Heimdal for encoding fix
RFC 4402 specifies the implementation of the gss_pseudo_random()
function for the krb5 mechanism (and the C bindings therein).
The implementation uses a PRF+ function that concatenates the output
of individual krb5 pseudo-random operations produced with a counter
and seed. The original implementation of this function in Heimdal
incorrectly encoded the counter as a little-endian integer, but the
RFC specifies the counter encoding as big-endian. The implementation
initializes the counter to zero, so the first block of output (16 octets,
for the modern AES enctypes 17 and 18) is unchanged. (RFC 4402 specifies
that the counter should begin at 1, but both existing implementations
begin with zero and it looks like the standard will be re-issued, with
test vectors, to begin at zero.)
This is upstream's commit f85652af868e64811f2b32b815d4198e7f9017f6,
from 13 October, 2013:
% Fix krb5's gss_pseudo_random() (n is big-endian)
%
% The first enctype RFC3961 prf output length's bytes are correct because
% the little- and big-endian representations of unsigned zero are the
% same. The second block of output was wrong because the counter was not
% being encoded as big-endian.
%
% This change could break applications. But those applications would not
% have been interoperating with other implementations anyways (in
% particular: MIT's).
Approved by: hrs (mentor, src committer)
MFC after: 3 days
dteske [Thu, 12 Dec 2013 20:47:18 +0000 (20:47 +0000)]
I caught the following snippet at the end of my /var/log/bsdinstall_log:
===
DEBUG: Running installation step: services
local: Not in a function
/usr/libexec/bsdinstall/services: cannot create : Read-only file system
/usr/libexec/bsdinstall/services: /tmp/bsdinstall/etc/rc.conf.services: \
Permission denied
===
The `local: Not in a function' is obvious, and was introduced by myself in
SVN revision 256348.
The latter two are caused by the attempt to use "\" to continue the line
after using the ">>" redirect. This appears to attempt to write a file with
the name " " in the current directory and subsequently attempts to execute
the file that was originally intended for writing (which is not executable;
hence the `Permission denied'). That was introduced in SVN r228192 about
2 years ago, apparently unnoticed until I started going over the debug
outputs very carefully.
bdrewery [Thu, 12 Dec 2013 17:59:09 +0000 (17:59 +0000)]
Fix multi-repository support by properly respecting 'enabled' flag.
This will read the REPOS_DIR env/config setting (default is /etc/pkg
and /usr/local/etc/pkg/repos) and use the last enabled repository.
This can be changed in the environment using a comma-separated list,
or in /usr/local/etc/pkg.conf with JSON array syntax of:
REPOS_DIR: ["/etc/pkg", "/usr/local/etc/pkg/repos"]
mav [Thu, 12 Dec 2013 11:05:48 +0000 (11:05 +0000)]
Fix long known bug with handling device aliases residing not in devfs root.
Historically creation of device aliases created symbolic links using only
name of target device as a link target, not considering current directory.
Fix that by adding number of "../" chunks to the terget device name,
required to get out of the current directory to devfs root first.
asomers [Thu, 12 Dec 2013 00:27:22 +0000 (00:27 +0000)]
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
When a da or ada device dissappears, outstanding IOs fail with
ENXIO, not EIO. The check for EIO was probably copied from Illumos,
where that is indeed the correct errno.
Without this change, pulling a busy drive from a zpool would usually
turn it into UNAVAIL, even though pulling an idle drive would turn
it into REMOVED. With this change, it is REMOVED every time.
Also, vdev_geom_io_intr shouldn't do zfs_post_remove, because that
results in devd getting two resource.fs.zfs.removed events. The
comment said that the event had to be sent directly instead of
through the async removal thread because "the DE engine is using
this information to discard prevoius I/O errors". However, the fact
that vdev_geom_io_intr was never actually sending the events until
now, and that vdev_geom_orphan never sent them at all, and that
vdev_geom_orphan usually gets called about 2 seconds after the
actual removal, means that FreeBSD's userland can cope with a late
event just fine.
mav [Wed, 11 Dec 2013 21:48:04 +0000 (21:48 +0000)]
Create own free list for each of the first 32 possible allocation sizes.
In case of 4K allocation quantum that means for allocations up to 128K.
With growth of memory fragmentation these lists may grow to quite a large
sizes (tenths and hundreds of thousands items). Having in one list items
of different sizes in worst case may require full linear list traversal,
that may be very expensive. Having lists for items of single size means
that unless user specify some alignment or border requirements (that are
very rare cases) first item found on the list should satisfy the request.
While running SPEC NFS benchmark on top of ZFS on 24-core machine with
84GB RAM this change reduces CPU time spent in vmem_xalloc() from 8%
and lock congestion spinning around it from 20% to invisible levels.
And that all is by the cost of just 26 more pointers per vmem instance.
If at some point our kernel will start to actively use KVA allocations
with odd sizes above 128K, something may need to be done to bigger lists
also.
jhb [Wed, 11 Dec 2013 21:21:03 +0000 (21:21 +0000)]
- Use <x86/mptable.h> instead of duplicating its definitions.
- Switch to mmaping the table from RAM instead of reading it out of
/dev/mem via read(2).
jhb [Wed, 11 Dec 2013 21:19:04 +0000 (21:19 +0000)]
Use fixed-width types for all fields in MP Table structures and pack
all the structures. While here, move a helper struct only used in
the kernel parser out of this header since it is not part of the MP
specification itself.
gnn [Wed, 11 Dec 2013 17:18:10 +0000 (17:18 +0000)]
Fix a panic when booting with kernels that have FREEBBSD_COMPAT
4, 5, 6 or 43 by only thunking the data parameter for old ioctls
compatability ioctls instead of doing it for all of them.
imp [Wed, 11 Dec 2013 05:32:29 +0000 (05:32 +0000)]
Fix one race and one fence post error. When the TX buffer was
completely full, we'd not complete any of the mbufs due to the fence
post error (this creates a large leak). When this is fixed, we still
leak, but at a much smaller rate due to a race between ateintr and
atestart_locked as well as an asymmetry where atestart_locked is
called from elsewhere. Ensure that we free in-flight packets that
have completed there as well. Also remove needless check for NULL on
mb, checked earlier in the loop and simplify a redundant if.
jmmv [Wed, 11 Dec 2013 04:09:17 +0000 (04:09 +0000)]
Migrate tools/regression/bin/ tests to the new layout.
This change is a proof of concept on how to easily integrate existing
tests from the tools/regression/ hierarchy into the /usr/tests/ test
suite and on how to adapt them to the new layout for src.
To achieve these goals, this change:
- Moves tests from tools/regression/bin/<tool>/ to bin/<tool>/tests/.
- Renames the previous regress.sh files to legacy_test.sh.
- Adds Makefiles to build and install the tests and all their supporting
data files into /usr/tests/bin/.
- Plugs the legacy_test test programs into the test suite using the new
TAP backend for Kyua (appearing in 0.8) so that the code of the test
programs does not have to change.
- Registers the new directories in the BSD.test.dist mtree file.
jmmv [Wed, 11 Dec 2013 03:41:07 +0000 (03:41 +0000)]
Make bsd.progs.mk work in directories with SCRIPTS but no PROGS.
This change fixes some cases where bsd.progs.mk would fail to handle
directories with SCRIPTS but no PROGS. In particular, "install" did
not handle such scripts nor dependent files when bsd.subdir.mk was
added to the mix.
jmmv [Wed, 11 Dec 2013 03:39:50 +0000 (03:39 +0000)]
Add tap.test.mk.
This file provides support to build test programs that comply with the
Test Anything Protocol. Its main goal is to support the painless
integration of existing tests from tools/regression/ into the Kyua-based
test suite.
neel [Tue, 10 Dec 2013 22:56:51 +0000 (22:56 +0000)]
Fix x2apic support in bhyve.
When the guest is bringing up the APs in the x2APIC mode a write to the
ICR register will now trigger a return to userspace with an exitcode of
VM_EXITCODE_SPINUP_AP. This gets SMP guests working again with x2APIC.
Change the vlapic timer lock to be a spinlock because the vlapic can be
accessed from within a critical section (vm run loop) when guest is using
x2apic mode.
kib [Tue, 10 Dec 2013 22:33:02 +0000 (22:33 +0000)]
The opt_*.h headers must be included before any system header, except
sys/cdefs.h. In particular, in case of COMPAT_43, param.h includes
sys/types.h, which includes sys/select.h, which includes
sys/_sigset.h. The _sigset.h customizes the provided definions based
on COMPAT_43, eliminating osigset_t if symbol is not defined. The
sys/proc.h is included after opt_compat.h and needs osigset_t.
kib [Tue, 10 Dec 2013 21:15:18 +0000 (21:15 +0000)]
Fix detection of EOF in kern_physio(). If bio_length was clipped by
the excess code in g_io_check(), bio_resid is also truncated by
g_io_deliver(). As result, bufdonebio() assigns truncated value to
the buffer b_resid field.
Use the residual bio_completed to calculate buffer b_resid from
b_bcount in bufdonebio(), instead of bio_resid, calculated from
bio_length in g_io_deliver().
The issue is seemingly caused by the code rearrange into g_io_check(),
which is not present in stable/10. The change still looks as the
useful change to have in 10 nevertheless.
Reported by: Stefan Hegnauer <stefan.hegnauer@gmx.ch>
Tested by: pho, Stefan Hegnauer <stefan.hegnauer@gmx.ch>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
kib [Tue, 10 Dec 2013 20:52:31 +0000 (20:52 +0000)]
Only assert the length of the passed bio in the mdstart_vnode() when
the bio is unmapped, so we must map the bio pages into pbuf. This
works around the geom classes which do not follow the MAXPHYS limit on
the i/o size, since such classes do not know about unmapped bios
either.
Reported by: Paolo Pinto <paolo.pinto@netasq.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
mav [Tue, 10 Dec 2013 20:25:43 +0000 (20:25 +0000)]
Do not DELAY() for P-state transition unless we want to see the result.
Intel manual says: "If a transition is already in progress, transition to
a new value will subsequently take effect. Reads of IA32_PERF_CTL determine
the last targeted operating point." So seems it should be fine to just
trigger wanted transition and go. Linux does the same.
trociny [Tue, 10 Dec 2013 20:05:07 +0000 (20:05 +0000)]
In remote_send_thread, if sending a request fails don't take the
request back from the receive queue -- it might already be processed
by remote_recv_thread, which lead to crashes like below:
(primary) Unable to receive reply header: Connection reset by peer.
(primary) Unable to send request (Connection reset by peer):
WRITE(954662912, 131072).
(primary) Disconnected from kopusha:7772.
(primary) Increasing localcnt to 1.
(primary) Assertion failed: (old > 0), function refcnt_release,
file refcnt.h, line 62.
Taking the request back was not necessary (it would properly be
processed by the remote_recv_thread) and only complicated things.
trociny [Tue, 10 Dec 2013 19:56:26 +0000 (19:56 +0000)]
For memsync replication, hio_countdown is used not only as an
indication when a request can be moved to done queue, but also for
detecting the current state of memsync request.
This approach has problems, e.g. leaking a request if memsynk ack from
the secondary failed, or racy usage of write_complete, which should be
called only once per write request, but for memsync can be entered by
local_send_thread and ggate_send_thread simultaneously.
So the following approach is implemented instead:
1) Use hio_countdown only for counting components we waiting to
complete, i.e. initially it is always 2 for any replication mode.
2) To distinguish between "memsync ack" and "memsync fin" responses
from the secondary, add and use hio_memsyncacked field.
3) write_complete() in component threads is called only before
releasing hio_countdown (i.e. before the hio may be returned to the
done queue).
4) Add and use hio_writecount refcounter to detect when
write_complete() can be called in memsync case.
Reported by: Pete French petefrench ingresso.co.uk
Tested by: Pete French petefrench ingresso.co.uk
MFC after: 2 weeks
trasz [Tue, 10 Dec 2013 17:27:11 +0000 (17:27 +0000)]
Fix handling for empty auth-groups. Without it, ctld child process
would either exit on assertion, or, if assertions are not enabled,
fail to authenticate the target.
MFC after: 2 days
Sponsored by: The FreeBSD Foundation
mav [Tue, 10 Dec 2013 12:36:44 +0000 (12:36 +0000)]
Don't even try to read vdev labels from devices smaller then SPA_MINDEVSIZE
(64MB). Even if we would find one somehow, ZFS kernel code rejects such
devices. It is funny to look on attempts to read 4 256K vdev labels from
1.44MB floppy, though it is not very practical and quite slow.
dteske [Mon, 9 Dec 2013 23:58:26 +0000 (23:58 +0000)]
Fix a regression introduced by SVN r257842; resulting in mountroot prompt
after attempting to install to encrypted ZFS root (caused by a typo in a
variable name -- ZFSBOOT_BOOT_FSNAME -> ZFSBOOT_BOOTFS_NAME).
dteske [Mon, 9 Dec 2013 22:58:26 +0000 (22:58 +0000)]
Fix a regression introduced by SVN r257842. Result was that after
successfully installing to encrypted ZFS root, the passphrase is
not accepted and a message about "incorrect key" is displayed.