Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Matthew Ahrens <mahrens@delphix.com>
https://www.illumos.org/issues/7019
Currently zfsdev_ioctl, when confronted by a request with the FKIOCTL flag set,
skips all processing of secpolicy functions. This means that ZFS is not doing
any kind of verification of the credentials or access rights of the caller and
assuming that (as it is an in-kernel client) all such checks have already been
done.
This turns out to be quite a dangerous assumption, especially with respect to
sdev. In general I don't think it's particularly reasonable to offload this
enforcement of access rights onto other kernel subsystems when ZFS has some
particular local semantics in this area (delegated datasets etc) and does not
provide any kind of API to allow other subsystems to avoid code duplication
when doing it. ZFS should apply its normal access policy to requests from
within the kernel, and callers should take care to give it the correct
credentials and call it from the correct context in order to get the results
they need.
You can observe the currently unfortunate consequences of this bug in any non-
global zone that has access to /dev/zvol or any subset of it via sdev profiles.
In particular, a zone used to contain a KVM or similar which has a single zvol
passed through to it using a <device match= block in its zone XML.
Even though sdev makes something of an attempt to control for whether the
caller should have access to nodes in /dev/zvol, it doesn't do this correctly,
or really at all in the lookup call path. So, if we have a zone that's been
given access to any part of /dev/zvol, it can simply look up the full path to
any other zvol on the entire system, and the node will appear and be able to be
used.
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Alex Wilson <alex.wilson@joyent.com>
https://www.illumos.org/issues/6922
ZFS does not do a config_sync after removing an aux (spare, log, or cache)
device. AFAICT this isn't being done because it is slow and was deemed
unnecessary. However, it should be such a rare operation that speed doesn't
matter, and not doing it results in two problems:
1) It is theoretically possible to remove an aux device from one pool and
attach it to another, then lose power. When power is restored, both pools woul
d
think that they own the aux device.
2) Removal of the aux device doesn't send any useful sysevents to userland.
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Alan Somers <asomers@gmail.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
https://www.illumos.org/issues/6876
Calling dsl_dataset_name on a dataset with a 256 byte buffer is asking for
trouble. We should check every dataset on import, using a 1024 byte buffer and
checking each time to see if the dataset's new name is longer than 256 bytes.
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Paul Dagnelie <pcd@delphix.com>
sephe [Tue, 11 Oct 2016 08:22:17 +0000 (08:22 +0000)]
MFC 302816-302818
302816
hyperv/vmbus: Release vmbus channel lock before detach devices
Device detach method may sleep.
While I'm here, rename the function, fix indentation and function
comment.
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D7110
302817
hyperv/vmbus: Field renaming to reflect reality
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D7111
302818
hyperv/vmbus: Fix the racy channel close.
It is not safe to iterate the sub-channel list w/o lock on the
close path, while it's even more difficult to hold the lock
and iterate the sub-channel list. We leverage the
vmbua_{get,rel}_subchan() functions to solve this dilemma.
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D7112
sephe [Tue, 11 Oct 2016 06:46:24 +0000 (06:46 +0000)]
MFC 302632-302634
302632
hyperv/vmbus: More verbose for GPADL_connect/chan_{rescind,offer}
Reviewed by: Dexuan Cui <decui microsoft com>, Hongjiang Zhang <honzhan microsoft com>
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D6976
302633
hyperv/vmbus: Free sysctl properly upon channel close.
Prepare for sub-channel re-open.
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D6977
For multi-channel devices, once the primary channel is closed,
a set of 'rescind' messages for sub-channels will be delivered
by Hypervisor. Sub-channel MUST be freed according to these
'rescind' messages; directly re-openning sub-channels in the
same fashion as the primary channel's re-opening does NOT work
at all.
After the primary channel is re-opened, requested # of sub-
channels will be delivered though 'channel offer' messages, and
this set of newly offered channels can be opened along side with
the primary channel.
This unbreaks the MTU setting for hn(4), which requires re-
openning all existsing channels upon MTU change.
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D6978
sephe [Tue, 11 Oct 2016 02:25:59 +0000 (02:25 +0000)]
MFC 306480
linuxkpi: Fix PCI BAR lazy allocation support.
FreeBSD supports lazy allocation of PCI BAR, that is, when a device
driver's attach method is invoked, even if the device's PCI BAR
address wasn't initialized, the invocation of bus_alloc_resource_any()
(the call chain: pci_alloc_resource() -> pci_alloc_multi_resource() ->
pci_reserve_map() -> pci_write_bar()) would allocate a proper address
for the PCI BAR and write this 'lazy allocated' address into the PCI
BAR.
This model works fine for native FreeBSD device drivers, but _not_ for
device drivers shared with Linux (e.g. dev/mlx5/mlx5_core/mlx5_main.c
and ofed/drivers/net/mlx4/main.c. Both of them use
pci_request_regions(), which doesn't work properly with the PCI BAR
lazy allocation, because pci_resource_type() -> _pci_get_rle() always
returns NULL, so pci_request_regions() doesn't have the opportunity to
invoke bus_alloc_resource_any(). We now use pci_find_bar() in
pci_resource_type(), which is able to locate all available PCI BARs
even if some of them will be lazy allocated.
Submitted by: Dexuan Cui <decui microsoft com>
Reviewed by: hps
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D8071
sevan [Mon, 10 Oct 2016 16:11:51 +0000 (16:11 +0000)]
MFC r306598
ccdconfig first appeared in NetBSD 1.1
From NetBSD man page, confirmed with repo tags in CVS [1]
(there was also no 1.0a release according to [2])
sevan [Mon, 10 Oct 2016 15:44:42 +0000 (15:44 +0000)]
MFC r306584:
Move the description of CHANGER variable to ENVIRONMENT section rather than
in the DESCRIPTION section.
From OpenBSD src/bin/chio/chio.1 r1.23
hselasky [Mon, 10 Oct 2016 11:34:51 +0000 (11:34 +0000)]
MFC r306451:
The IORESOURCE_XXX defines should resemble a bitmask while SYS_RES_XXX
are not bitmasks. Fix return value of pci_resource_flags() to reflect
this change.
hselasky [Mon, 10 Oct 2016 11:25:11 +0000 (11:25 +0000)]
MFC r306441 and r306634:
While draining a timeout task prevent the taskqueue_enqueue_timeout()
function from restarting the timer.
Commonly taskqueue_enqueue_timeout() is called from within the task
function itself without any checks for teardown. Then it can happen
the timer stays active after the return of taskqueue_drain_timeout(),
because the timeout and task is drained separately.
This patch factors out the teardown flag into the timeout task itself,
allowing existing code to stay as-is instead of applying a teardown
flag to each and every of the timeout task consumers.
Add assert to taskqueue_drain_timeout() which prevents parallel
execution on the same timeout task.
Update manual page documenting the return value of
taskqueue_enqueue_timeout().
julian [Mon, 10 Oct 2016 04:57:33 +0000 (04:57 +0000)]
While the thread is sleeping in taskqueue_drain_all() it is
posible that the queue entry it is looking at is removed
from the queue, but we make no effort to account
for this. when we wake up we need to check it's still there.
jch [Sun, 9 Oct 2016 21:35:12 +0000 (21:35 +0000)]
MFC r306443:
Fix an issue with accept_filter introduced with r261242:
As a side effect of r261242 when using accept_filter the
first call to soisconnected() is done earlier in tcp_input()
instead of tcp_do_segment() context. Restore the expected behaviour.
Note: This call to soisconnected() seems to be extraneous in all
cases (with or without accept_filter). Will be addressed in a
separate commit.
julian [Fri, 7 Oct 2016 19:28:45 +0000 (19:28 +0000)]
MFH: r259647
o Remove assertions on ipa_version as sometimes the version detection
using cpuid can be quirky (this is the case of VMWare without the
vPMC support) but fail to probe hwpmc.
o Apply the fix for XEON family of processors as established by
315338-020 document (bug AJ85).
emaste [Fri, 7 Oct 2016 14:46:34 +0000 (14:46 +0000)]
MFC r299199: Add nid_namelen bounds check to nfssvc system call
This is only allowed by root and only used by the nfs daemon, which
should not provide an incorrect value. However, it's still good
practice to validate data provided by userland.
jtl [Fri, 7 Oct 2016 10:47:32 +0000 (10:47 +0000)]
MFC r296454:
Some cleanup in tcp_respond() in preparation for another change:
- Reorder variables by size
- Move initializer closer to where it is used
- Remove unneeded variable
MFC r296455:
As reported on the transport@ and current@ mailing lists, the FreeBSD TCP
stack is not compliant with RFC 7323, which requires that TCP stacks send
a timestamp option on all packets (except, optionally, RSTs) after the
session is established.
This patch adds that support. It also adds a TCP signature option to the
packet, if appropriate.
MFC r300764 (by jhb@):
Don't reuse the source mbuf in tcp_respond() if it is not writable.
Not all mbufs passed up from device drivers are M_WRITABLE(). In
particular, the Chelsio T4/T5 driver uses a feature called "buffer
packing" to receive multiple frames in a single receive buffer. The mbufs
to receive multiple frames in a single receive buffer. The mbufs for
these frames all share the same external storage so are treated as
read-only by the rest of the stack when multiple frames are in flight.
Previously tcp_respond() would blindly overwrite read-only mbufs when
INVARIANTS was disabled or panic with an assertion failure if INVARIANTS
was enabled. Note that the new case is a bit of a mix of the two other
cases in tcp_respond(). The TCP and IP headers must be copied explicitly
into the new mbuf instead of being inherited (similar to the m == NULL
case), but the addresses and ports must be swapped in the reply (similar
to the m != NULL case).
jhb [Thu, 6 Oct 2016 19:41:09 +0000 (19:41 +0000)]
MFC 299458: Fix buffer overrun in gcore(1) NT_PRPSINFO
Use size of destination buffer, rather than a constant that may or may not
correspond to the source buffer, to restrict the length of copied strings. In
particular, pr_fname has 16+1 characters but MAXCOMLEN is 18+1.
Use strlcpy instead of strncpy to ensure the result is nul-terminated. This
seems to be what is expected of these fields.
emaste [Wed, 5 Oct 2016 00:33:06 +0000 (00:33 +0000)]
MFC r306417: portsnap: only move expected snapshot contents from snap/ to files/
Previously it was possible to smuggle in addional files that would
be used by later portsnap runs. Now we only move those files expected
to be in the snapshot into files/ and require that there are no
unexpected files.
This was used by portsnap attacks 2, 3, and 4 in the "non-cryptanalytic
attacks against FreeBSD update components" anonymous gist.
rmacklem [Mon, 3 Oct 2016 22:11:45 +0000 (22:11 +0000)]
MFC: r304026
Update the nfsstats structure to include the changes needed by
the patch in D1626 plus changes so that it includes counts for
NFSv4.1 (and the draft of NFSv4.2).
Also, make all the counts uint64_t and add a vers field at the
beginning, so that future revisions can easily be implemented.
There is code in place to handle the old vesion of the nfsstats
structure for backwards binary compatibility.
Subsequent commits will update nfsstat(8) to use the new fields.
asomers [Mon, 3 Oct 2016 15:17:22 +0000 (15:17 +0000)]
MFC r306048
Fix periodic scripts when an NFS mount covers a local mount
100.chksetuid and 110.neggrpperm try to search through all UFS and ZFS
filesystems. But their logic contains an error. They also search through
remote filesystems that are mounted on top of the root of a local
filesystem. For example, if a user installs a FreeBSD system with the
default ZFS layout, he'll get a zroot/usr/home filesystem. If he then mounts
/usr/home over NFS, these scripts would search through /usr/home.
Fragmented UDP and ICMP packets were corrupted if a firewall with reassembling
feature (like pf'scrub) is enabled on the bridge. This patch fixes corrupted
packet problem and the panic (triggered easly with low RAM) as explain in PR
185633.
bridge_pfil and bridge_fragment relationship:
bridge_pfil() receive (IN direction) packets and sent it to the firewall The
firewall can be configured for reassembling fragmented packet (like pf'scrubing)
in one mbuf chain when bridge_pfil() need to send this reassembled packet to the
outgoing interface, it needs to re-fragment it by using bridge_fragment()
bridge_fragment() had to split this mbuf (using ip_fragment) first then
had to M_PREPEND each packet in the mbuf chain for adding Ethernet
header.
But M_PREPEND can sometime create a new mbuf on the begining of the mbuf chain,
then the "main" pointer of this mbuf chain should be updated and this case is
tottaly forgotten. The original bridge_fragment code (Revision 158140,
2006 April 29) came from OpenBSD, and the call to bridge_enqueue was
embedded. But on FreeBSD, bridge_enqueue() is done after bridge_fragment(),
then the original OpenBSD code can't work as-it of FreeBSD.
MFC 305034: Implement 'devctl clear driver' to undo a previous 'set driver'.
Add a new 'clear driver' command for devctl along with the accompanying
ioctl and devctl_clear_driver() library routine to reset a device to
use a wildcard devclass instead of a fixed devclass. This can be used
to undo a previous 'set driver' command. After the device's name has
been reset to permit wildcard names, it is reprobed so that it can
attach to newly-available (to it) device drivers.
MFC 305502: Reset PCI pass through devices via PCI-e FLR during VM start/end.
Add routines to trigger a function level reset (FLR) of a PCI-express
device via the PCI-express device control register. This also includes
support routines to wait for pending transactions to complete as well
as calculating the maximum completion timeout permitted by a device.
Change the ppt(4) driver to reset pass through devices before attaching
to a VM during startup and before detaching from a VM during shutdown.
MFC 304858,305485: Fix various issues with PCI pass through and VT-d.
304858:
Enable I/O MMU when PCI pass through is first used.
Rather than enabling the I/O MMU when the vmm module is loaded,
defer initialization until the first attempt to pass a PCI device
through to a guest. If the I/O MMU fails to initialize or is not
present, than fail the attempt to pass a PCI device through to a
guest.
The hw.vmm.force_iommu tunable has been removed since the I/O MMU is
no longer enabled during boot. However, the I/O MMU support can be
disabled by setting the hw.vmm.iommu.enable tunable to 0 to prevent
use of the I/O MMU on any systems where it is buggy.
305485:
Leave ppt devices in the host domain when they are not attached to a VM.
This allows a pass through device to be reset to a normal device driver
on the host and reused on the host. ppt devices are now always active in
some I/O MMU domain when the I/O MMU is active, either the host domain
or the domain of a VM they are attached to.
MFC 303881: Reliably return PCI_GETCONF_LAST_DEVICE from PCIOCGETCONF.
Previously the loop in PCIIOCGETCONF would terminate as soon as it
found enough matches. Now it will continue iterating through the
PCI device list and only terminate if it finds another matching device
for which it has no room to store a conf structure. This means that
PCI_GETCONF_LAST_DEVICE is reliably returned when the number of
matching devices is equal to the number of slots in the matches
buffer. For example, if a program requests the conf structure for a
single PCI function with a specified domain/bus/slot/function it will
now get PCI_GETCONF_LAST_DEVICE instead of PCI_GETCONF_MORE_DEVS.
While here, simplify the loop conditional a bit more by explicitly
breaking out of the loop if copyout() fails and removing a redundant
i < pci_numdevs check.
MFC 303887: Add a dmardump utility to dump the VT-d context tables.
This tool parses the ACPI DMAR table looking for DMA remapping devices.
For each device it walks the root table and any context tables
referenced to display mapping info for PCI devices.
Note that acpidump -t already parses the info in the ACPI DMAR tables
directly. This tool examines some of the data structures the DMAR
remapping engines use to translate DMA requests.
- Add constants for the fields in the root-entry table address register,
namely the root type type (RTT) and root table address (RTA) mask.
- Add macros for the bitmask of the domain ID field in the second word
of context table entries as well as a helper macro (DMAR_CTX2_GET_DID)
to extract the domain ID from a context table entry.
1) Microoptimize %p case.
2) Implememt %u for GNU compatibility.
3) Don't forget to advance buf for %w/%u.
4) Fail with incomplete week (week 0) request and no such week in the
year.
5) Fix yday formula when Sunday requested and the week started from Monday.
6) Fail with impossible yday for incomplete week (week 0) and direct %w/%u
request.
7) Shift yday/wday to the first day of the year, if incomplete week
(week 0) requested and no %w/%u used.
8) For already non-standard %z extension implement GNU compatible formats:
+hh and -hh.
9) Check for incorrect values for %z.
mm [Sun, 25 Sep 2016 22:04:02 +0000 (22:04 +0000)]
MFC r305819:
Sync libarchive with vendor including important security fixes.
Issues fixed (FreeBSD):
PR #778: ACL error handling
Issue #745: Symlink check prefix optimization is too aggressive
Issue #746: Hard links with data can evade sandboxing restrictions
This update fixes the vulnerability #3 and vulnerability #4 as reported in
"non-cryptanalytic attacks against FreeBSD update components".
https://gist.github.com/anonymous/e48209b03f1dd9625a992717e7b89c4f
Fix for vulnerability #2 has already been merged in r305192.
When mlx5e_sq_xmit() returns an error code and the mbuf pointer is set,
we should not free the mbuf, because the caller will keep the mbuf in
the drbr. Make sure the mbuf pointer is correctly set upon function
exit.
MFC r305874:
mlx5en: Allow setting the software MTU size below 1500 bytes
The hardware MTU size can't be set to a value less than 1500 bytes due
to side-band management support. Allow setting the software MTU size
below 1500 bytes, thus creating a mismatch between hardware and
software MTU sizes.