lidl [Fri, 7 Oct 2016 02:33:45 +0000 (02:33 +0000)]
MFC r306508: Fix blacklistd's state restoral at startup
The blacklistd daemon attempted to restore the filtering rules
before the database of blocked addresses was opened, so no rules
were being reloaded. Now the rules are properly recreated when the
daemon is started with '-r'.
This bug was fixed locally, and then sent upstream to NetBSD.
This changeset is the import the NetBSD version of the change,
which added debugging output to alert about a null database.
In icmp6_reflect() use original source address of erroneous packet as
destination address for source selection algorithm when original
destination address is not one of our own.
Reported by: Mark Kamichoff <prox at prolixium com>
jhb [Tue, 4 Oct 2016 22:32:43 +0000 (22:32 +0000)]
MFC 304482: Adjust t4_port_init() to work with VF devices.
Specifically, the FW_PORT_CMD may or may not work for a VF (the PF
driver can choose whether or not to permit access to this command),
so don't attempt to fetch port information on a VF if permission is
denied by the PF.
jhb [Tue, 4 Oct 2016 22:15:42 +0000 (22:15 +0000)]
MFC 305548: Don't break out of the m_advance() loop if len drops to zero.
If a packet contains the Ethernet header (14 bytes) in the first mbuf
and the payload (IP + UDP + data) in the second mbuf, then the attempt
to fetch the l3hdr will return a NULL pointer. The first loop iteration
will drop len to zero and exit the loop without setting 'p'. However,
the desired data is at the start of the second mbuf, so the correct
behavior is to loop around and let the conditional set 'p' to m_data of
the next mbuf (and leave offset as 0).
glebius [Tue, 4 Oct 2016 20:26:18 +0000 (20:26 +0000)]
Merge r306212:
Fix regression from r297400, which truncates headers in case of low socket
buffer and put a small optimization for low socket buffer case:
- Do not hack uio_resid, and let m_uiotombuf() properly take care of it. This
fixes truncation of headers at low buffer.
- If headers ate all the space, jump right to the end of the cycle, to
avoid doing single page I/O and allocating zero length mbuf.
- Clear hdr_uio only if space is positive, which indicates that all uio
was copied in.
303522:
Various fixes to the t4/5nex character device.
- Remove null open/close methods.
- Don't set d_flags to 0 explicitly.
- Remove t5_cdevsw as the .d_name member isn't really used and doesn't
warrant a separate cdevsw just for the name.
- Use ENOTTY as the error value for an unknown ioctl request.
- Use make_dev_s() to close race with setting si_drv1.
303647:
Store the offset of the KDOORBELL and GTS registers in the softc.
VF devices use a different register layout than PF devices. Storing
the offset in a value in the softc allows code to be shared between the
PF and VF drivers.
303860:
Reserve an adapter flag IS_VF to mark VF devices vs PF devices.
303880:
Track the base absolute ID of ingress and egress queues.
Use this to map an absolute queue ID to a logical queue ID in interrupt
handlers. For the regular cxgbe/cxl drivers this should be a no-op as
the base absolute ID should be zero. VF devices have a non-zero base
absolute ID and require this change. While here, export the absolute ID
of egress queues via a sysctl.
304168:
Make SGE parameter handling more VF-friendly.
Add fields to hold the SGE control register and free list buffer sizes to
the sge_params structure. Populate these new fields in
t4_init_sge_params() for PF devices and change t4_read_chip_settings() to
pull these values out of the params structure instead of reading
registers directly. This will permit t4_read_chip_settings() to be reused
for VF devices which cannot read SGE registers directly.
While here, move the call to t4_init_sge_params() to
get_params__post_init(). The VF driver will populate the SGE parameters
structure via a different method before calling t4_read_chip_settings().
304169:
Update mailbox writes to work with VF devices.
- Use alternate register locations for the data and control registers for
VFs.
- Do a dummy read to force the writes to the mailbox data registers to
post before the write to the control register on VFs.
- Do not check the PCI-e firmware register for errors on VFs.
304170:
Add support for register dumps on VF devices.
- Add handling of VF register sets to t4_get_regs_len() and t4_get_regs().
- While here, use t4_get_regs_len() in the ioctl handler for regdump
instead of inlining it.
304479:
Add structures for VF-specific adapter parameters.
While here, mark which parameters are PF-specific and which are
VF-specific.
304485:
Reorder sysctls so that nodes shared with the VF driver are added first.
This permits a single early return for VF devices in the routines that
add sysctl nodes.
305549:
Chelsio T4/T5 VF driver.
The cxgbev/cxlv driver supports Virtual Function devices for Chelsio
T4 and T4 adapters. The VF devices share most of their code with the
existing PF4 driver (cxgbe/cxl) and as such the VF device driver
currently depends on the PF4 driver.
Similar to the cxgbe/cxl drivers, the VF driver includes a t4vf/t5vf
PCI device driver that attaches to the VF device. It then creates
child cxgbev/cxlv devices representing ports assigned to the VF.
By default, the PF driver assigns a single port to each VF.
t4vf_hw.c contains VF-specific routines from the shared code used to
fetch VF-specific parameters from the firmware.
t4_vf.c contains the VF-specific PCI device driver and includes its
own attach routine.
VF devices are required to use a different firmware request when
transmitting packets (which in turn requires a different CPL message
to encapsulate messages). This alternate firmware request does not
permit chaining multiple packets in a single message, so each packet
results in a firmware request. In addition, the different CPL message
requires more detailed information when enabling hardware checksums,
so parse_pkt() on VF devices must examine L2 and L3 headers for all
packets (not just TSO packets) for VF devices. Finally, L2 checksums
on non-UDP/non-TCP packets do not work reliably (the firmware trashes
the IPv4 fragment field), so IPv4 checksums for such packets are
calculated in software.
Most of the other changes in the non-VF-specific code are to expose
various variables and functions private to the PF driver so that they
can be used by the VF driver.
Note that a limited subset of cxgbetool functions are supported on VF
devices including register dumps, scheduler classes, and clearing of
statistics. In addition, TOE is not supported on VF devices, only for
the PF interfaces.
jhb [Mon, 3 Oct 2016 23:15:44 +0000 (23:15 +0000)]
MFC 303405: Add support for zero-copy aio_write() on TOE sockets.
AIO write requests for a TOE socket on a Chelsio T4+ adapter can now
DMA directly from the user-supplied buffer. This is implemented by
wiring the pages backing the user-supplied buffer and queueing special
mbufs backed by raw VM pages to the socket buffer. The TOE code
recognizes these special mbufs and builds a sglist from the VM page
array associated with the mbuf when queueing a work request to the TOE.
Because these mbufs do not have an associated virtual address, m_data
is not valid. Thus, the AIO handler does not invoke sosend() directly
for these mbufs but instead inlines portions of sosend_generic() and
tcp_usr_send().
An aiotx_buffer structure is used to describe the user buffer (e.g.
it holds the array of VM pages and a reference to the AIO job). The
special mbufs reference this structure via m_ext. Note that a single
job might be split across multiple mbufs (e.g. if it is larger than
the socket buffer size). The 'ext_arg2' member of each mbuf gives an
offset relative to the backing aiotx_buffer. The AIO job associated
with an aiotx_buffer structure is completed when the last reference to
the structure is released.
Zero-copy aio_write()'s for connections associated with a given
adapter can be enabled/disabled at runtime via the
'dev.t[45]nex.N.toe.tx_zcopy' sysctl.
jhb [Mon, 3 Oct 2016 22:42:23 +0000 (22:42 +0000)]
MFC 303205,303722,305032,305752: Create VF devices on Chelsio T4/T5 NICs.
303205:
Add a driver to create VF devices on Chelsio T4/T5 NICs.
Chelsio NICs are a bit unique compared to some other NICs in that they
expose different functionality on different physical functions. In
particular, PF4 is used to manage the NIC interfaces ('t4nex' and 't5nex').
However, PF4 is not able to create VF devices. Instead, VFs are only
supported by physical functions 0 through 3. This commit adds 't4iov'
and 't5iov' drivers that attach to PF0-3.
One extra wrinkle is that the iov devices cannot enable SR-IOV until the
firwmare has been initialized by the main PF4 driver. To handle this
case, a new t4_if kobj interface has been added to permit cross-calls
between the PF drivers. The PF4 driver notifies sibling drivers when it
is fully attached. It also requests sibling drivers to detach before it
detaches. Sibling drivers query the PF4 driver during their attach
routine to see if it is attached. If not, the sibling drivers defer
their attach actions until the PF4 driver informs them it is attached.
VF devices are associated with a single port on the NIC. VF devices
created from PF0 are associated with the first port on the NIC, VFs
from PF1 are associated with the second port, etc. VF devices can
only be created from a PF device that has an associated port. Thus,
on a 2-port card, VFs are only supported on PF0 and PF1.
303722:
Use the port device name for the iov device for Chelsio T4/T5 cards.
Chelsio T4/T5 adapters are multifunction cards. The main driver uses
physical function 4 (PF4). However, VF devices for SR-IOV are only
supported on physical functions 0 through 3, where PF0 creates VFs tied
to port 0, etc. The t4iov/t5iov driver was previously added to
create VF devices for ports that are present on each adapter. This
change uses the recently added pci_iov_attach_name() function to
name the character device in /dev/iov after the associated port on
the card (e.g. /dev/iov/cxl0 is used to create VFs that share the
cxl0 port). With this in place, mark the t4iov/t5iov devices quiet
to prevent them from cluttering dmesg.
305032:
Use device_verbose() to undo device_quiet() when detaching from t[45]iovX.
The device quiet flag is not automatically reset on detach, so it is
inherited by other device drivers (e.g. when switching a device driver
over to ppt for PCI pass through). Cope with this behavior by explicitly
marking the device verbose during detach so that the next driver can make
its own decision.
305752:
Remove explicit device_verbose() from the t4iov driver detach routine
now that this case is handled generically.
asomers [Mon, 3 Oct 2016 14:59:32 +0000 (14:59 +0000)]
MFC r306048
Fix periodic scripts when an NFS mount covers a local mount
100.chksetuid and 110.neggrpperm try to search through all UFS and ZFS
filesystems. But their logic contains an error. They also search through
remote filesystems that are mounted on top of the root of a local
filesystem. For example, if a user installs a FreeBSD system with the
default ZFS layout, he'll get a zroot/usr/home filesystem. If he then mounts
/usr/home over NFS, these scripts would search through /usr/home.
rmacklem [Mon, 3 Oct 2016 12:02:45 +0000 (12:02 +0000)]
MFC: r304058, r304066, r304194
Update nfsstat.c to use the new kernel nfsstat structure and
add the new "-d" flag from D1626.
The man page will be updated in a subsequent commit.
kib [Mon, 3 Oct 2016 09:41:33 +0000 (09:41 +0000)]
MFC r306350:
For machines which support PCID but not have INVPCID instruction,
i.e. SandyBridge and IvyBridge, correct a race between pmap_activate()
and invltlb_pcid_handler().
rmacklem [Mon, 3 Oct 2016 00:10:14 +0000 (00:10 +0000)]
MFC: r304026
Update the nfsstats structure to include the changes needed by
the patch in D1626 plus changes so that it includes counts for
NFSv4.1 (and the draft of NFSv4.2).
Also, make all the counts uint64_t and add a vers field at the
beginning, so that future revisions can easily be implemented.
There is code in place to handle the old vesion of the nfsstats
structure for backwards binary compatibility.
Subsequent commits will update nfsstat(8) to use the new fields.
Fragmented UDP and ICMP packets were corrupted if a firewall with reassembling
feature (like pf'scrub) is enabled on the bridge. This patch fixes corrupted
packet problem and the panic (triggered easly with low RAM) as explain in PR
185633.
bridge_pfil and bridge_fragment relationship:
bridge_pfil() receive (IN direction) packets and sent it to the firewall The
firewall can be configured for reassembling fragmented packet (like pf'scrubing)
in one mbuf chain when bridge_pfil() need to send this reassembled packet to the
outgoing interface, it needs to re-fragment it by using bridge_fragment()
bridge_fragment() had to split this mbuf (using ip_fragment) first then
had to M_PREPEND each packet in the mbuf chain for adding Ethernet
header.
But M_PREPEND can sometime create a new mbuf on the begining of the mbuf chain,
then the "main" pointer of this mbuf chain should be updated and this case is
tottaly forgotten. The original bridge_fragment code (Revision 158140,
2006 April 29) came from OpenBSD, and the call to bridge_enqueue was
embedded. But on FreeBSD, bridge_enqueue() is done after bridge_fragment(),
then the original OpenBSD code can't work as-it of FreeBSD.
alc [Sat, 1 Oct 2016 19:30:28 +0000 (19:30 +0000)]
MFC r305213,305319,305398
As an optimization to the machine-independent layer, change the machine-
dependent pmap_ts_referenced() so that it updates the page's dirty field
if a modified bit is found while counting reference bits. This
opportunistic update can be performed at low cost and can eliminate the
need for some future calls to pmap_is_modified() by the machine-
independent layer.
Replace the number 4 in sparc64's pmap_ts_referenced() by
PMAP_TS_REFERENCED_MAX, like we've done elsewhere, e.g., amd64.
MFC 305751: Make device_quiet() an attachment property.
In particular, reset the DF_QUIET flag when detaching from a device so
that a driver that marks a device quiet doesn't dictate policy for a
different driver that may claim the device in the future.
MFC 305034: Implement 'devctl clear driver' to undo a previous 'set driver'.
Add a new 'clear driver' command for devctl along with the accompanying
ioctl and devctl_clear_driver() library routine to reset a device to
use a wildcard devclass instead of a fixed devclass. This can be used
to undo a previous 'set driver' command. After the device's name has
been reset to permit wildcard names, it is reprobed so that it can
attach to newly-available (to it) device drivers.
MFC 305502: Reset PCI pass through devices via PCI-e FLR during VM start/end.
Add routines to trigger a function level reset (FLR) of a PCI-express
device via the PCI-express device control register. This also includes
support routines to wait for pending transactions to complete as well
as calculating the maximum completion timeout permitted by a device.
Change the ppt(4) driver to reset pass through devices before attaching
to a VM during startup and before detaching from a VM during shutdown.
MFC r303019:
Use g_resize_provider() to change the size of GEOM_DISK provider,
when it is being opened. This should fix the possible loss of a resize
event when disk capacity changed.
MFC r303288:
Do not invoke resize method if geom is being withered.
MFC r303637:
Do not invoke resize event if initial disk size is zero. Some disks
report the size only after first opening. And due to the events are
asynchronous, some consumers can receive this event too late and
this confuses them. This partially restores previous behaviour, and
at the same time this should fix the problem, when already opened
provider loses resize event.
MFC 304858,305485,305497: Fix various issues with PCI pass through and VT-d.
304858:
Enable I/O MMU when PCI pass through is first used.
Rather than enabling the I/O MMU when the vmm module is loaded,
defer initialization until the first attempt to pass a PCI device
through to a guest. If the I/O MMU fails to initialize or is not
present, than fail the attempt to pass a PCI device through to a
guest.
The hw.vmm.force_iommu tunable has been removed since the I/O MMU is
no longer enabled during boot. However, the I/O MMU support can be
disabled by setting the hw.vmm.iommu.enable tunable to 0 to prevent
use of the I/O MMU on any systems where it is buggy.
305485:
Leave ppt devices in the host domain when they are not attached to a VM.
This allows a pass through device to be reset to a normal device driver
on the host and reused on the host. ppt devices are now always active in
some I/O MMU domain when the I/O MMU is active, either the host domain
or the domain of a VM they are attached to.
305497:
Update the I/O MMU in bhyve when PCI devices are added and removed.
When the I/O MMU is active in bhyve, all PCI devices need valid entries
in the DMAR context tables. The I/O MMU code does a single enumeration
of the available PCI devices during initialization to add all existing
devices to a domain representing the host. The ppt(4) driver then moves
pass through devices in and out of domains for virtual machines as needed.
However, when new PCI devices were added at runtime either via SR-IOV or
HotPlug, the I/O MMU tables were not updated.
This change adds a new set of EVENTHANDLERS that are invoked when PCI
devices are added and deleted. The I/O MMU driver in bhyve installs
handlers for these events which it uses to add and remove devices to
the "host" domain.
MFC 303881: Reliably return PCI_GETCONF_LAST_DEVICE from PCIOCGETCONF.
Previously the loop in PCIIOCGETCONF would terminate as soon as it
found enough matches. Now it will continue iterating through the
PCI device list and only terminate if it finds another matching device
for which it has no room to store a conf structure. This means that
PCI_GETCONF_LAST_DEVICE is reliably returned when the number of
matching devices is equal to the number of slots in the matches
buffer. For example, if a program requests the conf structure for a
single PCI function with a specified domain/bus/slot/function it will
now get PCI_GETCONF_LAST_DEVICE instead of PCI_GETCONF_MORE_DEVS.
While here, simplify the loop conditional a bit more by explicitly
breaking out of the loop if copyout() fails and removing a redundant
i < pci_numdevs check.
MFC 303887: Add a dmardump utility to dump the VT-d context tables.
This tool parses the ACPI DMAR table looking for DMA remapping devices.
For each device it walks the root table and any context tables
referenced to display mapping info for PCI devices.
Note that acpidump -t already parses the info in the ACPI DMAR tables
directly. This tool examines some of the data structures the DMAR
remapping engines use to translate DMA requests.
- Add constants for the fields in the root-entry table address register,
namely the root type type (RTT) and root table address (RTA) mask.
- Add macros for the bitmask of the domain ID field in the second word
of context table entries as well as a helper macro (DMAR_CTX2_GET_DID)
to extract the domain ID from a context table entry.
MFC 303204: Install a handler for firmware work request error messages.
If a driver sends an malformed or disallowed work request, the firmware
responds with a work request error. Previously the driver treated this is
as an unexpected message and panicked. Now it decodes the error message
to aid in debugging.
MFC 303721: Permit the name of the /dev/iov entry to be set by the driver.
The PCI_IOV option creates character devices in /dev/iov for each PF
device driver that registers support for creating VFs. By default the
character device is named after the PF device (e.g. /dev/iov/foo0).
This change adds a variant of pci_iov_attach() called pci_iov_attach_name()
that allows the name of the /dev/iov entry to be specified by the
driver.
To preserve the ABI, this version does not modify the existing
PCI_IOV_ATTACH kobj method as was done in HEAD. Instead, a new
PCI_IOV_ATTACH_NAME method has been added that accepts the name as an
additional parameter. The PCI bus driver now provides an implementation
of PCI_IOV_ATTACH_NAME. A default implementation of PCI_IOV_ATTACH is
provided that calls PCI_IOV_ATTACH_NAME passing 'device_get_nameunit(dev)'
as the name.
MFC r306417: portsnap: only move expected snapshot contents from snap/ to files/
Previously it was possible to smuggle in addional files that would
be used by later portsnap runs. Now we only move those files expected
to be in the snapshot into files/ and require that there are no
unexpected files.
This was used by portsnap attacks 2, 3, and 4 in the "non-cryptanalytic
attacks against FreeBSD update components" anonymous gist.
1) Microoptimize %p case.
2) Implememt %u for GNU compatibility.
3) Don't forget to advance buf for %w/%u.
4) Fail with incomplete week (week 0) request and no such week in the
year.
5) Fix yday formula when Sunday requested and the week started from Monday.
6) Fail with impossible yday for incomplete week (week 0) and direct %w/%u
request.
7) Shift yday/wday to the first day of the year, if incomplete week
(week 0) requested and no %w/%u used.
8) For already non-standard %z extension implement GNU compatible formats:
+hh and -hh.
9) Check for incorrect values for %z.
MFC r306091:
Add a way for the architecture to specify the calling ABI for methods
in the EFI Runtime Services Table. On amd64, the calling conventions
are MS.
If present, honor the USB port mode (host or peripheral) set on DTS, if not,
keep the beaglebone defaults: USB0 -> peripheral/gadget, USB1 -> host.
This is only a workaround as in fact fact this hardware is capable of detect
the USB port mode based on type of cable and act according with the detected
mode. Unfortunately the driver does not handle that at moment.
mm [Sun, 25 Sep 2016 22:02:27 +0000 (22:02 +0000)]
MFC r305819:
Sync libarchive with vendor including important security fixes.
Issues fixed (FreeBSD):
PR #778: ACL error handling
Issue #745: Symlink check prefix optimization is too aggressive
Issue #746: Hard links with data can evade sandboxing restrictions
This update fixes the vulnerability #3 and vulnerability #4 as reported in
"non-cryptanalytic attacks against FreeBSD update components".
https://gist.github.com/anonymous/e48209b03f1dd9625a992717e7b89c4f
Fix for vulnerability #2 has already been merged in r305188.
When mlx5e_sq_xmit() returns an error code and the mbuf pointer is set,
we should not free the mbuf, because the caller will keep the mbuf in
the drbr. Make sure the mbuf pointer is correctly set upon function
exit.