Implement support for PCI suspend, resume and shutdown events in the
LinuxKPI. Fix a few spaces to tabs. Bump the FreeBSD version to force
recompilation of existing KMODs.
* Use standard IPv6 SAS instead of rt->rt_ifa address.
* Make address lookup work for IPv6 LLA.
* Save address into buffer provided by caller instead of using static vars.
Steven Hartland [Fri, 15 Jan 2016 02:33:47 +0000 (02:33 +0000)]
Add EFI ZFS boot support
This builds on the modular EFI loader support added r294060 adding a
module to provide ZFS boot support on EFI systems.
It should be noted that EFI uses a fixed size memory block for all
allocations performed by the loader so it may be necessary to tune this
size.
For example when building an image which uses mfs_root e.g. mfsbsd, adding
the following to /etc/make.conf would be needed to prevent EFI from running
out of memory when loading the mfs_root image.
EFI_STAGING_SIZE=128
Conrad Meyer [Fri, 15 Jan 2016 01:34:43 +0000 (01:34 +0000)]
ioat(4): Add support for 'fence' bit with DMA_FENCE flag
Some classes of IOAT hardware prefetch reads. DMA operations that
depend on the result of prior DMA operations must use the DMA_FENCE flag
to prevent stale reads.
(E.g., I've hit this personally on Broadwell-EP. The Broadwell-DE has a
different IOAT unit that is documented to not pipeline DMA operations.)
Steven Hartland [Fri, 15 Jan 2016 01:06:37 +0000 (01:06 +0000)]
Ensure boot fsread correctly probes all partitions
The boot code fsread was caching the result of meta data request and
reusing it even for calls with inode = 0, which is used to partitions
trigger a probe.
The result was that success was incorrectly returned for all partition
probes after the first valid success, even for partitions which are not
UFS.
Steven Hartland [Fri, 15 Jan 2016 00:55:36 +0000 (00:55 +0000)]
Make common boot file_loadraw name parameter const
Fix compiler warnings about dropping const qualifier by changing file_loadraw
name param to const, and updating method to make that the case (it was
abusing the variable).
Justin Hibbits [Thu, 14 Jan 2016 23:22:43 +0000 (23:22 +0000)]
Adjust VM_MAX_KERNEL_ADDRESS to the max address, not the minimum next.
VM_MAX_KERNEL_ADDERESS is the maximum KVA address. 0xf8000000 is the start of
device mapping space. Since several conditional checks use '<=' against
VM_MAX_KERNEL_ADDRESS, bad things could feasibly happen.
Improvements to the MDXFileChunk() template function:
- Remove unneeded fstat()/lseek() calls.
- Return NULL and set errno to EINVAL on negative length.
- Fix small style problems and expand variable names.
After this change, it is possible to use this code for some irregular
files. For example, 'md5 /dev/md0' should now succeed.
Ian Lepore [Thu, 14 Jan 2016 19:33:13 +0000 (19:33 +0000)]
Fix the handling of the "PDC write transfer length" erratum for at91. The
problem affects revision 1xx hardware as well as later versions. Also, the
recommended workaround is to set the PDC count register for a 12-byte
transfer when the actual size is less than that, but there is no need to
extend or zero-out the data buffer, because the blklen register contains
the real transfer size and only that many bytes will be transferred.
Also add a sysctl to turn debugging printfs on or off on the fly.
Andrew Turner [Thu, 14 Jan 2016 19:00:13 +0000 (19:00 +0000)]
Set -mlong-calls where needed to get a static clang and lldb 3.8.0
linking. These are too large for a branch instruction to branch from an
earlier point in the code to somewhere later.
This will also allow these to be build with Thumb-2 when we get this
infrastructure.
Reviewed by: dim
Differential Revision: https://reviews.freebsd.org/D4855
Alan Somers [Thu, 14 Jan 2016 18:19:05 +0000 (18:19 +0000)]
Fix race condition involving ZFS remove events
When a ZFS drive disappears, ZFS sends a resource.fs.zfs.removed event to
userland. A userland program like zfsd(8) can use that event, for example to
activate a hotspare. The current code contains a race condition: vdev_geom
will sent the sysevent _before_ spa.c would update the vdev's status,
causing userland processes to see pool state that does not reflect the
device removal. This change moves the sysevent to spa.c, closing the race.
Reviewed by: delphij, Sean Eric Fagan
MFC after: 4 weeks
Sponsored by: Spectra Logic Corp
Differential Revision: https://reviews.freebsd.org/D4902
Fix the code to retry mount attempt in mountcritlocal if there are
any root mount holds. The previous one used a wrong conditional - the
"err=$?" assignment resets "$?" to 0.
Submitted by: jilles@
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Warner Losh [Thu, 14 Jan 2016 16:23:07 +0000 (16:23 +0000)]
Document how to enter the debugger here. I'm sure there's some better
canonical place, and the nit-pickers are welcome to move this
information there with a cross reference.
Netflow module is supposed to store (along with fields like
gateway address and interface index) matched netmask for each record.
This (currently) requires returning individual route entries, instead
of optimized next-hop structure. Given that, use control-plane
rib_lookup_info() function to avoid accessing rtentries directly.
While rib_lookup_info() might be slower, than fibX_lookup() flavours,
it is more scalable than rtalloc1_fib(), because rtentry mutex is
not acquired.
Gleb Smirnoff [Thu, 14 Jan 2016 10:22:45 +0000 (10:22 +0000)]
There is a bug in tcp_output()'s implementation of the TCP_SIGNATURE
(RFC 2385/TCP-MD5) kernel option.
If a tcpcb has TF_NOOPT flag, then tcp_addoptions() is not called,
and to.to_signature is an uninitialized stack variable. The value
is later used as write offset, which leads to writing to random
address.
Gleb Smirnoff [Thu, 14 Jan 2016 10:16:25 +0000 (10:16 +0000)]
Call crextend() before copying old credentials to the new credentials
and replace crcopysafe by crcopy as crcopysafe is is not intended to be
safe in a threaded environment, it drops PROC_LOCK() in while() that
can lead to unexpected results, such as overwrite kernel memory.
In my POV crcopysafe() needs special attention. For now I do not see
any problems with this function, but who knows.
Submitted by: dchagin
Found by: trinity
Security: SA-16:04.linux
Gleb Smirnoff [Thu, 14 Jan 2016 10:13:58 +0000 (10:13 +0000)]
Change linux get_robust_list system call to match actual linux one.
The set_robust_list system call request the kernel to record the head
of the list of robust futexes owned by the calling thread. The head
argument is the list head to record.
The get_robust_list system call should return the head of the robust
list of the thread whose thread id is specified in pid argument.
The list head should be stored in the location pointed to by head
argument.
In contrast, our implemenattion of get_robust_list system call copies
the known portion of memory pointed by recorded in set_robust_list
system call pointer to the head of the robust list to the location
pointed by head argument.
So, it is possible for a local attacker to read portions of kernel
memory, which may result in a privilege escalation.
Gleb Smirnoff [Thu, 14 Jan 2016 10:11:10 +0000 (10:11 +0000)]
Verify the packet length in sctp6_input().
The sctp6_ctlinput() function does not properly check the length of the packet
it receives from the ICMP6 input routine. This means that an attacker can craft
a packet that will cause a kernel panic.
When the kernel receives an ICMP6 error message with one of the types/codes
it handles, it calls icmp6_notify_error() to deliver it to the upper-level
protocol. icmp6_notify_error() cycles through the extension headers (if any)
to find the protocol number of the first non-extension header. It does NOT
verify the length of the non-extension header.
It passes information about the packet (including the actual packet) to the
upper-level protocol's pr_ctlinput function. In the case of SCTP for IPv6,
icmp6_notify_error() calls sctp6_ctlinput().
sctp6_ctlinput() assumes that the incoming packet contains a sufficiently-long
SCTP header and calls m_copydata() to extract a copy of that header. In turn,
m_copydata() assumes that the caller has already verified that the offset and
length parameters are correct. If they are incorrect, it will dereference a
NULL pointer and cause a kernel panic.
In short, no one is sufficiently verifying the input, and the result is a
kernel panic.
Steven Hartland [Thu, 14 Jan 2016 09:22:01 +0000 (09:22 +0000)]
Fix GCC warnings causing build failure after r293724
Disable some compiler warnings for GCC (non-standard compiler) fixing
build failures introduced by r293724, which enabled WARNS in the EFI boot
code, when compiling with none standard compiler (GCC).
Andrew Rybchenko [Thu, 14 Jan 2016 09:19:28 +0000 (09:19 +0000)]
sfxge: add accessors for license-related MCDI calls to common code
Add support for Huntington MCDI licensing interface to common code.
Ported from Linux net driver IOCTL functions with restructuring for
initial support for V3 licensing API.
Submitted by: Richard Houldsworth <rhouldsworth at solarflare.com>
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4918
Andrew Rybchenko [Thu, 14 Jan 2016 09:11:20 +0000 (09:11 +0000)]
sfxge: fix common code VPD iterator and duplicate tag verification
Fix efx_vpd_hunk_next() which has -- since its inception -- failed to
correctly iterate over the tags and keywords contained in the VPD data.
Only the first tag or keyword would be returned and the next call with
*contp == 1 would walk to the end of the data and finish.
This was spotted when fixing up errors spotted by Prefast code analysis
(which neglected to set all of the out parameters in all successful cases)
Also fix efx_vpd_verify() on Siena and EF10 which (as a side effect of
correctly iterating over all the tags and keywords) was failing as it
detected that both the static VPD and dynamic VPD storage contained an
RV keyword in the VPD-R tag. This is intentional as the static VPD and
dynamic VPD are stored separately (firmware merges their contents and
computes a new RV keyword checksum for the data readable from the VPD
capability in PCIe configuration space).
Submitted by: Andrew Lee <alee at solarflare.com>
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4915
Andrew Rybchenko [Thu, 14 Jan 2016 09:07:40 +0000 (09:07 +0000)]
sfxge: use correct register definitions for setting interrupt moderation on Medford
The only value which has changed is the number of rows
(ER_DZ_EVQ_TMR_REG_ROWS is 2048 vs 1024 for FR_BZ_TIMER_COMMAND_REGP0_ROWS)
but that isn't used, so this shouldn't change behaviour.
Submitted by: Mark Spender <mspender at solarflare.com>
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4913
Ed Schouten [Thu, 14 Jan 2016 07:27:42 +0000 (07:27 +0000)]
Remove an unneeded assignment of the return value.
tdelete() is supposed to return the address of the parent node that has
been deleted. We already keep track of this node in the loop between
lines 94-107. The GO_LEFT()/GO_RIGHT() macros are used later on as well,
so we must make sure not to change it to something else.
Sepherosa Ziehau [Thu, 14 Jan 2016 03:16:29 +0000 (03:16 +0000)]
hyperv: set receive buffer size according to NVSP protocol version
If the NVSP protocol version is not greater than NVSP_PROTOCOL_VERSION_2,
then the recv buffer size is 15MB, otherwise the buffer size is 16MB.
Submitted by: Hongjiang Zhang <honzhan microsoft com>
Reviewed by: royger, Dexuan Cui <decui microsoft com>, adrian
Approved by: adrian (mentor)
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D4814
Sepherosa Ziehau [Thu, 14 Jan 2016 03:11:35 +0000 (03:11 +0000)]
hyperv: add interrupt counters
Submitted by: Howard Su <howard0su gmail com>
Reviewed by: royger, Dexuan Cui <decui microsoft com>, adrian
Approved by: adrian (mentor)
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D4693
Sepherosa Ziehau [Thu, 14 Jan 2016 03:05:10 +0000 (03:05 +0000)]
hyperv: implement an event timer
Submitted by: Howard Su <howard0su@gmail.com>
Reviewed by: delphij, royger, adrian
Approved by: adrian (mentor)
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D4676
Sepherosa Ziehau [Thu, 14 Jan 2016 02:50:13 +0000 (02:50 +0000)]
hyperv: use x86 generic code to do the hypervisor detection
This is first step to move the generic part of HV code into kernel instead
of module, so that it is possible to use hypercall to implement some other
paravirtualization code in the kernel.
Submitted by: Howard Su <howard0su@gmail.com>
Reviewed by: royger, delphij, adrian
Approved by: adrian (mentor)
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D3072
Eric van Gyzen [Thu, 14 Jan 2016 00:31:00 +0000 (00:31 +0000)]
bsdinstall: Suggest the GPT+Active workaround on Dell T5810
The Dell Precision Tower 5810 fails to boot from GPT in Legacy/BIOS mode
without the Active flag in the Protective MBR. Suggest the workaround
during installation.
Since an increasing number of Dell systems exhibit this behavior,
I imagine all Dells past a certain date will do so. I would like
to suggest the workaround for all Dells with a BIOS date of, say,
2014 or later, but I would need to test a variety of systems before
committing such a change.
Reviewed by: allanjude, dteske
MFC after: 5 days
Relnotes: We should probably suggest using GPT+Active on "recent" Dells.
Sponsored by: Dell Inc.
Differential Revision: https://reviews.freebsd.org/D4075
Marius Strobl [Wed, 13 Jan 2016 21:47:27 +0000 (21:47 +0000)]
Given that em(4), lem(4) and igb(4) hardware doesn't require the
alignment guarantees provided by m_defrag(9), use m_collapse(9)
instead for performance reasons.
While at it, sanitize the statistics softc members, i. e. retire
unused ones and add SYSCTL nodes missing for actually used ones.
Andrew Turner [Wed, 13 Jan 2016 21:34:15 +0000 (21:34 +0000)]
Add support for relocating AArch64 modules to kldxref. This fixes an error
message where it fails to read the module as the unrelocated addresses
are zero.
Steven Hartland [Wed, 13 Jan 2016 18:33:12 +0000 (18:33 +0000)]
Improve non-interactive forth cmd error reporting
Non-interactive forth command errors where silent even for critical issues
e.g. failing to load a required kernel module or mfs_root.
This resulted in later unexplained and hard to trace errors such as mount
root failures.
This introduces additional command return codes that are treated
appropriately by the non-interactive command processor (bf_command).
* CMD_CRIT = print error
* CMD_FATAL = panic
Also fix minor style(9) issues with command_load return codes.
Alan Somers [Wed, 13 Jan 2016 17:33:50 +0000 (17:33 +0000)]
Fix Coverity warnings regarding r293229
rpcbind/check_bound.c
Fix CID1347798, a memory leak in mergeaddr.
rpcbind/tests/addrmerge_test.c
Fix CID1347800 through CID1347803, memory leaks in ATF tests. They
are harmless because each ATF test case runs in its own process, but
they are trivial to fix. Fix a few other leaks that Coverity didn't
detect, too.
Andrew Turner [Wed, 13 Jan 2016 15:54:17 +0000 (15:54 +0000)]
Remove the compat code to handle the kernel passing us an unalinged
stackpointer. Userland expects the kernel to pass it an aligned sp and
pass a pointer to the arguments in x0. The kernel side was updated in
r289502, 3 months ago.
Steven Hartland [Wed, 13 Jan 2016 14:47:13 +0000 (14:47 +0000)]
Increase efiboot.img size used in ISO creation
Due to recent and upcoming changes to add additional functionality to
the EFI loader its now bigger than the space allocates for efiboot.img
so increase this in line with boot1.efifat.
Move the funsetown(9) call from audit_pipe_close() to cdevpriv
destructor. As result, close method becomes trivial and removed.
Final cdevsw close method might be called without file
context (e.g. in vn_open_vnode() if the vnode is reclaimed meantime),
which leaves ap_sigio registered for notification, despite cdevpriv
destructor frees the memory later.
Call destructor instead of doing a cleanup inline, for
devfs_set_cdevpriv() failure in open. This adds missed funsetown(9)
call and locks ap to satisfy audit_pipe_free() invariants.
Reported and tested by: pho (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Switch legacy pty clone handler to use make_dev_s(9). Add
MAKEDEV_CHECKNAME flag to the call, this is required to not panic on
race between the clone and destructing the closed master.
Reported by and discussed with: bde
Tested by: pho (as part of the larger patch)
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
Enji Cooper [Wed, 13 Jan 2016 09:14:27 +0000 (09:14 +0000)]
Integrate
tools/regression/geom_{concat,eli,gate,mirror,nop,raid3,shsec,stripe,uzip}
in to the FreeBSD test suite as
tests/sys/geom/class/{concat,eli,gate,mirror,nop,raid3,shsec,stripe,uzip}
The tools/regression/geom and tools/regression/geom_part testcases are being
left alone because both test sets are both currently broken.
The majority of this work was done on ^/user/ngie/more-tests2 . The differences
are as follows:
- tests/sys/geom/class/Makefile.inc is not present; it was
inlined into the class's Makefiles for explicitness.
- The testcases officially require root via kyua
- The geom_gate(4) tests don't use the pidfile changes proposed in
https://reviews.freebsd.org/D4836 .