Allan Jude [Sat, 16 Jan 2016 19:25:16 +0000 (19:25 +0000)]
Never 4k align the MBR bootpool because zfsldr can not deal with a gap
If the bootpool does not start at the first sector of the BSD partition
then zfsldr seeks to the wrong offset inside the ZFS vdev label, and is
unable to find zfsboot, so the system does not boot
If 4k alignment is requested, align the BSD partition in the MBR table,
and align the swap and data pool, but the bootpool must start at sector 1
While here, if 4k alignment is requested, disable MBR CHS alignment, as
this results in not-4k aligned partitions.
Reported by: Alex Wilkinson
MFC after: 5 days
Sponsored by: ScaleEngine Inc.
Andrew Turner [Sat, 16 Jan 2016 10:12:50 +0000 (10:12 +0000)]
Use __ARM_ARCH to decide when ARM_TP_ADDRESS needs to be set. This fixes
an issue with clang 3.8.0 where none of the __ARM_ARCH_*__ macros were
defined on some ARMv6 kernel configs.
Busy the mount point which is the owner of the audit vnode, around
audit_record_write(). This is important so that VFS_STATFS() is not
done on the NULL or freed mp and the check for free space is
consistent with the vnode used for write.
Add vn_start_write() braces around VOP_FSYNC() calls on the audit vnode.
Move repeated code to fsync vnode and panic to the helper
audit_worker_sync_vp().
Reviewed by: rwatson
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Warner Losh [Sat, 16 Jan 2016 05:47:34 +0000 (05:47 +0000)]
We don't need at91_bs_tag. arm_base_bs_tag works now that we have
better dynamic device mapping that didn't exit when we started this
port. Remove it, since everything works w/o it.
Warner Losh [Sat, 16 Jan 2016 04:47:32 +0000 (04:47 +0000)]
Move ohci files to their proper place in the tree for atmel.
Fix when it is included (we don't have a at91rm9200 device).
From a similar patch in the PR, with tweaked names.
Alan Cox [Sat, 16 Jan 2016 04:41:40 +0000 (04:41 +0000)]
A fix to r292469: Iterate over the physical segments in descending rather
than ascending order in vm_phys_alloc_contig() so that, for example, a
sequence of contigmalloc(low=0, high=4GB) calls doesn't exhaust the supply
of low physical memory resulting in a later contigmalloc(low=0, high=1MB)
failure.
Enji Cooper [Sat, 16 Jan 2016 02:18:36 +0000 (02:18 +0000)]
Fix warnings with clang/gcc
- Get rid of unused argc/argv variables in main
- Exit on failure with a return code of 1 instead of -1 with err/errx as a
return code of -1 is implementation dependent
- Bump WARNS to 6
Enji Cooper [Sat, 16 Jan 2016 02:15:13 +0000 (02:15 +0000)]
Fix warnings with gcc 5.0
reconnect.c:
- Convert the K&R prototype of main to an ANSI prototype to mute a
warning from gcc 4.2.1
- Close s_sock2 after finishing off the last test to plug a leak and
mute a warning from gcc 5.0 about a -Wunused-but-set variable
sendfile.c:
- Fix a -Wunused-but-set warning with gcc 5.0 with pagesize in main(..)
Enji Cooper [Sat, 16 Jan 2016 02:02:50 +0000 (02:02 +0000)]
Test for EPROTOTYPE not EPROTONOSUPPORT
- `SOCK_RAW` is the implied supported type parameter for socket(2) per route(4)
- localsw in `sys/kern/uipc_usrreq.c` doesn't have an entry for `SOCK_RAW`, so
the prototype is invalid (this isn't explicitly documented anywhere I could
find)
Bryan Drewery [Fri, 15 Jan 2016 22:08:58 +0000 (22:08 +0000)]
FAST_DEPEND: Fix incremental builds leading to kernel panics.
This fixes .depend.genassym.o not being included. genassym.o depends on
all of the system headers and when rebuilt regenerates assym.s which
lists offsets for critial .S files to utilize. By having a struct in a
system header change its offsets and not have generassym.o be rebuilt,
this would lead to panics.
The flaw in the initial commit was seeing ${OBJS} in ${SYSTEM_OBJS} and
assuming it had all of ${SRCS} in it. This is not the case though. The
older mkdep code splits out all of the various SRC lists for generating
the .depend file. It also includes ${GEN_CFILES}, which had genassym.c
and was the only significant file lacking from ${SYSTEM_OBJS} upon inspection,
since it is not linked in. Rather than duplicate the likely
soon-to-be-removed mkdep lists, just add genassym.o to the DEPENDOBJS
list. Using ${SRCS} as bsd.dep.mk does would be nice but there are many
files in the build that are only added to ${OBJS} and not ${SRCS}, such
as bf_enc.o derived from bf_enc.S for i386.
Sponsored by: EMC / Isilon Storage Division
Reported by: dhw (several panics on current@)
Pointyhat to: bdrewery
Bryan Drewery [Fri, 15 Jan 2016 22:08:51 +0000 (22:08 +0000)]
FAST_DEPEND: Rework optimization for r290524.
The .MAKEFLAGS check inside of the .for loop is extremely slow for some
reason. Just moving it out of the loop trimmed -V lookup time from 11
seconds to 1 second in the kernel obj directory.
Dimitry Andric [Fri, 15 Jan 2016 21:45:53 +0000 (21:45 +0000)]
MFV r294101: 6527 Possible access beyond end of string in zpool comment
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Gordon Ross <gwr@nexenta.com>
This fixes erroneous double increments of the 'check' variable in a loop
in spa_prop_validate(). I ran into this in the clang380-import branch,
where clang 3.8.0 warns about it. (It is already fixed there.)
Implement support for PCI suspend, resume and shutdown events in the
LinuxKPI. Fix a few spaces to tabs. Bump the FreeBSD version to force
recompilation of existing KMODs.
* Use standard IPv6 SAS instead of rt->rt_ifa address.
* Make address lookup work for IPv6 LLA.
* Save address into buffer provided by caller instead of using static vars.
Steven Hartland [Fri, 15 Jan 2016 02:33:47 +0000 (02:33 +0000)]
Add EFI ZFS boot support
This builds on the modular EFI loader support added r294060 adding a
module to provide ZFS boot support on EFI systems.
It should be noted that EFI uses a fixed size memory block for all
allocations performed by the loader so it may be necessary to tune this
size.
For example when building an image which uses mfs_root e.g. mfsbsd, adding
the following to /etc/make.conf would be needed to prevent EFI from running
out of memory when loading the mfs_root image.
EFI_STAGING_SIZE=128
Conrad Meyer [Fri, 15 Jan 2016 01:34:43 +0000 (01:34 +0000)]
ioat(4): Add support for 'fence' bit with DMA_FENCE flag
Some classes of IOAT hardware prefetch reads. DMA operations that
depend on the result of prior DMA operations must use the DMA_FENCE flag
to prevent stale reads.
(E.g., I've hit this personally on Broadwell-EP. The Broadwell-DE has a
different IOAT unit that is documented to not pipeline DMA operations.)
Steven Hartland [Fri, 15 Jan 2016 01:06:37 +0000 (01:06 +0000)]
Ensure boot fsread correctly probes all partitions
The boot code fsread was caching the result of meta data request and
reusing it even for calls with inode = 0, which is used to partitions
trigger a probe.
The result was that success was incorrectly returned for all partition
probes after the first valid success, even for partitions which are not
UFS.
Steven Hartland [Fri, 15 Jan 2016 00:55:36 +0000 (00:55 +0000)]
Make common boot file_loadraw name parameter const
Fix compiler warnings about dropping const qualifier by changing file_loadraw
name param to const, and updating method to make that the case (it was
abusing the variable).
Justin Hibbits [Thu, 14 Jan 2016 23:22:43 +0000 (23:22 +0000)]
Adjust VM_MAX_KERNEL_ADDRESS to the max address, not the minimum next.
VM_MAX_KERNEL_ADDERESS is the maximum KVA address. 0xf8000000 is the start of
device mapping space. Since several conditional checks use '<=' against
VM_MAX_KERNEL_ADDRESS, bad things could feasibly happen.
Improvements to the MDXFileChunk() template function:
- Remove unneeded fstat()/lseek() calls.
- Return NULL and set errno to EINVAL on negative length.
- Fix small style problems and expand variable names.
After this change, it is possible to use this code for some irregular
files. For example, 'md5 /dev/md0' should now succeed.
Ian Lepore [Thu, 14 Jan 2016 19:33:13 +0000 (19:33 +0000)]
Fix the handling of the "PDC write transfer length" erratum for at91. The
problem affects revision 1xx hardware as well as later versions. Also, the
recommended workaround is to set the PDC count register for a 12-byte
transfer when the actual size is less than that, but there is no need to
extend or zero-out the data buffer, because the blklen register contains
the real transfer size and only that many bytes will be transferred.
Also add a sysctl to turn debugging printfs on or off on the fly.
Andrew Turner [Thu, 14 Jan 2016 19:00:13 +0000 (19:00 +0000)]
Set -mlong-calls where needed to get a static clang and lldb 3.8.0
linking. These are too large for a branch instruction to branch from an
earlier point in the code to somewhere later.
This will also allow these to be build with Thumb-2 when we get this
infrastructure.
Reviewed by: dim
Differential Revision: https://reviews.freebsd.org/D4855
Alan Somers [Thu, 14 Jan 2016 18:19:05 +0000 (18:19 +0000)]
Fix race condition involving ZFS remove events
When a ZFS drive disappears, ZFS sends a resource.fs.zfs.removed event to
userland. A userland program like zfsd(8) can use that event, for example to
activate a hotspare. The current code contains a race condition: vdev_geom
will sent the sysevent _before_ spa.c would update the vdev's status,
causing userland processes to see pool state that does not reflect the
device removal. This change moves the sysevent to spa.c, closing the race.
Reviewed by: delphij, Sean Eric Fagan
MFC after: 4 weeks
Sponsored by: Spectra Logic Corp
Differential Revision: https://reviews.freebsd.org/D4902
Fix the code to retry mount attempt in mountcritlocal if there are
any root mount holds. The previous one used a wrong conditional - the
"err=$?" assignment resets "$?" to 0.
Submitted by: jilles@
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Warner Losh [Thu, 14 Jan 2016 16:23:07 +0000 (16:23 +0000)]
Document how to enter the debugger here. I'm sure there's some better
canonical place, and the nit-pickers are welcome to move this
information there with a cross reference.
Netflow module is supposed to store (along with fields like
gateway address and interface index) matched netmask for each record.
This (currently) requires returning individual route entries, instead
of optimized next-hop structure. Given that, use control-plane
rib_lookup_info() function to avoid accessing rtentries directly.
While rib_lookup_info() might be slower, than fibX_lookup() flavours,
it is more scalable than rtalloc1_fib(), because rtentry mutex is
not acquired.
Gleb Smirnoff [Thu, 14 Jan 2016 10:22:45 +0000 (10:22 +0000)]
There is a bug in tcp_output()'s implementation of the TCP_SIGNATURE
(RFC 2385/TCP-MD5) kernel option.
If a tcpcb has TF_NOOPT flag, then tcp_addoptions() is not called,
and to.to_signature is an uninitialized stack variable. The value
is later used as write offset, which leads to writing to random
address.
Gleb Smirnoff [Thu, 14 Jan 2016 10:16:25 +0000 (10:16 +0000)]
Call crextend() before copying old credentials to the new credentials
and replace crcopysafe by crcopy as crcopysafe is is not intended to be
safe in a threaded environment, it drops PROC_LOCK() in while() that
can lead to unexpected results, such as overwrite kernel memory.
In my POV crcopysafe() needs special attention. For now I do not see
any problems with this function, but who knows.
Submitted by: dchagin
Found by: trinity
Security: SA-16:04.linux
Gleb Smirnoff [Thu, 14 Jan 2016 10:13:58 +0000 (10:13 +0000)]
Change linux get_robust_list system call to match actual linux one.
The set_robust_list system call request the kernel to record the head
of the list of robust futexes owned by the calling thread. The head
argument is the list head to record.
The get_robust_list system call should return the head of the robust
list of the thread whose thread id is specified in pid argument.
The list head should be stored in the location pointed to by head
argument.
In contrast, our implemenattion of get_robust_list system call copies
the known portion of memory pointed by recorded in set_robust_list
system call pointer to the head of the robust list to the location
pointed by head argument.
So, it is possible for a local attacker to read portions of kernel
memory, which may result in a privilege escalation.
Gleb Smirnoff [Thu, 14 Jan 2016 10:11:10 +0000 (10:11 +0000)]
Verify the packet length in sctp6_input().
The sctp6_ctlinput() function does not properly check the length of the packet
it receives from the ICMP6 input routine. This means that an attacker can craft
a packet that will cause a kernel panic.
When the kernel receives an ICMP6 error message with one of the types/codes
it handles, it calls icmp6_notify_error() to deliver it to the upper-level
protocol. icmp6_notify_error() cycles through the extension headers (if any)
to find the protocol number of the first non-extension header. It does NOT
verify the length of the non-extension header.
It passes information about the packet (including the actual packet) to the
upper-level protocol's pr_ctlinput function. In the case of SCTP for IPv6,
icmp6_notify_error() calls sctp6_ctlinput().
sctp6_ctlinput() assumes that the incoming packet contains a sufficiently-long
SCTP header and calls m_copydata() to extract a copy of that header. In turn,
m_copydata() assumes that the caller has already verified that the offset and
length parameters are correct. If they are incorrect, it will dereference a
NULL pointer and cause a kernel panic.
In short, no one is sufficiently verifying the input, and the result is a
kernel panic.
Steven Hartland [Thu, 14 Jan 2016 09:22:01 +0000 (09:22 +0000)]
Fix GCC warnings causing build failure after r293724
Disable some compiler warnings for GCC (non-standard compiler) fixing
build failures introduced by r293724, which enabled WARNS in the EFI boot
code, when compiling with none standard compiler (GCC).
Andrew Rybchenko [Thu, 14 Jan 2016 09:19:28 +0000 (09:19 +0000)]
sfxge: add accessors for license-related MCDI calls to common code
Add support for Huntington MCDI licensing interface to common code.
Ported from Linux net driver IOCTL functions with restructuring for
initial support for V3 licensing API.
Submitted by: Richard Houldsworth <rhouldsworth at solarflare.com>
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4918
Andrew Rybchenko [Thu, 14 Jan 2016 09:11:20 +0000 (09:11 +0000)]
sfxge: fix common code VPD iterator and duplicate tag verification
Fix efx_vpd_hunk_next() which has -- since its inception -- failed to
correctly iterate over the tags and keywords contained in the VPD data.
Only the first tag or keyword would be returned and the next call with
*contp == 1 would walk to the end of the data and finish.
This was spotted when fixing up errors spotted by Prefast code analysis
(which neglected to set all of the out parameters in all successful cases)
Also fix efx_vpd_verify() on Siena and EF10 which (as a side effect of
correctly iterating over all the tags and keywords) was failing as it
detected that both the static VPD and dynamic VPD storage contained an
RV keyword in the VPD-R tag. This is intentional as the static VPD and
dynamic VPD are stored separately (firmware merges their contents and
computes a new RV keyword checksum for the data readable from the VPD
capability in PCIe configuration space).
Submitted by: Andrew Lee <alee at solarflare.com>
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4915
Andrew Rybchenko [Thu, 14 Jan 2016 09:07:40 +0000 (09:07 +0000)]
sfxge: use correct register definitions for setting interrupt moderation on Medford
The only value which has changed is the number of rows
(ER_DZ_EVQ_TMR_REG_ROWS is 2048 vs 1024 for FR_BZ_TIMER_COMMAND_REGP0_ROWS)
but that isn't used, so this shouldn't change behaviour.
Submitted by: Mark Spender <mspender at solarflare.com>
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4913