asomers [Tue, 28 Nov 2017 17:07:21 +0000 (17:07 +0000)]
MFC r325363:
Fix mpr(4) panics caused by bad drive mapping tables
sys/dev/mpr/mpr_mapping.c
If _mapping_process_dpm_pg0 detects inconsistencies in the drive
mapping table (stored in the HBA's NVRAM), abort reading it and
continue to boot as if the mapping table were blank. I observed
such inconsistencies in several HBAs after upgrading firmware from
14.0.0.0 to 15.0.0.0.
asomers [Tue, 28 Nov 2017 17:04:22 +0000 (17:04 +0000)]
MFC r322258, r324941, r324956, r325018
r322258:
Make p1003_1b.aio_listio_max a tunable
p1003_1b.aio_listio_max is now a tunable. Its value is reflected in the
sysctl of the same name, and the sysconf(3) variable _SC_AIO_LISTIO_MAX.
Its value will be bounded from below by the compile-time constant
AIO_LISTIO_MAX and from above by the compile-time constant
MAX_AIO_QUEUE_PER_PROC and the tunable vfs.aio.max_aio_queue.
r324941:
Remove artificial restriction on lio_listio's operation count
In r322258 I made p1003_1b.aio_listio_max a tunable. However, further
investigation shows that there was never any good reason for that limit to
exist in the first place. It's used in two completely different ways:
* To size a UMA zone, which globally limits the number of concurrent
aio_suspend calls.
* To artifically limit the number of operations in a single lio_listio call.
There doesn't seem to be any memory allocation associated with this limit.
This change does two things:
* Properly names aio_suspend's UMA zone, and sizes it based on a new constant.
* Eliminates the artifical restriction on lio_listio. Instead, lio_listio
calls will now be limited by the more generous max_aio_queue_per_proc. The
old p1003_1b.aio_listio_max is now an alias for
vfs.aio.max_aio_queue_per_proc, so sysconf(3) will still work with
_SC_AIO_LISTIO_MAX.
An off-by-one error has been present since the system call was first present
in 185878. It additionally became a memory corruption bug after change
324941. The failure is actually revealed by our existing AIO tests.
However, apparently nobody's been running those in 32-bit emulation mode.
asomers [Tue, 28 Nov 2017 16:52:38 +0000 (16:52 +0000)]
MFC r325011, r325016
r325011:
zfsd should be able to online an L2ARC that disappears and returns
Previously, this didn't work because L2ARC devices' labels don't contain
pool GUIDs. Modify zfsd so that the pool GUID won't be required:
lib/libdevdctl/guid.h
Change INVALID_GUID from a uint64_t constant to a function that
returns an invalid Guid object. Remove the void constructor.
Nothing uses it, and it violates RAII.
cddl/usr.sbin/zfsd/case_file.h
cddl/usr.sbin/zfsd/case_file.cc
Allow CaseFile::Find to match a CaseFile based on Vdev GUID alone.
In CaseFile::ReEvaluate, attempt to online devices even if the newly
arrived device has no pool GUID.
cddl/usr.sbin/zfsd/vdev_iterator.cc
Iterate through a pool's cache devices as well as its regular
devices.
asomers [Tue, 28 Nov 2017 16:34:55 +0000 (16:34 +0000)]
MFC r324940:
Fix the error message when creating a zpool on a too-small device
Don't check for SPA_MINDEVSIZE in vdev_geom_attach when opening by path.
It's redundant with the check in vdev_open, and failing to attach here
results in the wrong error message being printed. However, still check for
it in some other situations:
* When opening by guids, so we don't get bogged down reading from slow
devices like floppy drives.
* In vdev_geom_read_pool_label for the same reason, because we iterate over
all providers.
* If the caller requests that we verify the guid, because then we'll have to
read from the device before vdev_open verifies the size.
PR: 222227
Reported by: Marie Helene Kvello-Aune <marieheleneka@gmail.com>
Reviewed by: avg, mav
Sponsored by: Spectra Logic Corp
Differential Revision: https://reviews.freebsd.org/D12531
andrew [Tue, 28 Nov 2017 11:06:17 +0000 (11:06 +0000)]
MFC r326137:
Ensure we check the program state set in the trap frame on arm and arm64.
This value may be set by userspace so we need to check it before using it.
If this is not done correctly on exception return the kernel may continue
in kernel mode with all registers set to a userspace controlled value. Fix
this by moving the check into set_mcontext, and also add the missing
sanitisation from the arm64 set_regs.
emaste [Tue, 28 Nov 2017 00:55:30 +0000 (00:55 +0000)]
MFC r325042: libdtrace: replace "DOODAD" with more descriptive string
Previously some unimplemented libdtrace routines printed the function,
file and line number, followed by "DOODAD." That is not particularly
informative, so replace it with a message reporting the actual issue.
asomers [Tue, 28 Nov 2017 00:39:58 +0000 (00:39 +0000)]
MFC r323275, r324112
r323275:
Add basic tests for chflags, mkdir, rcp, and rmdir
Add basic command line parsing test coverage for these utilities. The tests
were automatically generated based on their man pages. These tests can be
expanded by hand for more thorough coverage. The aim is to generate very
basic amount of test coverage for all the utilities in the base system.
asomers [Tue, 28 Nov 2017 00:19:04 +0000 (00:19 +0000)]
MFC r322854, r323995, r324568, r324991
r322854:
zfsd(8): Close a race condition when onlining a disk paritition
When inserting a partitioned disk, devfs and geom will announce the whole
disk before they announce the partition. If the partition containing ZFS
extends to one of the disk's extents, then zfsd will see a ZFS label on the
whole disk and attempt to online it. ZFS is smart enough to activate the
partition instead of the whole disk, but only if GEOM has already created
the partition's provider.
cddl/contrib/opensolaris/lib/libzfs/common/libzfs.h
cddl/contrib/opensolaris/lib/libzfs/common/libzfs_import.c
Add a zpool_read_all_labels method. It's similar to
zpool_read_label, but it will return the number of labels found.
cddl/usr.sbin/zfsd/zfsd_event.cc
When processing a DevFS CREATE event, only online a VDEV if we can
read all four ZFS labels.
r324991:
Fix zpool_read_all_labels when vfs.aio.enable_unsafe=0
Previously, zpool_read_all_labels was trying to do 256KB reads, which are
greater than the default MAXPHYS and therefore must go through the slow,
unsafe AIO path. Shrink these reads to 112KB so they can use the safe, fast
AIO path instead.
gjb [Mon, 27 Nov 2017 15:12:14 +0000 (15:12 +0000)]
MFC r326068:
Remove /etc/resolv.conf from virtual machine images, which is
copied from the build host. It is renamed to /etc/resolv.conf.bak
on boot, so never used anyway.
ae [Fri, 24 Nov 2017 04:42:21 +0000 (04:42 +0000)]
MFC r325960:
Unconditionally enable support for O_IPSEC opcode.
IPsec support can be loaded as kernel module, thus do not depend from
kernel option IPSEC and always build O_IPSEC opcode implementation as
enabled.
MFC r325962:
Do not invoke IPv4 NAT handler for non IPv4 packets. Libalias expects
a packet is IPv4. And in case when it is IPv6, it just translates them
as IPv4. This leads to corruption and in some cases to panics.
In particular a panic can happen when value of ip6_plen modified to
something that leads to IP fragmentation, but actual packet length does
not match the IP length.
Packets that are not IPv4 will be dropped by NAT rule.
delphij [Wed, 22 Nov 2017 06:33:51 +0000 (06:33 +0000)]
MFC r325532: Update arcmsr(4) to 1.40.00.01:
- Fix clear doorbell queue buffer for ADAPTER_TYPE_B
- Fix release memory resource when detach device
- Add support for ARC-1216, 1226 SAS 12Gb controllers
- Declare some functions as static
- Change checking dword read/write for IOP rqbuffer.
Many thanks to Areca for continuing to support FreeBSD.
emaste [Tue, 21 Nov 2017 15:34:25 +0000 (15:34 +0000)]
MFC r325813 (bz): Unbreak IPv6.
No longer return ENXIO when trying to send an IPv6 packet in
nicvf_sq_add_hdr_subdesc().
Restructure the code so that the upper layer protocol parts are
agnostic of the L3 protocol (and no longer specific to IPv4).
With this basic IPv6 packets go through. We are still seeing
weird behaviour which needs further diagnosis.
emaste [Tue, 21 Nov 2017 13:59:40 +0000 (13:59 +0000)]
MFC r325811: vnic: report that the driver supports multicast
The driver is currently hardcoded to force promiscuous mode, so all of
the MAC filtering code is presently unused and multicast should "just
work." Report to the higher layers that multicast is supported.
PR: 223573
Reported by: bz
Sponsored by: The FreeBSD Foundation
Sync libsysdecode, kdump, and truss with head (aside from changes such
as ino64 that are not applicable to 11).
319493:
Decode the arguments passed to __cap_rights_get() and cap_rights_limit().
319509:
Decode the argument passed to cap_getmode().
The returned integer value is output.
319520:
Decode the 'who' argument passed to getrusage().
Add a new sysdecode_getrusage_who() which decodes the RUSAGE_* constant
passed as the first argument to getrusage(). Use this function in both
kdump and truss to decode the first argument to getrusage().
319595:
Decode arguments to dup, dup2, getdirentries, pread, and pwrite.
- dup and dup2 print fd arguments in decimal.
- pread and pwrite are similar to read and write with the addition of the
file offset.
- getdirentries displays the output entries as a string for now and also
prints the value returned in *basep. Eventually the buffer for
getdirentries should perhaps be decoded as an array of dirent
structures.
319677:
Decode arguments to ACL related system calls.
This only decodes the raw arguments but not the contents of the struct acl
objects.
319679:
Decode arguments passed to extended attribute related system calls.
The cmd argument passed to extattrctl() is not decoded as a string constant
but is just printed in hex. The value is filesystem-specific but in
practice is only used with UFS1 filesystems.
319680:
Decode arguments to minherit().
319681:
Decode arguments to mlock(), mlockall(), and munlock().
319688:
Decode flags passed to mount(), nmount(), and unmount().
319689:
Decode arguments passed to msync().
319761:
Fix decoding of setpriority() arguments.
The PRIO_* 'which' value is stored in the first argument to setpriority(2),
not the last. While here, decode the arguments to getpriority(2).
319762:
Decode arguments to getpriority() and setpriority().
319763:
Decode the arguments to ptrace().
This does not decode structures returned by ptrace().
319764:
Decode the arguments to quotactl().
319765:
Improve decoding of RB_AUTOBOOT in the 'howto' argument to reboot().
The reboot() system call accepts a mode (RB_AUTOBOOT, RB_HALT, RB_POWEROFF,
or RB_REROOT) as well as zero or more optional flags in 'howto'.
However, RB_AUTOBOOT was only displayed if 'howto' was exactly 0.
Combinations like 'RB_AUTOBOOT | RB_DUMP' were decoded as 'RB_DUMP'.
Instead, imply that RB_AUTOBOOT was specified if none of the other "mode"
flags were specified.
319766:
Decode the 'howto' argument to reboot().
319767:
Decode arguments to rtprio_thread() (same as rtprio()).
319768:
Decode arguments to rtprio() and rtprio_thread().
320010:
Decode arguments to sched_* family of system calls.
This includes decoding both scheduler policy constants and the sched_param
structure for sched_get_priority_max(), sched_get_priority_min(),
sched_getparam(), sched_getscheduler(), sched_rr_get_interval(),
sched_setparam(), and sched_setscheduler().
322899:
Decode arguments passed to thr_set_name().
322959:
Decode extra signal information for caught signals.
Decode fields from the siginfo_t stored in the PT_LWPINFO structure when a
signal is caught by a traced process. This includes the signal code
(si_code) as well as additional members such as si_addr, si_pid, etc.
323020:
Trim stale prototype for ioctlname().
323021:
Decode signal information returned by system calls.
Specifically, decode the siginfo structure returned by sigtimedwait(),
sigwaitinfo(), and wait6(). While here, also decode the signal number
returned in the second argument to sigwait().
323151:
Decode pathconf() names, *at() flags, and sysarch() numbers in libsysdecode.
Move tables that were previously in truss over to libsysdecode. truss
output is unchanged, but kdump has been updated to decode these fields.
In addition, sysdecode_sysarch_number() should support all platforms
whereas the old table in truss only supported x86.
gjb [Mon, 20 Nov 2017 15:46:23 +0000 (15:46 +0000)]
MFC r325863:
Only copy /etc/resolv.conf to ${CHROOTDIR} if /etc/resolv.conf does
not already exist within ${CHROOTDIR}. This allows re-using a build
chroot with CHROOTBUILD_SKIP set to a non-empty value and CHROOTDIR
set to '/' in release.conf.
eugen [Mon, 20 Nov 2017 09:23:08 +0000 (09:23 +0000)]
MFC r325436: RTF_PINNED for an interface
Allow a process to assign an IP address to local ppp interface
even if kernel routing table already has a route to the address in question
installed by some routing daemon (PR 223129).
Also, allow loopback route deletion when stopping a VIMAGE jail (PR 222647).
delphij [Mon, 20 Nov 2017 06:48:10 +0000 (06:48 +0000)]
MFC r325383:
Avoid calling get_controller_count() until attaching, this would avoid
costly PCI config space operations that slows down systems without the
hardware.
Many thanks to HighPoint for continued support of FreeBSD!
bcr [Sun, 19 Nov 2017 11:24:24 +0000 (11:24 +0000)]
MFC r325441:
Extend the synopsis section of md(4) to look more like other manpages
of this kind. Describe how to compile the driver into the kernel
and how to load it as a module.
This is useful for people using the MINIMAL kernel configuration file.
hselasky [Fri, 17 Nov 2017 15:45:35 +0000 (15:45 +0000)]
MFC r325615:
Make sure the IPv6 scope ID gets zeroed when exchanging CMA messages in ibcore.
Else the IPv6 address matching might fail. This change adds support for both
embedded and non-embedded IPv6 scope IDs when passing a IPv6 link-local socket
address to RDMA. Prior to this change only global IPv6 addresses would work
with RDMA.
hselasky [Fri, 17 Nov 2017 15:37:36 +0000 (15:37 +0000)]
MFC r325614:
Multiple fixes for using IPv6 link-local addresses with RDMA in ibcore.
1) Fail to resolve RDMA address if rtalloc1() returns the loopback
device, lo0, as the gateway interface. Currently RDMA loopback is
not supported.
2) Use ip_dev_find() and ip6_dev_find() to lookup network interfaces
with matching IPv4 and IPv6 addresses, respectivly.
3) In addr_resolve() make sure the "ifa" pointer is always set, also when
the "ifp" is NULL. Else a NULL pointer access might happen trying to
read from the "ifa" pointer later on.
4) In rdma_addr_find_dmac_by_grh() make sure the "bound_dev_if" field
gets set properly instead of passing the scope ID through the IPv6
socket address structure. This is more in line with upstream OFED
in Linux.
5) In rdma_addr_find_smac_by_sgid() there is no need to pass the
scope ID for IPv6. Either it is stored in the "bound_dev_if" field
or ip6_dev_find() will find the correct network device regardless
of the scope ID.
emaste [Fri, 17 Nov 2017 00:38:00 +0000 (00:38 +0000)]
MFC r325683: vnic: apply BPF tap before passing packet to hardware
Previously we passed tx packets to hardware via nicvf_tx_mbuf_locked
and then to the BPF tap, with a possibly invalid mbuf which would result
in a panic.
avg [Thu, 16 Nov 2017 23:27:27 +0000 (23:27 +0000)]
MFC r325035: MFV r325013,r325034: 640 number_to_scaled_string is duplicated in several commands
FreeBSD note: of all libcmdutils functionality ZFS (and other illumos
contrib code) currently uses only nicenum() function (which is similar
to humanize_number but has some formatting differences). For this
reason I decided to not port the whole library. As a result, nicenum.c
from libcmdutils is compiled into libzfs and libzpool. This is a bit
ugly, but works. If one day we are forced to create libillumos, then
the file should be moved to that library.
jhb [Thu, 16 Nov 2017 18:22:03 +0000 (18:22 +0000)]
MFC 325039: Rework pass through changes in r305485 to be safer.
Specifically, devices that do not support PCI-e FLR and were not
gracefully shutdown by the guest OS could continue to issue DMA
requests after the VM was terminated. The changes in r305485 meant
that those DMA requests were completed against the host's memory which
could result in random memory corruption. Instead, leave ppt devices
that are not attached to a VM disabled in the IOMMU and only restore
the devices to the host domain if the ppt(4) driver is detached from a
device.
As an added safety belt, disable busmastering for a pass-through device
when before adding it to the host domain during ppt(4) detach.
gjb [Thu, 16 Nov 2017 15:59:29 +0000 (15:59 +0000)]
MFC r320252, r320686, r325769:
r320252:
In release/release.sh:
- Rename chroot_arm_armv6_build_release() to chroot_arm_build_release()
and make it hardware agnostic (such as armv6 -vs- armv7 -vs- arm64).
- Evaluate EMBEDDED_TARGET differently so release/tools/arm.subr can
be used for arm/armv6 and arm64/aarch64.
- Update comments and copyright.
In release/tools/arm.subr:
- In arm_create_disk(), change the default alignment from 63 to 512k,
fixing a boot issue on arm64 and EFI. [1]
- Update comments and copyright.
r320686:
Fix the ftp-stage target for RPI3 images by loosening the
constraints on the TARGET and TARGET_ARCH variables.
r325769:
Update the GUMSTIX image build to use arm/arm TARGET/TARGET_ARCH.
Update the TARGET/TARGET_ARCH matching in release/release.sh and
release/Makefile.mirrors for simplification.
Note: The RPI3.conf addition from r320252 is not included, as the
11-STABLE image fails to boot in my testing.
ed [Wed, 15 Nov 2017 15:56:08 +0000 (15:56 +0000)]
MFC r324727 and r325555:
Import the latest CloudABI definitions, version 0.16.
The most important change in this release is the removal of the
poll_fd() system call; CloudABI's equivalent of kevent(). Though I think
that kqueue is a lot saner than many of its alternatives, our
experience is that emulating this system call on other systems
accurately isn't easy. It has become a complex API, even though I'm not
convinced this complexity is needed. This is why we've decided to take a
different approach, by looking one layer up.
We're currently adding an event loop to CloudABI's C library that is API
compatible with libuv (except when incompatible with Capsicum).
Initially, this event loop will be built on top of plain inefficient
poll() calls. Only after this is finished, we'll work our way backwards
and design a new set of system calls to optimize it.
Interesting challenges will include integrating asynchronous I/O into
such a system call API. libuv currently doesn't aio(4) on Linux/BSD, due
to it being unreliable and having undesired semantics.
Upgrade to CloudABI v0.17.
Compared to the previous version, v0.16, there are a couple of minor
changes:
- CLOUDABI_AT_PID: Process identifiers for CloudABI processes.
Initially, BSD process identifiers weren't exposed inside the runtime,
due to them being pretty much useless inside of a cluster computing
environment. When jobs are scheduled across systems, the BSD process
number doesn't act as an identifier. Even on individual systems they
may recycle relatively quickly.
With this change, the kernel will now generate a UUIDv4 when executing
a process. These UUIDs can be obtained within the process using
program_getpid(). Right now, FreeBSD will not attempt to store this
value. This should of course happen at some point in time, so that it
may be printed by administration tools.
- Removal of some unused structure members for polling.
With the polling framework being simplified/redesigned, it turns out
some of the structure fields were not used by the C library. We can
remove these to keep things nice and tidy.
jhb [Wed, 15 Nov 2017 02:03:38 +0000 (02:03 +0000)]
MFC 323584: Add a NT_ARM_VFP ELF core note to hold VFP registers for each thread.
The core note matches the format and layout of NT_ARM_VFP on Linux.
Debuggers use the AT_HWCAP flags to determine how many VFP registers
are actually used and their format.
trasz [Tue, 14 Nov 2017 18:07:51 +0000 (18:07 +0000)]
MFC r324857:
Add OID for the vm.overcommit sysctl. This makes it possible to remove
one call to sysctl(2) from jemalloc startup code. (That also requires
changes to jemalloc, but I plan to push those to upstream first.)
trasz [Tue, 14 Nov 2017 17:57:48 +0000 (17:57 +0000)]
MFC r324367:
Fix kvm_getprocs(3) error reporting in ps(1).
Previously it just didn't work at all - kvm_getprocs(3) doesn't update
the &nentries when it returns NULL. The end result was that ps(1) showed
garbage data instead of reporting kinfo_proc size mismatch.
trasz [Tue, 14 Nov 2017 17:03:56 +0000 (17:03 +0000)]
MFC r320672:
Run "resizewin -z" from the default shell profile files. This makes
the terminal work properly out of the box when logging over a serial
line, which is quite important for the user experience on boards like
Raspberry Pi. It doesn't affect cases where the terminal size is
already non-zero, such as SSH or vt(4) sessions.
Note that this doesn't handle a scenario pointed out by rgrimes@:
when the terminal is resized after login, the terminal size won't
get updated even after logging out and back in.
jhb [Tue, 14 Nov 2017 16:03:07 +0000 (16:03 +0000)]
MFC 323580,323933,323934,324814,324817: Enable AT_HWCAP on arm.
I reused the SV_HWCAP stub to cover the sv_hwcap2 field as well.
323580:
Add AT_HWCAP flags for VFP settings for FreeBSD/arm.
These flags match the meaning and value of flags in Linux, though
Linux has many more flags.
323933:
Correct HWCAP_VFP3* values to match Linux.
323934:
Detect NEON and set HWCAP_NEON if present.
324814:
Add AT_HWCAP2 ELF auxiliary vector.
- allocate value for new AT_HWCAP2 auxiliary vector on all platforms.
- expand 'struct sysentvec' by new 'u_long *sv_hwcap2', in exactly
same way as for AT_HWCAP.
324817:
Fullify implementation of AT_HWCAP and AT_HWCAP2 for ARMv6,7.
This makes elf_aux_info(3) useable for ARM ports.
truckman [Sun, 12 Nov 2017 01:26:43 +0000 (01:26 +0000)]
MFC r325008
Fix Dummynet AQM packet marking function ecn_mark() and fq_codel /
fq_pie schedulers packet classification functions in layer2 (bridge mode).
Dummynet AQM packet marking function ecn_mark() and fq_codel/fq_pie
schedulers packet classification functions (fq_codel_classify_flow()
and fq_pie_classify_flow()) assume mbuf is pointing at L3 (IP)
packet. However, this assumption is incorrect if ipfw/dummynet is
used to manage layer2 traffic (bridge mode) since mbuf will point
at L2 frame. This patch solves this problem by identifying the
source of the frame/packet (L2 or L3) and adding ETHER_HDR_LEN
offset when converting an mbuf pointer to ip pointer if the traffic
is from layer2. More specifically, in dummynet packet tagging
function, tag_mbuf(), iphdr_off is set to ETHER_HDR_LEN if the
traffic is from layer2 and set to zero otherwise. Whenever an access
to IP header is required, mtodo(m, dn_tag_get(m)->iphdr_off) is
used instead of mtod(m, struct ip *) to correctly convert mbuf
pointer to ip pointer in both L2 and L3 traffic.
ae [Fri, 10 Nov 2017 11:19:33 +0000 (11:19 +0000)]
MFC r325355:
Use correct pointer in key_updateaddresses() when updating NAT-T config.
key_updateaddresses() is used to update SA addresses and NAT-T
configuration in SADB_UPDATE message. This is done using cloning SA
content from old SA into new one. But addresses and NAT-T configuration
are taking from SADB_UPDATE message. Use newsa pointer to set NAT-T
properties into cloned SA.
hselasky [Fri, 10 Nov 2017 08:42:37 +0000 (08:42 +0000)]
MFC r325362:
Allow CUSE(3) to free all memory mapped memory by using regular SWAP objects
instead of malloc(). The SWAP objects are automagically freed when there are no
more consumers. This greatly simplifies the mmap logic inside CUSE(3) in the
kernel. This change fixes an issue where mmapped memory can accumulate and never
get freed, if many different mmap sizes are needed over time. Further this
change fixes memory leaks when the CUSE(3) kernel module is unloaded.
While at it make sure the CUSE_ALLOC_PAGES_MAX limit is treated as an exclusive
limit. CUSE(3) memory maps must be less than CUSE_ALLOC_PAGES_MAX number of pages.