kp [Thu, 26 Oct 2017 20:53:56 +0000 (20:53 +0000)]
pf tests: destroy jails before destroying interfaces
When cleaning up we must destroy the jails before we destroy the interfaces.
Otherwise we might try to destroy interfaces that belong to a jail, which won't
work and fail to completely clean up.
asomers [Thu, 26 Oct 2017 19:45:15 +0000 (19:45 +0000)]
Fix aio_suspend in 32-bit emulation
An off-by-one error has been present since the system call was first present
in 185878. It additionally became a memory corruption bug after change
324941. The failure is actually revealed by our existing AIO tests.
However, apparently nobody's been running those in 32-bit emulation mode.
Reported by: Coverity, cem
CID: 1382114
MFC after: 18 days
X-MFC-With: 324941
Sponsored by: Spectra Logic Corp
jilles [Thu, 26 Oct 2017 18:32:04 +0000 (18:32 +0000)]
libnv: Fix strict-aliasing violation with cookie
In rS323851, some casts were adjusted in calls to nvlist_next() and
nvlist_get_pararr() in order to make scan-build happy. I think these changes
just confused scan-build into not reporting the strict-aliasing violation.
For example, nvlist_xdescriptors() is causing nvlist_next() to write to its
local variable nvp of type nvpair_t * using the lvalue *cookiep of type
void *, which is not allowed. Given the APIs of nvlist_next(),
nvlist_get_parent() and nvlist_get_pararr(), one possible fix is to create a
local void *cookie in nvlist_xdescriptors() and other places, and to convert
the value to nvpair_t * when necessary. This patch implements that fix.
asomers [Thu, 26 Oct 2017 15:28:18 +0000 (15:28 +0000)]
zfsd should be able to online an L2ARC that disappears and returns
Previously, this didn't work because L2ARC devices' labels don't contain
pool GUIDs. Modify zfsd so that the pool GUID won't be required:
lib/libdevdctl/guid.h
Change INVALID_GUID from a uint64_t constant to a function that
returns an invalid Guid object. Remove the void constructor.
Nothing uses it, and it violates RAII.
cddl/usr.sbin/zfsd/case_file.h
cddl/usr.sbin/zfsd/case_file.cc
Allow CaseFile::Find to match a CaseFile based on Vdev GUID alone.
In CaseFile::ReEvaluate, attempt to online devices even if the newly
arrived device has no pool GUID.
cddl/usr.sbin/zfsd/vdev_iterator.cc
Iterate through a pool's cache devices as well as its regular
devices.
des [Thu, 26 Oct 2017 13:23:13 +0000 (13:23 +0000)]
If the user-provided password exceeds the maximum password length, don't
bother passing it to crypt(). It won't succeed and may allow an attacker
to confirm that the user exists.
truckman [Thu, 26 Oct 2017 10:11:35 +0000 (10:11 +0000)]
Fix Dummynet AQM packet marking function ecn_mark() and fq_codel /
fq_pie schedulers packet classification functions in layer2 (bridge mode).
Dummynet AQM packet marking function ecn_mark() and fq_codel/fq_pie
schedulers packet classification functions (fq_codel_classify_flow()
and fq_pie_classify_flow()) assume mbuf is pointing at L3 (IP)
packet. However, this assumption is incorrect if ipfw/dummynet is
used to manage layer2 traffic (bridge mode) since mbuf will point
at L2 frame. This patch solves this problem by identifying the
source of the frame/packet (L2 or L3) and adding ETHER_HDR_LEN
offset when converting an mbuf pointer to ip pointer if the traffic
is from layer2. More specifically, in dummynet packet tagging
function, tag_mbuf(), iphdr_off is set to ETHER_HDR_LEN if the
traffic is from layer2 and set to zero otherwise. Whenever an access
to IP header is required, mtodo(m, dn_tag_get(m)->iphdr_off) is
used instead of mtod(m, struct ip *) to correctly convert mbuf
pointer to ip pointer in both L2 and L3 traffic.
bdrewery [Wed, 25 Oct 2017 21:46:36 +0000 (21:46 +0000)]
Fix native-xtools build to use a proper sysroot.
This takes longer but should reliably produce working binaries.
The old version linked against system libraries and headers
which would be a problem if building a native-xtools against
a newer source than the host system. With a proper sysroot made
first this is not a problem.
This also allows:
- NXBDIRS to be built in parallel
- NXBDIRS to be installed to NXBDESTDIR in parallel
- SYSTEM_COMPILER logic to work again. Note that because this change
is adding a sysroot phase the compiler may be built up to twice now.
The first is the "cross-compiler" even though it is for the native
architecture, but it is still built to ensure the latest compiler
is used to generate the binaries, unless SYSTEM_COMPILER allows
/usr/bin/cc to be used. Then the target compiler is built
which is actually a cross-compiler since it runs on native host
but generates TARGET.TARGET_ARCH binaries.
Note this also changes the path used for the OBJDIR. It used to use
/usr/obj/target.target_arch/nxb/<srcdir> for objects and
/usr/obj/target.target_arch/nxb-bin for installed files, but now uses
/usr/obj/nxb/target.target_arch/<srcdir> for objects and
/usr/obj/nxb/target.target_arch/<srcdir>/nxb-bin for installed files.
- NXBDESTDIR can be specified for where to install or queried with
`make -f Makefile.inc1 TARGET=... TARGET_ARCH=... -V NXBDESTDIR`
This could potentially be improved to reuse an existing sysroot. The
problem is with building the SUBDIR_OVERRIDE list it needs to use a
different OBJDIR since it is building all statically. We don't want to
pollute the existing 'buildworld' OBJDIR and cause confusion on the next
build. Using a separate OBJDIR for the 'everything' phase mostly works
except for some things like linking in INTERNALLIBS that exist in the
other OBJDIR.
bdrewery [Wed, 25 Oct 2017 21:46:33 +0000 (21:46 +0000)]
native-xtools: Override proper NXBDESTDIR.
The new native-xtools uses 'make toolchain' so overriding DESTDIR
as a make argument may interfere with WORLDTMP handling.
The target also does a 'mkdir -p ${NXBDESTDIR}/usr', so we should
be modifying that rather than DESTDIR.
Note this causes the native-xtools binaries to be installed in
NANO_WORLDDIR/usr NANO_WORLDDIR/bin rather than NANO_WORLDDIR/nxb-bin/usr
and NANO_WORLDDIR/nxb-bin/bin. This was the case before this change
as well.
kp [Wed, 25 Oct 2017 19:21:48 +0000 (19:21 +0000)]
Evaluate packet size after the firewall had its chance in the ip6 fast path
Defer the packet size check until after the firewall has had a look at it. This
means that the firewall now has the opportunity to (re-)fragment an oversized
packet.
This mirrors what the slow path does.
asomers [Wed, 25 Oct 2017 16:01:19 +0000 (16:01 +0000)]
Fix zpool_read_all_labels when vfs.aio.enable_unsafe=0
Previously, zpool_read_all_labels was trying to do 256KB reads, which are
greater than the default MAXPHYS and therefore must go through the slow,
unsafe AIO path. Shrink these reads to 112KB so they can use the safe, fast
AIO path instead.
imp [Wed, 25 Oct 2017 15:30:53 +0000 (15:30 +0000)]
Implement IPMI support for RB_POWRECYCLE
Some BMCs support power cycling the chassis via the chassis control
command 2 subcommand 2 (ipmitool called it 'chassis power cycle'). If
the BMC supports the chassis device, register a shutdown_final handler
that sends the power cycle command if request and waits up to 10s for
it to take effect. To minimize stack strain, we preallocate a ipmi
request in the softc. At the moment, we're verbose about what we're
doing.
imp [Wed, 25 Oct 2017 15:30:35 +0000 (15:30 +0000)]
Add power cycle support to reboot/halt as -c.
When -c is specified, the system will be power cycled if the
underlying hardware supports it. Otherwise the system will be halted
or rebooted depending on which command was used.
imp [Wed, 25 Oct 2017 15:30:20 +0000 (15:30 +0000)]
Define RB_POWERCYCLE
RB_POWERCYCLE instructs the platform to power off and then power back
on a short time later, if that's possible. Otherwise, degrade to the
RB_POWEROFF behavior.
imp [Wed, 25 Oct 2017 15:27:53 +0000 (15:27 +0000)]
Use BOOTDIR consistently. We need to include bsd.init.mk early to make
this happen. This will cause src.opts.mk to be included, so remove
that. This needs to propigate through the sys/boot tree.
imp [Wed, 25 Oct 2017 15:26:03 +0000 (15:26 +0000)]
Report only the valid slots in the firmware log page.
Printing the entire log page is causing confusion over available
slots. Report only those slots that are valid. In the case where the
firmware download isn't supported, assume that only the first slot is
valid (I have no hardware to test this assumption though)
markj [Wed, 25 Oct 2017 00:51:00 +0000 (00:51 +0000)]
Add support for compressed kernel dumps.
When using a kernel built with the GZIO config option, dumpon -z can be
used to configure gzip compression using the in-kernel copy of zlib.
This is useful on systems with large amounts of RAM, which require a
correspondingly large dump device. Recovery of compressed dumps is also
faster since fewer bytes need to be copied from the dump device.
Because we have no way of knowing the final size of a compressed dump
until it is written, the kernel will always attempt to dump when
compression is configured, regardless of the dump device size. If the
dump is aborted because we run out of space, an error is reported on
the console.
savecore(8) is modified to handle compressed dumps and save them to
vmcore.<index>.gz, as it does when given the -z option.
A new rc.conf variable, dumpon_flags, is added. Its value is added to
the boot-time dumpon(8) invocation that occurs when a dump device is
configured in rc.conf.
alc [Tue, 24 Oct 2017 17:14:53 +0000 (17:14 +0000)]
Micro-optimize the handling of fictitious pages in vm_page_free_prep().
A fictitious page is always wired, so there is no point in trying to
remove one from the page queues.
Completely remove one inaccurate comment from vm_page_free_prep() and
correct another.
trasz [Tue, 24 Oct 2017 11:16:38 +0000 (11:16 +0000)]
Reword the conditional; it was ugly, and adding another parameter,
which I'm going to do in a subsequent commit, would make it even uglier.
No functional changes.
np [Tue, 24 Oct 2017 05:41:48 +0000 (05:41 +0000)]
cxgbe(4): Read the MPS buffer group map from the firmware as it could be
different from hardware defaults. The congestion channel map, which is
still fixed, needs to be tracked separately now. Change the congestion
setting for TOE rx queues to match the drivers on other OSes while here.
imp [Tue, 24 Oct 2017 02:25:42 +0000 (02:25 +0000)]
Treat a 'current' value of 0 as unlimited as a failsfe.
When limiting I/O, a value of 0 makes no sense as a limit. No progress
can be made. Trade the possibility that someone might be doing
something clever to achieve ultra-low I/O limits vs the damage of not
ever making progress on an I/O in favor of making progress. Now the
machine won't be useless if this accidentally gets requested.
asomers [Mon, 23 Oct 2017 23:12:01 +0000 (23:12 +0000)]
Remove artificial restriction on lio_listio's operation count
In r322258 I made p1003_1b.aio_listio_max a tunable. However, further
investigation shows that there was never any good reason for that limit to
exist in the first place. It's used in two completely different ways:
* To size a UMA zone, which globally limits the number of concurrent
aio_suspend calls.
* To artifically limit the number of operations in a single lio_listio call.
There doesn't seem to be any memory allocation associated with this limit.
This change does two things:
* Properly names aio_suspend's UMA zone, and sizes it based on a new constant.
* Eliminates the artifical restriction on lio_listio. Instead, lio_listio
calls will now be limited by the more generous max_aio_queue_per_proc. The
old p1003_1b.aio_listio_max is now an alias for
vfs.aio.max_aio_queue_per_proc, so sysconf(3) will still work with
_SC_AIO_LISTIO_MAX.
asomers [Mon, 23 Oct 2017 23:05:29 +0000 (23:05 +0000)]
Fix the error message when creating a zpool on a too-small device
Don't check for SPA_MINDEVSIZE in vdev_geom_attach when opening by path.
It's redundant with the check in vdev_open, and failing to attach here
results in the wrong error message being printed. However, still check for
it in some other situations:
* When opening by guids, so we don't get bogged down reading from slow
devices like floppy drives.
* In vdev_geom_read_pool_label for the same reason, because we iterate over
all providers.
* If the caller requests that we verify the guid, because then we'll have to
read from the device before vdev_open verifies the size.
dim [Mon, 23 Oct 2017 21:31:04 +0000 (21:31 +0000)]
After jemalloc was updated to version 5.0.0 in r319971, i386 executables
linked with AddressSanitizer (even those linked on earlier versions of
FreeBSD, or with external versions of clang) started failing with errors
similar to:
This is because AddressSanitizer expects all the TLS data in the program
to be aligned to at least 8 bytes.
Before the jemalloc 5.0.0 update, all the TLS data in the i386 version
of libc.so added up to 80 bytes (a multiple of 8), but 5.0.0 made this
grow to 2404 bytes (not a multiple of 8). This is due to added caching
data in jemalloc's internal struct tsd_s.
To fix AddressSanitizer, ensure this struct is aligned to at least 16
bytes, which can be done unconditionally for all architectures. (An
earlier version of the fix aligned the struct to 8 bytes, but only for
ILP32 architectures. This was deemed unnecessarily complicated.)
shurd [Mon, 23 Oct 2017 20:50:08 +0000 (20:50 +0000)]
Some cache related optimizations
1. prefetch 128 bytes of mbufs.
2. Re-order filling the pkt_info so cache stalls happen at the end
3. Define empty prefetch2cachelines() macro when the function isn't present.
Provides small performance improvments on some hardware
kib [Mon, 23 Oct 2017 16:14:55 +0000 (16:14 +0000)]
Expand explanation of atomicity.
Mention per-location total order, out of thin air, and torn writes
guarantees. Mention C11 standard' memory model and one most important
FreeBSD additional requirement, that is aligned ordinary loads and
stores are atomic on processors.
The text is introductional and informal. Reference the C11 and
C++1{1,4,7} standards for authorative description.
In collaboration with: alc
Sponsored by: The FreeBSD Foundation (kib)
MFC after: 1 week
mjoras [Mon, 23 Oct 2017 15:43:38 +0000 (15:43 +0000)]
Move clear_unrhdr to tmpfs_free_tmp.
Clearing the unr in tmpfs_unmount is not correct. In the case of
multiple references to the tmpfs mount (e.g. when there are lookup
threads using it) it will not be the one to finish tmpfs_free_tmp. In
those cases tmpfs_free_node_locked will be the final one to execute
tmpfs_free_tmp, and until then the unr must be valid.
imp [Sun, 22 Oct 2017 22:52:23 +0000 (22:52 +0000)]
Create a shell script to build sys/boot on all the architectures.
One could run this from any directory, but it's designed to do
regression testing on sys/boot (it only tests on a subset of
architectures since all of them would take a lot longer and not help).
This will also ensure that future commits to sys/boot compile
everywhere.
jilles [Sun, 22 Oct 2017 20:01:07 +0000 (20:01 +0000)]
libc: Do not refer to _DefaultRuneLocale in ctype inlines
Referring to _DefaultRuneLocale causes this >4KB structure to be copied to
all executables that use <ctype.h> inlines (except PIE executables).
This only affects the case where thread local storage is available.
_CurrentRuneLocale cannot be NULL, so the check can be removed entirely.
_DefaultRuneLocale needs to remain available for now since libc++ uses it.
The __isctype inline in include/_ctype.h also refers to _DefaultRuneLocale
and remains available because it may still be used by third party software.
markj [Sun, 22 Oct 2017 19:17:25 +0000 (19:17 +0000)]
Address some miscellaneous issues in the CTF man page.
- Fix a number of typos.
- Replace some illumos-specific references.
- Note that a type definition of kind CTF_K_FUNCTION may be followed by
a null type identifier in order to provide 4-byte alignment for the
next type definition.
https://www.illumos.org/issues/8300
Prior to integrating the mdocml update to 1.14.1, fix issues found by
new version, especially the "new sentence, new line" style rule.
FreeBSD note: this revision merges only the changes to the CTF manual
page. The changes to the ZFS pages cannot be applied directly.
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Toomas Soome <tsoome@me.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Yuri Pankov <yuri.pankov@nexenta.com>
trasz [Sun, 22 Oct 2017 10:35:29 +0000 (10:35 +0000)]
Add OID for the vm.overcommit sysctl. This makes it possible to remove
one call to sysctl(2) from jemalloc startup code. (That also requires
changes to jemalloc, but I plan to push those to upstream first.)
kib [Sun, 22 Oct 2017 08:11:45 +0000 (08:11 +0000)]
Remove the support for mknod(S_IFMT), which created dummy vnodes with
VBAD type.
FFS ffs_write() VOP catches such vnodes and panics, other VOPs do not
check for the type and their behaviour is really undefined. The
comment claims that this support was done for 'badsect' to flag bad
sectors, we do not have such facility in kernel anyway.
Reported by: Dmitry Vyukov <dvyukov@google.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week