brooks [Thu, 15 Feb 2018 17:26:30 +0000 (17:26 +0000)]
Fix getdirentries(2) under 32-bit compat.
The latest version of getdirentries (syscall 554) takes a pointer
an an off_t as the last argument. The old version which copies out
an int32_t was being used instead. Use the standard sys_getdirentries()
implementation instead.
cognet [Thu, 15 Feb 2018 15:46:14 +0000 (15:46 +0000)]
Rename the ACPI variant of the gicv2m driver from "gicv2m" to "gicv2m_acpi".
The FDT variant is called "gicv2m" too, and as both would try to register
on gic, only one of them would succeed, while we want them both in a
GENERIC kernel.
kevans [Thu, 15 Feb 2018 15:01:07 +0000 (15:01 +0000)]
stand: Fix ubldr after r329190
metadata load files were consolidated in r329190, and these relocation fixup
bits were inadvertently dropped in the process. Re-add them to fix boot with
ubldr.
It panicked here:
http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/zfs/
zio.c#430
pio->io_lock is DEAD, thus a panic. Further analysis shows the "pio"
(parent zio of "cio") has already been destroyed.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Youzhong Yang <youzhong@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: George Wilson <george.wilson@delphix.com>
eadler [Thu, 15 Feb 2018 03:22:53 +0000 (03:22 +0000)]
devd: don't pass &fds in useless parameters to select(2)
select(2) should be declared as restrict. In addition the only fd in
the fdset is open O_RDONLY, and it's not a socket that can provide OOB
notifications,
rpokala [Thu, 15 Feb 2018 03:22:04 +0000 (03:22 +0000)]
mxge(4) should pass unhandled ioctls to ether_ioctl()
Panasas discovered that ioctl(SIOCGLAGGPORT) returns ENOTTY for mxge(4) when
the NIC is not a member of a lagg. This came as a surprise, because the
SIOCGLAGGPORT handler in if_lagg.c only returns ENOENT (if run against the
laggX interface, rather than a physical port) or EINVAL (if run against a
non-member physical port). This behavior was not seen with other drivers,
such as bge(4), igb(4), and cxl(4). When I compared their respective ioctl
handlers, I found that they all called ether_ioctl() for the default (i.e.
unhandled) case; by contrast, mxge(4) only calls ether_ioctl() for two
specific cases, and returns ENOTTY for the default case.
Remove the two cases which explicitly call ether_ioctl(), and let the
default case call it instead. This matches what the vast majority of the NIC
drivers do.
asomers [Wed, 14 Feb 2018 23:52:39 +0000 (23:52 +0000)]
zfsd: Allow zfsd to work on any type of GEOM provider
cddl/usr.sbin/zfsd/zfsd_event.cc
Remove the check for da and ada devices. This way zfsd can work on md,
geli, glabel, gstripe, etc devices. geli in particular is useful
combined with ZFS. gnop is also useful for simulating drive pulls in
the ZFSD test suite.
Also, eliminate the DevfsEvent class entirely. Move its
responsibilities into GeomEvent. We can get everything we need to know
just from listening to GEOM events.
lib/libdevdctl/event.cc
Fix GeomEvent::DevName for CREATE events. Oddly, the relevant field is
named "cdev" for CREATE events but "devname" for disk events.
eugen [Wed, 14 Feb 2018 21:17:44 +0000 (21:17 +0000)]
ng_pppoe(8): add support for user-supplied Host-Uniq tag.
A few ISP filter PADI requests based on such tag,
to force the use of their own routers.
The custom Host-Uniq tag is passed in the NGM_PPPOE_CONNECT
control message, so it can be used with FreeBSD ppp(8)
and mpd without any other change.
Add support to send and receive PADM messages,
HURL and MOTM, often used by service providers to provide
ACS information and other configuration settings
to the user CPE.
asomers [Wed, 14 Feb 2018 20:26:09 +0000 (20:26 +0000)]
gpart: append partition name to the underlying provider's physical path
If the underlying provider's physical path is null, then the gpart device's
physical path will be, too. Otherwise, it will append the partition name,
such as "/p1" or "/s1/a". This will make gpart work better with zfsd(8).
asomers [Wed, 14 Feb 2018 20:15:32 +0000 (20:15 +0000)]
geli: append "/eli" to the underlying provider's physical path
If the underlying provider's physical path is null, then the geli device's
physical path will be, too. Otherwise, it will append "/eli". This will make
geli work better with zfsd(8).
bdrewery [Wed, 14 Feb 2018 18:43:50 +0000 (18:43 +0000)]
nanosleep(2): Fix bogus incrementing of rmtp by tc_tick_sbt on [EINTR].
sbt is the time in the future that the tsleep_sbt() is expected to be completed
at. sbtt is the current time. Depending on the precision with sysctl
kern.timecounter.alloweddeviation the start time may be incremented by
tc_tick_sbt. The same increment is needed for the current time of sbtt before
calculating the difference. The impact of missing this increment is that rmtp
may increase by one tc_tick_sbt on every early [EINTR] return. If the same
struct is passed in for rqtp as rmtp this can result in rqtp effectively
incrementing by tc_tick_sbt and sleeping longer than originally intended.
This problem was introduced in r247797.
Reviewed by: kib, markj, vangyzen (all on an older version of the test)
MFC after: 2 weeks
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D14362
imp [Wed, 14 Feb 2018 18:21:54 +0000 (18:21 +0000)]
Simple script to image a small test area from a built tree. Build with
'cd stand; make MK_FORTH=no MK_LOADER_LUA=yes' then run this script.
You can then test with lua-test.sh with the same parameter.
manu [Wed, 14 Feb 2018 18:05:37 +0000 (18:05 +0000)]
efi: Only scan the BLKIO MEDIA once
Scan only the BLOCK IO MEDIA once instead of each time for each type of
device (fd, cd and hdd).
Leave the mechanism to free and reprobe all devices if one day we want
to implement a "dev rescan" thing.
imp [Wed, 14 Feb 2018 17:51:51 +0000 (17:51 +0000)]
A quick test script that we can run to use userboot's test mode to
excersize the load loader. Assumes that we already have a suitable
root area that you pass in with the first arg.
asomers [Wed, 14 Feb 2018 15:49:31 +0000 (15:49 +0000)]
Implement .vop_pathconf and .vop_getacl for the .zfs ctldir
zfsctl_common_pathconf will report all the same variables that regular ZFS
volumes report. zfsctl_common_getacl will report an ACL equivalent to 555,
except that you can't read xattrs or edit attributes.
Fixes a bug where "ls .zfs" will occasionally print something like:
ls: .zfs/.: Operation not supported
kevans [Wed, 14 Feb 2018 15:40:13 +0000 (15:40 +0000)]
libsa: Fix IP recv timeout
readip() doesn't, at the moment, properly indicate to callers that it has
timed out. One can tell that it's timed out if errno == EAGAIN when it
returns, but this is not ideal. Restructure it a little bit to explicitly
set errno to ETIMEDOUT if we've exhausted tleft.
I found two places that care about where it timed out or not: sendrecv in
net.c and sendrecv_tftp. Both are structured to pass smaller timeout values
to readip while tracking a larger timeout. Neither of them were able to do
this properly with readip not indicating ETIMEDOUT, so fix it.
While here, straighten out the time (t/t1) usage in sendrecv_tftp.
This would have manifested itself in periodic failures to NFS/TFTP boot for
no apparent reason because MINTMO/MAXTMO were not actually being respected
properly. Problems were not reported with NFS, only TFTP.
eadler [Wed, 14 Feb 2018 07:59:30 +0000 (07:59 +0000)]
msun: signed overflow in atan2
As a component of atan2(y, x), the case of x == 1.0 is farmed out to
atan(y). The current implementation of this comparison is vulnerable
to signed integer underflow (that is, undefined behavior), and it's
performed in a somewhat more complicated way than it need be. Change
it to not be quite so cute, rather directly comparing the high/low
bits of x to the specific IEEE-754 bit pattern that encodes 1.0.
Note that while there are three different e_atan* files in the
relevant directory, only this one needs fixing. e_atan2f.c already
compares against the full bit pattern encoding 1.0f, while
e_atan2l.cuses bitwise-ands/ors/nots and so doesn't require a change.
jhibbits [Wed, 14 Feb 2018 02:51:28 +0000 (02:51 +0000)]
PPC64: Get the timestap from the proper OF field
Summary:
After revision rS328534('PPC64: use hwref instead of cpuid'), FreeBSD on
powerpc64 virtual machine panics since it is unable to read the
timebase, showing the following error:
get-property for timebase-frequency on zero phandle
panic: Unable to determine timebase frequency!
With the change above, cpuref->cr_hwref does not contain the phandle
anymore, thus, it never reads the proper CPU entry in OF.
kib [Wed, 14 Feb 2018 00:31:45 +0000 (00:31 +0000)]
Ensure memory consistency on COW.
From the submitter description:
The process is forked transitioning a map entry to COW
Thread A writes to a page on the map entry, faults, updates the pmap to
writable at a new phys addr, and starts TLB invalidations...
Thread B acquires a lock, writes to a location on the new phys addr, and
releases the lock
Thread C acquires the lock, reads from the location on the old phys addr...
Thread A ...continues the TLB invalidations which are completed
Thread C ...reads from the location on the new phys addr, and releases
the lock
In this example Thread B and C [lock, use and unlock] properly and
neither own the lock at the same time. Thread A was writing somewhere
else on the page and so never had/needed the lock. Thread C sees a
location that is only ever read|modified under a lock change beneath
it while it is the lock owner.
To fix this, perform the two-stage update of the copied PTE. First,
the PTE is updated with the address of the new physical page with
copied content, but in read-only mode. The pmap locking and the page
busy state during PTE update and TLB invalidation IPIs ensure that any
writer to the page cannot upgrade the PTE to the writable state until
all CPUs updated their TLB to not cache old mapping. Then, after the
busy state of the page is lifted, the faults for write can proceed and
do not violate the consistency of the reads.
The change is done in vm_fault because most architectures do need IPIs
to invalidate remote TLBs. More, I think that hardware guarantees of
atomicity of the remote TLB invalidation are not enough to prevent the
inconsistent reads of non-atomic reads, like multi-word accesses
protected by a lock. So instead of modifying each pmap invalidation
code, I did it there.
Discovered and analyzed by: Elliott.Rabe@dell.com
Reviewed by: markj
PR: 225584 (appeared to have the same cause)
Tested by: Elliott.Rabe@dell.com, emaste, Mike Tancsa <mike@sentex.net>, truckman
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D14347
kib [Wed, 14 Feb 2018 00:25:18 +0000 (00:25 +0000)]
Do not call pmap_enter() with invalid protection mode.
If the map entry elookup was performed due to the mapping changes, we
need to ensure that there is still some access permission bit
requested which is compatible with the current vm_map_entry mode. If
not, restart the handler from scratch instead of trying to save the
current progress.
Also adjust fault_type to not include cleared permission bits.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D14347
landonf [Tue, 13 Feb 2018 20:07:40 +0000 (20:07 +0000)]
bwn(4): Conditionalize "RX decryption attempted" message on a new
BWN_DEBUG_HWCRYPTO debug flag.
The MAC will attempt decryption (and set BWN_RX_MAC_DEC) even if a key has
not been supplied to the hardware; this is expected behavior, and there's
no need to spam users' console with this debugging printf.
markj [Tue, 13 Feb 2018 19:28:02 +0000 (19:28 +0000)]
Add support for zstd-compressed user and kernel core dumps.
This works similarly to the existing gzip compression support, but
zstd is typically faster and gives better compression ratios.
Support for this functionality must be configured by adding ZSTDIO to
one's kernel configuration file. dumpon(8)'s new -Z option is used to
configure zstd compression for kernel dumps. savecore(8) now recognizes
and saves zstd-compressed kernel dumps with a .zst extension.
jhibbits [Tue, 13 Feb 2018 17:40:09 +0000 (17:40 +0000)]
Narrow a race, and fix a leak, in g_part_wither
A race in g_part_wither() can lead to I/O being performed with a freed GEOM
when the device disappears. Close the race as best as we can for now,
following the code patterns from g_part_ctl_destroy() and g_part_ctl_undo().
This also fixes a leak, as g_wither_geom() does not wither providers, it
only orphans them, so the partition entries would never get destroyed in
g_wither_washer().
Note, this is not a complete fix, it can still race with g_part_start(), the
race has merely been narrowed.
hselasky [Tue, 13 Feb 2018 17:04:34 +0000 (17:04 +0000)]
Import the mthca kernel side infiniband driver from Linux 4.9 and fix
compilation under FreeBSD. The mthca driver was temporarily removed as
part of the Linux 4.9 RoCE/infinband upgrade.
kib [Tue, 13 Feb 2018 15:44:35 +0000 (15:44 +0000)]
linuxkpi: Do not leak pages on put.
When the owner of the wire reference releases the last reference, it
might be that the page was already attempted to be freed (but free
cannot be performed at that time due to wire). Check that the page
was removed from the object as the indicator of the free attempt and
finish the free operation if so.
kib [Tue, 13 Feb 2018 15:36:28 +0000 (15:36 +0000)]
Do not leak rv->psind in some specific situations.
Suppose that we have an object with a mapped superpage, and that all
pages in the superpages are held (by some driver). Additionally,
suppose that the object is terminated, e.g. because the only process
mapping it is exiting. Then the reservation is broken, but the pages
cannot be freed until later, when they are unheld. In this situation,
the reservation code cannot clean psind, since no pages are freed, and
the page is freed and then reused with invalid psind.
Clean psind on vm_reserv_break() to avoid the situation.
kib [Tue, 13 Feb 2018 15:30:31 +0000 (15:30 +0000)]
Fix build with gas.
Do not use C constant suffixes. Bit values are small enough to not
require typing, despite they are used for 64bit MSR writes. The added
cast in hw_ibrs_recalculate() is redundand but I prefer to add it for
clarity.
Reported by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
jhibbits [Tue, 13 Feb 2018 03:44:50 +0000 (03:44 +0000)]
Unify metadata load files for arm, mips, powerpc, sparc64
Summary:
All metadata.c files are very similar, with only trivial changes. Unify them
into a single common file, with minor special-casing where needed.
jeff [Mon, 12 Feb 2018 22:53:00 +0000 (22:53 +0000)]
Make v_wire_count a per-cpu counter(9) counter. This eliminates a
significant source of cache line contention from vm_page_alloc(). Use
accessors and vm_page_unwire_noq() so that the mechanism can be easily
changed in the future.
vangyzen [Mon, 12 Feb 2018 19:49:20 +0000 (19:49 +0000)]
Update the MTU in affected routes when IPv6 RA changes the MTU
ip6_calcmtu() only looks at the interface MTU if neither the TCP hostcache
nor the route provides an MTU. Update the routes so they do not provide
stale MTUs.
This fixes UNH IPv6 conformance test cases v6LC_4_1_08 and v6LC_4_1_09,
which use a RA to reduce the link MTU from 1500 to 1280.
landonf [Mon, 12 Feb 2018 19:36:26 +0000 (19:36 +0000)]
siba(4): Ignore disabled per-core address match entries.
Previously, the address regions described by disabled admatch entries would
be treated as being mapped to the given core; while incorrect, this was
essentially harmless given that the entries describe unused address space
on the few affected devices.
We now perform parsing of per-core admatch registers and interrupt flags in
siba_erom, correctly skip any disabled admatch entries, and use the
siba_erom API in siba_add_children() to perform enumeration of attached
cores.
ian [Mon, 12 Feb 2018 17:41:11 +0000 (17:41 +0000)]
Add a new sysctl, debug.clock_do_io, to allow manully triggering a one-shot
read or write of all registered realtime clocks. In the read case, the
values read are simply discarded. For writes, there's no alternative but
to actually write the current system time to the device.
jtl [Mon, 12 Feb 2018 17:27:50 +0000 (17:27 +0000)]
Mark the pages used for the initial page-table entries as wired. This
makes them consistent with the way other page-table pages are allocated.
It also provides the rest of the VM system a good clue that these pages
are used.
ian [Mon, 12 Feb 2018 16:25:56 +0000 (16:25 +0000)]
Replace the existing print_ct() private debugging function with a set of
three public functions to format and print the three major data structures
used by realtime clock drivers (clocktime, bcd_clocktime, and timespec).
imp [Mon, 12 Feb 2018 15:31:53 +0000 (15:31 +0000)]
Add Lua as a scripting langauge to /boot/loader
liblua glues the lua run time into the boot loader. It implements all
the runtime routines that lua expects. In addition, it has a few
standard 'C' headers that nueter various aspects of the LUA build that
are too specific to lua to be in libsa. Many refinements from the
original code to improve implementation and the number of included lua
libraries. Use int64_t for lua_Number. Have "/boot/lua" be the default
module path. Numerous cleanups from the original GSoC project,
including hacking libsa to allow lua to be built with only one change
outside luaconf.h.
Add the final bit of lua glue to bring in liblua and plug into the
multiple interpreter framework, previously committed.
Add LOADER_LUA option, currently off by default.
Presently, this is an experimental option. One must opt-in to using
this by defining WITH_LOADER_LUA and WITHOUT_FORTH. It's been
lightly tested, so keep a backup copy of your old loader handy.
The menu code, coming in the next commit, hasn't been exhaustively
tested. A LUA boot loader is 60k larger than a FORTH one, which is
80k larger than a no-interpreter one. Subtle changes in size
may tip things past some subtle limit (the binary is ~430k now
when built with LUA). A future version may offer coexistance.
Bump FreeBSD version to 1200058 to mark the milestone.
Pedro Souza's 2014 Summer of Code project. Rui Paulo, Pedro Arthur,
Zakary Nafziger and Wojciech A. Koszek also contributed. Warner Losh
reworked it extensively into its current form.
Obtained from: https://wiki.freebsd.org/SummerOfCode2014/LuaLoader
Sponsored by: Google Summer of Code
Relnotes: Yes
MFC After: 1 month
Differential Review: https://reviews.freebsd.org/D14295
imp [Mon, 12 Feb 2018 14:48:05 +0000 (14:48 +0000)]
Use standard pattern for stdargs.h
We don't support older compilers. Most of the code in these files is
for pre-3.0 gcc, which is at least 15 years obsolete. Move to using
phk's sys/_stdargs.h for all these platforms.
tychon [Mon, 12 Feb 2018 14:45:27 +0000 (14:45 +0000)]
Provide further mitigation against CVE-2017-5715 by flushing the
return stack buffer (RSB) upon returning from the guest.
This was inspired by this linux commit:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/x86/kvm?id=117cc7a908c83697b0b737d15ae1eb5943afe35b
imp [Mon, 12 Feb 2018 06:51:20 +0000 (06:51 +0000)]
Turn devmatch on by default.
Turn devmatch on by default. However, use 'start' instead of
'onestart' in the devmatch.conf file so the setting of
'devmatch_enable' is honored. Give an example of what to put in
devd.conf if you want to disable just the run-time part of devmatch.
imp [Mon, 12 Feb 2018 04:45:26 +0000 (04:45 +0000)]
Switch to using devmatch to autoload drivers. Remove usb.conf
as obsolete because devmatch gets its information from the same
place as the genration scripts.