mckusick [Fri, 3 May 2019 21:54:14 +0000 (21:54 +0000)]
This update eliminates a kernel stack disclosure bug in UFS/FFS
directory entries that is caused by uninitialized directory entry
padding written to the disk. It can be viewed by any user with read
access to that directory. Up to 3 bytes of kernel stack are disclosed
per file entry, depending on the the amount of padding the kernel
needs to pad out the entry to a 32 bit boundry. The offset in the
kernel stack that is disclosed is a function of the filename size.
Furthermore, if the user can create files in a directory, this 3
byte window can be expanded 3 bytes at a time to a 254 byte window
with 75% of the data in that window exposed. The additional exposure
is done by removing the entry, creating a new entry with a 4-byte
longer name, extracting 3 more bytes by reading the directory, and
repeating until a 252 byte name is created.
This exploit works in part because the area of the kernel stack
that is being disclosed is in an area that typically doesn't change
that often (perhaps a few times a second on a lightly loaded system),
and these file creates and unlinks themselves don't overwrite the
area of kernel stack being disclosed.
It appears that this bug originated with the creation of the Fast
File System in 4.1b-BSD (Circa 1982, more than 36 years ago!), and
is likely present in every Unix or Unix-like system that uses
UFS/FFS. Amazingly, nobody noticed until now.
This update also adds the -z flag to fsck_ffs to have it scrub
the leaked information in the name padding of existing directories.
It only needs to be run once on each UFS/FFS filesystem after a
patched kernel is installed and running.
Submitted by: David G. Lawrence <dg@dglawrence.com>
Reviewed by: kib
MFC after: 1 week
imp [Fri, 3 May 2019 21:06:34 +0000 (21:06 +0000)]
Remove stray '*'
We're storing an EFI_HANDLE, not an pointer to a handle. Since
EFI_HANDLE is a void * anyway, this has little practical effect since
the conversion to / from void * and void ** is silent.
rwatson [Fri, 3 May 2019 20:38:43 +0000 (20:38 +0000)]
When MAC is enabled and a policy module is loaded, don't unconditionally
lock mac_ifnet_mtx, which protects labels on struct ifnet, unless at least
one policy is actively using labels on ifnets. This avoids a global mutex
acquire in certain fast paths -- most noticeably ifnet transmit. This was
previously invisible by default, as no MAC policies were loaded by default,
but recently became visible due to mac_ntpd being enabled by default.
gallatin@ reports a reduction in PPS overhead from 300% to 2.2% with this
change. We will want to explore further MAC Framework optimisation to
reduce overhead further, but this brings things more back into the world
of the sane.
gallatin [Fri, 3 May 2019 14:43:21 +0000 (14:43 +0000)]
Select lacp egress ports based on NUMA domain
This change creates an array of port maps indexed by numa domain
for lacp port selection. If we have lacp interfaces in more than
one domain, then we select the egress port by indexing into the
numa port maps and picking a port on the appropriate numa domain.
This is behavior is controlled by the new ifconfig use_numa flag
and net.link.lagg.use_numa sysctl/tunable (both modeled after the
existing use_flowid), which default to enabled.
bde [Fri, 3 May 2019 13:06:46 +0000 (13:06 +0000)]
Fix copying planar bitmaps when the horizontal start and end are both not
multiples of 8. Then the misaligned pixels at the end were not copied.
Clean up variable misuse related to this bug. The width in bytes was
first calculated correctly and used to do complicated reblocking
correctly, but it was stored in an unrelated scratch variable and later
recalculated with an off-by-1-error, so the last byte (times 4 planes)
in the intermediate copy was not copied.
This doubly-misaligned case is especially slow. Misalignment complicates
the reblocking, and each misaligment requires a read before write, and this
read is still not done from the shadow buffer.
dchagin [Fri, 3 May 2019 08:42:49 +0000 (08:42 +0000)]
In order to reduce duplication between MD parts of the Linuxulator
move bits that are MI out into the headers in compat/linux.
For that remove bogus _packed attribute from struct l_sockaddr
and use MI types for struct members.
And continue to move into the linux_common module a code that is
intended for both Linuxulator modules (both instruction set - 32 & 64 bit)
or for external modules like linsysfs or linprocfs.
To avoid header pollution introduce new sys/compat/linux_common.h header.
jhb [Thu, 2 May 2019 22:46:37 +0000 (22:46 +0000)]
Increase the VirtIO segment count to support modern Windows guests.
The Windows virtio driver ignores the advertized seg_max field and
assumes the host can accept up to 67 segments in indirect descriptors,
triggering an assert in the bhyve process.
This brings back r282922 but with a couple of changes:
- It raises the block interface segment limit to 128 instead of 67.
- Linux's virtio driver assumes that the segment limit is no
larger than the ring size. To avoid breaking Linux guests,
raise the VirtIO ring size to 128, and cap the VirtIO segment
limit at ring size - 2 (effectively 126).
kevans [Thu, 2 May 2019 17:44:46 +0000 (17:44 +0000)]
libbe(3): Properly mount BEs with mountpoint=none
Instead of pretending to successfully mount them while not actually
mounting anything, we'll now actually mount them *and* claim we mounted them
successfully.
kevans [Thu, 2 May 2019 17:01:13 +0000 (17:01 +0000)]
stand: correct mis-merge from r346879
Small mis-merge from multiple WIP resulted in block io media handles getting
double-initialized. This resulted in some installations oddly landing at the
mountroot prompt.
kevans [Thu, 2 May 2019 16:56:03 +0000 (16:56 +0000)]
fdt: Fix installation of aarch64 dtb
r345519 rewrote parts of how we build .dtb, but mistakenly dropped the
vendor dir for aarch64. Simply drop the :T for building ${DTB} in the
aarch64 case- it'll get applied at install-time as-needed, with :H:T for
determining the vendor dir.
Reported by: manu
Tested by: manu
Reviewed by: manu
MFC after: 3 days
kib [Thu, 2 May 2019 15:03:16 +0000 (15:03 +0000)]
Cleanup for rtld_malloc.c.
- Remove dead and most likely rotten MALLOC_DEBUG, MSTAT, and RCHECK options.
- Remove unused headers.
- Remove one case of undefined behavior where left shift could overflow.
It is impossible on practice for rtld and libthr consumer.
PR: 237577
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
manu [Thu, 2 May 2019 12:56:13 +0000 (12:56 +0000)]
arm64: Add support for NanoPI NEO2
Add overlay files and activate devicetree file for NanoPi NEO2 featuring
Allwinner H5 ARM64 core.
To enable sound, dma and codec drivers are enabled for build.
jhibbits [Thu, 2 May 2019 03:39:03 +0000 (03:39 +0000)]
powerpc: Drop OPAL_HANDLE_HMI2 for now, to avoid panicking
It's possible for a Hypervisor Maintenance Interrupt (HMI) to occur while in
the pmap code, holding locks. This can cause WITNESS to panic due to lock
errors in calling pmap_kextract(). Since we don't yet handle the flags
returned by OPAL_HANDLE_HMI2, just stop using it, so that we don't call into
pmap_kextract().
andrew [Wed, 1 May 2019 17:12:49 +0000 (17:12 +0000)]
Restore x18 in efi_arch_leave.
Some UEFI implementations trash this register and, as we use it as a
platform register, the kernel doesn't save it before calling into the UEFI
runtime services. As we have a copy in tpidr_el1 restore from there when
exiting the EFI environment.
kib [Wed, 1 May 2019 13:15:06 +0000 (13:15 +0000)]
Fix another race between vm_map_protect() and vm_map_wire().
vm_map_wire() increments entry->wire_count, after that it drops the
map lock both for faulting in the entry' pages, and for marking next
entry in the requested region as IN_TRANSITION. Only after all entries
are faulted in, MAP_ENTRY_USER_WIRE flag is set.
This makes it possible for vm_map_protect() to run while other entry'
MAP_ENTRY_IN_TRANSITION flag is handled, and vm_map_busy() lock does
not prevent it. In particular, if the call to vm_map_protect() adds
VM_PROT_WRITE to CoW entry, it would fail to call
vm_fault_copy_entry(). There are at least two consequences of the
race: the top object in the shadow chain is not populated with
writeable pages, and second, the entry eventually get contradictory
flags MAP_ENTRY_NEEDS_COPY | MAP_ENTRY_USER_WIRED with VM_PROT_WRITE
set.
Handle it by waiting for all MAP_ENTRY_IN_TRANSITION flags to go away
in vm_map_protect(), which does not drop map lock afterwards. Note
that vm_map_busy_wait() is left as is.
Reported and tested by: pho (previous version)
Reviewed by: Doug Moore <dougm@rice.edu>, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D20091
imp [Wed, 1 May 2019 05:42:13 +0000 (05:42 +0000)]
Use D_PARTISGPT rather than bare 255
These three cases dovetail with other places in the code where we use
or set D_PARTISGPT when we mean that the partitioning scheme is
GPT. Use this #define to make the code easier to undertand.
It seems to be incompatible with the OVMF.fd (of unknown provenance)
in use by the Cirrus-CI config. We will soon have a known OVMF build
via a port/package (see review D19869) and we can switch back to q35
once packages are available.
Port the logic used by getifaddrs(3) to handle the case where
NET_RT_IFLIST returns ENOMEM, which can occur if the list size changes
between the buffer allocation and sysctl read.
Reduce the default image size for virtual machine disk images from
30GB to 3GB. The raw images can be resized using truncate(1), and
other formats can be resized with tools included with other tools
included with other hypervisors.
Enable the growfs(8) rc(8) at firstboot if the disk was resized
prior to booting the virtual machine for the first time.
Discussed with: several
PR: 232313 (requested in other context)
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Unconditional writing to MAS7, which doesn't exist on the e500v1 core, in a
TLB miss handler has been in the code for several years now. Since this has
gone unnoticed for so long, it's easily concluded that e500v1 is not in use
with FreeBSD. Simplify the code path a bit, by unconditionally zeroing MAS7
instead of calling a subroutine to do it.
r18 is used to hold the old PCB flags, but cpu_throw doesn't populate r18
with PCB flags, since the old thread is gone. This can lead to panics on
cores that don't have the registers guarded by these flags.
Update/reformat maintainer entries that I am a part of
* Replace all instances of freebsd-testing with `#test`. `#test` is the
Phabricator group that focuses on test-related reviews.
* Replace `atf` with contrib/atf, as that's the actual location for the test
framework.
* Remove jmmv@ from the maintainers list for atf. He is the upstream project
owner, but was moved to alumni status after r345787.
* Fix a typo accidentally introduced in r346899 (inpact -> impact).
Add a note to MAINTAINERS requesting pre-commit review from the graphics
team, using phabricator, for changes to the lkpi subsystem. This is done in
order to give us a chance to test the graphics drivers (drm drivers) for
regressions, and to try to avoid breakage, errors and issues with the
graphics drivers.
The review is done via the #x11 group on phabricator.
Please note that hselasky also want to review changes.
Turning on multicast debug made multicast failure worse
because the strings and #define values no longer matched
up. Fix them, and make sure they stay matched-up.
o Rewrite softdma_process_tx() of Altera SoftDMA engine driver
so it does not require a bounce buffer. The only need for this was
to align the buffer address. Implement unaligned access and we don't
need to copy data twice.
o Remove contigmalloc-based bounce buffer from xDMA code since it is
not suitable for arbitrary memory provided by platform, which is
sometimes a dedicated piece of memory that is not managed by OS at all.
Support all reasonable cursor sizes. Reduce the size of the standard
cursor from 16x16 (with 6 columns unused) to 10x16 and rename it to
the "small" cursor. Add a "large" 19x32 cursor and use it for screen
widths larger than 800 pixels. Use libvgl's too-small indentation for
the large data declarations.
MOUSE_IMG_SIZE = 16 is still part of the API. If an application supplies
invalid bitmaps for the cursor, then the results may be different from
before.
Refactor and simplify hiding the mouse cursor and fix bugs caused by
complications in the previous methods.
r346761 broke showing the mouse cursor after changing its state from
off to on (including initially), since showing the cursor uses the
state to decide whether to actually show and the state variable was
not changed until after null showing. Moving the mouse or copying
under the cursor fixed the problem. Fix this and similar problems for
the on to off transition by changing the state variable before drawing
the cursor.
r346641 failed to turn off the mouse cursor on exit from vgl. It hid
the cursor only temporarily for clearing. This doesn't change the state
variable, so unhiding the cursor after clearing restored the cursor if its
state was on. Fix this by changing its state to VGL_MOUSEHIDE using the
application API for changing the state.
Remove the VGLMouseVisible state variable and the extra states given by it.
This was an optimization that was just an obfuscation in at least the
previous version.
Staticize VGLMouseAction(). Remove VGLMousePointerShow/Hide() except as
internals in __VGLMouseMode(). __VGLMouseMouseMode() is the same as the
application API VGLMouseMouseMode() except it returns the previous mode
which callers need to know to restore it after hiding the cursor.
Use the refactoring to make minor improvements in a simpler way than was
possible:
- in VGLMouseAction(), only hide and and unhide the mouse cursor if the
mouse moved
- in VGLClear(), only hide and and unhide the mouse cursor if the clearing
method would otherwise clear the cursor.
Handle HAVE_PROTO flag and print "proto" keyword for O_IP4 and O_IP6
opcodes when it is needed.
This should fix the problem, when printed by `ipfw show` rule could not
be added due to missing "proto" keyword.
When set, we ignore all the hints that the UEFI boot manager has set
for us. We also always fail back to the OK prompt when we can't find
the right thing to boot rather than failing back to the UEFI boot
manager. This has the side effect of also expanding the cases where we
fail back to the OK prompt to include when we're booted under UEFI,
but UEFI::BootCurrent isn't set in the environment and we can't find a
proper place to boot from.
If uefi_rootdev is set in the environment, then treat it like a device
path. Convert the string to a device path and see if we can find a
device that matches. If so, use that device at our root dev no matter
what. If it's bad in any way, the boot will fail.
Read in and parse /efi/freebsd/loader.env from the boot device's
partition as if it were on the command line.
Fetch FreeBSD-LoaderEnv UEFI enviornment variable. If set, read in
loader environment variables from it. Otherwise read in
/efi/freebsd/loader.env. Both are read relative to the device
loader.efi loaded from (they aren't full UEFI device paths)
Next fetch FreeBSD-NextLoaderEnv UEFI environment variable. If
present, read the file it points to in as above and delete the UEFI
environment variable so it only happens once.
This lets one set environment variables in the bootloader.
Unfortunately, we don't have all the mechanisms in place to parse the
file, nor do we have the magic pattern matching in place that
loader.conf has. Variables are of the form foo=bar. No quotes are
supported, so spaces aren't allowed, for example. Also, variables like
foo_load=yes are intercepted when we parse the loader.conf file and
things are done based on that. Since those aren't done here, variables
that cause an action to happen won't work.
Add a trailing empty line to match the test code output
This is added for letting these long failing test case pass, and for
consistency. The test code should be fixed later to not output this extra
empty line.
It was reported that without #ifdef INET6 around the declaration of "nbuf",
a build would report an unused variable. For some reason, I didn't see that
warning when I did a build, but it seems reasonable to add these #ifdef INET6's.
Some test scripts use ncat --sctp --listen port to run an SCTP discard
server in the background. However, when running in the background,
stdin is closed and ncat initiates a graceful shutdown of the SCTP
association. This is not expected by the client. Therefore, the
ncat-based discard server is replaced by a perl-based one.
In addition, to remove the dependency from ncat, which needs to be
installed via the nmap port, also the code testing for a free SCTP port
is changed to use the perl-based client.
Finally, remove some debug output from the report generated.
Make isp(4) suggest loading ispfw(4) when it fails to attach.
It cannot load it automatically at boot, because the root filesystem
is not there yet. An alternative would be adding ispfw(4) to GENERIC,
but it's an additional 1MB.
powerpc: Add support for additional FSCR-managed facilities
Add support to enable, save, and restore the following facilities:
* Target Address Register (bctar) -- seemingly just another register to
branch to.
* Event-based branching -- an interrupt-like userspace event handler
subsystem.
* Load-monitored facility -- A facility that allows monitoring a range of
physical memory, and triggering an event on access. Targeted to garbage
collection software features.
powerpc64: Add the DSCR facility on POWER8 and later
The Data Stream Control Register (DSCR) is privileged on POWER7, but
unprivileged (different register) on POWER8 and later. However, it's now
guarded by a new register, the Facility Status and Control Register, instead of
the MSR like other pre-existing facilities (FPU, Altivec). The FSCR must be
managed explicitly, since it's effectively an extension of the MSR.
manu [Sat, 27 Apr 2019 14:48:27 +0000 (14:48 +0000)]
arm64: allwinner: Add compatible strings for clock devices used on both Allwinner H3 and H5
Allwinner H3 and H5 share many internal components, that's why they can
use the same drivers.
This patch adds the compatible strings to enable clock drivers
probing on Allwinner NanoPI NEO2 device.
The POWER8NVL (POWER8 NVLink) architecturally behaves identically to the
POWER8, with a different PVR identifier. Mark it as such, so it shows up
appropriately to the user.
Since the non-volatile registers are restored at the end of cpu_switchin (of
the new thread) they're free for us to use for our own purposes. Load the
PCB_FLAGS into a non-volatile register so it's preserved across the C
function calls that manage FPU and altivec state. This removes 4 loads from
each file. Might be a trivial performance improvement (~12 clock cycles per
context switch).
Some PPC systems (PowerNV) use msdosfs for /boot, which can't handle either
symlinks or hardlinks. So on PPC, copy the module instead. This change fixes
installkernel on such systems after r345350.
Use __VGLBitmapCopy() directly to show the mouse cursor. The mouse
cursor must be merged with the shadow buffer on the way to the screen,
and __VGLBitmapCopy() now has an option to do exactly that. This is
insignificantly less efficient.
Add map-vdisk and unmap-vdisk commands to create virtual disk interface on top of file. This will allow to use disk image from file system to load and start the kernel.
By mapping file, we create vdiskX device, the device will be listed by lsdev [-v] and can be accessed directly as ls vdisk0p1:/path or can be used as value for currdev variable.
vdisk strategy function does not use bcache as we have bcache used with backing file. vdisk can be unmapped when all consumers have closed the open files.
In first iteration we do not support the zfs images because zfs pools do keep the device open (there is no "zpool export" mechanism). Adding zfs support is relatively simple, we just need to run zfs disk probe after mapping is done.
Merge __VGLGetXY() back into VGLGetXY(). They were split to simplify
the organization of fixes for the mouse cursor, but after optimizations
VGLGetXY() automatically avoids the mouse cursor.
In VGLClear(), check for the overlap of the mouse cursor in the whole
display, not just in the unpanned top left corner. This currently
makes no difference since the kernel erroneously doesn't allow moving
the cursor completely outside of the unpanned corner.