alc [Thu, 8 Aug 2019 06:26:34 +0000 (06:26 +0000)]
Ordinarily, during a superpage promotion or demotion within a pmap, the
pmap's lock ensures that other operations on the pmap don't observe the
old mapping being broken before the new mapping is established. However,
pmap_kextract() doesn't acquire the kernel pmap's lock, so it may observe
the broken mapping. And, if it does, it returns an incorrect result.
This revision implements a lock-free solution to this problem in
pmap_update_entry() and pmap_kextract() because pmap_kextract() can't
acquire the kernel pmap's lock.
cem [Thu, 8 Aug 2019 03:27:46 +0000 (03:27 +0000)]
Disable useless -Wformat-zero-length
It is part of -Wformat, which is enabled by -Wall. Empty format strings are
well defined and it is perfectly reasonable to expect them in a formatting
interface.
jhibbits [Thu, 8 Aug 2019 03:16:32 +0000 (03:16 +0000)]
Change autounmountd(8) to use time_t for duration instead of double
Summary:
autounmountd(8) uses doubles to handle mount time durations. However,
it must convert to integer types, time_t in particular, to do anything
meaningful. Additionally, even though it's a floating-point value in
seconds, the sub-seconds component is never used, so it's unnecessary.
Switching type to time_t fixes an assertion on powerpc64, which checks
that a sleep value that's not -1.0 is greater than 0. On powerpc64, it
happens that the value of -1.0 gets loaded as a float (perhaps a bug in
gcc), but gets compared to a double. This compares as false, so follows
through the 'sleep != -1.0' path, and fails the assert. Since the
sub-second component isn't used in the double, just drop it and deal
with whole-integer seconds.
cem [Thu, 8 Aug 2019 00:42:29 +0000 (00:42 +0000)]
ddb(4): Add 'sysctl' command
Implement `sysctl` in `ddb` by overriding `SYSCTL_OUT`. When handling the
req, we install custom ddb in/out handlers. The out handler prints straight
to the debugger, while the in handler ignores all input. This is intended
to allow us to print just about any sysctl.
There is a known issue when used from ddb(4) entered via 'sysctl
debug.kdb.enter=1'. The DDB mode does not quite prevent all lock
interactions, and it is possible for the recursive Giant lock to be unlocked
when the ddb(4) 'sysctl' command is used. This may result in a panic on
return from ddb(4) via 'c' (continue). Obviously, this is not a problem
when debugging already-paniced systems.
Submitted by: Travis Lane (formerly: <travis.lane AT isilon.com>)
Reviewed by: vangyzen (earlier version), Don Morris <dgmorris AT earthlink.net>
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D20219
asomers [Wed, 7 Aug 2019 20:28:27 +0000 (20:28 +0000)]
Remove the fuse.ko -> fusefs.ko symlink
On FreeBSD 13.0, the fuse driver will always be known as fusefs. The
backwards compatibility symlink will still be used for stable/12 and
stable/11, though.
Reported by: jhibbits
Reviewed by: rgrimes, imp, cem
MFC after: Never
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21181
dim [Wed, 7 Aug 2019 20:13:43 +0000 (20:13 +0000)]
Fix a possible segfault in wcsxfrm(3) and wcsxfrm_l(3).
If the length of the source wide character string, passed in via the
"size_t n" parameter, is set to zero, the function should only return
the required length for the destination wide character string. In this
case, it should *not* attempt to write to the destination, so the "dst"
parameter is permitted to be NULL.
However, when the internally called _collate_wxfrm() function returns an
error, such as when using the "C" locale, as a fallback wcscpy(3) or
wcsncpy(3) are used. But if the input length is zero, wcsncpy(3) will
be called with a length of -1! If the "dst" parameter is NULL, this
will immediately result in a segfault, or if "dst" is a valid pointer,
it will most likely result in unexpectedly overwritten memory.
Fix this by explicitly checking for an input length greater than zero,
before calling wcsncpy(3).
Note that a similar situation does not occur in strxfrm(3), the plain
character version of this function, as it uses strlcpy(3) for the error
case. The strlcpy(3) function does not write to the destination if the
input length is zero.
oshogbo [Wed, 7 Aug 2019 19:30:33 +0000 (19:30 +0000)]
cap_filergs: limit size of the file name
The limit of the name in fileargs is twice the size of the MAXPATH.
The nvlist will not add an element with the longer name.
We can detect at this point that the path is too big, and simple return
the same error as open(2) would.
cem [Wed, 7 Aug 2019 19:27:14 +0000 (19:27 +0000)]
sbuf(9): Add sbuf_nl_terminate() API
The API is used to gracefully terminate text line(s) with a single \n. If
the formatted buffer was empty or already ended in \n, it is unmodified.
Otherwise, a newline character is appended to it. The API, like other
sbuf-modifying routines, is only valid while the sbuf is not FINISHED.
cem [Wed, 7 Aug 2019 19:25:56 +0000 (19:25 +0000)]
sbuf(9): Refactor sbuf_newbuf into sbuf_new
Code flow was somewhat difficult to read due to the combination of
multiple return sites and the 4x possible dynamic constructions of an
sbuf. (Future consideration: do we need all 4?) Refactored slightly to
improve legibility.
trasz [Wed, 7 Aug 2019 18:14:45 +0000 (18:14 +0000)]
Add cdceem(4) driver, for virtual ethernet devices compliant
with Communication Device Class Ethernet Emulation Model (CDC EEM).
The driver supports both the device, and host side operation; there
is a new USB template (#11) for the former.
This enables communication with virtual USB NIC provided by iLO 5,
as found in new HPE Proliant servers.
mckusick [Wed, 7 Aug 2019 16:56:00 +0000 (16:56 +0000)]
Correct the location of the first backup superblock in fsck_ffs.8.
Make a note in the newfs.8 manual page to update the first backup
superblock location when changing the default fragment size for
the filesystem.
mav [Wed, 7 Aug 2019 14:45:10 +0000 (14:45 +0000)]
Make `camcontrol modepage` support block descriptors.
It allows to read and write block descriptors alike to mode page parameters.
It allows to change block size or short-stroke HDDs or overprovision SSDs.
Depenting on -P parameter the change can be either persistent or till reset.
In case of block size change device may need reformat after the setting.
In case of SSD overprovisioning format or sanitize may be needed to really
free the flash.
During implementation appeared that csio_encode_visit() can not handle
integers of more then 4 bytes, that makes 8-byte LBA handling awkward.
I had to split it into two 4-byte halves now.
MFC after: 1 week
Relnotes: yes
Sponsored by: iXsystems, Inc.
manu [Wed, 7 Aug 2019 13:13:16 +0000 (13:13 +0000)]
ofw: ofw_reg_to_paddr: Use a 256 static array for the cell
Some hardware needs more than 32, bump this value.
We cannot use the _alloc for of getencprop as this function is called
too early in the boot before pmap is initialized and we only have
2k of stack when cninit is called.
manu [Wed, 7 Aug 2019 13:11:53 +0000 (13:11 +0000)]
arm: dts: am33xx: Fix the region for uart0
The region for uart0 is declared to be 0x2000 in size but the parent
node only declare 0x1000.
As the parent only declare a size of 0x1000 in the ranges for it's children
this cause the device to not be mappable.
markj [Wed, 7 Aug 2019 03:14:45 +0000 (03:14 +0000)]
readelf: Close input files when done with them.
The low fd limit used by poudriere exposed an odd failure mode in
cap_fileargs (used by readelf as of r350516). In particular, when
the limit was hit, both the main process and casper service would
block on their shared socket, waiting forever for the other to send a
message.
Reported by: zeising
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
jhb [Wed, 7 Aug 2019 00:53:27 +0000 (00:53 +0000)]
Tidy up the list of auth and encryption algorithms for IPsec stats.
- Use keyed-md5 and keyed_sha1 instead of md5 and sha1 to match
the names accepted by setkey and to also avoid confusion since
these are not "plain" MD5 or SHA1.
- Remove always-true #ifdef's to make the source a bit easier to
read.
- Add missing mappings for tcp-md5, camellia-cbc, and aes-gmac.
asomers [Wed, 7 Aug 2019 00:38:26 +0000 (00:38 +0000)]
fusefs: merge from projects/fuse2
This commit imports the new fusefs driver. It raises the protocol level
from 7.8 to 7.23, fixes many bugs, adds a test suite for the driver, and
adds many new features. New features include:
* Optional kernel-side permissions checks (-o default_permissions)
* Implement VOP_MKNOD, VOP_BMAP, and VOP_ADVLOCK
* Allow interrupting FUSE operations
* Support named pipes and unix-domain sockets in fusefs file systems
* Forward UTIME_NOW during utimensat(2) to the daemon
* kqueue support for /dev/fuse
* Allow updating mounts with "mount -u"
* Allow exporting fusefs file systems over NFS
* Server-initiated invalidation of the name cache or data cache
* Respect RLIMIT_FSIZE
* Try to support servers as old as protocol 7.4
Performance enhancements include:
* Implement FUSE's FOPEN_KEEP_CACHE and FUSE_ASYNC_READ flags
* Cache file attributes
* Cache lookup entries, both positive and negative
* Server-selectable cache modes: writethrough, writeback, or uncached
* Write clustering
* Readahead
* Use counter(9) for statistical reporting
jhb [Tue, 6 Aug 2019 23:22:25 +0000 (23:22 +0000)]
Fix LOCAL_MODULES and improve the make output.
The exists() check guarding the invocation of ls was not working
correctly as it was expanding '$L' to determine the path of the local
modules directory. Fix by using {} around the variable name.
Inline some of the logic from bsd.subdir.mk when invoking local module
builds. This gives output in 'make buildkernel' the same as if there
was a Makefile in /usr/local/sys/modules with SUBDIR =
${LOCAL_MODULES}.
jhb [Tue, 6 Aug 2019 23:15:04 +0000 (23:15 +0000)]
Detect invalid PCI devices more correctly in PCI interrupt router drivers.
- Check for an invalid device (vendor is invalid) before reading the
header type register when examining function 0 of a possible device.
- When iterating over functions of a device, reject any device whose
16-bit vendor is invalid rather than requiring the full 32-bit
vendor+device to be all 1's. In practice the latter check is
probably fine, but checking the vendor is what the PCI spec
recommends.
jeff [Tue, 6 Aug 2019 21:50:34 +0000 (21:50 +0000)]
Add two new kernel options to control memory locality on NUMA hardware.
- UMA_XDOMAIN enables an additional per-cpu bucket for freed memory that
was freed on a different domain from where it was allocated. This is
only used for UMA_ZONE_NUMA (first-touch) zones.
- UMA_FIRSTTOUCH sets the default UMA policy to be first-touch for all
zones. This tries to maintain locality for kernel memory.
tsoome [Tue, 6 Aug 2019 19:27:27 +0000 (19:27 +0000)]
loader.efi: replace HandleProtocol() with OpenProtocol()
The HandleProtocol() is deprecated interface and we should use OpenProtocol()
instead. Moreover, in some firmware implementation(s), the HandleProtocol()
does return device path using static storage, so we can not keep the value
returned there. With same firmware, the OpenProtocol() does return data we
do not need to clone.
imp [Tue, 6 Aug 2019 18:15:26 +0000 (18:15 +0000)]
Fix mismerge.
I merged passthru.c from the wrong branch (it was a branch that went further in
a direction I wound up not taking). Fix the mismerge and turn passthru on.
mckusick [Tue, 6 Aug 2019 18:10:34 +0000 (18:10 +0000)]
A race condition existed between the time a UFS/FFS superblock check
hash was computed and the time that the superblock was copied to a
buffer to be written to disk. The result was a failed superblock
check hash the next time that the superblock was read.
The fix is to compute the check hash after the superblock has been
copied to a buffer to be written.
PR: 236504
Reported by: Peter Holm
Tested by: Peter Holm
Sponsored by: Netflix
kib [Tue, 6 Aug 2019 16:53:25 +0000 (16:53 +0000)]
amd64: prevents speculations over swapgs reload of %gs base.
Such speculations could use user-controlled %gs base, esp. since
FreeBSD supports WRGSBASE instructions.
Place LFENCEs on entry for each basic block after the test for
previous kernel/user mode on the kernel entry, which prevents the
speculation. Code accesses %gs-based PCPU before any serialization
instructions are executed, like %cr3 reload for KPTI.
There is no need for the 64-bit pmap to have a fixed number of page table
buffers. Since the 64-bit pmap has a DMAP, we can effectively have user
page tables limited only by total RAM size.
vangyzen [Mon, 5 Aug 2019 22:59:35 +0000 (22:59 +0000)]
Relax time constraint in pthread_cond_timedwait unit test
pthread_cond_timedwait() should wait _at least_ until the timeout,
but it might appear to wait longer due to system activity and
scheduling. The test ignored fractional seconds when comparing the
actual and expected timeouts, so it allowed anywhere between zero
and one extra second of wait time. Zero is a bit unreasonable.
Compare fractional seconds so we always allow up to one extra second.
jhb [Mon, 5 Aug 2019 21:39:55 +0000 (21:39 +0000)]
Validate guest-supplied length of headers for TSO transmit requests.
When transmitting a large TCP packet, the final transmit descriptor
includes the length of the protocol headers to be duplicated on each
segment. The device model was trusting the guest-supplied value
without validating it. A value of zero would result in the guest
being able to indirect a garbage pointer on the stack to overwrite
arbitrary memory in the bhyve process. A value that was non-zero but
too small for the requested parameters resulted in the device model
reading and writing values beyond the end of the on-stack buffer used
to hold the template header.
To fix, validate the supplied length and drop requests to transmit
packets that would overflow the header buffer. While here, initialize
the header pointer to NULL as a preventive measure so that any access
to an unallocated template header crashes they hypervisor
deterministically.
While here, only read the TCP sequence number if the packet being
split is a TCP packet. The e1000 logic supports a segmentation of UDP
frames, and while UDP segmentation requires this part of the header to
be valid (so there is no buffer overflow), only reading the field when
needed is cleaner.
admbugs: 918
Reported by: Reno Robert <renorobert@gmail.com>
Reviewed by: markj
Approved by: so
Security: CVE-2019-5609
oshogbo [Mon, 5 Aug 2019 20:15:46 +0000 (20:15 +0000)]
procdesc: fix reparenting when the debugger is attached
The process is reparented to the debugger while it is attached.
B B
/ ----> |
A A D
Every time when the process is reparented, it is added to the orphan list
of the previous parent:
A->orphan = B
D->orphan = NULL
When the A process will close the process descriptor to the B process,
the B process will be reparented to the init process.
B B - init
| ---->
A D A D
A->orphan = B
D->orphan = B
In this scenario, the B process is in the orphan list of A and D.
When the last process descriptor is closed instead of reparenting
it to the reaper let it stay with the debugger process and set
our previews parent to the reaper.
Add test case for this situation.
Notice that without this patch the kernel will crash with this test case:
panic: orphan 0xfffff8000e990530 of 0xfffff8000e990000 has unexpected oppid 1
oshogbo [Mon, 5 Aug 2019 19:59:23 +0000 (19:59 +0000)]
exit1: postpone clearing P_TRACED flag until the proctree lock is acquired
In case of the process being debugged. The P_TRACED is cleared very early,
which would make procdesc_close() not calling proc_clear_orphan().
That would result in the debugged process can not be able to collect
status of the process with process descriptor.
thj [Mon, 5 Aug 2019 11:47:34 +0000 (11:47 +0000)]
Add common firewall test suite
Add a common test suite for the firewalls included in the base system. The test
suite allows common test infrastructure to test pf, ipfw and ipf firewalls from
test files containing the setup for all three firewalls.
Add the pass block test for pf, ipfw and ipf. The pass block test checks the
allow/deny functionality of the firewalls tested.
frag6.c: rename ip6q[] to ipq6b[] and consistently use "bucket"
The hash buckets array is called ip6q. The data structure ip6q is a
description of different object, the one the array holds these days
(since r337776). To clear some of this confusion, rename the array
to ip6qb.
When iterating over all buckets or addressing them directly, we
use at least the variables i, hash, and bucket. To keep the
terminology consistent use the variable name "bucket" and always
make it an uint32_t and not sometimes an int.
jhibbits [Mon, 5 Aug 2019 01:37:18 +0000 (01:37 +0000)]
powerpc: Get 32-bit AIM building with secure-PLT
The last few changes needed before 32-bit AIM builds with secure-PLT with
base GCC. Because ofwcall32.S and swtch32.S were branching to the GOT it
could not use secure PLT.
kevans [Mon, 5 Aug 2019 00:08:25 +0000 (00:08 +0000)]
ipfw: fix jail option after r348215
r348215 changed jail_getid(3) to validate passed-in jids as active jails
(as the function is documented to return -1 if the jail does not exist).
This broke the jail option (in some cases?) as the jail historically hasn't
needed to exist at the time of rule parsing; jids will get stored and later
applied.
Fix this caller to attempt to parse *av as a number first and just use it
as-is to match historical behavior. jail_getid(3) must still be used in
order for name arguments to work, but it's strictly a fallback in case we
weren't given a number.
Reported and tested by: Ari Suutari <ari stonepile fi>
Reviewed by: ae
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D21128
kib [Sun, 4 Aug 2019 21:43:34 +0000 (21:43 +0000)]
rtld-elf: Remove x86 elf_rtld.x linker scripts.
First, amd64 version of the script cannot work at least due to the
wrong architecture specification. Second, kernel can activate shared
objects for long time, due to PIE support.
It seems the intent was to allow ld-elf.so.1 to be build and used as
an executable. Since we have direct exec mode implemented for dso
ld-elf.so.1, the non-functional and commented out scripts can be
finally removed.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
jhibbits [Sun, 4 Aug 2019 19:28:10 +0000 (19:28 +0000)]
Add necessary bits for Linux KPI to work correctly on powerpc
PowerPC, and possibly other architectures, use different address ranges for
PCI space vs physical address space, which is only mapped at resource
activation time, when the BAR gets written. The DRM kernel modules do not
activate the rman resources, soas not to waste KVA, instead only mapping
parts of the PCI memory at a time. This introduces a
BUS_TRANSLATE_RESOURCE() method, implemented in the Open Firmware/FDT PCI
driver, to perform this necessary translation without activating the
resource.
In addition to system KPI changes, LinuxKPI is updated to handle a
big-endian host, by adding proper endian swaps to the I/O functions.
Resolve ipfilter kld unload issues related to VNET jails.
When the ipfilter kld is loaded, used within VNET jail, and unloaded,
then subsequent loading, use, and unloading of another packet filters
will cause the subsequently loaded netpfil kld's to panic.
The scenario is as follows:
cd /usr/tests/sys/netpfil/common
kldunload ipl
kldunload pfsync
kldunload ipfw
kyua test pass_block
kldload ipl
kyua test pass_block
kldunload ipl
kldload pfsync
kyua test pass_block
kldunload pfsync
-- page fault panic occurs here --
Reported by: "Ahsan Barkati" <ahsanbarkati@g.....com> via kp@
Discussed with: kp@
Tested by: kp@
MFC after: 3 days
mav [Sat, 3 Aug 2019 19:24:56 +0000 (19:24 +0000)]
Add `nvmecontrol sanitize` command.
It allows to delete all user data from NVM subsystem in one of 3 methods.
It is a close equivalent of SCSI SANITIZE command of `camcontrol sanitize`,
so I tried to keep arguments as close as possible.
While there, fix supported sanitize methods reporting in `identify`.
MFC after: 2 weeks
Relnotes: yes
Sponsored by: iXsystems, Inc.