Make linux(4) set the openfiles soft resource limit to 1024 for Linux
applications, which often depend on this being the case. There's a new
sysctl, compat.linux.default_openfiles, to control this behaviour.
Move compat.linux.map_sched_prio sysctl definition to linux_mib.c so it is
only defined by linux_common kernel module and not both linux and linux64
modules.
linuxulator: Map scheduler priorities to Linux priorities.
On Linux the valid range of priorities for the SCHED_FIFO and SCHED_RR
scheduling policies is [1,99]. For SCHED_OTHER the single valid priority is
0. On FreeBSD it is [0,31] for all policies. Programs are supposed to
query the valid range using sched_get_priority_(min|max), but of course some
programs assume the Linux values are valid.
This commit adds a tunable compat.linux.map_sched_prio. When enabled
sched_get_priority_(min|max) return the Linux values and sched_setscheduler
and sched_(get|set)param translate between FreeBSD and Linux values.
Because there are more Linux levels than FreeBSD levels, multiple Linux
levels map to a single FreeBSD level, which means pre-emption might not
happen as it does on Linux, so the tunable allows to disable this behaviour.
It is enabled by default because I think it is unlikely that anyone runs
real-time software under Linux emulation on FreeBSD that critically relies
on correct pre-emption.
This fixes FMOD, a commercial sound library used by several games.
Add compat.linux.ignore_ip_recverr sysctl. This is a workaround
for missing IP_RECVERR setsockopt(2) support. Without it, DNS
resolution is broken for glibc >= 2.30 (glibc BZ #24047).
From the user point of view this fixes "yum update" on recent
CentOS 8.
Support SO_SNDBUFFORCE/SO_RCVBUFFORCE by aliasing them to the
standard SO_SNDBUF/SO_RCVBUF. Mostly cosmetics, to get rid
of the warning during 'apt upgrade'.
Emmanuel Vadot [Mon, 24 Aug 2020 10:46:09 +0000 (10:46 +0000)]
MFC r361007, r361138-r361140, r361245-r361246
r361007:
linuxkpi: Add EBADRQC to errno.h
This is used in the amdgpu driver from Linux 5.2
Sponsored-by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24807
r361138:
linuxkpi: Add atomic_dec_and_mutex_lock
This function decrement the counter and if the result is 0 it acquires
the mutex and returns 1, if not it simply returns 0.
Needed by DRM from Linux v5.3
Sponsored-by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24847
r361139:
linuxkpi: Add __mutex_init
Same as mutex_init, the lock_class_key argument seems to be only used for
debug in Linux, simply ignore it for now.
Needed by DRM in Linux v5.3
Sponsored-by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24848
r361140:
linuxkpi: Add offsetofend macro
This calculate the offset of the end of the member in the given struct.
Needed by DRM in Linux v5.3
Sponsored-by: The FreeBSD Foudation
Differential Revision: https://reviews.freebsd.org/D24849
r361245:
linuxkpi: Add __init_waitqueue_head
The only difference with init_waitqueue_head is that the name and the
lock class key are provided but we don't use those so use init_waitqueue_head
directly.
Sponsored-by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24861
r361246:
linuxkpi: add pci_dev_present
pci_dev_present shows if a set of pci ids are present in the system.
It just wraps pci_find_device.
Needed by DRMv5.2
Emmanuel Vadot [Mon, 24 Aug 2020 10:42:04 +0000 (10:42 +0000)]
MFC r360787, r360851, r360870-r360872
r360787:
linuxkpi: Add pci_iomap and pci_iounmap
Those function are use to map/unmap io region of a pci device.
Different resource can be mapped depending on the bar so use a
tailq to store them all.
Sponsored-by: The FreeBSD Foundation
Reviewed by: emaste, hselasky
Differential Revision: https://reviews.freebsd.org/D24696
r360851:
linuxkpi: Add bitmap_copy and bitmap_andnot
bitmap_copy simply copy the bitmaps, no idea why it exists.
bitmap_andnot is similar to bitmap_and but uses !src2.
Sponsored-by: The FreeBSD Foundation
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D24782
r360870:
linuxkpi: Add bitmap_alloc and bitmap_free
This is a simple call to kmallock_array/kfree, therefore include linux/slab.h as
this is where the kmalloc_array/kfree definition is.
Sponsored-by: The FreeBSD Foundation
Reviewed by: hselsasky
Differential Revision: https://reviews.freebsd.org/D24794
r360871:
linuxkpi: Really add bitmap_alloc and bitmap_zalloc
This was missing in r360870
Sponsored-by: The FreeBSD Foundation
r360872:
qnlx: Do not redifines types.
r360870 added linux/slab.h into liunx/bitmap.h and this include linux/types.h
The qlnx driver is redefining some of those types so remove them and add an
explicit linux/types.h include.
Pointy hat: manu
Reported by: Austin Shafer <ashafer@badland.io>
Michael Tuexen [Mon, 24 Aug 2020 09:15:52 +0000 (09:15 +0000)]
MFC r363456:
Clear the pointer to the socket when closing it also in case of
an ungraceful operation.
This fixes a use-after-free bug found and reported by Taylor
Brandstetter of Google by testing the userland stack.
Michael Tuexen [Mon, 24 Aug 2020 09:13:06 +0000 (09:13 +0000)]
MFC r363323:
Add reference counts for inp/stcb/net when timers are running.
This avoids a use-after-free reported for the userland stack.
Thanks to Taylor Brandstetter for suggesting a patch for
the userland stack.
Michael Tuexen [Mon, 24 Aug 2020 09:10:19 +0000 (09:10 +0000)]
MFC r363275:
Improve the locking of address lists by adding some asserts and
rearranging the addition of address such that the lock is not
given up during checking and adding.
Michael Tuexen [Mon, 24 Aug 2020 09:06:46 +0000 (09:06 +0000)]
MFC r363194:
Improve the error handling in generating ASCONF chunks.
In case of errors, the cleanup was not consistent.
Thanks to Felix Weinrank for fuzzing the userland stack and making
me aware of the issue.
Michael Tuexen [Mon, 24 Aug 2020 09:00:07 +0000 (09:00 +0000)]
MFC r363076:
Fix a use-after-free bug for the userland stack. The kernel
stack is not affected.
Thanks to Mark Wodrich from Google for finding and reporting the
bug.
Michael Tuexen [Mon, 24 Aug 2020 08:58:45 +0000 (08:58 +0000)]
MFC r363046:
Optimize flushing of receive queues.
This addresses an issue found and reported for the userland stack in
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=21243
Michael Tuexen [Mon, 24 Aug 2020 08:38:58 +0000 (08:38 +0000)]
MFC r363008:
Improve handling of PKTDROP chunks. This includes the input validation
to address two issues found by ossfuzz testing the userland stack:
* https://oss-fuzz.com/testcase-detail/5387560242380800
* https://oss-fuzz.com/testcase-detail/4887954068865024
and adding support for I-DATA chunks in addition to DATA chunks.
Michael Tuexen [Mon, 24 Aug 2020 08:35:13 +0000 (08:35 +0000)]
MFC r362722:
Don't send packets containing ERROR chunks in response to unknown
chunks when being in a state where the verification tag to be used
is not known yet.
Michael Tuexen [Mon, 24 Aug 2020 08:32:16 +0000 (08:32 +0000)]
MFC r362581:
Fix the acconting for fragmented unordered messages when using
interleaving.
This was reported for the userland stack in
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=19321
Michael Tuexen [Mon, 24 Aug 2020 08:25:00 +0000 (08:25 +0000)]
MFC r362473:
leanup the defintion of struct sctp_getaddresses. This stucture
is used by the IPPROTO_SCTP level socket options SCTP_GET_PEER_ADDRESSES
and SCTP_GET_LOCAL_ADDRESSES, which are used by libc to implement
sctp_getladdrs() and sctp_getpaddrs().
These changes allow an old libc to work on a newer kernel.
Michael Tuexen [Mon, 24 Aug 2020 08:19:25 +0000 (08:19 +0000)]
MFC r362451:
Use a struct sockaddr_in or struct sockaddr_in6 as the option value
for the IPPROTO_SCTP level socket options SCTP_BINDX_ADD_ADDR and
SCTP_BINDX_REM_ADDR. These socket option are intended for internal
use only to implement sctp_bindx().
This is one user of struct sctp_getaddresses less.
struct sctp_getaddresses is strange and will be changed shortly.
Michael Tuexen [Sun, 23 Aug 2020 23:24:38 +0000 (23:24 +0000)]
MFC r361895:
Retire SCTP_SO_LOCK_TESTING.
This was intended to test the locking used in the MacOS X kernel on a
FreeBSD system, to make use of WITNESS and other debugging infrastructure.
This hasn't been used for ages, to take it out to reduce the #ifdef
complexity.
Michael Tuexen [Sun, 23 Aug 2020 23:19:32 +0000 (23:19 +0000)]
MFC r361243:
Replace snprintf() by SCTP_SNPRINTF() and let SCTP_SNPRINTF() map
to snprintf() on FreeBSD. This allows to check for failures of snprintf()
on platforms other than FreeBSD kernel.
Michael Tuexen [Sun, 23 Aug 2020 22:48:19 +0000 (22:48 +0000)]
MFC r360869:
Only drop DATA chunk with lower priorities as specified in RFC 7496.
This issue was found by looking at a reproducer generated by syzkaller.
Michael Tuexen [Sun, 23 Aug 2020 22:39:06 +0000 (22:39 +0000)]
MFC r360662:
Fix the computation of the numbers of entries of the mapping array to
look at when generating a SACK. This was wrong in case of sequence
numbers wrap arounds.
Thanks to Gwenael FOURRE for reporting the issue for the userland stack:
https://github.com/sctplab/usrsctp/issues/462
Michael Tuexen [Sun, 23 Aug 2020 22:26:38 +0000 (22:26 +0000)]
MFC r359405:
Handle integer overflows correctly when converting msecs and secs to
ticks and vice versa.
These issues were caught by recently added panic() calls on INVARIANTS
systems.
Make linux stat(2) return the same st_dev for every devfs instance.
The reason for this is to work around an idiosyncrasy of glibc
getttynam(3) implementation: it checks whether st_dev returned for
fd 0 is the same as st_dev returned for the target of /proc/self/fd/0
symlink, and with linux chroots having their own devfs instance,
the check will fail if you chrooted into it.
PR: kern/240767
Sponsored by: The FreeBSD Foundation
Make Linux stat(2) et al distinguish between block and character
devices. It's required for LTP, among other things. It's not
complete, but good enough for now.
Michael Tuexen [Sun, 23 Aug 2020 22:19:39 +0000 (22:19 +0000)]
MFC r359306:
Remove an optimization, which was incorrect a couple of times and
therefore doesn't seem worth to be there.
In this case COOKIE where not retransmitted anymore, when the
socket was already closed.
Michael Tuexen [Sun, 23 Aug 2020 22:07:49 +0000 (22:07 +0000)]
MFC r359287:
Another cleanup of the timer code. Also be more pedantic about the
parameters of the timer start and stop routines. Several inconsistencies
have been fixed in earlier commits. Now they will be catched when running
an INVARIANTS system.
Move futex_list definition to linux.c which is included once
in linux.ko (i386) and in linux_common.ko (amd64 and aarch64)
allowing 32/64 bit linux programs to access the same futexes
in the latter case.
It is assembled using "${CC} -x assembler-with-cpp", which by convention
(bsd.suffixes.mk) uses the .asm extension.
This is a portion of the review referenced below (D18344). That review
also renamed linux_support.s to .S, but that is a functional change
(using the compiler's integrated assembler instead of as) and will be
revisited separately.