Mateusz Guzik [Thu, 20 Feb 2020 16:58:19 +0000 (16:58 +0000)]
vfs: add realpathat syscall
realpath(3) is used a lot e.g., by clang and is a major source of getcwd
and fstatat calls. This can be done more efficiently in the kernel.
This works by performing a regular lookup while saving the name and found
parent directory. If the terminal vnode is a directory we can resolve it using
usual means. Otherwise we can use the name saved by lookup and resolve the
parent.
On machines with SMAP, fueword executes two serializing instructions
which can be seen in microbenchmarks.
As a measure to restore microbenchmark numbers, only read the word on
the attempt to deliver signal in ast(). If the word is set, signal is
not delivered and word is kept, preventing interruption of
interruptible sleeps by signals until userspace calls
sigfastblock(UNBLOCK) which clears the word.
This way, the spurious EINTR that userspace can see while in critical
section is on first interruptible sleep, if a signal is pending, and
on signal posting. It is believed that it is not important for rtld
and lbithr critical sections. It might be visible for the application
code e.g. for the callback of dl_iterate_phdr(3), but again the belief
is that the non-compliance is acceptable. Most important is that the
retry of the sleeping syscall does not interrupt unless additional
signal is posted.
For now I added the knob kern.sigfastblock_fetch_always to enable the
word read on syscall entry to be able to diagnose possible issues due
to spurious EINTR.
While there, do some code restructuting to have all sigfastblock()
handling located in kern_sig.c.
Reviewed by: jeff
Discussed with: mjg
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D23622
Bjoern A. Zeeb [Thu, 20 Feb 2020 10:56:12 +0000 (10:56 +0000)]
ip6_output: improve extension header handling
Move IPv6 source address checks from after extension header heandling
to the top of the function. If we do not pass these checks there is
no reason to do a lot of work upfront.
Fold extension header preparations and length calculations together into
a single branch and macro rather than doing them sequentially.
Likewise move extension header concatination into a single branch block
only doing it if we recorded any extension header length length.
ABI has change in between ncurses 5 or 6. While theorically ncurses 6 is buildable with
backward compatibility, I fail at building in a way where the application linked against
the previous version of ncurses are rendering properly.
Let's go on the new ABI which provides all the latest features.
A compat12x package is cooking for backward compatibility
Adrian Chadd [Thu, 20 Feb 2020 07:12:43 +0000 (07:12 +0000)]
[ath] Attempt to fix epoch handling.
The epoch stuff with taskqueues works fine if the driver never calls
the receive path in other contexts, but this driver does. If there was
a chip reset during active receive then part of the reset will call
the receive path to flush out any active packets before reinitialising
the receive queue and that needs to be done with the epoch held.
So:
* make the receive task a normal task again
* explicitly call epoch enter/exit around the legacy and newer DMA
receive paths
* add a couple of epoch asserts to ensure that the receive packet
path itself is called with epoch held.
This fixes it on my Atom eeepc laptop (circa 2010!) that I did
all of my initial 802.11n work in this driver and net80211.
Tested:
* AR9285, STA mode
TODO:
* Test on EDMA chipset (AR9380)
* Test in AP/adhoc modes, just to be sure (eg for beacon
receive processing in particular.)
Pedro F. Giffuni [Thu, 20 Feb 2020 03:54:07 +0000 (03:54 +0000)]
/etc/services: attempt bring the database to this century.
Document better this file, updating the URL to the IANA registry and closely
match the official services.
For system ports (0 to 1023) we now try to follow the registry closely, noting
some historical differences where applicable.
For the User ports (1024 - 49151) we try to keep some sensible balance only
of services that are likely to be found on FreeBSD/UNIX systems. This attempts
to strike a balance between complexity and usefulness.
As a side effect: drop references to unofficial Kerberos IV which was EOL'ed
on Oct 2006[1]. While it is conceivable some people may still use it in some
very old FreeBSD machines that can't be replaced easily, the use of it is
considered a security risk. Also drop the unofficial netatalk, which we
supported long ago in the kernel but was dropped long ago.
Hiroki Sato [Thu, 20 Feb 2020 03:01:27 +0000 (03:01 +0000)]
Improve performance of "read" built-in command when using a seekable
fd.
The read built-in command calls read(2) with a 1-byte buffer because
newline characters need to be detected even on a byte stream which
comes from a non-seekable file descriptor. Because of this, the
following script calls >6,000 read(2) to show a 6KiB file:
while read IN; do echo "$IN"; done < /COPYRIGHT
When the input byte stream is seekable, it is possible to read a data
block and then reposition the file pointer to where a newline
character found. This change adds a small buffer to do this and
reduces the number of read(2) calls.
Theoretically, multiple built-in commands reading the same seekable
byte stream in a single pipe chain can share the buffer. However,
this change just makes a single invocation of the read built-in
allocate a buffer and deallocate it every time for simplicity.
Although this causes read(2) to read the same regions multiple times,
the performance penalty should be small compared to the reduction of
read(2) calls.
Warner Losh [Thu, 20 Feb 2020 01:33:01 +0000 (01:33 +0000)]
Don't convert all lower-layer errors to EIO.
Don't convert all lower layer errors to EIO. Instead, pass the actual error up
the stack. This will allow the upper layers that look for ENXIO to react
properly to that signal from the lower layers and, for UFS, unmount the
filesystem.
Warner Losh [Thu, 20 Feb 2020 00:46:22 +0000 (00:46 +0000)]
Move smbios.c to libsa.
smbios used to be an i386 only kinda weird quirk to the x86
architecture. But UEFI picked it up, dusted it off and now it's many
other locations. Make it base technology by moving it to libsa and
fixing up the compliation. The code has issues with unaligned access
still, but that will be addressed in a followup commit.
Warner Losh [Thu, 20 Feb 2020 00:46:16 +0000 (00:46 +0000)]
Create ptov() function.
Create a ptov() function. It's basically the same as the btx PTOV
macro, but works everywhere. smbios needs this to translate addresses,
but the translation differs between BIOS booting and EFI booting. Make
it a function so one smbios.o can be used everywhere. Provide
definitions for it in the two loaders affected.
Warner Losh [Thu, 20 Feb 2020 00:34:46 +0000 (00:34 +0000)]
Don't spam the console with an additional, and useless, error message.
There's no need to spam the console with this error message. If there's an I/O
error, the disk/cam driver will report it at the lower levels. If that's an
actual problem, the upper layers will report that.
Jeff Roberson [Wed, 19 Feb 2020 22:34:22 +0000 (22:34 +0000)]
Silence a gcc warning about no return from a function that handles every
possible enum in a switch statement. I verified that this emits nothing
as expected on clang. radix relies on constant propagation to eliminate
any branching from these access routines.
Dimitry Andric [Wed, 19 Feb 2020 21:12:59 +0000 (21:12 +0000)]
Take LINKER_FREEBSD_VERSION from numerical field after dash
Summary:
With COMPILER_FREEBSD_VERSION, we use a numeric value that we bump each
time we make a change that requires re-bootstrapping, but with the
linker variant, we instead take the entire part after "FreeBSD", as in
this example version output:
We should only look at the numerical field we append after a dash
instead. This review attempts to make it so.
The only thing I am not happy about is the post-processing of awk output
in Makefile.inc1. I notice that our awk does not have gensub(), so it
can't substitute a numbered sub-regex with \1, \2, etc. Suggestions
welcome. :)
Jeff Roberson [Wed, 19 Feb 2020 19:58:31 +0000 (19:58 +0000)]
Use SMR to provide a safe unlocked lookup for vm_radix.
The tree is kept correct for readers with store barriers and careful
ordering. The existing object lock serializes writers. Consumers
will be introduced in later commits.
Jeff Roberson [Wed, 19 Feb 2020 18:48:46 +0000 (18:48 +0000)]
Use per-domain locks for the bucket cache.
This gives much better concurrency when there are a large number of
cores per-domain and multiple domains. Avoid taking the lock entirely
if it will not be productive. ROUNDROBIN domains will have mixed
memory in each domain and will load balance to all domains.
While here refactor the zone/domain separation and bucket limits to
simplify callers.
Kristof Provost [Wed, 19 Feb 2020 16:44:16 +0000 (16:44 +0000)]
bridge tests: Remove unneeded 'All rights reserved.'
The FreeBSD foundation no longer requires this, as per
https://lists.freebsd.org/pipermail/svn-src-all/2019-February/177215.html and
private communications.
Kyle Evans [Wed, 19 Feb 2020 14:52:32 +0000 (14:52 +0000)]
libsysdecode: grab shmflags from sys/mman.h, add decode method
Any SHM_* flag here is (and likely will continue to be) a shmflag that may
be passed to shm_open2(), with exception to SHM_ANON. This is a prereq to
adding appropriate support to truss/kdump.
Kyle Evans [Wed, 19 Feb 2020 14:32:55 +0000 (14:32 +0000)]
kdump: decode SHM_ANON as first arg to legacy shm_open(2)
The first argument to shm_open(2) as well as shm_open2(2) may be a path or
SHM_ANON. Decode SHM_ANON, at least- paths will show up as namei results in
kdump output, which may be sufficient; in those cases, we'll have printed an
address.
Future commits will add support for shm_open2() to libsysdecode/truss/kdump.
environ(7) was in AT&T Version 7
ac(8): Add a HISTORY section
sa(8): Add a HISTORY section
sqrt(3): Add the actual sqrt function to the HISTORY section
Jeff Roberson [Wed, 19 Feb 2020 08:15:20 +0000 (08:15 +0000)]
Type validating smr protected pointer accessors.
This API is intended to provide some measure of safety with SMR
protected pointers. A struct wrapper provides type checking and
a guarantee that all access is mediated by the API unless abused. All
modifying functions take an assert as an argument to guarantee that
the required synchronization is present.
Hiroki Sato [Wed, 19 Feb 2020 06:28:55 +0000 (06:28 +0000)]
Add _BIX (Battery Information Extended) object support.
ACPI Control Method Batteries have a _BIF and/or _BIX object which
provide static properties of the battery. FreeBSD acpi_cmbat module
supported _BIF object only, which was deprecated as of ACPI 4.0.
_BIX is an extended version of _BIF defined in ACPI 4.0 or later.
As of writing, _BIX has two revisions. One is in ACPI 4.0 (rev.0) and
another is in ACPI 6.0 (rev.1). It seems that hardware vendors still
stick to _BIF only or _BIX rev.0 + _BIF for the maximum compatibility.
Microsoft requires _BIX rev.0 for Windows machines, so there are some
laptop machines with _BIX rev.0 only. In this case, FreeBSD does not
recognize the battery information.
After this change, the acpi_cmbat module gets battery information from
_BIX or _BIF object and internally uses _BIX rev.1 data structure as
the primary information store in the kernel. ACPIIO_BATT_GET_BI[FX]
returns an acpi_bi[fx] structure built by using information obtained
from a _BIF or a _BIX object found on the system. The revision number
field can be used to check which field is available. The acpiconf(8)
utility will show additional information if _BIX is available.
Although ABIs of ACPIIO_BATT_* were changed, the existing APIs for
userland utilities are not changed and the backward-compatible ABIs
are provided. This means that older versions of acpiconf(8) can also
work with the new kernel. The (union acpi_battery_ioctl_arg) was
padded to 256 byte long to avoid another ABI change in the future.
A _BIX object with its revision number >1 will be treated as
compatible with the rev.1 _BIX format.
Ryan Libby [Wed, 19 Feb 2020 04:46:41 +0000 (04:46 +0000)]
powerpc: unconditionally mark SLB zones UMA_ZONE_CONTIG
PR: 244118
Reported by: Francis Little <oggy at farscape.co.uk>
Tested by: Francis Little, Mark Millard <marklmi at yahoo.com>
Reviewed by: markj
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D23729
Kyle Evans [Wed, 19 Feb 2020 02:34:56 +0000 (02:34 +0000)]
certctl(8): switch to install(1) to fix DESTDIR support
"Oops" - ln(1) is fine and dandy, but when you're using DESTDIR...it's not-
the path will almost certainly be invalid once the root you've just
installed to is relocated, perhaps to /.
Switch to install(1) using `-l rs` to calculate the relative symlink between
the two, which should work just fine in all cases.
Dimitry Andric [Tue, 18 Feb 2020 17:55:24 +0000 (17:55 +0000)]
Fix the following -Werror warning from clang 10.0.0:
sys/arm/allwinner/clkng/aw_clk_mipi.c:144:6: error: misleading indentation; statement is not part of the previous 'if' [-Werror,-Wmisleading-indentation]
m++;
^
sys/arm/allwinner/clkng/aw_clk_mipi.c:142:5: note: previous statement is here
if (best == *fout)
^
Move the increment operations into the for loop headers instead.
Ed Maste [Tue, 18 Feb 2020 16:37:48 +0000 (16:37 +0000)]
remove old perl entries from ObsoleteFiles.inc
Each entry in ObsoleteFiles.inc adds to the time `make delete-old` and
friends take to run. Perl was removed from the FreeBSD base system a
very long time ago (FreeBSD 5); source updates have not been supported
from that version for years.
Perl was a single component responsible for thousands of entries so
provides significant benefit with little effort/investigation required.
We could still use a more comprehensive cleanup to remove old entries.
Also add an UPDATING note (with wordsmithing by imp) indicating that
`make delete-old` is required along each step of a source upgrade from
an old, unsupported release.
Discussed with: imp
Sponsored by: The FreeBSD Foundation
Cy Schubert [Tue, 18 Feb 2020 11:27:08 +0000 (11:27 +0000)]
This commit makes significant changes to pam_login_access(8) to bring it
up to par with the Linux pam_access(8).
Like the Linux pam_access(8) our pam_login_access(8) is a service module
for pam(3) that allows a administrator to limit access from specified
remote hosts or terminals. Unlike the Linux pam_access, pam_login_access
is missing some features which are added by this commit:
Access file can now be specified. The default remains /etc/access.conf.
The syntax is consistent with Linux pam_access.
By default usernames are matched. If the username fails to match a match
against a group name is attempted. The new nodefgroup module option will
only match a username and no attempt to match a group name is made.
Group names must be specified in brackets, "()" when nodefgroup is
specified. Otherwise the old backward compatible behavior is used.
This is consistent with Linux pam_access.
A new field separator module option allows the replacement of the default
colon (:) with any other character. This facilitates potential future
specification of X displays. This is also consistent with Linux pam_access.
A new list separator module option to replace the default space/comma/tab
with another character. This too is consistent with Linux pam_access.
Linux pam_access options not implemented in this commit are the debug
and audit options. These will be implemented at a later date.
Cy Schubert [Tue, 18 Feb 2020 11:26:56 +0000 (11:26 +0000)]
When pam_login_access(5) fails to match a username it attempts to
match the primary group a user belongs to. This commit extends the
match to secondary groups a user belongs to as well, just as the Linux
pam_access(5) does.
Cy Schubert [Tue, 18 Feb 2020 11:26:52 +0000 (11:26 +0000)]
The words ALL, LOCAL, and EXCEPT have special meaning and are documented
as in the login.access(5) man page. However strcasecmp() is used to compare
for these special strings. Because of this User accounts and groups with
the corresponding lowercase names are misintrepreted to have special
whereas they should not.
This commit fixes this, conforming to the man page and to how the Linux
pam_access(8) handles these special words.
Cy Schubert [Tue, 18 Feb 2020 11:26:49 +0000 (11:26 +0000)]
As with ipf(8), give ippool(8) the ability to load IP pools from multiple
files. This allows for loading, during the same invocation of ippool, of
multiple sources of input using multiple tools to concurrently maintain the
files such as fail2ban, macro preprocessors, and manually.
Among the changes from before:
- Add support for extended colors on widechar version
- Enable ncurses extended functions
- Enable version 2 of the extended mouse support
- Enable SCREEN extensions
Modification that differs from upstream:
- _nc_delink_entries used to be exposed and was turn static,
turn it back as dynamic to not break abi
- Adapt our old termcap.c to modern ncurses
Hiroki Sato [Tue, 18 Feb 2020 01:50:44 +0000 (01:50 +0000)]
Use 0x5c for the scan code 0x7d.
Japanese keyboards traditionally use 0x5c for
both Japanese yen sign key and backslash key.
While a Japanese yen sign is depicted on the keytop,
most of Japanese expect that the scan code 0x7d gives
a backslash (0x5c), not a Japanese yen sign (0xa5).
This is because JIS X 0201 encoding (aka ISO/IEC 646-JA,
an extended version of ASCII which is very popular
in Japan) has Japanese yen sign at 0x5c and
no backslash. On the other hand, ISO/IEC 8859-1
has Japanese yen sign at 0xa5. This difference has
caused a confusion after Unicode became popular since
ISO/IEC 10646 adopted 8859-1 for the plane 0.
Chuck Silvers [Tue, 18 Feb 2020 00:02:20 +0000 (00:02 +0000)]
amd64: keep PTE bitmasks in sync with target pmap during pv reclaim
in reclaim_pv_chunk_domain(), when we switch to a new target pmap from which
we are trying to reclaim a pv chunk, always update the current PTE bitmasks
to match.
Dimitry Andric [Mon, 17 Feb 2020 20:24:21 +0000 (20:24 +0000)]
Merge r358042 from the clang1000-import branch:
Add casts and L suffixes to libc quad support, to work around various
-Werror warnings from clang 10.0.0, such as:
lib/libc/quad/fixdfdi.c:57:12: error: implicit conversion from 'long long' to 'double' changes value from 9223372036854775807 to 9223372036854775808 [-Werror,-Wimplicit-int-float-conversion]
if (x >= QUAD_MAX)
~~ ^~~~~~~~
/usr/obj/usr/src/powerpc.powerpc/tmp/usr/include/sys/limits.h:89:19: note: expanded from macro 'QUAD_MAX'
#define QUAD_MAX (__QUAD_MAX) /* max value for a quad_t */
^~~~~~~~~~
/usr/obj/usr/src/powerpc.powerpc/tmp/usr/include/machine/_limits.h:91:20: note: expanded from macro '__QUAD_MAX'
#define __QUAD_MAX __LLONG_MAX /* max value for a quad_t */
^~~~~~~~~~~
/usr/obj/usr/src/powerpc.powerpc/tmp/usr/include/machine/_limits.h:75:21: note: expanded from macro '__LLONG_MAX'
#define __LLONG_MAX 0x7fffffffffffffffLL /* max value for a long long */
^~~~~~~~~~~~~~~~~~~~
and many instances of:
lib/libc/quad/fixunsdfdi.c:73:17: error: shift count >= width of type [-Werror,-Wshift-count-overflow]
toppart = (x - ONE_HALF) / ONE;
^~~~~~~~
lib/libc/quad/fixunsdfdi.c:45:19: note: expanded from macro 'ONE_HALF'
#define ONE_HALF (ONE_FOURTH * 2.0)
^~~~~~~~~~
lib/libc/quad/fixunsdfdi.c:44:23: note: expanded from macro 'ONE_FOURTH'
#define ONE_FOURTH (1 << (LONG_BITS - 2))
^ ~~~~~~~~~~~~~~~
lib/libc/quad/fixunsdfdi.c:73:29: error: shift count >= width of type [-Werror,-Wshift-count-overflow]
toppart = (x - ONE_HALF) / ONE;
^~~
lib/libc/quad/fixunsdfdi.c:46:15: note: expanded from macro 'ONE'
#define ONE (ONE_FOURTH * 4.0)
^~~~~~~~~~
lib/libc/quad/fixunsdfdi.c:44:23: note: expanded from macro 'ONE_FOURTH'
#define ONE_FOURTH (1 << (LONG_BITS - 2))
^ ~~~~~~~~~~~~~~~
Dimitry Andric [Mon, 17 Feb 2020 20:14:59 +0000 (20:14 +0000)]
Add casts and L suffixes to libc quad support, to work around various
-Werror warnings from clang 10.0.0, such as:
lib/libc/quad/fixdfdi.c:57:12: error: implicit conversion from 'long long' to 'double' changes value from 9223372036854775807 to 9223372036854775808 [-Werror,-Wimplicit-int-float-conversion]
if (x >= QUAD_MAX)
~~ ^~~~~~~~
/usr/obj/usr/src/powerpc.powerpc/tmp/usr/include/sys/limits.h:89:19: note: expanded from macro 'QUAD_MAX'
#define QUAD_MAX (__QUAD_MAX) /* max value for a quad_t */
^~~~~~~~~~
/usr/obj/usr/src/powerpc.powerpc/tmp/usr/include/machine/_limits.h:91:20: note: expanded from macro '__QUAD_MAX'
#define __QUAD_MAX __LLONG_MAX /* max value for a quad_t */
^~~~~~~~~~~
/usr/obj/usr/src/powerpc.powerpc/tmp/usr/include/machine/_limits.h:75:21: note: expanded from macro '__LLONG_MAX'
#define __LLONG_MAX 0x7fffffffffffffffLL /* max value for a long long */
^~~~~~~~~~~~~~~~~~~~
and many instances of:
lib/libc/quad/fixunsdfdi.c:73:17: error: shift count >= width of type [-Werror,-Wshift-count-overflow]
toppart = (x - ONE_HALF) / ONE;
^~~~~~~~
lib/libc/quad/fixunsdfdi.c:45:19: note: expanded from macro 'ONE_HALF'
#define ONE_HALF (ONE_FOURTH * 2.0)
^~~~~~~~~~
lib/libc/quad/fixunsdfdi.c:44:23: note: expanded from macro 'ONE_FOURTH'
#define ONE_FOURTH (1 << (LONG_BITS - 2))
^ ~~~~~~~~~~~~~~~
lib/libc/quad/fixunsdfdi.c:73:29: error: shift count >= width of type [-Werror,-Wshift-count-overflow]
toppart = (x - ONE_HALF) / ONE;
^~~
lib/libc/quad/fixunsdfdi.c:46:15: note: expanded from macro 'ONE'
#define ONE (ONE_FOURTH * 4.0)
^~~~~~~~~~
lib/libc/quad/fixunsdfdi.c:44:23: note: expanded from macro 'ONE_FOURTH'
#define ONE_FOURTH (1 << (LONG_BITS - 2))
^ ~~~~~~~~~~~~~~~
Dimitry Andric [Mon, 17 Feb 2020 18:51:52 +0000 (18:51 +0000)]
Work around riscv buildworld failure where it cannot link the final
clang binary, with:
ld: error: undefined symbol: llvm::EnableABIBreakingChecks
>>> referenced by PlistDiagnostics.cpp
>>> PlistDiagnostics.o:(.sdata+0x0) in archive /usr/obj/usr/src/riscv.riscv64/lib/clang/libclang/libclang.a
[... many more like this ...]
Dimitry Andric [Mon, 17 Feb 2020 18:31:32 +0000 (18:31 +0000)]
Tentatively apply D23730:
Fix compile errors in altera_sdcard_io.c after r357647
Summary:
After rS357647, building universe results in compilation errors for
_.mips.BERI_DE4_SDROOT:
```
sys/dev/altera/sdcard/altera_sdcard_io.c: In function 'altera_sdcard_io_start_internal':
sys/dev/altera/sdcard/altera_sdcard_io.c:299:13: error: '*bp' is a pointer; did you mean to use '->'?
switch (*bp->bio_cmd) {
^~
->
sys/dev/altera/sdcard/altera_sdcard_io.c:301:38: error: '*bp' is a pointer; did you mean to use '->'?
altera_sdcard_write_cmd_arg(sc, *bp->bio_pblkno *
^~
->
sys/dev/altera/sdcard/altera_sdcard_io.c:307:42: error: '*bp' is a pointer; did you mean to use '->'?
altera_sdcard_write_rxtx_buffer(sc, *bp->bio_data,
^~
->
sys/dev/altera/sdcard/altera_sdcard_io.c:308:10: error: '*bp' is a pointer; did you mean to use '->'?
*bp->bio_bcount);
^~
->
sys/dev/altera/sdcard/altera_sdcard_io.c:309:38: error: '*bp' is a pointer; did you mean to use '->'?
altera_sdcard_write_cmd_arg(sc, *bp->bio_pblkno *
^~
->
sys/dev/altera/sdcard/altera_sdcard_io.c: In function 'altera_sdcard_io_start':
sys/dev/altera/sdcard/altera_sdcard_io.c:336:20: error: incompatible types when assigning to type 'struct bio *' from type 'struct bio'
sc->as_currentbio = *bp;
^
```
The first few are because `->` has a higher precedence than `*`, so the
expressions should use `(*bp)->foo` instead. I also renamed the
variable to `bpp` to make it clearer that it is a pointer-to-pointer.
The last one is because `sc->as_currentbio` is already a `struct bio *`,
there is no need to dereference `bp` there.
Last but not least, I would really suggest rewriting the
`altera_sdcard_io_start_internal()` function to just return success or
failure, so the caller can decide to set `bp` to NULL.
Andrew Turner [Mon, 17 Feb 2020 15:32:21 +0000 (15:32 +0000)]
Use EARLY_DRIVER_MODULE in the acpi bus.
We need this to use EARLY_DRIVER_MODULE in child drivers on arm64. This
should be a no-op on x86 as it has DRIVER_MODULE in the nexus driver making
all later drivers attach in the last pass.
Mark Johnston [Mon, 17 Feb 2020 15:11:07 +0000 (15:11 +0000)]
Remove swblk_t.
It was used only to store the bounds of each swap device. However,
since swblk_t is a signed 32-bit int and daddr_t is a signed 64-bit
int, swp_pager_isondev() may return an invalid result if swap devices
are repeatedly added and removed and sw_end for a device ends up
becoming a negative number.
Note that the removed comment about maximum swap size still applies.
Reviewed by: jeff, kib
Tested by: pho
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23666
Mark Johnston [Mon, 17 Feb 2020 15:10:41 +0000 (15:10 +0000)]
Fix a swap block allocation race.
putpages' allocation of swap blocks is done under the global sw_dev
lock. Previously it would drop that lock before inserting the allocated
blocks into the object's trie, creating a window in which swap blocks
are allocated but are not visible to swapoff. This can cause
swp_pager_strategy() to fail and panic the system.
Fix the problem bluntly, by allocating swap blocks under the object
lock.
Reviewed by: jeff, kib
Tested by: pho
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23665
Mark Johnston [Mon, 17 Feb 2020 15:09:40 +0000 (15:09 +0000)]
Fix object locking races in swapoff(2).
swap_pager_swapoff_object()'s goal is to allocate pages for all valid
swap blocks belonging to the object, for which there is no resident
page. If the page corresponding to a block is already resident and
valid, the block can simply be discarded.
The existing implementation tries to minimize the number of I/Os used.
For each cluster of swap blocks, it finds maximal runs of valid swap
blocks not resident in memory, and valid resident pages. During this
processing, the object lock may be dropped in several places: when
calling getpages, or when blocking on a busy page in
vm_page_grab_pages(). While the lock is dropped, another thread may
free swap blocks, causing getpages to page in stale data.
Fix the problem following a suggestion from Jeff: use getpages'
readahead capability to perform clustering rather than doing it
ourselves. The simplies the code a bit without reintroducing the old
behaviour of performing one I/O per page.
Reviewed by: jeff
Reported by: dhw, gallatin
Tested by: pho
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23664
Michael Tuexen [Mon, 17 Feb 2020 14:54:21 +0000 (14:54 +0000)]
Don't use uninitialised stack memory if the sysctl variable
net.inet.tcp.hostcache.enable is set to 0.
The bug resulted in using possibly a too small MSS value or wrong
initial retransmission timer settings. Possibly the value used
for ssthresh was also wrong.
pciconf: List names of all known extended PCIe capabilities.
Some ids are redundand because the list_ecaps() function decodes them
by explicit switch case. But listing them all makes it easier to not
miss ecaps, while not changing the functionality.
Initial submission by: Dmitry Luhtionov <dmitryluhtionov@gmail.com>
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Bjoern A. Zeeb [Mon, 17 Feb 2020 11:08:50 +0000 (11:08 +0000)]
Partially revert VNET change and expand VNET structure.
Revert parts of r353274 replacing vnet_state with a shutdown flag.
Not having the state flag for the current SI_SUB_* makes it harder to debug
kernel or module panics related to VNET bringup or teardown.
Not having the state also does not allow us to check for other dependency
levels between components, e.g. for moving interfaces.
Expand the VNET structure with the new boolean flag indicating that we are
doing a shutdown of a given vnet and update the vnet magic cookie for the
change.
Update libkvm to compile with a bool in the kernel struct.
Bump __FreeBSD_version for (external) module builds to more easily detect
the change.
Fix kernel panic while trying to read multicast stream.
When VIMAGE is enabled make sure the "m_pkthdr.rcvif" pointer is set
for all mbufs being input by the IGMP/MLD6 code. Else there will be a
NULL-pointer dereference in the netisr code when trying to set the
VNET based on the incoming mbuf. Add an assert to catch this when
queueing mbufs on a netisr to make debugging of similar cases easier.
Found by: Vladislav V. Prodan
PR: 244002
Reviewed by: bz@
MFC after: 1 week
Sponsored by: Mellanox Technologies