Add support for INET6 addresses to the kernel code that dumps open/lock state.
PR#223036 reported that INET6 callback addresses were not printed by
nfsdumpstate(8). This kernel patch adds INET6 addresses to the dump structure,
so that nfsdumpstate(8) can print them out, post-r346190.
The patch also includes the addition of #ifdef INET, INET6 as requested
by bz@.
Followup to -r344552 in which fsck_ffs checks for a size past the
last allocated block of the file and if that is found, shortens the
file to reference the last allocated block thus avoiding having it
reference a hole at its end.
This update corrects an error where fsck_ffs miscalculated the last
logical block of the file when the file contained a large hole.
When sending IPv4 packets on a SOCK_RAW socket using the IP_HDRINCL option,
ensure that the ip_hl field is valid. Furthermore, ensure that the complete
IPv4 header is contained in the first mbuf. Finally, move the length checks
before relying on them when accessing fields of the IPv4 header.
Reported by: jtl@
Reviewed by: jtl@
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D19181
Move mpr/mps drivers from per-arch NOTES files into the MI notes
file. They are in more arches they they aren't. Add appropriate
nodevice directives in powerpc and arm.
cem [Sat, 13 Apr 2019 04:42:17 +0000 (04:42 +0000)]
sort(1): Memoize MD5 computation to reduce repeated computation
Experimentally, reduces sort -R time of a 148160 line corpus from about
3.15s to about 0.93s on this particular system.
There's probably room for improvement using some digest other than md5, but
I don't want to look at sort(1) anymore. Some discussion of other possible
improvements in the Test Plan section of the Differential.
Summary:
Initial NUMA support:
- associate CPU with domain
- associate memory ranges with domain
- identify domain for devices
- limit device interrupt binding to appropriate domain
- Additionally fixes a bug in the setting of Maxmem which led to
only memory attached to the first socket being enabled for DMA
A pmap variant can opt in to numa support by by calling `numa_mem_regions`
at the end of pmap_bootstrap - registering the corresponding ranges with the
VM.
This yields a ~20% improvement in build times of llvm on dual socket POWER9
over non-NUMA.
powerpc/dtrace: Fix dtrace powerpc asm, and simplify stack walking
Fix some execution bugs in the dtrace powerpc asm. addme pulls in the carry
flag which we don't want, and the result wasn't recorded anyways, so the
following beq to check for exit condition wasn't checking the right
condition.
Simplify the stack walking in dtrace_isa.c, so there's only a single walker
that handles both pc and sp. This should make it easier to follow, and any
bugfix that may be needed for walking only needs to be made in one place
instead of two now.
Forgot to add the changes for DELAY(), which lowers priority during the
delay period. Also, mark the timebase read as volatile so newer GCC does
not optimize it away, as it reportedly does currently.
powerpc: Adjust priority NOPs, and make them functions
PowerISA 2.07 and PowerISA 3.0 both specify special NOPs for priority
adjustments, with "medium" priority being normal. We had been setting
medium-low as our normal priority. Rather than guess each time as to what
we want and the right NOP, wrap them in inline functions, and replace the
occurrances of the NOPs with the functions. Also, make DELAY() drop to very
low priority while waiting, so we don't burn CPU.
Coupled with r346143, this shaves off a modest 5-8% on buildworld times with
-j72. There may be more room for improvement with judicious use of these
NOPs.
powerpc64: Increase the nap level on power9 idling
The POWER9 documentation specifies that levels 0-3 are the 'lightest' sleep
level, meaning lowest latency and with no state loss. However, state 3 is
not implemented, and is instead reserved for future chips. This now
properly configures the PSSCR, specifying state 2 as the lowest level to
enter, but request level 0 for quickest sleep level. If the OCC determines
that the CPU can enter states 1 or 2 it will trigger the transition to those
states on demand.
It was pointed out that manually loading a .dtb to be used rather than
relying on platform-specific method for loading .dtb will result in overlays
not being applied. This was true because overlay loading was hacked into
fdt_platform_load_dtb, rather than done in a way more independent from how
the .dtb is loaded.
Instead, push overlay loading (for now) out into an
fdt_platform_load_overlays. This method easily allows ubldr to pull in any
fdt_overlays specified in the ub env, and omits overlay-checking on
platforms where they're not tested and/or not desired (e.g. powerpc). If we
eventually stop caring about fdt_overlays from ubenv (if we ever cared),
this method should get chopped out in favor of just calling
fdt_load_dtb_overlays() directly.
Reported by: Manuel Stühn (freebsdnewbie freenet de)
Cirrus-CI: pass OVMF env var to test script for upcoming changes
In review D19876 ian@ has some proposed improvements to the
tools/boot/ci-qemu-test.sh script. Start specifying the location of
OVMF.fd fetched by the Cirrus-CI build in advance of those changes.
Reinitialize multicast source filter structures after invalidation.
When leaving a multicast group, a hole may be created in the inpcb's
source filter and group membership arrays. To remove the hole, the
succeeding array elements are copied over by one entry. The multicast
code expects that a newly allocated array element is initialized, but
the code which shifts a tail of the array was leaving stale data
in the final entry. Fix this by explicitly reinitializing the last
entry following such a copy.
Reported by: syzbot+f8c3c564ee21d650475e@syzkaller.appspotmail.com
Reviewed by: ae
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19872
cem [Thu, 11 Apr 2019 05:08:49 +0000 (05:08 +0000)]
sort(1): Simplify and bound random seeding
Bound input file processing length to avoid the issue reported in [1]. For
simplicity, only allow regular file and character device inputs. For
character devices, only allow /dev/random (and /dev/urandom symblink).
32 bytes of random is perfectly sufficient to seed MD5; we don't need any
more. Users that want to use large files as seeds are encouraged to truncate
those files down to an appropriate input file via tools like sha256(1).
(This does not change the sort algorithm of sort -R.)
Implement CMD53 block mode support for SDHCI and AllWinner-based boards
If a custom block size requested, use it, otherwise revert to the previous logic
of using just a data size if it's less than MMC_BLOCK_SIZE, and MMC_BLOCK_SIZE otherwise.
Add new fields to mmc_data in preparation to SDIO CMD53 block mode support
SDIO command CMD53 (IO_RW_EXTENDED) allows data transfers using blocks of 1-2048 bytes,
with a maximum of 511 blocks per request.
Extend mmc_data structure to properly describe such requests,
and initialize the new fields in kernel and userland consumers.
No actual driver changes happen yet, these will follow in the separate changes.
manu [Wed, 10 Apr 2019 19:18:05 +0000 (19:18 +0000)]
arm: dts: Remove some old DTS
RPI is using the firmware provided DTS since 12.0
Pandaboard works with the Linux DTS
RK* Exynos* and Meson*/Odroid* don't even work with current
source code, if someone wants to make them work again they
better use the Linux DTS.
Fix a small bug in the tcp_log_id where the bucket
was unlocked and yet the bucket-unlock flag was not
changed to false. This can cause a panic if INVARIANTS
is on and we go through the right path (though rare).
This fixes the correct bug :)
libbe(3): use libzfs name validation for datasets/snapshot names
Our home-rolled solution didn't quite capture all of the details, and we
didn't actually validate snapshot names at all. zfs_name_valid captures the
important details, but it doesn't necessarily expose the errors that we're
wanting to see in the be_validate_* functions. Validating lengths
independently, then the names, should make this a non-issue.
Refine r330113 to honor the ProducerConsumer flag most of the time.
While it is true that the ACPI spec says that the flag is only valid
on Extended Address Space Descriptors, examples of other descriptors
in the spec use the ProducerConsumer flag explicitly, and real
hardware uses it as well. In fact, even in the ASL of the Thunder X2
for which r330113 was a workaround, some devices use this flag on
non-Extended Address Space Descriptors correctly. Instead, only
ignore the flag for resources associated with the UART devices on the
Thunder X2 using the "ARMH0011" HID to identify these devices.
This should fix regressions from ignoring this flag in other contexts
such as Hyper-V.
Fix dirty buf exhaustion easily triggered with msdosfs.
If truncate(2) is performed on msdosfs file, which extends the file by
system-depended large amount, fs creates corresponding amount of dirty
delayed-write buffers, which can consume all buffers. Such buffers
cannot be flushed by the bufdaemon because the ftruncate() thread owns
the vnode lock. So the system runs out of free buffers, and even
truncate() thread starves, which means deadlock because it owns the
vnode lock.
Fix this by doing vnode fsync in extendfile() when low memory or low
buffers condition detected, which flushes all dirty buffers belonging
to the file being extended.
Note that the more usual fallback to bawrite() does not work
acceptable in this situation, because it would only allow one buffer
to be recycled. Other filesystems, most important UFS, do not allow
userspace to create arbitrary amount of dirty delayed-write buffers
without feedback, so bawrite() is good enough for them.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Don't pre-reserve resources for CPU devices when they are set.
CPUs can use shared (RF_SHAREABLE) resources for the I/O port used for
entering and exiting C states. If this I/O port is included in an ACPI
system resource device, then this happens to still work, but if the port
wasn't part of a system resource device, only the first CPU could allocate
the I/O port and use C states since resource_list_reserve() was always
allocating the resource from nexus0 without RF_SHAREABLE. By avoiding
the reservation, the flags from the bus_alloc_resource() in the CPU driver
(which include RF_SHAREABLE) are honored.
pci_cfgreg.c: Use io port config access for early boot time.
Some early PCIe chipsets are explicitly listed in the white-list to
enable use of the MMIO config space accesses, perhaps because ACPI
tables were not reliable source of the base MCFG address at that time.
For that chipsets, MCFG base was read from the known chipset MCFGbase
config register.
During very early stage of boot, when access to the PCI config space
is performed (see e.g. pci_early_quirks.c), we cannot map 255MB of
registers because the method used with pre-boot pmap overflows initial
kernel page tables.
Move fallback to read MCFGbase to the attachment method of the
x86/legacy device, which removes code duplication, and results in the
use of io accesses until MCFG is parsed or legacy attach called.
For amd64, pre-initialize cfgmech with CFGMECH_1, right now we
dynamically assign CFGMECH_1 to it anyway, and remove checks for
CFGMECH_NONE.
There is a mention in the Intel documentation for corresponding
chipsets that OS must use either io port or MMIO access method, but we
already break this rule by reading MCFGbase register, so one more
access seems to be innocent.
1. Not all kernels have netmap(4) support. Check for netmap(4) support before
attempting to run the tests via the `PLAIN_REQUIRE_KERNEL_MODULE(..)` macro.
2. Libraries shouldn't be added to LDFLAGS; they should be added to LIBADD
instead. This allows the build system to evaluate dependencies for sanity.
3. Sort some of the Makefile variables per bsd.README.
1., in particular, will resolve failures when running this testcase on kernels
lacking netmap(4) support, e.g., the i386 GENERIC kernels on ^/stable/11 and
^/stable/12.
Final cleanup routines shouldn't be called from testcases; it should be called
from the testcase cleanup routine.
Furthermore, `geli_test_cleanup` should take care of cleaning up geli providers
and the memory disks used for the geli providers. `geli_test_cleanup` will always
be executed whereas the equivalent logic in `geli_test_body`, may not have been
executed if the test failed prior to the logic being run.
Prior to this change, the test case was trying to clean up `$md` twice: once in
at the end of the test case body function, and the other in the cleanup function.
The cleanup function logic was failing because there wasn't anything to clean up
in the cleanup function and the errors weren't being ignored.
In some cases like NanoPI R1, its second USB ethernet
RTL8152 (chip version URE_CHIP_VER_4C10) doesn't
have hardwired MAC address, in other words, it is all zeros.
This commit fixes it by setting random MAC address
when MAC address is all zeros.
$() is more modern and also nests. Convert the mix of styles to using
only the former (although the latter was more common). It's the more
dominant style in other shell scripts these days as well.
In the unlinkat syscall, the operation is performed on the directory
descriptor, not the file descriptor. The file descriptor is used only for
verification so do not expect any additional capabilities on it.
Reported by: antoine
Tested by: antoine
Discussed with: kib, emaste, bapt
Sponsored by: Fudo Security
Fix URE_WDT6_SET_MODE value in the register definition.
Both linux and u-boot sources for RTL8152 driver has this value.
RTL8152 USB ethernet is used in NanoPI R1 board as second ethernet.
This fixes for me RTL8152 USB ethernet not detected problem after
reboot on NanoPI R1 board.
Both NetBSD and OpenBSD have a wrong value so far.
Fix copying of MEMBUFs to MEMBUFs. This case was implemented by using
the same code as the VIDBUF8 case, so it only worked for depths <= 8.
The 2 directions for copying between VIDBUFs and MEMBUFs worked by using
a Read/Write organization which makes the destination a VIDBUF so the
MEMBUF case was not reached, and the VIDBUF cases have already been fixed.
Fix this by removing "optimizations" for the VIDBUF8 case so that the
MEMBUF case can fall through to the general (non-segmented) case. The
optimizations were to duplicate code for the VIDBUF8 case so as to
avoid 2 multiplications by 1 at runtime. This optimization is not useful
since the multiplications are not in the inner loop.
Remove the same "optimization" for the VIDBUF8S case. It was even less
useful there since it duplicated more to do relatively less.
Fix restoring the geometry when recovering from an error. Just restore the
previous geometry, and don't do extra work to calculate the default geometry
so as to reset to that.
All variable assignments that start in column 1 have to be on a single
line for amd to build due to as weird dependency there (most likely it
can be fixed to use the new VARS_ONLY feature, but it isn't
today). usr.sbin/amd/include/Makefile calls
usr.sbin/amd/include/newvers.sh which does:
eval `LC_ALL=C egrep '^[A-Z]+=' $1 | grep -v COPYRIGHT`
which is where that requirement comes from. It handles COPYRIGHT since
that's an exception. Rather than add additional exceptions, cope with
the long line in newvers.sh instead. Note: it no longer needs to
filter COPYRIGHT because the assignment doesn't start in column 1
anymore.
I had done a universe when I had an earlier version of r346018 that
had it as one line. When I changed it to multi-line as suggested in
the review, I only built kernels on a couple of architectures to make
sure it didn't break anything.
In certain scenarios, it is possible for PCPU data to be
accessed before it has been initialized (e.g. during printf
if the kernel was built with the TSLOG option).
Initialize the PCPU pointer for hart 0 at the beginning of
initriscv() rather than near the end.
Such processes will be reparented to the reaper when the current
parent is done with them (i.e., ptrace detached), so p_oppid must be
updated accordingly.
Add a regression test to exercise this code path. Previously it
would not be possible to reap an orphan with a stale oppid.
Reviewed by: kib, mjg
Tested by: pho
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19825
loader: mod_loadkld() error: we previously assumed 'last_file' could be null
The last_file variable is used to reset the loadaddr variable back to original
value; however, it is possible the last_file is NULL, so we can not blindly
trust it. But then again, we can just save the original loadaddr and use
the saved value for recovery.