CyberLeo.Net >> Repos - FreeBSD/FreeBSD.git/log

iwm(4): Fix version string formatting.

MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation

Move ipf_pcksum6() to its rightful place, in ip_fil_freebsd.c. This
FreeBSD-only function should live in the O/S specific source file.

This essentially reverts r349929 Now that ipftest and ipfreplay are
disabled in FreeBSD 11-stable.

MFC after: 3 days

Save a little stack by removing a used once intermediate variable.

MFC after: 3 days

Remove redundant #ifdef'd function definitions.

MFC after: 3 days

Fix a logic bug when "mask" contains a ?: operator.

Newer versions of clang warn that '&' evaluates before '?:'.

Reviewed by: markj
MFC after: 3 days
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D22573

bus_dma_dmar_load_ident(9): load identity mapping into the map.

Requested, reviewed and tested by: mav
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D22559

uma: trash memory when ctor/dtor supplied too

On INVARIANTS kernels, UMA has a use-after-free detection mechanism.
This mechanism previously required that all of the ctor/dtor/uminit/fini
arguments to uma_zcreate() be NULL in order to function. Now, it only
requires that uminit and fini be NULL; now, the trash ctor and dtor will
be called in addition to any supplied ctor or dtor.

Also do a little refactoring for readability of the resulting logic.

This enables use-after-free detection for more zones, and will allow for
simplification of some callers that worked around the previous
restriction (see kern_mbuf.c).

Reviewed by: jeff, markj
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D20722

Plug two mbuf leaks during INIT-ACK handling.
One leak happens when there is not enough memory to allocate the
the resources for streams. The other leak happens if the are
unknown parameters in the received INIT-ACK chunk which require
reporting and the INIT-ACK requires sending an ABORT due to illegal
parameter combinations.
Hopefully this fixes
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=19083

MFC after: 1 week

Clean up and clarify meta commentary on TAA. Add a state to denote
that TSX doesn't exist on the CPU.

MFC after: 3 days
Sponsored by: Intel

Support kernels larger than EFI_STAGING_SIZE in loader.efi

With a very large kernel or module the staging area may be too small to
hold it. When this is the case try to allocate more space before failing
in the efi copyin/copyout/readin functions.

Reviewed by: imp, tsoome
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D22569

Fix typos.

MFC after: 2 weeks
Sponsored by: Klara, Inc

Add support for dummy ESP packets with next header field equal to
IPPROTO_NONE.

According to RFC4303 2.6 they should be silently dropped.

Submitted by: aurelien.cazuc.external_stormshield.eu
MFC after: 10 days
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D22557

Update leap-seconds to leap-seconds.3676924800.

Obtained from: ftp://ftp.nist.gov/pub/time/leap-seconds.3676924800
MFC after: 3 days

witness: sleepable rm locks are not sleepable in read mode

There are two classes of rm lock, one "sleepable" and one not. But even
a "sleepable" rm lock is only sleepable in write mode, and is
non-sleepable when taken in read mode.

Warn about sleepable rm locks in read mode as non-sleepable locks. Do
this by defining a new lock operation flag, LOP_NOSLEEP, to indicate
that a lock is non-sleepable despite what the LO_SLEEPABLE flag would
indicate, and defining a new witness lock instance flag, LI_SLEEPABLE,
to track the product of LO_SLEEPABLE and LOP_NOSLEEP on the lock
instance.

Reviewed by: markj
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D22527

cache: stop reusing .. entries on enter

It almost never happens in practice anyway. With this eliminated ->nc_vp
cannot change vnodes, removing an obstacle on the road to lockless
lookup.

cache: fix numcache accounting on entry

. entries are never created and .. can reuse existing entries,
meaning the early count bump is both spurious and leading to
overcounting in certain cases.

cache: hide "doingcache" behind DEBUG_CACHE

Use atomics in more cases for object references.  We now can completely
omit the object lock if we are above a certain threshold.  Hold only a
single vnode reference when the vnode object has any ref > 0.  This
allows us to only lock the object and vnode on 0-1 and 1-0 transitions.

Differential Revision: https://reviews.freebsd.org/D22452

Refactor uma_zalloc_arg(). It is a mess of gotos and code which doesn't
make sense after many partial refactors. Attempt to make a smaller cache
footprint for the fast path.

Reviewed by: markj, rlibby
Differential Revision: https://reviews.freebsd.org/D22470

The fdlibm hypot() implementations shouldn't potentially left-shift
negative numbers (invoking undefined behavior)

Summary:
Various paths through hypot(x, y) will multiply x and y by a power of
two, perform the calculation in a range where IEEE-754 provides greater
precision, then undo the multiplication to determine the true result.
Undoing that multiplication is implemented as t1*w, where t1=2**k.

2**k is often computed by taking the high word of 1.0, then adding k<<20
(for doubles or long doubles) or k<<23 (for floats) to it, then
overwriting that high word. But when k is negative this left-shifts a
negative value -- and that's undefined behavior in many editions of C
and C++.

This patch should fix all hypot implementations to compute 2**k without
triggering this particular bit of undefined behavior.

Test Plan: I've only very lightly tested out the hypot(double, double)
change, in SpiderMonkey's JavaScript engine, for consistency with prior
behavior. The other functions' changes have more or less only been
eyeballed. Careful examination appreciated! Do note, however, that an
error in any of these changes would most likely produce a value that is
incorrect by a factor of two, so any mistake would most likely be
glaring if invoked.

Submitted by: Jeff Walden <jwalden@mit.edu>
Obtained from: https://github.com/freebsd/freebsd/pull/414
Reviewed by: dim, lwhsu
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D22354

stop building arm LINT-V5 kernel

r354290 removed arm.arm from universe, but arm.arm kernels were still
found and built during the kernel stage. r354934 tagged armv5 kernel
configs as NO_UNIVERSE, but LINT-V5 remained. Stop building it as well.
Leave the clean rule in place for now so folks don't end up with a stale
LINT-V5.

Reviewed by: imp
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22560

Finish implementation of RK3299 clocks.
- implement of all but mmc clocks. MMC clocks will be added later by own commit.
- use 'link' clock type for external clocks.
- use macros for initialization of structure's named members.

MFC after: 3 weeks
Reviewed by: manu
Differential Revision: https://reviews.freebsd.org/D22441

Revert r355021. In my haste to grep for Giant, I missed that it was in
conditional ifdefs for this driver. We will consider removing those ifdefs
in the future.

Reported by: imp

Add some IDs of Intel Wildcat Point-LP.

MFC after: 1 week

Update Makefile.inc1 dtc comment

We use the BSDL dtc by default now (as long as we're using a C++11
compiler).

Fix panic when loading kernel modules before root file system is mounted.
Make sure the rootvnode is always NULL checked.

Differential Revision: https://reviews.freebsd.org/D22545
PR: 241639
MFC after: 1 week
Sponsored by: Mellanox Technologies

cxgbe(4): Allow the driver to specify multiple FECs that the firmware
should try in order to link up with the peer.

Various FEC variables within the driver can now have multiple bits set
instead of being powers of 2. 0 and -1 in the user knobs still mean no
FEC and auto (driver decides) respectively for backward compatibility,
but no-FEC and auto now have their own bits in the internal
representation. There is a new bit that can be set to request the FEC
recommended by the cable/transceiver module.

Add sysctls to display link related capabilities of the local side as
well as the link partner.

Note that all this needs a new firmware and the documentation for the
driver FEC knobs will be updated after that firmware is added to the
driver.

MFC after: 1 week
Sponsored by: Chelsio Communications

ping, ping6: Use setitimer(2) instead of obsolete alarm(3)

Submitted by: Ján Sučan <sucanjan@gmail.com>
Differential Revision: https://reviews.freebsd.org/D22103

cfi: #include <limits.h> for ULONG_MAX after r355101

Reported by: rlibby
MFC with: r355101

in_mcast.c: need if_addr_lock around inm_release_deferred

Apply a similar fix as for in6_mcast.c.

Reviewed by: hselasky
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D20740

in6_joingroup_locked: need if_addr_lock around in6m_disconnect_locked

It looks like the call that requires the lock was introduced in r337866.

Reviewed by: hselasky
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D20739

cfi: check for inter overflow in cfi_devioctl

Reported by: Pietro Oliva
Reviewed by: markj
MFC after: 3 days
Security: Possible OOB read in root-only ioctl
Sponsored by: The FreeBSD Foundation

Allow opt-out of automatic ntpd leapfile checking/fetching.

When a system has no internet connection, or when it is configured to obtain
ntpd leapfiles from some source other than the internet, or even when the
sysadmin has decided for some reason to customize ntp.conf to eliminate use
of the leapfile, the rc.d/ntpd script emits various error messages related
to the file.

This change allows setting the rc var ntp_db_leapfile to NONE to disable all
automatic processing related to that file in rc.d/ntpd.

Differential Revision: https://reviews.freebsd.org/D22461

In ffs_freefile(), use a separate variable to hold the inode number within
the cg rather than reusuing "ino" for this purpose. This reduces the diff
for an upcoming change that improves handling of I/O errors.

No functional change.

Reviewed by: mckusick
Approved by: mckusick (mentor)
Sponsored by: Netflix

procdesc: allow to collect status through wait(1) if process is traced

The debugger like truss(1) depends on the wait(2) syscall. This syscall
waits for ALL children. When it is waiting for ALL child's the children
created by process descriptors are not returned. This behavior was
introduced because we want to implement libraries which may pdfork(1).

The behavior of process descriptor brakes truss(1) because it will
not be able to collect the status of processes with process descriptors.

To address this problem the status is returned to parent when the
child is traced. While the process is traced the debugger is the new parent.
In case the original parent and debugger are the same process it means the
debugger explicitly used pdfork() to create the child. In that case the debugger
should be using kqueue()/pdwait() instead of wait().

Add test case to verify that. The test case was implemented by markj@.

Reviewed by: kib, markj
Discussed with: jhb
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D20362

update comment (about llvm target config) to match r355095

remove armv6 LLVM workaround introduced in r341812

r341812 enabled only arm target support in LLVM on arm and armv6,
because ld.bfd 2.17.50 lacked support for range extensions required for
linking such large binaries/libraries. r341812 indicated that the
workaround should be removed once the userland can be linked by lld.

r354289 switched armv6 to use lld by default, so remove the workaround
on armv6. The workaround remains in place for arm (v5), and will
presumably be removed when arm is retired.

Sponsored by: The FreeBSD Foundation

[PPC64] Enable phyp vty use as a GDB DBGPORT

This change makes it possible to use a POWER Hypervisor virtual
terminal device (phyp vty) as a GDB debug port.

Similar to the uart debug port, it has to be enabled by setting
the hw.uart_phyp.dbgport variable to the vty node of the device
tree.

Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D22205

Limit bus_dma_dmar_set_buswide() definition to kernel only.

The header is abused for inclusion into userspace, and on stable
branches neither device_t nor bool types are not defined when used
from userspace.

Sponsored by: The FreeBSD Foundation
X-MFC after: now

MFV r355071: libbsdxml (expat) 2.2.9.

MFC after: 2 weeks
Relnotes: yes

vm_object_collapse_scan_wait: drop locks before reacquiring

Regression from r352174. In the vm_page_rename() failure case we forgot
to unlock the vm object locks before sleeping and reacquiring them.

Reviewed by: jeff
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D22542

sysctl sysctls: wire old buf before output with sysctl lock

Several sysctl sysctls output to a user buffer while holding a
non-sleepable lock that protects the sysctl topology. They need to wire
the output buffer, or else they may try to sleep on a page fault.

Reviewed by: cem, markj
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D22528

Move anonymous object copying for fork into its own routine and so that we
can avoid locking non-anonymous objects.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D22472

Where 'current' is used to index over vm_map entries, use
'entry'. Where 'entry' is used to identify the starting point for
iteration, use 'first_entry'. These are the naming conventions used in
most of the vm_map.c code. Where VM_MAP_ENTRY_FOREACH can be used, do
so. Squeeze a few lines to fit in 80 columns. Where lines are being
modified for these reasons, look to remove style(9) violations.

Reviewed by: alc, markj
Differential Revision: https://reviews.freebsd.org/D22458

Report XLAT0 register for completeness.

Vendor import of expat 2.2.9

Allow kernel to compile without BPF.

r297816 added some bpf magic for VIMAGE unconditionally which no longer
allows kernels to compile without bpf (but with other networking).
Add the missing ifdef checks and allow a kernel to compile without bpf
again.

PR: 242136
Reported by: dave mischler.com
MFC after: 2 weeks

When doing ARM stack unwinding as part of stack_save(9), do not search
loaded modules (pass 0/false for the can_lock arg). Searching the unwind
info in modules acquires an exclusive sxlock, and the stack(9) functions can
be called in a context where unbounded sleeps are forbidden (such as from
the witness checkorder code).

Just ignoring the existence of modules in stack_save() is not ideal, so I'm
looking for a better solution, but this commit will make it possible to boot
an ARM kernel with WITNESS enabled again, until I get something better.

PR: 242200

Linux epoll: Allow passing of any negative timeout value to epoll_wait

Linux epoll allow passing of any negative timeout value to epoll_wait()
to cause unbound blocking

Reviewed by: emaste
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D22517

Linux epoll: Register events with zero event mask

Such an events are legal and should be interpreted as EPOLLERR | EPOLLHUP.
Register a disabled kqueue event in that case as we do not support EPOLLHUP yet.

Required by Linux Steam client.

PR: 240590
Reported by: Alex S <iwtcex@gmail.com>
Reviewed by: emaste
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D22516

Linux epoll: Check both read and write kqueue events existence in EPOLL_CTL_ADD

Linux epoll EPOLL_CTL_ADD op handler should always check registration
of both EVFILT_READ and EVFILT_WRITE kevents to deceide if supplied
file descriptor fd is already registered with epoll instance.

Reviewed by: emaste
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D22515

Linux epoll: Don't deregister file descriptor after EPOLLONESHOT is fired

Linux epoll does not remove descriptor after one-shot event has been triggered.
Set EV_DISPATCH kqueue flag rather then EV_ONESHOT to get the same behavior.

Required by Linux Steam client.

PR: 240590
Reported by: Alex S <iwtcex@gmail.com>
Reviewed by: emaste, imp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D22513

Ignore object->handle for OBJ_ANON objects.

Note that the change in vm_object_collapse() is arguably a correctness
fix. We must not collapse into content-identity carrying objects.

Reviewed by: jeff
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D22467

bsd-family-tree: correct macOS release date

Reported by: Herbert J. Skuhra <herbert@gojira.at>
Reported by: Maxim Konovalov <maxim.konovalov@gmail.com>

Record part of the owner struct thread pointer into busy_lock.

Record as much bits from curthread into busy_lock as fits.  Low bits
for struct thread * representation are zero due to struct and zone
alignment, and they leave space for busy flags (perhaps except
statically allocated thread0).  Upper bits are not very interesting
for assert, and in most practical situations recorded value should
allow to manually identify the owner with certainity.

Assert that unbusy is performed by the owner, except few places where
unbusy is done in io completion handler.  For this case, add
_unchecked variants of asserts and unbusy primitives.

Reviewed by: markj (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D22298

tmpfs: resolve deadlock between rename and unmount.

Top-level kern_renameat() increases the writecount on the mount point,
which, together with tmpfs unmount suspending the mount, already
ensures that unmount cannot proceed while rename unlocks and relocks
all operated vnodes.

Remove vfs_busy() call from tmpfs_rename() which was done while
holding a vnode lock, creating the deadlock. The only intent of the
busy operation seems to be the prevention of unmount, which is already
ensured.

Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week

amd64: assert that EARLY_COUNTER does not corrupt memory.

Reviewed by: imp
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D22514

cxgbe(4): sysctl to reset the temperature/voltage sensor.

# sysctl dev.<nexus>.<inst>.reset_sensor=1
# sysctl dev.t6nex.0.reset_sensor=1

MFC after: 1 week
Sponsored by: Chelsio Communications

Don't need giant for these drivers dev nodes.

Also, Giant isn't required to busy / unbusy a device, so drop that too while I'm
here. It's not done elsewhere in the tree and in the future will likely be
handled by a node lock to ensure consistency. Leave Giant in place for attach
and removing childing, as that's actually still needed, even if imperfect.

Remove stale comment about contigmalloc taking Giant and calling w/o the lock
held. Neither of these is still true.

Hoist locking giant back up into the ioctl handler

Move the locking back into the ioctl handler. This "fixes" the race where we hve
a hot plug event just after the dropping of Giant in pci_find_dbsf, assuming the
driver doesn't then call anything that drops and picks up Giant again... It's a
little safer since don't think it doesn't, but we lack the tools to know for
sure.

Fix leak in state machine for commands.

When we get a device departed message from the firmware, we send a TARGET_REST
to the device to let the firmware know we're done and as part of the recovery
process. This will abort all the commands. While the documentation says the IOC
is responsible for writing the completion message for all the commands pending
with an aborted status, we sometimes have queued commands for the target that
haven't been completed so are in the INQUEUE state. So, when we later complete
the pending CCB as aborted, these commands are freed and we hit the "state not
busy" panic.

Elsewhere where we dequeue commands, we move the state to BUSY from INQUEUE. Do
that here as well. In talking to Ken, Scott and Justin, they recommended a
series of tests to see if this is 100% safe. Those tests are ongoing, but
preliminary tests suggest this is safe as we see no duplicate completions when
we hit this case at work. We have a machine that has a dodgy powersupply which
usually doesn't apply power to a few drives, but sometimes does when the machine
is under heavy load so we get a rash of the connect / disconnect messages over
half an hour. Without this change, we'd see state not busy panic. With this
change, the drives just annoyingly come and go without affecting the rest of the
machine, but without a complete error injection test suite, it's hard to know if
all edge cases are now covered or not.

Discussed with: scottl, ken, gibbs

Fix gcc build

We have -Werror=strict-overflow so gcc complains:

In file included from /tmp/obj/workspace/src/amd64.amd64/tmp/usr/include/bitstring.h:36:0,
                 from /workspace/src/tests/sys/sys/bitstring_test.c:34:
/workspace/src/tests/sys/sys/bitstring_test.c: In function 'bit_ffc_at_test':
/workspace/src/sys/sys/bitstring.h:239:5: error: assuming signed overflow does not occur when assuming that (X + c) >= X is always true [-Werror=strict-overflow]
  if (_start >= _nbits) {
     ^

Disable assuming overflow of signed integer will never happen by specifying
-fno-strict-overflow

Sponsored by: The FreeBSD Foundation

pf: Add endline to all DPFPRINTF()

DPFPRINTF() doesn't automatically add an endline, so be consistent and
always add it.

bsd-family-tree: add several new entries

Reviewed by: imp, scottl
Differential Revision: https://reviews.freebsd.org/D22529

[PowerPC] Fix stack padding issue on ppc32.

Four bytes of padding are needed in the regular powerpc case to bring the
stack frame size up to a multiple of 16 bytes to meet ABI requirements.

Fixes odd hangs I was encountering during testing.

cxgbe(4): Update the firmware interface header.

This allows the driver to be updated for the next firmware without
waiting for it to be released.

MFC after: 2 weeks
Sponsored by: Chelsio Communications

rtld/powerpc: Fix _rtld_bind_start for powerpcspe

Summary:
We need to save off the full 64-bit register, not just the low 32 bits,
of all registers getting saved off in _rtld_bind_start. Additionally,
we need to save off the other SPE registers (SPEFSCR and accumulator),
so that their program state is not affected by the PLT resolver.

Reviewed by: bdragon
Differential Revision: https://reviews.freebsd.org/D22520

Add a warning about Giant Locked devices

Add a warning when a device registers with devfs and requests
D_NEEDGIANT. The warning says the device will go away before
13.0. This is needed to flush out the devices in the tree that are
still Giant locked. This warning, or some variant of it, should have
gone into the tree a long time ago...

The intention is to require all devices be converted to not use
automatic giant in this way, or remove any such devices that remain
that we don't have the hardware to test a conversion of.

kbd so far is the only device that can't leave the tree, yet needs
something sensible done to avoid the auto giant lock (even if it is
just doing the wrapping itself). There may be others added to this
list... Any discussions of this topic will take place on arch@.

We don't even need Giant here. It isn't protecting anything internal
to geom, and nothing we call requires it to be held. It's left over
from a time when the latter wasn't the case. Retire it.

Reviewed in concept: scottl@

Push Giant down one layer

The /dev/pci device doesn't need GIANT, per se. However, one routine
that it calls, pci_find_dbsf implicitly does. It walks a list that can
change when PCI scans a new bus. With hotplug, this means we could
have a race with that scanning. To prevent that, take out Giant around
scanning the list.

However, given that we have places in the tree that drop giant, if
held when we call into them, the whole use of Giant to protect newbus
may be less effective that we desire, so add a comment about why we're
talking it out, and we'll address the issue when we lock newbus with
something other than Giant.

[PowerPC] Fix typo in _ctx_start on ppc32

Theoretically, this was breaking the size calculation for the symbol.

Noticed when doing a readthrough.

Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D22525

[PowerPC] Use QEMU-compatible version of SPE accumulator save

Switch from "evaddumiaaw 0,0" to "evmwumiaa 0,0,0" when persisting the
accumulator. This has the benefit of actually being implemented in QEMU
as it is the form Linux uses for the same task.

Both instructions are functionally equivilent, as we are using them for
their side effect of copying the accumulator to GPRs rather than for the
actual math operation that they are performing.

Reviewed by: jhibbits

libclang_rt: enable on powerpc*

Summary:
Enable on powerpc64 and in lib/libclang_rt/Makefile change
MACHINE_CPUARCH to MACHINE_ARCH because on powerpc64
MACHINE_ARCH==MACHINE_CPUARCH so the 32-bit library overwrites 64-bit
library during installworld.

This patch doesn't enable any other libclang_rt libraries because they
need to be separately ported.

I have verified that games/julius (which fails on powerpc64 elfv2
without this change because of no libclang_rt profiling library) builds.

Test Plan: Ship it, test on powerpc and powerpcspe

Submitted by: pkubaj
Reviewed by: dim, jhibbits
Differential Revision: https://reviews.freebsd.org/D22425
MFC after: 1 month
X-MFC-With: r353358

The error messages that indicate bugs in 'area' bitstring functions
should identify accurately which function exhibited the bug.

Reviewed by: asomers
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D22519

bcm2835_sdhci: fix non-INVARIANTS build

sc is now only used to make sure we're not re-entering the data handling
path erroneously.

Reported by: Mark Millard

arm64/NOTES: add SOC_BRCM_BCM2838

This should have been done back when it was added, but it was not. It only
really adds an extra entry for memory mapping bits in bcm2835_vcbus.c, so
nothing too extensive yet.

bcm2835_dma: rip out the "use_dma" flag, make it non-optional

Now that it works for the Raspberry Pi 4, we can discontinue our workarounds
that were put in place to at least get a bootable kernel for other testing.

bcm2835_sdhci: "fix" DMA on the RPi 4

According to the documentation I have, DREQ pacing should be required here.
The DREQ# hasn't changed since the BCM2835. As soon as we attempt to setup
DREQ, DMA stalls and there's no clear reason why as of yet. Setting this
back to NONE seems to work just as well, though it's yet to be determined if
this is a sustainable model in high-throughput scenarios.

Add explicit SI_SUB_EPOCH

Add explicit SI_SUB_EPOCH, after SI_SUB_TASKQ and before SI_SUB_SMP
(EARLY_AP_STARTUP). Rename existing "SI_SUB_TASKQ + 1" to SI_SUB_EPOCH.

epoch(9) consumers cannot epoch_alloc() before SI_SUB_EPOCH:SI_ORDER_SECOND,
but likely should allocate before SI_SUB_SMP. Prior to this change,
consumers (well, epoch itself, and net/if.c) just open-coded the
SI_SUB_TASKQ + 1 order to match epoch.c, but this was fragile.

Reviewed by: mmacy
Differential Revision: https://reviews.freebsd.org/D22503

Do not retry long ready waits if previous gave nothing.

I have some disks reporting "Logical unit is in process of becoming ready"
for about half an hour before finally reporting failure. During that time
CAM waits for the readiness during ~2 minutes for each request, that makes
system boot take very long time.

This change reduces wait times for the following requests to ~1 second if
previously long wait for that device has timed out.

MFC after: 2 weeks
Sponsored by: iXsystems, Inc.

random(4): De-export random_sources list

The internal datastructures do not need to be visible outside of
random_harvestq, and this helps ensure they are not misused.

No functional change.

Approved by: csprng(delphij, markm)
Differential Revision: https://reviews.freebsd.org/D22485

Mark hpt27xx for removal in 13.0; all CAM drivers will be Giant-free by then.

Relnotes: yes

random(4): Use ordinary sysctl definitions

There's no need to dynamically populate them; the SYSCTL_ macros take care
of load/unload appropriately already (and random_harvestq is 'standard' and
cannot be unloaded anyway).

Approved by: csprng(delphij, markm)
Differential Revision: https://reviews.freebsd.org/D22484

dhclient: support option 114, default-url ascii

This will enable further automation of HTTP UEFI boot loader support by
providing a specific option for providing the boot URL to FreeBSD.

Documented in:

https://www.iana.org/assignments/bootp-dhcp-parameters/bootp-dhcp-parameters.xhtml
https://kb.isc.org/docs/isc-dhcp-44-manual-pages-dhcp-options
https://tools.ietf.org/html/rfc3679

Approved by: emaste
MFC after: 2 weeks
Sponsored by: SkunkWerks, GmbH
Differential Revision: https://reviews.freebsd.org/D22475

random(4): Abstract loader entropy injection

Break random_harvestq_prime up into some logical subroutines. The goal
is that it becomes easier to add other early entropy sources.

While here, drop pre-12.0 compatibility logic. loader default configuration
should preload the file as expeced since 12.0.

Approved by: csprng(delphij, markm)
Differential Revision: https://reviews.freebsd.org/D22482

random(4): Remove unused definitions

Approved by: csprng(gordon, markm)
Differential Revision: https://reviews.freebsd.org/D22481

bcm2835_vcbus: add the *other* rpi4 compat string

The DTS I used initially had brcm,bcm2838; the new one uses brcm,bcm2711.
Add that one as well.

MMCCAM: defer release of ccb until we're done with it

If we've found a device, we attempt to call xpt_action() on a ccb that's
already been released. Simply defer release until after we're done with it.

Reviewed by: imp, scottl
MFC after: 1 week

random/ivy: Provide mechanism to read independent seed values from rdrand

On x86 platforms with the intrinsic, rdrand is a deterministic bit generator
(AES-CTR) seeded from an entropic source.  On x86 platforms with rdseed, it
is something closer to the upstream entropic source.  (There is more nuance;
a block diagram is provided in [1].)

On devices with rdrand and without rdseed, there is no good intrinsic for
acecssing the good entropic soure directly.  However, the DRBG is guaranteed
to reseed every 8 kB on these platforms.  As a conservative option, on such
hardware we can read an extra 7.99kB samples every time we want a sample
from an independent seed.

As one can imagine, this drastically slows the effective read rate of
RDRAND (a factor of 1024 on amd64 and 2048 on ia32).  Microbenchmarks on AMD
Zen (has RDSEED) show an RDRAND rate of 25 MB/s and Intel Haswell (no
RDSEED) show RDRAND of 170 MB/s.  This would reduce the read rate on Haswell
to ~170 kB/s (at 100% CPU).  random(4)'s harvestq thread periodically
"feeds" from pure sources in amounts of 128-1024 bytes.  On Haswell,
enabling this feature increases the CPU time of RDRAND in each "feed" from
approximately 0.7-6 µs to 0.7-6 ms.

Because there is some performance penalty to this more conservative option,
a knob is provided to enable the change.  The change does not affect
platforms with RDSEED.

[1]: https://software.intel.com/en-us/articles/intel-digital-random-number-generator-drng-software-implementation-guide#inpage-nav-4-2

Approved by: csprng(delphij, markm)
Differential Revision: https://reviews.freebsd.org/D22455

Remove xpt_lock mutex.

CAM does not require SIM locks for years, and obviously does not require
it for completely virtual XPT SIM.

MFC after: 2 weeks

Schedule the trm(4) driver for removal. It relies on Giant and thus has
required compat shims in CAM for 12 years.

Relnotes: yes

Revert r354909: Make the warning for deprecated NO_ variables an error.

An unexpectidly large number of ports define NO_MAN (and sometimes the
long-dead NOMAN). I'll fix ports and then re-commit.

Make CAM use root_mount_hold_token() to delay boot.

Before this change CAM used config_intrhook_establish() for this purpose,
but that approach does not allow to delay it again after releasing once.

USB stack uses root_mount_hold() to delay boot until bus scan is complete.
But once it is, CAM had no time to scan SCSI bus, registered by umass(4),
if it already done other scans and called config_intrhook_disestablish().
The new approach makes it work smooth, assuming the USB device is found
during the initial bus scan. Devices appearing on USB bus later may still
require setting kern.cam.boot_delay, but hopefully those are minority.

MFC after: 2 weeks
Sponsored by: iXsystems, Inc.

Remove NEEDGIANT from the scsi_sg /dev node. It likely has not been
needed for many years.

Reported by: imp

Add and document options to allow rpc.lockd and rpc.statd to run in the
foreground.

This allows a separate process to monitor when and how those programs exit.
That process can then restart them if needed.

Submitted by: Alex Burlyga
Reviewed by: bcr, imp
MFC after: 1 week
Sponsored by: Panasas
Differential Revision: https://reviews.freebsd.org/D22474

Simplify vm_pageout_init_domain() and add a "big picture" comment.

Stop subtracting 1024/200 from vmd_page_count/200. I cannot see how
such precise accounting can make a difference on modern systems.

Add some explanation of what the page daemon does and how it handles
memory shortages.

Reviewed by: dougm
Discussed with: jeff, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22396

Reclaim memory from UMA if the page daemon is struggling.

Use the UMA reclaim thread to asynchronously drain all caches if
there is a severe shortage in a domain.  Otherwise we only trigger UMA
reclamation every 10s even when the system has completely run out of
memory.

Stop entirely draining the caches when one domain falls below its min
threshold.  In some workloads it is normal for one NUMA domain to end
up being nearly depleted by kernel memory allocations, for example for
the ZFS ARC.  The domainset iterators skip domains below the
vmd_min_free theshold on the first iteration, so we should allow that
mechanism to limit further depletion of the domain's free pages before
taking the extreme step of calling uma_reclaim(UMA_RECLAIM_DRAIN_CPU).

Discussed with: jeff
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22395

Update the checks in vm_page_zone_import().

- Remove the cnt == 1 check.  UMA passes cnt == 1 when it has disabled
  per-CPU caching.  In this case we might as well just allocate a single
  page and return it to the caller, since the caller is going to do
  exactly that anyway if the UMA cache allocation attempt fails.
- Don't replenish caches if the domain is severely short on free pages.
  With large buckets we may otherwise quickly exacerbate a situation
  where the page daemon is failing to keep up.
- Don't replenish caches if the calling thread belongs to the page
  daemon, which should avoid creating extra memory pressure when it is
  trying to free memory.  Virtually all such allocations while occur in
  the context of laundering, where the laundry thread must allocate
  slabs for various swap and I/O-related UMA zones.

Reviewed by: kib
Discussed with: alc, jeff
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22394

Revise the page cache size policy.

In r353734 the use of the page caches was limited to systems with a
relatively large amount of RAM per CPU.  This was to mitigate some
issues reported with the system not able to keep up with memory pressure
in cases where it had been able to do so prior to the addition of the
direct free pool cache.  This change re-enables those caches.

The change modifies uma_zone_set_maxcache(), which was introduced
specifically for the page cache zones.  Rather than using it to limit
only the full bucket cache, have it also set uz_count_max to provide an
upper bound on the per-CPU cache size that is consistent with the number
of items requested.  Remove its return value since it has no use.

Enable the page cache zones unconditionally, and limit them to 0.1% of
the domain's pages.  The limit can be overridden by the
vm.pgcache_zone_max tunable as before.

Change the item size parameter passed to uma_zcache_create() to the
correct size, and stop setting UMA_ZONE_MAXBUCKET.  This allows the page
cache buckets to be adaptively sized, like the rest of UMA's caches.
This also causes the initial bucket size to be small, so only systems
which benefit from large caches will get them.

Reviewed by: gallatin, jeff
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22393

Fix locking in vm_reserv_reclaim_contig().

We were not properly handling the case where the trylock of the
reservaton fails, in which case we could leak reservation lock.

Introduce a marker reservation to implement precise scanning in
vm_reserv_reclaim_contig().  Before, a race could result in early
termination of the scan in rare situations.  Use the marker's lock to
serialize scans of the partpop queue so that a global marker structure
can be used.  Modify vm_reserv_reclaim_inactive() to handle the presence
of a marker while minimizing the hold time of domain-global locks.

Reviewed by: alc, jeff, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22392