ian [Tue, 24 Apr 2018 17:13:31 +0000 (17:13 +0000)]
MFC r332518, r332527
r332518:
Add an option to daemon(8) to specify a delay between restarts of a
supervised program. The existing -r option has a hard-coded delay of one
second. This change adds a -R option which takes a delay in seconds. This
can be used to prevent log spam and rapid restarts, similar to init(8)'s
behavior of adding a delay between rapid restarts when it's supervising a
program.
r332527:
Fix cut-and-pasted line to have the right option letter.
r331868:
Add opt_platform.h for several modules that have #ifdef FDT in the source.
Submitted by: Andre Albsmeier <Andre.Albsmeier@siemens.com>
r332046:
Add a missing MODULE_DEPEND().
r332194:
Add support for writing/changing spi device ivars. The SPI mode (polarity
and phase) and the maximum bus speed can be changed. The chip select
number cannot be changed, because the device instances which are children
of spibus are inherently associated with the chip select number they were
instantiated for.
r332195:
A couple minor improvements to spibus.c...
- Change the description string to "SPI bus" (was "spibus bus").
- This is the default driver for a SPI bus, not a generic implementation,
so return the probe value that indicates such.
- Use device_delete_children() at detach time, instead of a local loop
to enumerate the children and detach each one individually.
r332196:
Return BUS_PROBE_DEFAULT, not zero, because this is not the one driver
implementation that must be used, it's just the base system default driver.
Also add a comment noting that we're being more liberal about the bus
frequency property than the dts binding documents require.
r332198:
Arrange the list of generated sources as 1-per-line alphbetical, and add
the files required when building for FDT-based systems.
r332219:
Remove the existing identify() hack to force-add a spigen device on
FDT-based systems, and instead add proper FDT probe code. Because this
driver is freebsd-specific and just provides generic userland access to run
spibus transactions, there is no bindings document to mandate a compatible
string, so just arbitrarily use "freebsd,spigen".
r332231:
Generate a spibus_set_[ivarname]() convenience function for each ivar,
now that they can be set.
r332233:
Add an ioctl to get/set the SPI transfer mode. Also, make the bus clock
frequency ioctl actually set the corresponding ivar instead of just storing
the value locally in the softc (and then not using it for anything). Also,
return the correct error code if the ioctl cmd is not recognized.
r332240:
Add the ioctl definitions for spigen get/set spi mode. Should have been
part of r332233.
r332258:
Don't check for impossible NULL return from malloc(..., M_WAITOK).
r332259:
Cast the data pointer to the correct type for the data being accessed (as
opposed to one that accidentally worked on the one arch I test-compiled for
on my first try).
Reported by: np@, O. Hartmann <ohartmann@walstatt.org>
Pointy hat: ian@
r332261:
Add a manpage for spigen(4).
r332292:
Allow hinted attachment on FDT-based systems. Instead of returning ENXIO
when the FDT data doesn't enable the device instance, return
BUS_PROBE_NOWILDCARD, the same as for non-FDT systems.
MFC r332789: pwd_mkdb: warn that legacy support is deprecated (if specified)
r283981 switched pwd_mkdb to emit only v4 database entries by default,
and introduced a -l (legacy) option emit v3 entries in addition. The
commit message claims that legacy support will be removed in 12.0, so
emit a warning now if it is used.
MFC r332875: pwd_mkdb: add deprecation notice in manpage too
Followon to r332789; as reported on the -current and -stable lists and
in review D15144 the -l option will be removed before FreeBSD 12.0.
Relnotes: Yes
Sponsored by: The FreeBSD Foundation
r332385:
hyperv/storvsc: storvsc_io_done(): do not use CAM_SEL_TIMEOUT
CAM_SEL_TIMEOUT was introduced in
https://reviews.freebsd.org/D7521 (r304251), which claimed:
"VM shall response to CAM layer with CAM_SEL_TIMEOUT to filter those
invalid LUNs. Never use CAM_DEV_NOT_THERE which will block LUN scan
for LUN number higher than 7."
But it turns out this is not correct:
I think what really filters the invalid LUNs in r304251 is that:
before r304251, we could set the CAM_REQ_CMP without checking
vm_srb->srb_status at all:
ccb->ccb_h.status |= CAM_REQ_CMP.
r304251 checks vm_srb->srb_status and sets ccb->ccb_h.status properly,
so the invalid LUNs are filtered.
I changed my code version to r304251 but replaced the CAM_SEL_TIMEOUT
with CAM_DEV_NOT_THERE, and I confirmed the invalid LUNs can also be
filtered, and I successfully hot-added and hot-removed 8 disks to/from
the VM without any issue.
CAM_SEL_TIMEOUT has an unwanted side effect -- see cam_periph_error():
For a selection timeout, we consider all of the LUNs on
the target to be gone. If the status is CAM_DEV_NOT_THERE,
then we only get rid of the device(s) specified by the
path in the original CCB.
This means: for a VM with a valid LUN on 3:0:0:0, when the VM inquires
3:0:0:1 and the host reports 3:0:0:1 doesn't exist and storvsc returns
CAM_SEL_TIMEOUT to the CAM layer, CAM will detech 3:0:0:0 as well: this
is the bug I reported recently:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=226583
MFC r332674:
Increase the msdosfs partition size on arm SoC images where the
current size may not be sufficiently large for development and/or
testing.
r288291 added a call to limits(1), which isn't available before partitions
are mounted. This broke the ddb rc script, which does not provide its own
start_cmd.
Alleviate the situation here by providing a start_cmd. We still have other
problems with diskless setups that need to be considered, but this is a
start.
MFC r319216:
Fix an unnecessary/incorrect check in the PKTOPT_EXTHDRCPY macro.
This macro allocates memory and, if malloc does not return NULL, copies
data into the new memory. However, it doesn't just check whether malloc
returns NULL. It also checks whether we called malloc with M_NOWAIT. That
is not necessary.
While it may be that malloc() will only return NULL when the M_NOWAIT flag
is set, we don't need to check for this when checking malloc's return
value. Further, in this case, the check was not completely accurate,
because it checked for flags == M_NOWAIT, rather than treating it as a bit
field and checking for (flags & M_NOWAIT).
MFC r319215:
Fix two places in the ICMP6 code where we could dereference a NULL pointer
in the icmp6_input() function.
When processing an ICMP6_ECHO_REQUEST, if IP6_EXTHDR_GET fails, it will
set nicmp6 and n to NULL. Therefore, we should condition our modification
to nicmp6 on n being not NULL.
And, when processing an ICMP6_WRUREQUEST in the (mode != FQDN) case, if
m_dup_pkthdr() fails, the code will set n to NULL. However, the very next
line dereferences n. Therefore, when m_dup_pkthdr() fails, we should
discontinue further processing and follow the same path as when m_gethdr()
fails.
Reported by: clang static analyzer
Sponsored by: Netflix, Inc.
MFC r319214:
Enforce the limit on ICMP messages before doing work to formulate the
response.
Delete an unneeded rate limit for UDP under IPv6. Because ICMP6
messages have their own rate limit, it is unnecessary to apply a
second rate limit to UDP messages.
MFC r314286:
Do some minimal work to better conform to the 802.3ad (LACP) standard.
In particular, don't set the synchronized bit for the peer unless it truly
appears to be synchronized to us. Also, don't set our own synchronized bit
unless we have actually seen a remote system.
Prior to this change, we were seeing some strange behavior, such as:
1. We send an advertisement with the Activity, Aggregation, and Default
flags, followed by an advertisement with the Activity, Aggregation,
Synchronization, and Default flags. However, we hadn't seen an
advertisement from another peer and were still advertising the default
(NULL) peer. A closer examination of the in-kernel data structures (using
kgdb) showed that the system had added the default (NULL) peer as a valid
aggregator for the segment.
2. We were receiving an advertisement from a peer that included the
default (NULL) peer instead of including our system information. However,
we responded with an advertisement that included the Synchronization flag
for both our system and the peer. (Since the peer's advertisement did not
include our system information, we shouldn't add the synchronization bit
for the peer.)
MFC r314116:
Fix a panic during boot caused by inadequate locking of some vt(4) driver
data structures.
vt_change_font() calls vtbuf_grow() to change some vt driver data
structures. It uses TF_MUTE to prevent the console from trying to use
those data structures while it changes them.
During the early stage of the boot process, the vt driver's tc_done
routine uses those data structures; however, it is currently called
outside the TF_MUTE check.
Move the tc_done routine inside the locked TF_MUTE check.
MFC r313447:
Ensure the idle thread's loop services interrupts in a timely way when
using the ACPI C1/mwait sleep method.
Previously, the mwait instruction would return when an interrupt was
pending; however, the idle loop did not actually enable interrupts when
this occurred. This led to a situation where the idle loop could quickly
spin through the C1/mwait sleep method a number of times when an interrupt
was pending. (Eventually, the situation corrected itself when something
other than an interrupt triggered the idle loop to either enable
interrupts or schedule another thread.)
MFC r307083:
Currently, when tcp_input() receives a packet on a session that matches a
TCPCB, it checks (so->so_options & SO_ACCEPTCONN) to determine whether or
not the socket is a listening socket. However, this causes the code to
access a different cacheline. If we first check if the socket is in the
LISTEN state, we can avoid accessing so->so_options when processing packets
received for ESTABLISHED sessions.
If INVARIANTS is defined, the code still needs to access both variables to
check that so->so_options is consistent with the state.
MFC r330511:
We shouldn't need to execute code in the recursive page table mappings;
therefore, it should be safe to set the NX bit on the PML4E for the
recursive page table mappings. According to the Intel docs, the effect
of the NX bit should propogate to any page reached through a PML4E which
has the NX bit set.
MFC r330510:
Prior to r329071, pmap_bootstrap() used pmap_kmem_choose() to round the
first available virtual address to a 2MB boundary. After r329071,
create_pagetables() rounds firstaddr up to a 2MB boundary. This ensures
the kernel is mapped in super-pages, which is the point of the logic
in pmap_kmem_choose(). Therefore, it is no longer necessary for
pmap_bootstrap() to round up to the 2MB boundary again.
As pmap_bootstrap() was the only user of pmap_kmem_choose(), we can
delete pmap_kmem_choose().
MFC r332780,r332783:
Intel drives have an optimal alignment for I/O. While they honor I/Os
that cross this boundary, they perform better when this isn't the
case. Intel uses the 3rd byte in the vendor specific area for
this. The DC P3500 was previously listed without any explanation. Add
the DC P3520 and DC P4500 to the list.
There won't be any others drives needing this quirk. Intel has
standardized a field in the namespace data in 1.3 (noiob). A future
patch will use that if it exists, with fallback to this method.
Submitted by: Keith Busch
Reviewed by: jimharris@
[[ plus tweak comments from 332783 ]]
MFC r329171:
Mark the pages used for the initial page-table entries as wired. This
makes them consistent with the way other page-table pages are allocated.
It also provides the rest of the VM system a good clue that these pages
are used.
MFC r329071:
On bootup, the amd64 pmap initialization code creates page-table
mappings for the pages used for the kernel and some initial allocations
used for the page table. It maps the kernel and the blocks used for
these initial allocations using 2MB pages.
However, if the kernel does not end on a 2MB boundary, it still maps the
last portion using a 2MB page, but reports that the unused 4K blocks
within this 2MB allocation are free physical blocks. This means that
these same physical blocks could also be mapped elsewhere - for example,
into a user process. Given the proximity to the kernel text and data
area, it seems wise to avoid allowing someone to write data to physical
blocks also mapped into these virtual addresses.
(Note that this isn't a security vulnerability: the direct map makes
most/all memory on the system mapped into kernel space. And, nothing
in the kernel should be trying to access these pages, as the virtual
addresses are unused. It simply seems wise to avoid reusing these
physical blocks while they are mapped to virtual addresses so close
to the kernel text and data area.)
Consequently, let's reserve the physical blocks covered by the
page-table mappings for these initial allocations.
MFC r331488:
This change adds a flag to the DAD entry to indicate whether it is
currently on the queue. This prevents accidentally doubly-removing a DAD
entry from the queue, while also simplifying some of the logic in
nd6_dad_stop().
MFC r331926:
r330675 introduced an extra window check in the LRO code to ensure it
captured and reported the highest window advertisement with the same
SEQ/ACK. However, the window comparison uses modulo 2**16 math, rather
than directly comparing the absolute values. Because windows use
absolute values and not modulo 2**16 math (i.e. they don't wrap), we
need to compare the absolute values.
MFC r332120:
If a user closes the socket before we call tcp_usr_abort(), then
tcp_drop() may unlock the INP. Currently, tcp_usr_abort() does not
check for this case, which results in a panic while trying to unlock
the already-unlocked INP (not to mention, a use-after-free violation).
Make tcp_usr_abort() check the return value of tcp_drop(). In the case
where tcp_drop() returns NULL, tcp_usr_abort() can skip further steps
to abort the connection and simply unlock the INP_INFO lock prior to
returning.
Update stable/11 from 11.1-STABLE to 11.2-PRERELEASE, marking the
official start of the code slush.
Set the default mdoc(7) version to 11.2, and update the clang(1)
TARGET_TRIPLE to reflect 11.2. While here, add missing FreeBSD
major versions to mdoc(7).
Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation
When the compressed ARC feature was added in commit d3c2ae1
the method of reference counting in the ARC was modified. As
part of this accounting change the arc_buf_add_ref() function
was removed entirely.
This would have be fine but the arc_buf_add_ref() function
served a second undocumented purpose of updating the ARC access
information when taking a hold on a dbuf. Without this logic
in place a cached dbuf would not migrate its associated
arc_buf_hdr_t to the MFU list. This would negatively impact
the ARC hit rate, particularly on systems with a small ARC.
This change reinstates the missing call to arc_access() from
dbuf_hold() by implementing a new arc_buf_access() function.
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
MFC r332457:
Use cfg->nomatch_verdict as return value from NAT64LSN handler when
given mbuf is considered as not matched.
If mbuf was consumed or freed during handling, we must return
IP_FW_DENY, since ipfw's pfil handler ipfw_check_packet() expects
IP_FW_DENY when mbuf pointer is NULL. This fixes KASSERT panics
when NAT64 is used with INVARIANTS. Also remove unused nomatch_final
field from struct nat64lsn_cfg.
Reported by: Justin Holcomb <justin at justinholcomb dot me>
MFC r331666: ZFS vn_rele_async: catch up with the use of refcount(9) for the vnode use count
It's not sufficient nor required to use the vnode interlock when
checking if we are going to drop the last use count as the code in
vputx() uses refcount (atomic) operations for both checking and
decrementing the use code. Apply the same method to vn_rele_async().
While here, remove vn_rele_inactive(), a wrapper around vrele() that
didn't add any value.
Also, the change required making vfs_refcount_release_if_not_last()
public. I've made vfs_refcount_acquire_if_not_zero() public as well.
They are in sys/refcount.h now. While making the move I've dropped the
vfs_ prefix.
MFC r331616: vfs_donmount: in certain cases try r/o mount if r/w mount fails
If the operation is not an update, if neither r/w nor r/o mode is
explicitly requested, if the error code hints at the possibility of the
media being read-only, and if the fallback is allowed, then we can try
to automatically downgrade to the readonly mode.
This is especially useful for auto-mounting of removable media that
sometimes can happen to be write-protected.
The fallback to r/o is not enabled by default. It can be requested on a
per-mount basis with a new mount option, 'autoro'. Or it can be
globally allowed by setting vfs.default_autoro.
Refactor the currdev setting to find the device we booted from. Limit
searching when we don't already have a reasonable currdev from that to
the same device only. Search a little harder for ZFS volumes as that's
needed for loader.efi to live on an ESP.
r332429:
cron(8): Reload database if an existing job in cron.d changed as well
Directory mtime will only change if a file is added or removed, not
modified. For /var/cron/tabs, this is fine because of how crontab(1) manages
it using temp files so all crontab(1) changes will trigger a reload of the
database.
For /etc/cron.d and /usr/local/etc/cron.d, this is not necessarily the case.
Instead of checking their mtime, we should descend into them and check mtime
on all jobs also.
r332431:
cron(8): Correct test sense
We're about to use the result of fstat(2) either way, so don't do that if it
fails...
Harry Schmalzbauer reports that some firmware, in his experience, trips
over the ESP we install due to the volume label. It has been theorized that
this is due to some confusion with the label and the path on the ESP to
boot1.efi.
Regardless, Harry found that renaming the label seems to fix it.
MFC r332573: Regenerate FAT templates after r332561
MFC r332452: Update vt(4) "Terminus BSD Console" font to v4.46
"Terminus BSD Console" is a derivative of Terminus that is provided
by Mr. Dimitar Zhekov under the 2-clause BSD license for use by the
FreeBSD vt(4) console and other BSDs.
MFC 331466:
Add a workaround to the hypervisor detection for older versions of KVM.
Originally KVM set %eax to 0 in the cpuid leaf 0x4000000 rather than
to the highest supported leaf in the hypervisor "branch". Detect this
case and fixup the %eax value so that the hypervisor is still
detected.
growfs: Commit the changes after expanding the partition
This fix the problem in arm snapshot present since at least 6 months
where growfs was failing at firstboot and dropped you in a single
user shell.
Note: In addition to this merge, kern.geom.part.mbr.enforce_chs has
been enabled on the build machine to mitigate against the issue in
the PR referenced.