Warner Losh [Thu, 1 Sep 2022 16:38:53 +0000 (10:38 -0600)]
acpi: arm64 doesn't support ACPI 1.0 RSDP, report when we see one
arm64 requires ACPI RSDP Revision 2.0 since it requires 64-bit physical
addresses. It is an error worth reporting if we have a RSDP pointer, but
it points to the wrong version.
Sponsored by: Netflix
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D36404
Warner Losh [Thu, 1 Sep 2022 16:34:30 +0000 (10:34 -0600)]
stand: Document EFI consoles
Document how EFI consoles work, at least on x86. There's a number of
weird quirks and limitations that are generally known, but not
documented until now. Include information on how EFI decides what the
defualt console is, how to set it and how to cope with common
situations. Note limitations and mismatch between ACPI (which uses UID
to identify a device) and our console code (which uses a raw address)
and explain why we can't translate between them in the loader.
Fix problem getting gpio version during attach.
Both RK3328 and RK3399 don't have GPIO_VER_ID register.
Set gpio version depending on compat string of the parent.
Rick Macklem [Wed, 31 Aug 2022 23:19:22 +0000 (16:19 -0700)]
mount_nfs.8: Reword sentence so .Pa macro works
Commit 603677334a64 added a sentence with a file path
in it. However, it did not use .Pa since it would leave
a space after it, where ('s) was supposed to go.
This patch rewords the sentence so that .Pa can
be used.
Bjoern A. Zeeb [Wed, 31 Aug 2022 23:01:36 +0000 (23:01 +0000)]
iwlwifi: move an ieee80211_get_tid() call
Introduce a local change. It seems ieee80211_get_tid() does not deal
with non-dataqos packets unlike net80211's ieee80211_gettid().
Gernally all calls in Linux drivers to ieee80211_get_tid() seem to
be proceeded by an ieee80211_is_data_qos() check.
Moving the ieee80211_get_tid() has no difference in the result, but
(a) saves us the call if we do not need it due to an earlier return,
and (b) allows us to put an assert into the LinuxKPI ieee80211_get_tid()
implementation to avoid accidentally returning random frame header data
in case of a missing earlier ieee80211_is_data_qos() check in (future/
other) drivers.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
The AccECN handshake and TCP header flags are supported,
no support yet for the AccECN option. This minimalistic
implementation is sufficient to support DCTCP while
dramatically cutting the number of ACKs, and provide ECN
response from the receiver to the CC modules.
tcp: finish SACK loss recovery on sudden lack of SACK blocks
While a receiver should continue sending SACK blocks for the
duration of a SACK loss recovery, if for some reason the
TCP options no longer contain these SACK blocks, but we
already started maintaining the Scoreboard, keep on handling
incoming ACKs (without SACK) as belonging to the SACK recovery.
Gleb Smirnoff [Tue, 30 Aug 2022 22:09:21 +0000 (15:09 -0700)]
divert(4): maintain own cb database and stop using inpcb KPI
Here go cons of using inpcb for divert:
- divert(4) uses only 16 bits (local port) out of struct inpcb,
which is 424 bytes today.
- The inpcb KPI isn't able to provide hashing for divert(4),
thus it uses global inpcb list for lookups.
- divert(4) uses INET-specific part of the KPI, making INET
a requirement for IPDIVERT.
Maintain our own very simple hash lookup database instead. It
has mutex protection for write and epoch protection for lookups.
Since now so->so_pcb no longer points to struct inpcb, don't
initialize protosw methods to methods that belong to PF_INET.
Also, drop support for setting options on a divert socket. My
review of software in base and ports confirms that this has no
use and unlikely worked before.
Gleb Smirnoff [Tue, 30 Aug 2022 22:09:21 +0000 (15:09 -0700)]
protosw: cleanup protocols that existed merely to provide pr_input
Since 4.4BSD the protosw was used to implement socket types created
by socket(2) syscall and at the same to demultiplex incoming IPv4
datagrams (later copied to IPv6). This story ended with 78b1fc05b20.
These entries (e.g. IPPROTO_ICMP) in inetsw that were added to catch
packets in ip_input(), they would also be returned by pffindproto()
if user says socket(AF_INET, SOCK_RAW, IPPROTO_ICMP). Thus, for raw
sockets to work correctly, all the entries were pointing at raw_usrreq
differentiating only in the value of pr_protocol.
With 78b1fc05b20 all these entries are no longer needed, as ip_protox
is independent of protosw. Any socket syscall requesting SOCK_RAW type
would end up with rip_protosw. And this protosw has its pr_protocol
set to 0, allowing to mark socket with any protocol.
For IPv6 raw socket the change required two small fixes:
o Validate user provided protocol value
o Always use protocol number stored in inp in rip6_attach, instead
of protosw value, which is now always 0.
Gleb Smirnoff [Tue, 30 Aug 2022 22:09:21 +0000 (15:09 -0700)]
divert: declare PF_DIVERT domain and stop abusing PF_INET
The divert(4) is not a protocol of IPv4. It is a socket to
intercept packets from ipfw(4) to userland and re-inject them
back. It can divert and re-inject IPv4 and IPv6 packets today,
but potentially it is not limited to these two protocols. The
IPPROTO_DIVERT does not belong to known IP protocols, it
doesn't even fit into u_char. I guess, the implementation of
divert(4) was done the way it is done basically because it was
easier to do it this way, back when protocols for sockets were
intertwined with IP protocols and domains were statically
compiled in.
Moving divert(4) out of inetsw accomplished two important things:
1) IPDIVERT is getting much closer to be not dependent on INET.
This will be finalized in following changes.
2) Now divert socket no longer aliases with raw IPv4 socket.
Domain/proto selection code won't need a hack for SOCK_RAW and
multiple entries in inetsw implementing different flavors of
raw socket can merge into one without requirement of raw IPv4
being the last member of dom_protosw.
Some controllers like the XHCI(4) loose track of the data toggle value when
USB receive transfers are cancelled at close. This in turn can lead to to
data loss after the next open.
To avoid data loss, make sure both the receive and transmit data toggles
get reset, before trying to read or write any data.
Gleb Smirnoff [Tue, 30 Aug 2022 02:15:01 +0000 (19:15 -0700)]
domains: merge domain_init() into domain_add()
domain_init() called at SI_SUB_PROTO_DOMAIN/SI_ORDER_SECOND is always
called right after domain_add(), that had been called at SI_ORDER_FIRST.
Note that protocols aren't initialized yet at this point, since they are
usually scheduled to initialize at SI_ORDER_THIRD.
After this merge it becomes clear that DOMF_SUPPORTED / DOMF_INITED
can be garbage collected as they are set & checked in the same function.
For initialization of the domain system itself it is now clear that
domaininit() can be garbage collected and static initializer is enough.
Gleb Smirnoff [Tue, 30 Aug 2022 02:14:25 +0000 (19:14 -0700)]
mbufs: isolate max_linkhdr and max_protohdr handling in the mbuf code
o Statically initialize max_linkhdr to default value without relying
on domain(9) code doing that.
o Statically initialize max_protohdr to a sane value, without relying
on TCP being always compiled in.
o Retire max_datalen. Set, but not used.
o Don't make the domain(9) system responsible in validating these
values and updating max_hdr. Instead provide KPI max_linkhdr_grow()
and max_protohdr_grow().
o Call max_linkhdr_grow() from IEEE802.11 and max_protohdr_grow() from
TCP. Those are the only protocols today that may want to grow.
John Baldwin [Mon, 29 Aug 2022 22:35:15 +0000 (15:35 -0700)]
bhyve e1000: Sanitize transmit ring indices.
When preparing to transmit pending packets, ensure that the head (TDH)
and tail (TDT) indices are in bounds. Note that validating values
when they are written is not sufficient along as the transmit length
(TDLEN) could be changed turning a value that was valid when written
into an out of bounds value.
While here, add further restrictions to the head register (TDH). The
manual states that writing to this value while transmit is enabled can
cause unexpected behavior and that it should only be written after a
reset. As such, ignore attempts to write while transmit is active,
and also ignore writes of non-zero values. Later e1000 chipsets have
this register as read-only.
Also ignore any attempts to transmit packets if the transmit ring's
size is zero.
PR: 264567
Reported by: Robert Morris <rtm@lcs.mit.edu>
Reviewed by: emaste
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36269
Doug Moore [Mon, 29 Aug 2022 16:11:31 +0000 (11:11 -0500)]
rb_tree: avoid extra reads in rebalancing
In RB_INSERT_COLOR and RB_REMOVE_COLOR, avoid reading a parent pointer
from memory, and then reading the left-color bit from memory, and then
reading the right-color bit from memory, since they're all in the same
field. The compiler can't infer that only the first read is really
necessary, so write the code in a way so that it doesn't have to.
Drop RB_RED_LEFT and RB_RED_RIGHT macros that reach into memory to get
those bits. Drop RB_COLOR, the only thing left using RB_RED_LEFT and
RB_RED_RIGHT after the other changes, and go straight to DIAGNOSTIC
code in subr_stats to implement RB_COLOR for its single, dubious use
there.
Add IF_DEBUG_LEVEL() macro to ensure all debug output preparation
is run only if the current debug level is sufficient. Consistently
use it within routing subsystem.
* add nhop_get_unlinked() used to prepare referenced but not
linked nexthop, that can later be used as a clone source.
* add nhop_check_gateway() to check for allowed address family
combinations between the rib family and neighbor family (useful
for 4o6 or direct routes)
* add nhop_set_upper_family() to allow copying IPv6 nexthops to
IPv4 rib.
* add rt_get_rnd() wrapper, returning both nexthop/group and its
weight attached to the rtentry.
* Add CHT_SLIST_FOREACH_SAFE(), allowing to delete items during
iteration.
Multiple consumers in the kernel space want to install IPv4 or IPv6
default route. Provide convenient wrapper to simplify the code
inside the customers.
routing: install prefix and loopback routes using new nhop-based KPI.
Construct the desired hexthops directly instead of using the
"translation" layer in form of filling rt_addrinfo data.
Simplify V_rt_add_addr_allfibs handling by using recently-added
rib_copy_route() to propagate the routes to the non-primary address
fibs.
Wei Hu [Mon, 29 Aug 2022 05:03:33 +0000 (05:03 +0000)]
mana: some code refactoring and export apis for future RDMA driver
- Record the physical address for doorbell page region
For supporting RDMA device with multiple user contexts with their
individual doorbell pages, record the start address of doorbell page
region for use by the RDMA driver to allocate user context doorbell IDs.
- Handle vport sharing between devices
For outgoing packets, the PF requires the VF to configure the vport with
corresponding protection domain and doorbell ID for the kernel or user
context. The vport can't be shared between different contexts.
Implement the logic to exclusively take over the vport by either the
Ethernet device or RDMA device.
- Add functions for allocating doorbell page from GDMA
The RDMA device needs to allocate doorbell pages for each user context.
Implement those functions and expose them for use by the RDMA driver.
- Export Work Queue functions for use by RDMA driver
RDMA device may need to create Ethernet device queues for use by Queue
Pair type RAW. This allows a user-mode context accesses Ethernet hardware
queues. Export the supporting functions for use by the RDMA driver.
- Define max values for SGL entries
The number of maximum SGl entries should be computed from the maximum
WQE size for the intended queue type and the corresponding OOB data
size. This guarantees the hardware queue can successfully queue requests
up to the queue depth exposed to the upper layer.
- Define and process GDMA response code GDMA_STATUS_MORE_ENTRIES
When doing memory registration, the PF may respond with
GDMA_STATUS_MORE_ENTRIES to indicate a follow request is needed. This is
not an error and should be processed as expected.
- Define data structures for protection domain and memory registration
The MANA hardware support protection domain and memory registration for use
in RDMA environment. Add those definitions and expose them for use by the
RDMA driver.
Rick Macklem [Sun, 28 Aug 2022 21:36:45 +0000 (14:36 -0700)]
nfscl: Fix setup of Sequence when all slots marked bad
Commit 40ada74ee1da modified the NFSv4.1/4.2 client so
that it would issue a DestroySession to the server when
all session slots are marked bad. Once this is done,
the Sequence operation should get a NFSERR_BADSESSION
reply from the server.
Without this patch, the code was setting ND_HASSLOTID
when, in fact, there was no slot marked in use by
nfsv4_sequencelookup(). This would result in the
code freeing a slot not in use. The effect of this
was minimal, since the session was already destroyed.
This patch fixes the code so that it does not set
ND_HASSLOTID for this case.
Rick Macklem [Sun, 28 Aug 2022 21:24:39 +0000 (14:24 -0700)]
nfscl: Add a console message for session recovery
The NFSv4.1/4.2 client does recovery when it receives a
NFSERR_BADSESSION reply from the server. If the server has
not rebooted, this is often caused by multiple clients using
the same /etc/hostid and, as such, not being recognized as
different clients by the server.
This trivial patch adds a console message to suggest that
client's /etc/hostid's need to be checked for uniqueness.
Eugene Grosbein [Sun, 28 Aug 2022 05:45:23 +0000 (12:45 +0700)]
rc.conf(5): add <service>_umask to run the service using this value
None of tools working with login classes change umask(1)
and we had no ways to specify non-default umask for a service
not touching its startup script. This change makes in possible.
Some file-sharing services that create new files may benefit from it.
Differential: https://reviews.freebsd.org/D36309
MFC-after: 3 days
Rick Macklem [Sun, 28 Aug 2022 01:31:20 +0000 (18:31 -0700)]
nfsd: Update console message for no session found
The NFSv4.1/4.2 server generates a console message that indicates
that there is no session. I was until recently perplexed w.r.t. how
this could occur. It turns out that the common cause is multiple NFS
clients with the same /etc/hostid.
The host uuid is used by the FreeBSD NFSv4.1/4.2 client as a unique
identifier for the client. If multiple clients use the same host uuid,
this indicates to the NFSv4.1/4.2 server that they are the same client
and confusion occurs.
This trivial patch modifies the console message to suggest that the
client's /etc/hostid needs to be checked for uniqueness.
Rick Macklem [Sat, 27 Aug 2022 23:03:18 +0000 (16:03 -0700)]
nfscl: Fix handling of nd_slotid while handling NFSERR_BADSESSION
When the NFSv4.1/4.2 client is handling a server error
of NFSERR_BADSESSION, it retries RPCs with a new session.
Without this patch, the nd_slotid was not being updated
for the new session.
This would result in a bogus console message like
"Wrong session srvslot=X slot=Y" and then it would
free the incorrect slot, often generating a
"freeing free slot!!" console message as well.
This patch fixes the problem.
Note that FreeBSD NFSv4.1/4.2 servers only
generate a NFSERR_BADSESSION error after a reboot
or after a client does a DestroySession operation.
Bjoern A. Zeeb [Sat, 27 Aug 2022 14:48:09 +0000 (14:48 +0000)]
LinuxKPI 802.11: change type of bssid in struct ieee80211_bss_conf
Enabling other driver code found that the bssid in
struct ieee80211_bss_conf is not an array but expected to be
a const pointer (const, != NULL checks).
Adjust accordingly in the header and in the LinuxKPI compat code.
There initialization now needs to be a static array always present
as we need a value before we will have a BSS (node in scan_to_auth)
as the mac80211 driver (*handlers) are expecting the pointer to be
not NULL (copying without checks).
This is a pre-req to enable d3 (CONFIG_PM[_SLEEP]) in the future.
Tested by: Tomoaki AOKI (junchoon dec.sakura.ne.jp)
Tested by: Berislav Purgar (bpurgar gmail.com)
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Some non-compliant USB devices do not implement the
clear endpoint halt feature. Silently ignore such
failures, when they at least responded correctly
passing up a valid STALL PID packet.
Warner Losh [Fri, 26 Aug 2022 21:47:21 +0000 (15:47 -0600)]
stand: Document that boot0 uses BIOS
And thus has a limited range of supported baud rates. Also add that
setting BOOT_BOOT0_COMCONSOLE_SPEED=0 will leave it unchanged which
sometimes can give you 115200 if the BIOS initialized things outside of
the normal BIOS baud rates (which many x86 enbedded-targetted boards
do).
Warner Losh [Fri, 26 Aug 2022 21:46:33 +0000 (15:46 -0600)]
stand: More sensible defaults when ConOut is missing
When ConOut is missing, we used to default to serial. Except we did it
in the worst way possible by just setting the howto bits and not
updating the console setting, which lead to weird behavior where we'd
get some things on the video port, others on serial.
Instead, set console to "efi,comconsole" for this case. Also set
RB_MULTIPLE always (so we get dual consoles from the kernel) and or in
RB_SERIAL when we can't find GOPs that suggest the precense of a video
console. This will put output in the most places and have a sensible
default for 'primary' console.
Sponsored by: Netflix
Reviewed by: emaste, manu
Differential Revision: https://reviews.freebsd.org/D36299
Warner Losh [Fri, 26 Aug 2022 17:39:37 +0000 (11:39 -0600)]
efi: Create a define for memory descriptor version
For true EFI platforms, the EFI BIOS will return version 1 (since no
other version is defined as of this commit). However, for environments
that wish to create an EFI memory mapping table that aren't actually
EFI, we need to know this. Add EFI_MEMORY_DESCRIPTOR_VERSION for this
constant.
firk [Fri, 26 Aug 2022 08:05:56 +0000 (11:05 +0300)]
Fix compat10 semaphore interface race
Wrong has-waiters and missing unconditional _count==0 check may cause
infinite waiting with already non-zero count.
1) properly clear _has_waiters flag when waiting failed to start
2) always check _count before start waiting
Gleb Smirnoff [Fri, 26 Aug 2022 15:16:15 +0000 (08:16 -0700)]
socket(2): bring documentation up tp date
o Undocument sockets that are no longer supported, or never were.
o Add AF_HYPERV. Note: PF_HYPERV isn't defined, no typo here.
o Point at ip(4) and ip6(4) instead of unwelcoming "not described here".
Rick Macklem [Fri, 26 Aug 2022 03:48:04 +0000 (20:48 -0700)]
nfscl: Fix handling of nd_slotid while handling NFSERR_BADSESSION
When the NFSv4.1/4.2 client is handling a server error
of NFSERR_BADSESSION, it retries RPCs with a new session.
Without this patch, the nd_slotid was not being updated
for the new session.
This would result in a bogus console message like
"Wrong session srvslot=X slot=Y" and then it would
free the incorrect slot, often generating a
"freeing free slot!!" console message as well.
This patch fixes the problem.
Note that FreeBSD NFSv4.1/4.2 servers only
generate a NFSERR_BADSESSION error after a reboot
or after a client does a DestroySession operation.
Rick Macklem [Fri, 26 Aug 2022 03:33:31 +0000 (20:33 -0700)]
nfscl: Fix handling of a bad session slot (NFSv4.1/4.2)
When a session has been marked defunct by the server
sending a NFSERR_BADSESSION reply to the NFSv4.1/4.2
client, nfsv4_sequencelookup() returns NFSERR_BADSESSION
without actually assigning a session slot.
Without this patch, newnfs_request() would erroneously
free slot 0.
This could result in the slot being reused prematurely,
but most likely just generated a "freeing free slot!!"
console message.
This patch fixes the code to not do the erroneous
freeing of the slot for this case.
Notable upstream pull request merges:
#13717 Fix zpool status in case of unloaded keys
#13753 Prevent zevent list from consuming all of kernel memory
#13767 arcstat: fix -p option
#13785 Updates for snapshots_changed property
The target modifiers (-g, -p, -u) may occur in any position except
between -n and its argument; furthermore, we support both the old
absolute form (without -n) and the modern relative form (with -n).
Andrew Turner [Mon, 22 Aug 2022 17:02:13 +0000 (18:02 +0100)]
Add an IDC only arm64 icache sync function
When the IDC flag is set in the cache type register we don't need to
clean the data cache to the point of unification. Previously we
supported this flag being set only when the DIC flags was also set.
Add a new handler for when this is not the case.