mav [Mon, 1 May 2017 06:03:44 +0000 (06:03 +0000)]
MFC r317064: Optimize pathologic case of telldir() for Samba.
When application reads large directory, calling telldir() for each entry,
like Samba does, it creates exponential performance drop as number of
entries reach tenths to hundreds of thousands. It is caused by full search
through the internal list, that never finds matches in that scenario, but
creates O(n^2) delays. This patch optimizes that search, limiting it to
entries of the same buffer, turning time closer to O(n) in case of linear
directory scan.
dim [Sat, 29 Apr 2017 23:26:36 +0000 (23:26 +0000)]
MFC r317214:
Turn off llvm/clang's ENABLE_BACKTRACES setting, since it never worked
properly anyway. (Upstream has reorganized this somewhat in the mean
time, but for proper backtraces we would need llvm-symbolizer in base.)
MFC r317215:
Add function and data sections when building llvm, clang, lld and lldb,
and allow the linker to garbage collect them. This shaves off up to a
few MB from the final executables.
MFC: r316829
Remove unused "cred" argument to ncl_flush().
The "cred" argument of ncl_flush() is unused and it was confusing to have
the code passing in NULL for this argument in some cases. This patch deletes
this argument.
There is no semantic change because of this patch.
In order to preserve KBI in stable branches, replace the existing
td_sigqueue slot with padding and move the expanded (as of r315949)
td_sigqueue to the end of the struct.
MFC: r316792
Add an NFSv4.1 mount option for "use one openowner".
Some NFSv4.1 servers such as AmazonEFS can only support a small fixed number
of open_owner4s. This patch adds a mount option called "oneopenown" that
can be used for NFSv4.1 mounts to make the client do all Opens with the
same open_owner4 string. This option can only be used with NFSv4.1 and
may not work correctly when Delegations are is use.
MFC: r316782
Add call to svcpool_close() for the NFSv4 callback pool (svcpool_nfscbd).
A function called svcpool_close() was added to the server side krpc by
r313735, so that a pool could be closed without destroying the data structures.
This little patch adds a call to it for the callback pool (svcpool_nfscbd),
so that the nfscbd daemon can be killed/restarted and continue to work
correctly.
Some dummynet modules used strcpy() to copy from a larger buffer
(dn_aqm->name) to a smaller buffer (dn_extra_parms->name). It happens that
the lengths of the strings in the dn_aqm buffers were always hardcoded to be
smaller than the dn_extra_parms buffer ("CODEL", "PIE").
Use strlcpy() instead, to appease static checkers. No functional change.
hyperv/hn: Use channel0, i.e. TX ring0, for TCP SYN/SYN|ACK.
Hyper-V hot channel effect:
Operation latency on hot channel is only _half_ of the operation
latency on cold channels.
This commit takes the advantage of the above Hyper-V host channel
effect, and can reduce more than 75% latency and more than 50%
latency stdev, i.e. lower and more stable/predictable latency,
for various types of web server workloads.
MFC: r316745
Fix the NFS client for "text file modified, process killed" mmap'd case.
When an mmap'd text file is written and then executed immediately
afterwards, it was possible that the modify time would change after the
text file was executing, resulting in the process executing the file
being killed. This was usually only observed when the file system's
times were set to higher resolution, but could have occurred for any
time resolution.
This patch adds a VOP_SET_TEXT() to the NFS client which flushed all
dirty pages to the NFS server and then makes sure that n_mtime is up
to date to avoid this from occurring.
Thanks go to kib@ and pho@ for their help with developing this patch.
The call to ncl_flush() has been removed. If r316532 is merged into stable/10,
this call needs to go back into nfs_set_text().
MFC: r316719
Don't throw away Open state when a NFSv4.1 client recovery fails.
If the ExchangeID/CreateSession operations done by an NFSv4.1 client
after the server crashes/reboots fails, it is possible that some process/thread
is waiting for an open_owner lock. If the client state is free'd, this
can cause a crash.
This would not normally happen, but has been observed on a mount of the
AmazonEFS service.
MFC: r316717
During a server crash recovery, fix the NFSv4.1 client for a NFSERR_BADSESSION
during recovery.
If the NFSv4.1 client gets a NFSv4.1 NFSERR_BADSESSION reply to an Open/Lock
operation while recovering from the server crash/reboot, allow the opens
to be retained for a subsequent recovery attempt. Since NFSv4.1 servers
should only reply NFSERR_BADSESSION after a crash/reboot that has lost
state, this case should almost never happen.
However, for the AmazonEFS file service, this has been observed when
the client does a fresh TCP connection for RPCs.
MFC: r316694
Fix a crash during unmount of an NFSv4.1 mount.
Larry Rosenman reported a crash on freebsd-current@ which was caused by
a premature release of the krpc backchannel socket structure.
I believe this was caused by a race between the SVC_RELEASE() in clnt_vc.c
and the xprt_unregister() in the higher layer (clnt_rc.c), which tried
to lock the mutex in the xprt structure and crashed.
This patch fixes this by removing the xprt_unregister() in the clnt_vc
layer and allowing this to always be done by the clnt_rc (higher reconnect
layer).
Keep state incorrectly assumes keep frags. This is counter to the
ipfilter man pages. This also currently restricts keep frags to only when
keep state is used, which is redundant because keep state currently
assumes keep frags. This commit fixes this.
To the user this change means that to maintain the current behaviour
one must add keep frags to any ipfilter keep state rule (as documented
in the man pages).
This patch also allows the flexability to specify and use keep frags
separate from keep state, as documented in an example in ipf.conf.5,
instead of the currently broken behaviour.
MFC: r316692
Set initial values for nfsstatfs in the NFSv4 client.
The AmazonEFS NFSv4.1 server does not support the FILES_FREE and FILES_TOTAL
attributes. As such, an NFSv4.1 mount to the server would return garbage
for these values. This patch initializes the fields of the nfsstatfs structure,
so that "df" and friends will at least return consistent bogus values.
MFC: r316669
Avoid starvation of the server crash recovery thread for the NFSv4 client.
This patch gives a requestor of the exclusive lock on the client state
in the NFSv4 client priority over shared lock requestors. This avoids
the server crash recovery thread being starved out by other threads doing
RPCs.
MFC: r316667
Fix the NFSv4 client hndling of a stale write verifier in the Commit operation.
When the NFSv4 client Commit operation encountered a stale write verifier,
it erroneously mapped that to EIO. This could have caused recently written
data to be lost when a server crashes/reboots between an UNSTABLE write
and the subsequent commit. This patch fixes this.
The bug was only for the NFSv4 client and did not affect NFSv3.
MFC: r316666
Fix the NFSv4.1 client for NFSERR_BADSESSION recovery via ReclaimComplete.
For the ReclaimComplete operation, the RPC layer should not loop on
NFSERR_BADSESSION. If it does, the recovery thread (nfscl) can get stuck
looping and will not do a recovery.
This patch fixes it so it does not loop. This bug only affects NFSv4.1 and
only when a server reboots.
MFC: r316655
Fix parsing failure for NFSv4 Setattr operation for failed case.
If an operation that preceeds a Setattr in an NFSv4 compound fails,
there is no bitmap of attributes to parse. Without this patch, the
parsing would fail and return EBADRPC instead of the correct failure
error. This could break recovery from a server crash/reboot.
MFC: r310491
Fix NFSv4.1 client recovery from NFS4ERR_BAD_SESSION errors.
For most NFSv4.1 servers, a NFS4ERR_BAD_SESSION error is a rare failure
that indicates that the server has lost session/open/lock state.
However, recent testing by cperciva@ against the AmazonEFS server found
several problems with client recovery from this due to it generating this
failure frequently.
Briefly, the problems fixed are:
- If all session slots were in use at the time of the failure, some processes
would continue to loop waiting for a slot on the old session forever.
- If an RPC that doesn't use open/lock state failed with NFS4ERR_BAD_SESSION,
it would fail the RPC/syscall instead of initiating recovery and then
looping to retry the RPC.
- If a successful reply to an RPC for an old session wasn't processed
until after a new session was created for a NFS4ERR_BAD_SESSION error,
it would erroneously update the new session and corrupt it.
- The use of the first element of the session list in the nfs mount
structure (which is always the current metadata session) was slightly
racey. With changes for the above problems it became more racey, so all
uses of this head pointer was wrapped with a NFSLOCKMNT()/NFSUNLOCKMNT().
- Although the kernel malloc() usually allocates more bytes than requested
and, as such, this wouldn't have caused problems, the allocation of a
session structure was 1 byte smaller than it should have been.
(Null termination byte for the string not included in byte count.)
There are probably still problems with a pNFS data server that fails
with NFS4ERR_BAD_SESSION, but I have no server that does this to test
against (the AmazonEFS server doesn't do pNFS), so I can't fix these yet.
Although this patch is fairly large, it should only affect the handling
of NFS4ERR_BAD_SESSION error replies from an NFSv4.1 server.
Thanks go to cperciva@ for the extension testing he did to help isolate/fix
these problems.
Use estimated RTT for receive buffer auto resizing instead of timestamps.
This is a partial MFC as stable/10 doesn't include the TCP stack
modularisation.
MFC r313045:
Add an mbuf to ipinfo_t translator to finish cleanup of mbuf passing to TCP
probes. This is a partial MFC (missing debug__output & debug__drop changes)
due to the massive amount of additional dtrace changes that would be
required for a full MFC.
MFC r316677: Do not register in CTL portal groups without portals.
From config synthax point of view such portal groups are not incorrect,
but they are useless since can not receive any connection. And since
CTL port resource is very limited, it is good to save it.
When forwarding pf tracks the size of the largest fragment in a fragmented
packet, and refragments based on this size.
It failed to ensure that this size was a multiple of 8 (as is required for all
but the last fragment), so it could end up generating incorrect fragments.
For example, if we received an 8 byte and 12 byte fragment pf would emit a first
fragment with 12 bytes of payload and the final fragment would claim to be at
offset 8 (not 12).
We now assert that the fragment size is a multiple of 8 in ip6_fragment(), so
other users won't make the same mistake.
Reported by: Antonios Atlasis <aatlasis at secfu net>
Fix a use after free panic in ipfilter's fragment processing.
Memory is malloc'd, then a search for a match in the fragment table
is made and if the fragment matches, the wrong fragment table is
freed, causing a use after free panic. This commit fixes this.
A symptom of the problem is a kernel page fault in bcopy() called by
ipf_frag_lookup() at line 715 in ip_frag.c. Another symptom is a
kernel page fault in ipf_frag_delete() when called by ipf_frag_expire()
via ipf_slowtimer().
MFC r316720
Fix defects reported by Coverity
1. Deadcode in ecore_init_cache_line_size(), qlnx_ioctl() and
qlnx_clean_filters()
2. ARRAY_VS_SINGLETON issue in qlnx_remove_all_mcast_mac() and
qlnx_update_rx_prod()
MFC r316310
Update man page for commit r316309 "Add support for optional Soft LRO".
The driver provides the ability to select either HW or Software LRO, when
LRO is enabled (default HW LRO).
cleanup routines can be executed at any point during the execution of the
body, including even before the body has done any real work. In those
cases, cleanup routines should be careful to not raise spurious errors so
as to not "override" the actual result of the test case.
This is just general good coding style but is not a problem in practice
for these specific tests. (The way I discovered the issue, though, was
due to a regression I introduced in Kyua itself while refactoring some
internals.)
MFC r316716:
Inherit IPv6 checksum offloading flags to vlan interfaces.
if_vlan(4) interfaces inherit IPv4 checksum offloading flags from the
parent when VLAN_HWCSUM and VLAN_HWTAGGING flags are present on the
parent interface. Do the same for IPv6 checksum offloading flags.
MFC r314686: sh: Fix crash if a -T trap is taken during command substitution.
Code like t=$(stat -f %m "$file") segfaulted if -T was active and a trap
was taken while the shell was waiting for the child process to finish.
What happened was that the dotrap() call in waitforjob() was hit. This
re-entered command execution (including expand.c) at a point not expected by
expbackq(), and global state (unallocated stack string and argbackq) was
corrupted.
To fix this, change expbackq() to prepare for command execution to be
re-entered.
In stable/10, there is more global state that needs to be restored than in
stable/11 and head.
Point out that -F probably does not do what the user expects.
Users attempting to create images from mtree METALOG files created by
installworld often use -F when they should be passing the METALOG file
in place of a directory. This is often produces difficult to debug
error reports.
ian [Thu, 13 Apr 2017 19:48:45 +0000 (19:48 +0000)]
MFC r291310:
Stop building vers.c in include/ and only build the needed osreldate.h.
Because of how osreldate.h was being built with newvers.sh, which always
spat out a vers.c dependent on SVN or git, the meta mode build was
considering osreldate.h to depend on the current git or SVN index. This
would lead to entire tree rebuilds when modifying git's index. There's
no reason to be generating vers.c here so just skip it.
While here, in mk-osreldate.sh rename PARAM_H to proper PARAMFILE (which
newvers.sh already has a default for) and remove unneeded export.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Pedro Giffuni <pfg@freebsd.org>
hyperv/hn: Fixat RNDIS rxfilter after the successful RNDIS init.
Under certain conditions on certain versions of Hyper-V, the RNDIS
rxfilter is _not_ zero on the hypervisor side after the successful
RNDIS initialization, which breaks the assumption of any following
code (well, it breaks the RNDIS API contract actually). Clear the
RNDIS rxfilter explicitly, drain packets sneaking through, and drain
the interrupt taskqueues scheduled due to the stealth packets.
Reported by: dexuan@
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D10230
Leap-second smearing is an experimental option that may be specified in
ntp.conf(5) and the -x option on the command line to spread the effect
of a leap-second over an interval as specified by the leapsmearinterval
config file statement. Recommended values are between 7200 (2 hours) and
86400 (24 hours).
It is advised that leap-second smearing not be used for public NTP
servers (https://www.meinbergglobal.com/download/burnicki/Leap\
%20Second%20Smearing%20With%20NTP.pdf). It is also advised that NTP
clients not use a mix of NTP servers using leap-second smearing with
NTP servers not using leap-second smearing as that could cause
undefined client behaviour.
Leap-second smearing was committed to ports net/ntp and net/ntp-devel
by r426825 on 2016-11-22.
For three years now CAM does not use SIM lock, but still enforces SIM to
use it. Remove this requirement, allowing SIMs to have any locking they
prefer, if they pass no mutex to cam_sim_alloc().