marius [Thu, 11 May 2017 21:13:06 +0000 (21:13 +0000)]
MFC: r315431
- Adds macros for the content of SDHCI_ADMA_ERR and SDHCI_HOST_CONTROL2
registers.
- Add slot type capability bits. These bits should allow recognizing
removable card slots, embedded cards and shared buses (shared bus
supposedly is always comprised of non-removable cards).
- Dump CAPABILITIES2, ADMA_ERR, HOST_CONTROL2 and ADMA_ADDRESS_LO
registers in sdhci_dumpregs().
- The drive type support flags in the CAPABILITIES2 register are for
drive types A,C,D, drive type B is the default setting (value 0) of
the drive strength field in the SDHCI_HOST_CONTROL2 register.
o Move the DRIVER_MODULE() statements that declare mmc(4) to be a child
of the various bridge drivers out of dev/mmc.c and into the bridge
drivers.
o Add ACPI platform support for SDHCI driver.
o Fix some overly long lines, whitespace and other bugs according to
style(9) as well as spelling etc. in mmc(4), mmcsd(4) and sdhci(4).
o In the mmc(4) bridges and sdhci(4) (bus) front-ends:
- Remove redundant assignments of the default bus_generic_print_child
device method,
- use DEVMETHOD_END,
- use NULL instead of 0 for pointers.
o Trim/adjust includes.
o Add and use a MMC_DECLARE_BRIDGE macro for declaring mmc(4) bridges
as kernel drivers and their dependency onto mmc(4).
o Add support for eMMC "partitions". Besides the user data area, i. e.
the default partition, eMMC v4.41 and later devices can additionally
provide up to:
1 enhanced user data area partition
2 boot partitions
1 RPMB (Replay Protected Memory Block) partition
4 general purpose partitions (optionally with a enhanced or extended
attribute)
Besides simply subdividing eMMC devices, some Intel NUCs having UEFI
code in the boot partitions etc., another use case for the partition
support is the activation of pseudo-SLC mode, which manufacturers of
eMMC chips typically associate with the enhanced user data area and/
or the enhanced attribute of general purpose partitions.
CAVEAT EMPTOR: Partitioning eMMC devices is a one-time operation.
o Now that properly issuing CMD6 is crucial (so data isn't written to
the wrong partition for example), make a step into the direction of
correctly handling the timeout for these commands in the MMC layer.
Also, do a SEND_STATUS when CMD6 is invoked with an R1B response as
recommended by relevant specifications.
o Add an IOCTL interface to mmcsd(4); this is sufficiently compatible
with Linux so that the GNU mmc-utils can be ported to and used with
FreeBSD (note that due to the remaining deficiencies outlined above
SANITIZE operations issued by/with `mmc` currently most likely will
fail). These latter have been added to ports as sysutils/mmc-utils.
Among others, the `mmc` tool of mmc-utils allows for partitioning
eMMC devices (tested working).
o For devices following the eMMC specification v4.41 or later, year 0
is 2013 rather than 1997; so correct this for assembling the device
ID string properly.
o Let mmcsd.ko depend on mmc.ko. Additionally, bump MMC_VERSION as at
least for some of the above a matching pair is required.
jhb [Thu, 11 May 2017 17:26:34 +0000 (17:26 +0000)]
MFC 313407,313449: Copy ELF machine/flags from binaries to core dumps.
313407:
Copy the e_machine and e_flags fields from the binary into an ELF core dump.
In the kernel, cache the machine and flags fields from ELF header to use in
the ELF header of a core dump. For gcore, the copy these fields over from
the ELF header in the binary.
This matters for platforms which encode ABI information in the flags field
(such as o32 vs n32 on MIPS).
313449:
Trim trailing whitespace (mostly introduced in r313407).
jhb [Thu, 11 May 2017 04:29:20 +0000 (04:29 +0000)]
MFC 313999: Consolidate statements to initialize files.
Previously, the first lines of various generated files from system call
tables were generated in two sections. Some of the initialization was
done in BEGIN, and the rest was done when the first line was encountered.
The main reason for this split before r313564 was that most of the
initialization done in the second section depended on the $FreeBSD$ tag
extracted from the system call table. Now that the $FreeBSD$ tag is no
longer used, consolidate all of the file initialization in the BEGIN
section.
This change was tested by confirming that the content of generated files
did not change.
jhb [Wed, 10 May 2017 23:09:17 +0000 (23:09 +0000)]
MFC 313564:
Drop the "created from" line from files generated by makesyscalls.sh.
This information is less useful when the generated files are included in
source control along with the source. If needed it can be reconstructed
from the $FreeBSD$ tag in the generated file. Removing this information
from the generated output permits committing the generated files along
with the change to the system call master list without having inconsistent
metadata in the generated files.
- Allow overriding the FDT slicer with a custom slicer.
- Teach the flashmap code about SPI flash.
- Allow different slicers for different flash types to be registered
with geom_flashmap(4) and teach it about MMC for slicing enhanced
user data area partitions. The FDT slicer still is the default for
CFI, NAND and SPI flash on FDT-enabled platforms.
- In addition to a device_t, also pass the name of the GEOM provider
in question to the slicers as a single device may provide more than
one provider.
- Build a geom_flashmap.ko.
- Use MODULE_VERSION() so other modules can depend on geom_flashmap(4).
- Remove redundant/superfluous GEOM routines that either do nothing
or provide/just call default GEOM (slice) functionality.
- Trim/adjust includes
marius [Wed, 10 May 2017 20:46:59 +0000 (20:46 +0000)]
MFC: r311817
In dummynet(4), random chunks of memory are casted to struct dn_*,
potentially leading to fatal unaligned accesses on architectures with
strict alignment requirements. This change fixes dummynet(4) as far
as accesses to 64-bit members of struct dn_* are concerned, tripping
up on sparc64 with accesses to 32-bit members happening to be correctly
aligned there. In other words, this only fixes the tip of the iceberg;
larger parts of dummynet(4) still need to be rewritten in order to
properly work on all of !x86.
In principle, considering the amount of code in dummynet(4) that needs
this erroneous pattern corrected, an acceptable workaround would be to
declare all struct dn_* packed, forcing compilers to do byte-accesses
as a side-effect. However, given that the structs in question aren't
laid out well either, this would break ABI/KBI.
While at it, replace all existing bcopy(9) calls with memcpy(9) for
performance reasons, as there is no need to check for overlap in these
cases.
marius [Wed, 10 May 2017 20:29:01 +0000 (20:29 +0000)]
MFC: r310712
- Use correct offsets into the keys set array. As the elements of this
zero-length array are dynamically sized at run-time based on the use
of hints, compilers can't be expected to figure out these offsets on
their own. [1]
- Fix incorrect comparison in cmp_nans(). [2]
PR: 204571 [1], 202301 [2]
Submitted by: David Binderman [2]
marius [Wed, 10 May 2017 20:12:23 +0000 (20:12 +0000)]
MFC: r293642
- Add support for Advantech PCI-1602 Rev. B1 and PCI-1603 cards. [1]
- Add a description of Advantech PCI-1602 Rev. A boards. [1]
- Properly set up REG_ACR also for PCI-1602 Rev. A based on what the
Advantech-supplied Linux driver does.
- Additionally use the macros of <dev/ic/ns16550.h> to replace existing
magic values and get rid of trivial comments.
- Fix the style of some comments.
PR: 205359 [1]
Submitted by: Jan Mikkelsen (original patch) [1]
ken [Wed, 10 May 2017 18:59:20 +0000 (18:59 +0000)]
MFC r317740:
Correct loop mode CRN resets to adhere to FCP-4 section 4.10
Prior to this change, the CRN (Command Reference Number) is reset on any
firmware LIP, LOOP DOWN, or LOOP RESET event in violation of FCP-4 which
specifies that the CRN should only be reset in response to a LIP Reset
(LIPyx) primitive. FCP-4 also indicates PLOGI/LOGO and PRLI/PRLO ELS
actions as conditions for resetting the CRN for the associated initiator
port.
These violations manifest themselves when the HBA is removed from the
loop, or a target device is removed (especially during an outstanding
command) without power cycling. If the HBA and and the target device
determine upon re-establishing the loop that no PLOGI or PRLI is
required, and the target does not issue a LIPxy to the initiator, the
CRN for the target will have been improperly reset by the isp driver. As
a result, the target port will silently ignore all FCP commands issued
during the device probe (which will time out) preventing the device from
attaching.
This change corrects thie CRN reset behavior in response to loop state
changes, also introduces CRN resets for the above mentioned ELS actions
as encountered through async PDB change events.
This change also adds cleanup of outstanding commands in isp_loop_dead()
that was previously missing.
sys/dev/isp/isp.c
Add the last login state to debug output when syncing the pdb
sys/dev/isp/isp_freebsd.c
Replace binary statement setting aborted ccb status in
isp_watchdog() with the XS_SETERR macro used elsewhere
In isp_loop_dead(), abort or complete pending commands as done
in isp_watchdog()
In isp_async(), segregate the ISPASYNC_LOOP_RESET action from
ISPASYNC_LIP, ISPASYNC_LOOP_DOWN, and ISPASYNC_LOOP_UP
fallthroughs, and only reset the CRN in the RESET case. Also add
checks to handle false LOOP RESET actions that do not have a
proper associated LIP primitive, and log the primitive in the
debug messages
In isp_async(), remove the goto from ISP_ASYNC_DEV_STAYED, and
only reset the CRN in the DEV_CHANGED action
In isp_async(), when processing an ISPASYNC_CHANGE_PDB status,
reset CRN(s) for the associated nphdl (or all ports) if the
change reason is some form of ELS login/logout. Also remove
assignment to fc since it is not used in the scope
sys/dev/isp/ispmbox.h
Add macro definition for the global N-Port handle, and correct a
macro typo 'PDB24XX_AE_PRLI_DONJE'
sys/dev/isp/ispvar.h
Add macros FCP_AL_DA_ALL, FCP_AL_PA, and FCP_IS_DEST_ALPD for
more legible code when determining if an AL_PD port matches the
portid for a given struct fcparam* by value or by virtue of the
AL_PD port being 0xFF
ken [Wed, 10 May 2017 15:20:39 +0000 (15:20 +0000)]
MFC r317775:
Fix error recovery behavior in the pass(4) driver.
After FreeBSD SVN revision 236814, the pass(4) driver changed from
only doing error recovery when the CAM_PASS_ERR_RECOVER flag was
set on a CCB to sometimes doing error recovery if the passed in
retry count was non-zero.
Error recovery would happen if two conditions were met:
1. The error recovery action was simply a retry. (Which is most
cases.)
2. The retry_count is non-zero. (Which happened a lot because of
cut-and-pasted code.)
This explains a bug I noticed in with camcontrol:
# camcontrol tur da34 -v
Unit is ready
# camcontrol reset da34
Reset of 1:172:0 was successful
At this point, there should be a Unit Attention:
# camcontrol tur da34 -v
Unit is ready
No Unit Attention.
Try it again:
# camcontrol reset da34
Reset of 1:172:0 was successful
Now set the retry_count to 0 for the TUR:
# camcontrol tur da34 -v -C 0
Unit is not ready
(pass42:mps1:0:172:0): TEST UNIT READY. CDB: 00 00 00 00 00 00
(pass42:mps1:0:172:0): CAM status: SCSI Status Error
(pass42:mps1:0:172:0): SCSI status: Check Condition
(pass42:mps1:0:172:0): SCSI sense: UNIT ATTENTION asc:29,2 (SCSI bus reset
occurred)
(pass42:mps1:0:172:0): Field Replaceable Unit: 2
There is the unit attention. camcontrol(8) has a default
retry_count of 1, in case someone sets the -E flag without
setting -C.
The CAM_PASS_ERR_RECOVER behavior was only broken with the
CAMIOCOMMAND ioctl, which is the synchronous pass(4) API. It has
worked as intended (error recovery is only done when the flag
is set) in the asynchronous API (CAMIOQUEUE ioctl).
sys/cam/scsi/scsi_pass.c:
In passsendccb(), when calling cam_periph_runccb(), only
specify the error routine when CAM_PASS_ERR_RECOVER is set.
share/man/man4/pass.4:
Document that CAM_PASS_ERR_RECOVER is needed to enable
error recovery.
rmacklem [Wed, 10 May 2017 01:39:21 +0000 (01:39 +0000)]
MFC: r317465
Fix handling of a NFSv4.1 callback reply from the session cache.
The nfsv4_seqsession() call returns NFSERR_REPLYFROMCACHE when it has a
reply in the session, due to a requestor retry. The code erroneously
assumed a return of 0 for this case. This patch fixes this and adds
a KASSERT(). This would be an extremely rare occurrence. It was found
during code inspection during the pNFS server development.
dim [Tue, 9 May 2017 16:58:08 +0000 (16:58 +0000)]
MFC r317888 and two upstream prerequisites:
Pull in r227097 from upstream libc++ trunk (by Marshall Clow):
Fix PR21428. Buffer was one byte too small in octal formatting case.
Add test
Pull in r268009 from upstream libc++ trunk (by Eric Fiselier):
Fix PR21428 for long. Buffer was one byte too small in octal
formatting case. Rename previously added test
Pull in r302362 from upstream libc++ trunk (by me):
Ensure showbase does not overflow do_put buffers
Summary:
In https://bugs.freebsd.org/207918, Daniel McRobb describes how using
std::showbase with ostreams can cause truncation of unsigned long long
when output format is octal. In fact, this can even happen with
unsigned int and unsigned long.
To ensure this does not happen, add one additional character to the
do_put buffers if std::showbase is on. Also add a test case.
brooks [Tue, 9 May 2017 16:29:06 +0000 (16:29 +0000)]
MFC r317707:
Correct an out-of-bounds read in regcomp when the RE is bad.
When passed the invalid regular expression "a**", the error is
eventually detected and seterr() is called. It sets p->error
appropriatly and p->next and p->end to nuls which is a never used char
nuls[10] which is zeros due to .bss initialization. Unfortunatly,
p_ere_exp() and p_simp_re() both have fall through cases where they set
the error, decrement p->next and access it which means a read from
whatever .bss variable comes before nuls.
Found with regex_test:repet_multi and CHERI bounds checking.
rmacklem [Mon, 8 May 2017 21:58:29 +0000 (21:58 +0000)]
MFC: r317382
Allow use of a write open stateid for reading in the NFSv4 server.
The NFSv4 RFCs give a server the option of allowing the use of an open
stateid for write access to be used for a Read operation.
This patch enables this by default and adds a sysctl to disable it,
for anyone who does not want this capability.
Allowing this is particularily useful for a pNFS Data Server (DS), since
they are not permitted to allow the use of special stateids.
Discovered during recent testing of the pNFS server under development.
rmacklem [Mon, 8 May 2017 21:40:42 +0000 (21:40 +0000)]
MFC: r317345
Make the NFSv4 client to use a write open for reading if allowed by the server.
An NFSv4 server has the option of allowing a Read to be done using a Write
Open. If this is not allowed, the server will return NFSERR_OPENMODE.
This patch attempts the read with a write open and then disables this
if the server replies NFSERR_OPENMODE.
This change will avoid some uses of the special stateids. This will be
useful for pNFS/DS Reads, since they cannot use special stateids.
It will also be useful for any NFSv4 server that does not support reading
via the special stateids. It has been tested against both types of NFSv4 server.
rmacklem [Mon, 8 May 2017 20:30:30 +0000 (20:30 +0000)]
MFC: r317305
Fix the NFSv4.1/pNFS client return layout on close.
The "return layout on close" case in the pNFS client was badly broken.
Fortunately, extant pNFS servers that I have tested against do not
do this. This patch fixes it. It also changes the way the layout stateid.seqid
is set for LayoutReturn. I think this change is correct w.r.t. the RFC,
but I am not 100% sure.
This was found during recent testing of the pNFS server under development.
rmacklem [Mon, 8 May 2017 19:57:43 +0000 (19:57 +0000)]
MFC: r317296
Fix some krpc leaks for the NFSv4.1/pNFS client.
The NFSv4.1/pNFS client wasn't doing a newnfs_disconnect() call for the
connection to the Data Server (DS) under some circumstances. The main
effect of this was a leak of malloc'd structures in the krpc. This patch
adds the newnfs_disconnect() calls to fix this.
Detected during recent testing against the pNFS server under development.
ken [Mon, 8 May 2017 18:30:56 +0000 (18:30 +0000)]
MFC r317854:
When editing a mode page on a tape drive, do not clear the device
specific parameter.
Tape drives include write protect (WP), Buffered Mode and Speed
settings in the device-specific parameter. Clearing this
parameter on a mode select can have the effect of turning off
write protect or buffered mode, or changing the speed setting of
the tape drive.
Disks report DPO/FUA support via the device specific parameter
for MODE SENSE, but the bit is reserved for MODE SELECT. So we
clear this for disks (and other non-tape devices) to avoid
potential errors from the target device.
sbin/camcontrol/modeedit.c:
Clear the device-specific parameter in the mode page
header if we're not operating on a tape drive.
ken [Mon, 8 May 2017 17:55:51 +0000 (17:55 +0000)]
MFC r317848:
Add basic programmable early warning error injection to the sa(4) driver.
This will help application developers simulate end of tape conditions.
To inject an error in sa0:
sysctl kern.cam.sa.0.inject_eom=1
This will return the next read or write request queued with 0 bytes
written. Any subsequent writes or reads will go along as usual.
This will also cause the early warning position flag to get set
for the next position query. So, 'mt status' will show the BPEW
(Beyond Programmable Early Warning) flag on the first query after
an error injection. After that, the position flags will be as they
are in the underlying tape drive.
Also, update the sa(4) man page to describe tape parameters,
which can be set via 'mt param'.
sys/cam/scsi/scsi_sa.c:
In saregister(), create the inject_eom sysctl variable.
In sastart(), check to see whether inject_eom is set. If
so, return the read or write with 0 bytes written to
indicate EOM. Set the set_pews_status flag so that we
fake PEWS status in the next position call for reads, and the
next 3 calls for writes. This allows the user to see the BPEW
flag one time via 'mt status'.
In sagetpos(), check the set_pews_status flag and fake
PEWS status and decrement the counter if it is set.
share/man/man4/sa.4:
Document the inject_eom sysctl variable.
Document all of the parameters currently supported via
'mt param'.
usr.bin/mt/mt.1:
Point the user to the sa(4) man page for more details on
supported parameters.
ken [Mon, 8 May 2017 17:21:57 +0000 (17:21 +0000)]
MFC r317799:
Add the SCSI Solid State Media Log page (0x11) definition.
sys/cam/scsi/scsi_all.h:
Add the SCSI Solid State Media log page (0x11) structure
definition. This gives the percentage used (in terms of
lifetime flash wear) of an SSD.
ken [Mon, 8 May 2017 17:02:03 +0000 (17:02 +0000)]
MFC r317774, r317776
r317774:
Add the ability to rescan or reset devices specified by peripheral
name and unit number in camcontrol(8).
Previously camcontrol(8) only supported rescanning or resetting
devices specified by bus:target:lun. This is because for
rescanning at least, you don't have a peripheral name and unit
number (e.g. da4) for devices that don't exist yet.
That is still the case after this change, but in other cases, when
the device does exist in the CAM EDT (Existing Device Table), we
do a careful lookup of the bus/target/lun if the user supplies a
peripheral name and unit number to find the bus:target:lun and then
issue the requested reset or rescan.
The lookup is done without actually opening the device in question,
since a rescan is often done to make a device go away after it has
been pulled. (This is especially true for busses/controllers, like
parallel SCSI controllers, that don't automatically detect changes
in topology.) Opening a device that is no longer there to
determine the bus/target/lun might result in error recovery actions
when the user really just wanted to make the device go away.
sbin/camcontrol/camcontrol.c:
In dorescan_or_reset(), if the use hasn't specified a
numeric argument, assume he has specified a device. Lookup
the pass(4) instance for that device using the transport
layer CAMGETPASSTHRU ioctl. If that is successful, we can
use the returned bus:target:lun to rescan or reset the
device.
Under the hood, resetting a device using XPT_RESET_DEV is
actually sent via the pass(4) device anyway. But this
provides a way for the user to specify devices in a more
convenient way, and can work on device rescans when the
device is going away, assuming it still exists in the EDT.
sbin/camcontrol/camcontrol.8:
Update the man page for the rescan and reset subcommands
to reflect that you can now use a device name and unit
number with them.
ken [Mon, 8 May 2017 14:48:39 +0000 (14:48 +0000)]
MFC r317745:
Don't bother retrying errors for encrypted drives that are locked.
sys/cam/scsi/scsi_all.c:
In the asc_table, if we get a 0x20,0x02 error ("Access denied -
no access rights"), don't bother retrying. Instead, immediately
fail the command.
This is the error returned by Self Encrypting Drives (SED) when
they are locked.
rmacklem [Sun, 7 May 2017 22:18:05 +0000 (22:18 +0000)]
MFC: r317276
Don't set ND_NOMOREDATA for a failed Setattr operation (NFSv4).
The NFSv4 Setattr operation always has reply data even when it fails,
so don't set the ND_NOMOREDATA for it. This would only affect unusual
cases where Setattr fails and the RPC code wants to parse the rest of
the compound. Detected during recent development related to the pNFS server.
rmacklem [Sun, 7 May 2017 21:57:46 +0000 (21:57 +0000)]
MFC: r317275, r317344
Don't create a backchannel for a DS connection.
An NFSv4.1 client connection to a Data Server (DS) should not have a
backchannel. This patch fixes the NFSv4.1/pNFS client to not do a backchannel
for this case.
Found during recent testing with the pNFS server under development.
rmacklem [Sun, 7 May 2017 21:32:55 +0000 (21:32 +0000)]
MFC: r317272
Add checks for failed operations to the NFSv4 client function nfscl_mtofh().
The nfscl_mtofh() function didn't check for failed operations and, as such,
would have returned EBADRPC for these cases, due to parsing failure.
This patch adds checks, so that it returns with ND_NOMOREDATA set.
This is needed for future use in the pNFS server and acts as a safety
belt in the meantime.
rmacklem [Sun, 7 May 2017 20:50:32 +0000 (20:50 +0000)]
MFC: r317350
Fix the default uid/gid values in nfsuserd.c
This patch sets the default uid/gid values for "nobody" and "nogroup"
to the values in the password and group databases. Normally nfsuserd(8)
will override these with whatever is in the password/group databases,
so these values are only used when the databases entries aren't available.
It would be nice to use the definitions in sys/conf.h, but those are
in the _KERNEL section of the file.
rmacklem [Sun, 7 May 2017 20:21:59 +0000 (20:21 +0000)]
MFC: r317269
Set default uid/gid to nobody/nogroup for NFSv4 mapping.
The default uid/gid for NFSv4 are set by the nfsuserd(8) daemon.
However, they were 0 until the nfsuserd(8) was run. Since it is
possible to use NFSv4 without running the nfsuserd(8) daemon, set them
to nobody/nogroup initially.
Without this patch, the values would be set by the nfsuserd(8) daemon
and left changed even if the nfsuserd(8) daemon was killed. The default
values of 0 meant that setting a group to "wheel" would fail even when
done by root.
It also adds a definition of GID_NOGROUP to sys/conf.h.
rmacklem [Sun, 7 May 2017 19:57:45 +0000 (19:57 +0000)]
MFC: r317236
Fix the setting of atime for Linux client NFSv4 mounts.
The FreeBSD NFSv4 server did not set the attribute bit for TimeAccess in
the reply to an Open with exclusive_create, as required by the RFCs.
(This is required since the FreeBSD NFS server stores the create_verifier
in the va_atime attribute.)
As such, the Linux NFSv4 client did not set the TimeAccess (atime) in
the Setattr done in an RPC after the one with the Open/exclusive_create.
This patch fixes the server to set the TimeAccess bit in the reply.
I believe that storing the create_verifier in an extended attribute for
file systems that support extended attributes might be a good idea,
but I will wait for a discussion of this on the freebsd-fs@ email list
before considering committing a patch to do this.
mav [Mon, 1 May 2017 06:03:44 +0000 (06:03 +0000)]
MFC r317064: Optimize pathologic case of telldir() for Samba.
When application reads large directory, calling telldir() for each entry,
like Samba does, it creates exponential performance drop as number of
entries reach tenths to hundreds of thousands. It is caused by full search
through the internal list, that never finds matches in that scenario, but
creates O(n^2) delays. This patch optimizes that search, limiting it to
entries of the same buffer, turning time closer to O(n) in case of linear
directory scan.
dim [Sat, 29 Apr 2017 23:26:36 +0000 (23:26 +0000)]
MFC r317214:
Turn off llvm/clang's ENABLE_BACKTRACES setting, since it never worked
properly anyway. (Upstream has reorganized this somewhat in the mean
time, but for proper backtraces we would need llvm-symbolizer in base.)
MFC r317215:
Add function and data sections when building llvm, clang, lld and lldb,
and allow the linker to garbage collect them. This shaves off up to a
few MB from the final executables.
MFC: r316829
Remove unused "cred" argument to ncl_flush().
The "cred" argument of ncl_flush() is unused and it was confusing to have
the code passing in NULL for this argument in some cases. This patch deletes
this argument.
There is no semantic change because of this patch.
In order to preserve KBI in stable branches, replace the existing
td_sigqueue slot with padding and move the expanded (as of r315949)
td_sigqueue to the end of the struct.
MFC: r316792
Add an NFSv4.1 mount option for "use one openowner".
Some NFSv4.1 servers such as AmazonEFS can only support a small fixed number
of open_owner4s. This patch adds a mount option called "oneopenown" that
can be used for NFSv4.1 mounts to make the client do all Opens with the
same open_owner4 string. This option can only be used with NFSv4.1 and
may not work correctly when Delegations are is use.
MFC: r316782
Add call to svcpool_close() for the NFSv4 callback pool (svcpool_nfscbd).
A function called svcpool_close() was added to the server side krpc by
r313735, so that a pool could be closed without destroying the data structures.
This little patch adds a call to it for the callback pool (svcpool_nfscbd),
so that the nfscbd daemon can be killed/restarted and continue to work
correctly.
Some dummynet modules used strcpy() to copy from a larger buffer
(dn_aqm->name) to a smaller buffer (dn_extra_parms->name). It happens that
the lengths of the strings in the dn_aqm buffers were always hardcoded to be
smaller than the dn_extra_parms buffer ("CODEL", "PIE").
Use strlcpy() instead, to appease static checkers. No functional change.
hyperv/hn: Use channel0, i.e. TX ring0, for TCP SYN/SYN|ACK.
Hyper-V hot channel effect:
Operation latency on hot channel is only _half_ of the operation
latency on cold channels.
This commit takes the advantage of the above Hyper-V host channel
effect, and can reduce more than 75% latency and more than 50%
latency stdev, i.e. lower and more stable/predictable latency,
for various types of web server workloads.
MFC: r316745
Fix the NFS client for "text file modified, process killed" mmap'd case.
When an mmap'd text file is written and then executed immediately
afterwards, it was possible that the modify time would change after the
text file was executing, resulting in the process executing the file
being killed. This was usually only observed when the file system's
times were set to higher resolution, but could have occurred for any
time resolution.
This patch adds a VOP_SET_TEXT() to the NFS client which flushed all
dirty pages to the NFS server and then makes sure that n_mtime is up
to date to avoid this from occurring.
Thanks go to kib@ and pho@ for their help with developing this patch.
The call to ncl_flush() has been removed. If r316532 is merged into stable/10,
this call needs to go back into nfs_set_text().
MFC: r316719
Don't throw away Open state when a NFSv4.1 client recovery fails.
If the ExchangeID/CreateSession operations done by an NFSv4.1 client
after the server crashes/reboots fails, it is possible that some process/thread
is waiting for an open_owner lock. If the client state is free'd, this
can cause a crash.
This would not normally happen, but has been observed on a mount of the
AmazonEFS service.
MFC: r316717
During a server crash recovery, fix the NFSv4.1 client for a NFSERR_BADSESSION
during recovery.
If the NFSv4.1 client gets a NFSv4.1 NFSERR_BADSESSION reply to an Open/Lock
operation while recovering from the server crash/reboot, allow the opens
to be retained for a subsequent recovery attempt. Since NFSv4.1 servers
should only reply NFSERR_BADSESSION after a crash/reboot that has lost
state, this case should almost never happen.
However, for the AmazonEFS file service, this has been observed when
the client does a fresh TCP connection for RPCs.
MFC: r316694
Fix a crash during unmount of an NFSv4.1 mount.
Larry Rosenman reported a crash on freebsd-current@ which was caused by
a premature release of the krpc backchannel socket structure.
I believe this was caused by a race between the SVC_RELEASE() in clnt_vc.c
and the xprt_unregister() in the higher layer (clnt_rc.c), which tried
to lock the mutex in the xprt structure and crashed.
This patch fixes this by removing the xprt_unregister() in the clnt_vc
layer and allowing this to always be done by the clnt_rc (higher reconnect
layer).
Keep state incorrectly assumes keep frags. This is counter to the
ipfilter man pages. This also currently restricts keep frags to only when
keep state is used, which is redundant because keep state currently
assumes keep frags. This commit fixes this.
To the user this change means that to maintain the current behaviour
one must add keep frags to any ipfilter keep state rule (as documented
in the man pages).
This patch also allows the flexability to specify and use keep frags
separate from keep state, as documented in an example in ipf.conf.5,
instead of the currently broken behaviour.
MFC: r316692
Set initial values for nfsstatfs in the NFSv4 client.
The AmazonEFS NFSv4.1 server does not support the FILES_FREE and FILES_TOTAL
attributes. As such, an NFSv4.1 mount to the server would return garbage
for these values. This patch initializes the fields of the nfsstatfs structure,
so that "df" and friends will at least return consistent bogus values.
MFC: r316669
Avoid starvation of the server crash recovery thread for the NFSv4 client.
This patch gives a requestor of the exclusive lock on the client state
in the NFSv4 client priority over shared lock requestors. This avoids
the server crash recovery thread being starved out by other threads doing
RPCs.
MFC: r316667
Fix the NFSv4 client hndling of a stale write verifier in the Commit operation.
When the NFSv4 client Commit operation encountered a stale write verifier,
it erroneously mapped that to EIO. This could have caused recently written
data to be lost when a server crashes/reboots between an UNSTABLE write
and the subsequent commit. This patch fixes this.
The bug was only for the NFSv4 client and did not affect NFSv3.
MFC: r316666
Fix the NFSv4.1 client for NFSERR_BADSESSION recovery via ReclaimComplete.
For the ReclaimComplete operation, the RPC layer should not loop on
NFSERR_BADSESSION. If it does, the recovery thread (nfscl) can get stuck
looping and will not do a recovery.
This patch fixes it so it does not loop. This bug only affects NFSv4.1 and
only when a server reboots.
MFC: r316655
Fix parsing failure for NFSv4 Setattr operation for failed case.
If an operation that preceeds a Setattr in an NFSv4 compound fails,
there is no bitmap of attributes to parse. Without this patch, the
parsing would fail and return EBADRPC instead of the correct failure
error. This could break recovery from a server crash/reboot.
MFC: r310491
Fix NFSv4.1 client recovery from NFS4ERR_BAD_SESSION errors.
For most NFSv4.1 servers, a NFS4ERR_BAD_SESSION error is a rare failure
that indicates that the server has lost session/open/lock state.
However, recent testing by cperciva@ against the AmazonEFS server found
several problems with client recovery from this due to it generating this
failure frequently.
Briefly, the problems fixed are:
- If all session slots were in use at the time of the failure, some processes
would continue to loop waiting for a slot on the old session forever.
- If an RPC that doesn't use open/lock state failed with NFS4ERR_BAD_SESSION,
it would fail the RPC/syscall instead of initiating recovery and then
looping to retry the RPC.
- If a successful reply to an RPC for an old session wasn't processed
until after a new session was created for a NFS4ERR_BAD_SESSION error,
it would erroneously update the new session and corrupt it.
- The use of the first element of the session list in the nfs mount
structure (which is always the current metadata session) was slightly
racey. With changes for the above problems it became more racey, so all
uses of this head pointer was wrapped with a NFSLOCKMNT()/NFSUNLOCKMNT().
- Although the kernel malloc() usually allocates more bytes than requested
and, as such, this wouldn't have caused problems, the allocation of a
session structure was 1 byte smaller than it should have been.
(Null termination byte for the string not included in byte count.)
There are probably still problems with a pNFS data server that fails
with NFS4ERR_BAD_SESSION, but I have no server that does this to test
against (the AmazonEFS server doesn't do pNFS), so I can't fix these yet.
Although this patch is fairly large, it should only affect the handling
of NFS4ERR_BAD_SESSION error replies from an NFSv4.1 server.
Thanks go to cperciva@ for the extension testing he did to help isolate/fix
these problems.
Use estimated RTT for receive buffer auto resizing instead of timestamps.
This is a partial MFC as stable/10 doesn't include the TCP stack
modularisation.
MFC r313045:
Add an mbuf to ipinfo_t translator to finish cleanup of mbuf passing to TCP
probes. This is a partial MFC (missing debug__output & debug__drop changes)
due to the massive amount of additional dtrace changes that would be
required for a full MFC.
MFC r316677: Do not register in CTL portal groups without portals.
From config synthax point of view such portal groups are not incorrect,
but they are useless since can not receive any connection. And since
CTL port resource is very limited, it is good to save it.
When forwarding pf tracks the size of the largest fragment in a fragmented
packet, and refragments based on this size.
It failed to ensure that this size was a multiple of 8 (as is required for all
but the last fragment), so it could end up generating incorrect fragments.
For example, if we received an 8 byte and 12 byte fragment pf would emit a first
fragment with 12 bytes of payload and the final fragment would claim to be at
offset 8 (not 12).
We now assert that the fragment size is a multiple of 8 in ip6_fragment(), so
other users won't make the same mistake.
Reported by: Antonios Atlasis <aatlasis at secfu net>