Andriy Gapon [Tue, 15 May 2018 13:27:29 +0000 (13:27 +0000)]
Fix 'zpool create -t <tempname>'
Creating a pool with a temporary name fails when we also specify custom
dataset properties: this is because we mistakenly call
zfs_set_prop_nvlist() on the "real" pool name which, as expected,
cannot be found because the SPA is present in the namespace with the
temporary name.
Fix this by specifying the correct pool name when setting the dataset
properties.
Marcelo Araujo [Tue, 15 May 2018 05:55:29 +0000 (05:55 +0000)]
vq_getchain() can return -1 if some descriptor(s) are invalid and prints
a diagnostic message. So we do a sanity checking on the return value
of vq_getchain().
Navdeep Parhar [Tue, 15 May 2018 04:24:38 +0000 (04:24 +0000)]
cxgbe(4): Filtering related features and fixes.
- Driver support for hardware NAT.
- Driver support for swapmac action.
- Validate a request to create a hashfilter against the filter mask.
- Add a hashfilter config file for T5.
Marius Strobl [Mon, 14 May 2018 21:46:06 +0000 (21:46 +0000)]
The broken DDR52 support of Intel Bay Trail eMMC controllers rumored
in the commit log of r321385 has been confirmed via the public VLI54
erratum. Thus, stop advertising DDR52 for these controllers.
Note that this change should hardly make a difference in practice as
eMMC chips from the same era as these SoCs most likely support HS200
at least, probably even up to HS400ES.
Stephen Hurd [Mon, 14 May 2018 20:06:49 +0000 (20:06 +0000)]
Replace rmlock with epoch in lagg
Use the new epoch based reclamation API. Now the hot paths will not
block at all, and the sx lock is used for the softc data. This fixes LORs
reported where the rwlock was obtained when the sxlock was held.
John Baldwin [Mon, 14 May 2018 17:27:53 +0000 (17:27 +0000)]
Make the common interrupt entry point labels local labels.
Kernel debuggers depend on symbol names to find stack frames with a
trapframe rather than a normal stack frame. The labels used for the
shared interrupt entry point for the PTI and non-PTI cases did not
match the existing patterns confusing debuggers. Add the '.L' prefix
to mark these symbols as local so they are not visible in the symbol
table.
Nathan Whitehorn [Mon, 14 May 2018 04:00:52 +0000 (04:00 +0000)]
Final fix for alignment issues with the page table first patched with
r333273 and partially reverted with r333594.
Older CPUs implement addition of offsets into the page table by a
bitwise OR rather than actual addition, which only works if the table is
aligned at a multiple of its own size (they also require it to be aligned
at a multiple of 256KB). Newer ones do not have that requirement, but it
hardly matters to enforce it anyway.
The original code was failing on newer systems with huge amounts of RAM
(> 512 GB), in which the page table was 4 GB in size. Because the
bootstrap memory allocator took its alignment parameter as an int, this
turned into a 0, removing any alignment constraint at all and making
the MMU fail. The first round of this patch (r333273) fixed this case by
aligning it at 256 KB, which broke older CPUs. Fix this instead by widening
the alignment parameter.
Matt Macy [Mon, 14 May 2018 00:14:00 +0000 (00:14 +0000)]
epoch(9): allow sx locks to be held across epoch_wait()
The INVARIANTS checks in epoch_wait() were intended to
prevent the block handler from returning with locks held.
What it in fact did was preventing anything except Giant
from being held across it. Check that the number of locks
held has not changed instead.
Rick Macklem [Sun, 13 May 2018 23:38:01 +0000 (23:38 +0000)]
Fix the eir_server_scope reply argument for NFSv4.1 ExchangeID.
In the reply to an ExchangeID operation, the NFSv4.1 server returns a
"scope" value (eir_server_scope). If this value is the same, it indicates
that two servers share state, which is never the case for FreeBSD servers.
As such, the value needs to be unique and it was without this patch.
However, I just found out that it is not supposed to change when the
server reboots and without this patch, it did change.
This patch fixes eir_server_scope so that it does not change when the
server is rebooted.
The only affect not having this patch has is that Linux clients don't
reclaim opens and locks after a server reboot, which meant they lost
any byte range locks held before the server rebooted.
It only affects NFSv4.1 mounts and the FreeBSD NFSv4.1 client was not
affected by this bug.
Matt Macy [Sun, 13 May 2018 23:24:48 +0000 (23:24 +0000)]
epoch(9): cleanups, additional debug checks, and add global_epoch
- GC the _nopreempt routines
- to really benefit we'd need a separate routine
- they're not currently in use
- they complicate the API for no benefit at this time
- check that we're actually in a epoch section at exit
Rick Macklem [Sun, 13 May 2018 12:42:53 +0000 (12:42 +0000)]
Fix a slow leak of session structures in the NFSv4.1 server.
For a fairly rare case of a client doing an ExchangeID after a hard reboot,
the old confirmed clientid still exists, but some clients use a new
co_verifier. For this case, the server was not freeing up the sessions on
the old confirmed clientid.
This patch fixes this case. It also adds two LIST_INIT() macros, which are
actually no-ops, since the structure is malloc()d with M_ZERO so the pointer
is already set to NULL.
It should have minimal impact, since the only way I could exercise this
code path was by doing a hard power cycle (pulling the plus) on a machine
running Linux with a NFSv4.1 mount on the server.
Originally spotted during testing of the ESXi 6.5 client.
Rick Macklem [Sun, 13 May 2018 12:29:09 +0000 (12:29 +0000)]
The NFSv4.1 server should return NFSERR_BACKCHANBUSY instead of NFS_OK.
When an NFSv4.1 session is busy due to a callback being in progress,
nfsrv_freesession() should return NFSERR_BACKCHANBUSY instead of NFS_OK.
The only effect this has is that the DestroySession operation will report
the failure for this case and this probably has little or no effect on a
client. Spotted by inspection and no failures related to this have been
reported.
- Create getblkx(9) variant of getblk(9) which can return error.
- Add GB_NOSPARSE flag for getblk()/getblkx() which requests that BMAP
was performed before the buffer is created, and EJUSTRETURN returned
in case the requested block does not exist.
- Make ffs_read() use GB_NOSPARSE to avoid instantiating buffer (and
allocating the pages for it), copying from zero_region instead.
The end result is less page allocations and buffer recycling when a
hole is read, which is important for some benchmarks.
Requested and reviewed by: jeff
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D14917
Matt Macy [Sat, 12 May 2018 20:00:29 +0000 (20:00 +0000)]
hwpmc/epoch - don't reference domain if NUMA is not set
It appears that domain information is set correctly independent
of whether or not NUMA is defined. However, there is no memory
backing secondary domains leading to allocation failure.
Mark Johnston [Sat, 12 May 2018 15:35:26 +0000 (15:35 +0000)]
DTrace aarch64: Avoid calling unwind_frame() in the probe context.
unwind_frame() may be instrumented by FBT, leading to recursion into
dtrace_probe(). Manually inline unwind_frame() as we do with stack
unwinding code for other architectures.
According to the Intel SDM (Volme 3, 9.11.7) the BIOS signature MSR
should be zeroed before executing cpuid (although in practice it does
not seem to matter).
PR: 192487
Submitted by: Dan Lukes
Reported by: Henrique de Moraes Holschuh
MFC after: 3 days
Emmanuel Vadot [Sat, 12 May 2018 13:14:01 +0000 (13:14 +0000)]
aw_mmc: Rework regulator handling
Don't enable regulator on attach but dealt with them on power_up/power_off
Only set the voltage for the signaling regulator since I don't have boards
that can change the supply voltage.
Enable 1.8v signaling voltage.
Emmanuel Vadot [Sat, 12 May 2018 13:13:34 +0000 (13:13 +0000)]
aw_mmc: Do not fully init the controller in attach
Only do a reset of the controller at attach and init it at power_up.
We use to enable some interrupts in reset, only enable the interrupts
we are interested in when doing a request.
While here remove the regulators handling in power_on as it is very wrong
and will be dealt with in another commit.
I've been holding back on this because 1.7.0 requires OpenSSL 1.1.0 or
newer for full DANE support. But we can't wait forever, and nothing in
base uses DANE anyway, so here we go.
Kernel entry from vm86 mode, where PCB_VM86CALL pcb flag is not set,
is executed on the right stack already. No copy from the entry stack
to the kstack must be performed for vm86 bios call code to function.
To access the pcb flags on kernel entry, unconditionally switch to
kernel address space if vm86 mode is detected.
This fixes very early vm86 bios calls, typically done when boot is
performed by boot2 without loader, and kernel falls back to BIOS calls
to get SMAP.
Reported by: bde
Sponsored by: The FreeBSD Foundation
Fix use of the custom TSS on i386 after the 4/4 split.
Record common_tssd, the descriptor to be written in GDT to point to
the common TSS, before LTR is executed. The LTR instruction sets the
loaded descriptor type to 386 TSS busy, which traps on reloads.
Devin Teske [Sat, 12 May 2018 06:18:15 +0000 (06:18 +0000)]
dwatch(1): Expose process for ip/tcp/udp
Knowing the value of execname during these probes is of some value even
if it is commonly the interrupt kernel thread (intr[12]) -- quite often
it is not, but that depends on the probe.
Devin Teske [Sat, 12 May 2018 05:49:31 +0000 (05:49 +0000)]
dwatch(1): Export ARGV to profiles loaded via load_profile()
A module that wishes to post-process the output needs to know which
arguments were passed in order to re-execute a child in a pipe-chain.
Further, the expansion of ARGV needs to be such that items are escaped
properly.
Devin Teske [Sat, 12 May 2018 05:41:28 +0000 (05:41 +0000)]
dwatch(1): Separate default values so `-[BK] num' don't affect usage
If you were to pass an invalid option after `-B num' or `-K num' you
would see that the usage statement would show the value you passed
instead of the actual default.
Moving the default values to separate variables that are unaffected
by the options-parsing allows the usage statement to correctly show
the hard-coded default values if no flags are used.
Devin Teske [Sat, 12 May 2018 05:36:47 +0000 (05:36 +0000)]
dwatch(1): Bugfix, usage displayed with `-1Q'
A return statement should have been an exit in list_profiles().
If the user passed `-Q' to list profiles and asked for one-line
per profile (`-1'), list_profiles() would not exit as should.
Matt Macy [Sat, 12 May 2018 01:26:34 +0000 (01:26 +0000)]
hwpmc(9): Make pmclog buffer pcpu and update constants
On non-trivial SMP systems the contention on the pmc_owner mutex leads
to a substantial number of samples captured being from the pmc process
itself. This change a) makes buffers larger to avoid contention on the
global list b) makes the working sample buffer per cpu.
Run pmcstat in the background (default event rate of 64k):
pmcstat -S UNHALTED_CORE_CYCLES -O /dev/null sleep 600 &
Before:
make -j96 buildkernel -s >&/dev/null 3336.68s user 24684.10s system 7442% cpu 6:16.50 total
After:
make -j96 buildkernel -s >&/dev/null 2697.82s user 1347.35s system 6058% cpu 1:06.77 total
For more realistic overhead measurement set the sample rate for ~2khz
on a 2.1Ghz processor:
pmcstat -n 1050000 -S UNHALTED_CORE_CYCLES -O /dev/null sleep 6000 &
Collecting 10 samples of `make -j96 buildkernel` from each:
x before
+ after
real time:
N Min Max Median Avg Stddev
x 10 76.4 127.62 84.845 88.577 15.100031
+ 10 59.71 60.79 60.135 60.179 0.29957192
Difference at 95.0% confidence
-28.398 +/- 10.0344
-32.0602% +/- 7.69825%
(Student's t, pooled s = 10.6794)
system time:
N Min Max Median Avg Stddev
x 10 2277.96 6948.53 2949.47 3341.492 1385.2677
+ 10 1038.7 1081.06 1070.555 1064.017 15.85404
Difference at 95.0% confidence
-2277.47 +/- 920.425
-68.1574% +/- 8.77623%
(Student's t, pooled s = 979.596)
x no pmc
+ pmc running
real time:
HEAD:
N Min Max Median Avg Stddev
x 10 58.38 59.15 58.86 58.847 0.22504567
+ 10 76.4 127.62 84.845 88.577 15.100031
Difference at 95.0% confidence
29.73 +/- 10.0335
50.5208% +/- 17.0525%
(Student's t, pooled s = 10.6785)
patched:
N Min Max Median Avg Stddev
x 10 58.38 59.15 58.86 58.847 0.22504567
+ 10 59.71 60.79 60.135 60.179 0.29957192
Difference at 95.0% confidence
1.332 +/- 0.248939
2.2635% +/- 0.426506%
(Student's t, pooled s = 0.264942)
system time:
HEAD:
N Min Max Median Avg Stddev
x 10 1010.15 1073.31 1025.465 1031.524 18.135705
+ 10 2277.96 6948.53 2949.47 3341.492 1385.2677
Difference at 95.0% confidence
2309.97 +/- 920.443
223.937% +/- 89.3039%
(Student's t, pooled s = 979.616)
patched:
N Min Max Median Avg Stddev
x 10 1010.15 1073.31 1025.465 1031.524 18.135705
+ 10 1038.7 1081.06 1070.555 1064.017 15.85404
Difference at 95.0% confidence
32.493 +/- 16.0042
3.15% +/- 1.5794%
(Student's t, pooled s = 17.0331)
Rick Macklem [Fri, 11 May 2018 22:16:23 +0000 (22:16 +0000)]
Add support for the TestStateID operation to the NFSv4.1 server.
The Linux client now uses the TestStateID operation, so this patch adds
support for it to the NFSv4.1 server. The FreeBSD client never uses this
operation, so it should not be affected.
Stephen Hurd [Fri, 11 May 2018 19:37:18 +0000 (19:37 +0000)]
Fix mld6query(8) and add a new -g option
The mld6query command relies on KAME behaviour which allows the
ipv6mr_multiaddr member of the request object in a IPV6_JOIN_GROUP
setsockopt() call to be INADDR6_ANY. The FreeBSD stack doesn't allow
this, so mld6query has been non-functional.
Also, add a -g option which sends a General Query (query INADDR6_ANY)
Sean Bruno [Fri, 11 May 2018 17:26:59 +0000 (17:26 +0000)]
vxge(4): deprecation notice
This hardware isn't totally ancient, about equal to a mxge(4) or mlx4en(4),
but the company was sold to Exar which then promptly exited the Ethernet
business so the card was commercially available for under 2 years. On deep
search, the only usage of these cards I found was by the importing of the
driver. There are code quality issues identified by Brooks and Hiren and
no visible use nor maintainership that warrant removal from FreeBSD 12.0.
Apply the change from r272770 to if_ipsec(4) interface.
It is guaranteed that if_ipsec(4) interface is used only for tunnel
mode IPsec, i.e. decrypted and decapsultaed packet has its own IP header.
Thus we can consider it as new packet and clear the protocols flags.
This allows ICMP/ICMPv6 properly handle errors that may cause this packet.
- Use Fx when referring to FreeBSD.
- Use Ql instead of Cm for command invocations.
- Remove some redundant Pp macros.
- Use a literal indented Bd instead of a series of Dl macros.