mjg [Mon, 6 Feb 2017 09:40:14 +0000 (09:40 +0000)]
locks: fix recursion support after recent changes
When a relevant lockstat probe is enabled the fallback primitive is called with
a constant signifying a free lock. This works fine for typical cases but breaks
with recursion, since it checks if the passed value is that of the executing
thread.
tsoome [Mon, 6 Feb 2017 08:58:40 +0000 (08:58 +0000)]
loader: bcache read ahead block count should take account the large sectors
The loader bcache is implementing simple read-ahead to boost the cache.
The bcache is built based on 512B block sizes, and the read ahead is attempting
to read number of cache blocks, based on amount of the free bcache space.
However, there are devices using larger sector sizes than 512B, most obviously
the CD media is based on 2k sectors. This means the read-ahead can not be just
random number of blocks, but we should use value suitable also for use with
larger sectors, as for example, with CD devices, we should read multiple of 2KB.
Since the sector size from disk interface is not too reliable, i guess we can
just use "good enough" value, so the implementation is rounding down the read
ahead block count to be multiple of 16.
This means we have covered sector sizes to 8k.
In addition, the update does implement the end of cache marker, to help to
detect the possible memory corruption - I have not seen it happening so far,
but it does not hurt to have the detection mechanism in place.
o Almost all IPsec releated code was moved into sys/netipsec.
o New kernel modules added: ipsec.ko and tcpmd5.ko. New kernel
option IPSEC_SUPPORT added. It enables support for loading
and unloading of ipsec.ko and tcpmd5.ko kernel modules.
o IPSEC_NAT_T option was removed. Now NAT-T support is enabled by
default. The UDP_ENCAP_ESPINUDP_NON_IKE encapsulation type
support was removed. Added TCP/UDP checksum handling for
inbound packets that were decapsulated by transport mode SAs.
setkey(8) modified to show run-time NAT-T configuration of SA.
o New network pseudo interface if_ipsec(4) added. For now it is
build as part of ipsec.ko module (or with IPSEC kernel).
It implements IPsec virtual tunnels to create route-based VPNs.
o The network stack now invokes IPsec functions using special
methods. The only one header file <netipsec/ipsec_support.h>
should be included to declare all the needed things to work
with IPsec.
o All IPsec protocols handlers (ESP/AH/IPCOMP protosw) were removed.
Now these protocols are handled directly via IPsec methods.
o TCP_SIGNATURE support was reworked to be more close to RFC.
o PF_KEY SADB was reworked:
- now all security associations stored in the single SPI namespace,
and all SAs MUST have unique SPI.
- several hash tables added to speed up lookups in SADB.
- SADB now uses rmlock to protect access, and concurrent threads
can do SA lookups in the same time.
- many PF_KEY message handlers were reworked to reflect changes
in SADB.
- SADB_UPDATE message was extended to support new PF_KEY headers:
SADB_X_EXT_NEW_ADDRESS_SRC and SADB_X_EXT_NEW_ADDRESS_DST. They
can be used by IKE daemon to change SA addresses.
o ipsecrequest and secpolicy structures were cardinally changed to
avoid locking protection for ipsecrequest. Now we support
only limited number (4) of bundled SAs, but they are supported
for both INET and INET6.
o INPCB security policy cache was introduced. Each PCB now caches
used security policies to avoid SP lookup for each packet.
o For inbound security policies added the mode, when the kernel does
check for full history of applied IPsec transforms.
o References counting rules for security policies and security
associations were changed. The proper SA locking added into xform
code.
o xform code was also changed. Now it is possible to unregister xforms.
tdb_xxx structures were changed and renamed to reflect changes in
SADB/SPDB, and changed rules for locking and refcounting.
tsoome [Mon, 6 Feb 2017 08:26:45 +0000 (08:26 +0000)]
loader: Implement disk_ioctl() to support DIOCGSECTORSIZE and DIOCGMEDIASIZE.
Need interface to extract information about disk abstraction,
to read disk or partition size depending on the provided argument
and adjust disk size based on information in partition table.
The disk handle from disk_open() has d_offset field to point to
partition start. So we can use this fact to return either whole disk
size or partition size. For this we only need to record partition size
we get from disk_open() anyhow.
In addition, this will also make it possible to adjust the disk media size
based on information from partition table. The problem with disk size is
about some BIOS systems reporting bogus disk size for 2+TB disks, but
since such disks are using GPT partitioning, and GPT does have information
about disk size (alternate LBA + 1), we can use this fact to record disk
size based on partition table.
This patch does exactly this: implements DIOCGSECTORSIZE and DIOCGMEDIASIZE
ioctl, and DIOCGMEDIASIZE will report either disk media size or partition size.
Adds ptable_getsize() call to read partition size in bytes from ptable pointer.
Updates disk_open() to use ptable_getsize() to update mediasize value.
Implements GPT detection function to update ptable size (used by
ptable_getsize()) according to alternate lba (which is location of backup copy
of GPT header table).
imp [Mon, 6 Feb 2017 06:15:38 +0000 (06:15 +0000)]
o Add mkimg to the cross tools, and use the TMPPATH as PATH to pick up
mkimg for building on systems like FreeBSD 11.0 that don't have my
-a changes.
o Set NANO_ROOT and NANO_ALTROOT for std-* since their values don't
change when we set NANO_SLICE*.
cxgbe(4): Allow tunables that control the number of queues to be set to
'-n' to tell the driver to create _up to_ 'n' queues if enough cores are
available. For example, setting hw.cxgbe.nrxq10g="-32" will result in
16 queues if the system has 16 cores, 32 if it has 32.
There is no change in the default number of queues of any type.
adrian [Mon, 6 Feb 2017 05:03:41 +0000 (05:03 +0000)]
[iwm] Sync valid_tx_ant and valid_rx_ant mask handling with iwlwifi.
* This fixes the phy_cfg field sent in the iwm_send_phy_cfg_cmd()
command, which wasn't taking into account the valid_rx_ant and
valid_tx_ant masks from nvm_data before.
ian [Sun, 5 Feb 2017 15:45:31 +0000 (15:45 +0000)]
Add tsw_busy support to usb_serial (ucom).
The tty layer uses tsw_busy to poll for busy/idle status of the transmitter
hardware during close() and tcdrain(). The ucom layer defines ULSR_TXRDY and
ULSR_TSRE bits for the line status register; when both are set, the
transmitter is idle. Not all chip drivers maintain those bits in the sc_lsr
field, and if the bits never get set the transmitter will always appear
busy, causing hangs in tcdrain().
These changes add a new sc_flag bit, UCOM_FLAG_LSRTXIDLE. When this flag is
set, ucom_busy() uses the lsr bits to return busy vs. idle state, otherwise
it always returns idle (which is effectively what happened before this
change because tsw_busy wasn't implemented).
For the uftdi chip driver, these changes stop masking out the tx idle bits
when processing the status register (because now they're useful), and it
calls ucom_use_lsr_txbits() to indicate the bits are maintained by the
driver and can be used by ucom_busy().
dchagin [Sun, 5 Feb 2017 14:17:09 +0000 (14:17 +0000)]
Update syscall.master to 4.10-rc6. Also fix comments, a typo,
and wrong numbering for a few unimplemented syscalls.
For 32-bit Linuxulator, socketcall() syscall was historically
the entry point for the sockets API. Starting in Linux 4.3, direct
syscalls are provided for the sockets API. Enable it.
The initial version of patch was provided by trasz@ and extended by me.
mjg [Sun, 5 Feb 2017 13:37:23 +0000 (13:37 +0000)]
rwlock: move lockstat handling out of inline primitives
See r313275 for details.
One difference here is that recursion handling was removed from the fallback
routine. As it is it was never supposed to see a recursed lock in the first
place. Future changes will move it out of inline variants, but right now
there is no easy to way to test if the lock is recursed without reading
additional words.
mjg [Sun, 5 Feb 2017 08:04:11 +0000 (08:04 +0000)]
mtx: move lockstat handling out of inline primitives
Lockstat requires checking if it is enabled and if so, calling a 6 argument
function. Further, determining whether to call it on unlock requires
pre-reading the lock value.
This is problematic in at least 3 ways:
- more branches in the hot path than necessary
- additional cacheline ping pong under contention
- bigger code
Instead, check first if lockstat handling is necessary and if so, just fall
back to regular locking routines. For this purpose a new macro is introduced
(LOCKSTAT_PROFILE_ENABLED).
LOCK_PROFILING uninlines all primitives. Fold in the current inline lock
variant into the _mtx_lock_flags to retain the support. With this change
the inline variants are not used when LOCK_PROFILING is defined and thus
can ignore its existence.
mjg [Sun, 5 Feb 2017 05:20:29 +0000 (05:20 +0000)]
sx: uninline slock/sunlock
Shared locking routines explicitly read the value and test it. If the
change attempt fails, they fall back to a regular function which would
retry in a loop.
The problem is that with many concurrent readers the risk of failure is pretty
high and even the value returned by fcmpset is very likely going to be stale
by the time the loop in the fallback routine is reached.
Uninline said primitives. It gives a throughput increase when doing concurrent
slocks/sunlocks with 80 hardware threads from ~50 mln/s to ~56 mln/s.
Interestingly, rwlock primitives are already not inlined.
mjg [Sun, 5 Feb 2017 03:26:34 +0000 (03:26 +0000)]
mtx: switch to fcmpset
The found value is passed to locking routines in order to reduce cacheline
accesses.
mtx_unlock grows an explicit check for regular unlock. On ll/sc architectures
the routine can fail even if the lock could have been handled by the inline
primitive.
markj [Sun, 5 Feb 2017 02:44:08 +0000 (02:44 +0000)]
Fix a double free of libelf data buffers in the USDT link code.
libdtrace needs to append to the input object files' string and symbol
tables. Currently it does so by allocating a larger buffer, copying the
existing sections into them, and swapping pointers in the libelf data
descriptors. However, it also frees those buffers when its processing is
complete, which leads to a double free since the elftoolchain libelf
owns them and also frees them in elf_end(3). Instead, free the buffers
originally allocated by libelf.
markj [Sun, 5 Feb 2017 02:39:12 +0000 (02:39 +0000)]
Use PC-relative relocations for USDT probe sites on i386 and amd64.
When recording probe site addresses in the output DOF file, dtrace -G
needs to emit relocations for the .SUNW_dof section in order to obtain
the addresses of functions containing probe sites. DTrace expects the
addresses to be relative to the base address of the final ELF file,
and the amd64 USDT implementation was relying on some unspecified and
incorrect behaviour in the base system GNU ld to achieve this.
This change reimplements the probe site relocation handling to allow
USDT to be used with lld and newer GNU binutils. Specifically, it
makes use of R_X86_64_PC64/R_386_PC32 relocations to obtain the
probe site address relative to the DOF file address, and adds and uses a
new DOF relocation type which computes the final probe site address using
these relative offsets.
Reported by and discussed with: Rafael EspÃndola
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D9374
markj [Sun, 5 Feb 2017 02:27:04 +0000 (02:27 +0000)]
Make witness_warn() always print to the console.
witness_warn() either breaks into the debugger or panics the system, so its
output should go to the console regardless of the witness(4) output channel
configuration.
imp [Sun, 5 Feb 2017 01:20:39 +0000 (01:20 +0000)]
Use ssize_t instead of uint32_t to prevent warnings about a comparison
with different signs. Due to the promotion rules, this would only
happen on 32-bit platforms.
imp [Sun, 5 Feb 2017 00:55:07 +0000 (00:55 +0000)]
Add the ability to dump log pages directly in binary to stdout.
Update man page to include this flag, and an example of dumping a
vendor-specific page while I'm here.
imp [Sun, 5 Feb 2017 00:45:02 +0000 (00:45 +0000)]
Add some descriptions to the man page for the supported log pages as
well as the new wdc commands. Make wdc be an alias for hgst when
specifying the vendor to use to interpret the page.
def [Sat, 4 Feb 2017 14:10:16 +0000 (14:10 +0000)]
Fix bugs found by Coverity in decryptcore(8) and savecore(8):
- Perform final decryption and write decrypted data in case of non-block aligned
input data;
- Use strlcpy(3) instead of strncpy(3) to verify if paths aren't too long;
- Check errno after calling unlink(2) instead of calling stat(2) in order to
verify if a decrypted core was created by a child process;
- Free dumpkey.
kib [Sat, 4 Feb 2017 12:26:38 +0000 (12:26 +0000)]
Define the vm_ooffset_t and vm_pindex_t types as machine-independend.
The types are for the byte offset and page index in vm object. They
are similar to off_t, which is defined as 64bit MI integer. Using MI
definitions will allow to provide consistent MD values of vm
object-related maximum sizes.
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
imp [Sat, 4 Feb 2017 05:53:00 +0000 (05:53 +0000)]
Implement 5 wdc-specific nvme control options for their HGST drives:
wdc cap-diag Capture diagnostic data from drive
wdc drive-log Capture drive history data from drive
wdc get-crash-dump Retrieve firmware crash dump from drive
alc [Sat, 4 Feb 2017 05:23:10 +0000 (05:23 +0000)]
Over the years, the code and comments in vm_page_startup() have diverged in
one respect. When determining how many page structures to allocate,
contrary to what the comments say, the code does not account for the
overhead of a page structure per page of physical memory. This revision
changes the code to match the comments.
bdrewery [Sat, 4 Feb 2017 02:15:49 +0000 (02:15 +0000)]
Remove LOCAL_LIB_DIRS warning added in r275839.
The case for which this was added, r274807, causes this warning to
always show. LOCAL_DIRS=foo LOCAL_LIB_DIRS=foo/lib. The only case in
which r274807 is a problem is if foo/Makefile does not contain
SUBDIR+=lib, which is a normal convention. LOCAL_LIB_DIRS is a special
hack only to get a library into the _generic_libs list for the
'make libraries' bootstrapping phase. The old behavior changed in
r274807 was only in head during the 10.0 cycle, so the warning was
only ever needed until release anyhow.
vangyzen [Sat, 4 Feb 2017 00:34:00 +0000 (00:34 +0000)]
PCIe HotPlug: remove tests for DL active link capability
As of r313097, the HotPlug code requires the link to support
reporting of the data-link status. Remove tests for this capability
from code that can now assume its presence.
gnn [Fri, 3 Feb 2017 22:26:19 +0000 (22:26 +0000)]
Replace the implementation of DTrace's RAND subroutine for generating
low-quality random numbers with a modern implementation (xoroshiro128+)
that is capable of generating better quality randomness without compromising performance.
jilles [Fri, 3 Feb 2017 20:33:23 +0000 (20:33 +0000)]
Clean up documentation of AF_UNIX control messages.
Document AF_UNIX control messages in unix(4) only, not split between unix(4)
and recv(2).
Also, warn about LOCAL_CREDS effective uid/gid fields, since the write could
be from a setuid or setgid program (with the explicit SCM_CREDS and
LOCAL_PEERCRED, the credentials are read at such a time that it can be
assumed that the process intends for them to be used in this context).
pkelsey [Fri, 3 Feb 2017 17:02:57 +0000 (17:02 +0000)]
Fix VIMAGE-related bugs in TFO. The autokey callout vnet context was
not being initialized, and the per-vnet fastopen context was only
being initialized for the default vnet.
PR: 216613
Reported by: Alex Deiter <alex dot deiter at gmail dot com>
MFC after: 1 week
bdrewery [Fri, 3 Feb 2017 16:27:23 +0000 (16:27 +0000)]
native-xtools: Add missing readelf.
The switch to elftoolchain's readelf in r280859 caused native-xtools
to no longer build readelf. This fixes poudriere builds not using
a native readelf when expected.
pfg [Fri, 3 Feb 2017 16:08:58 +0000 (16:08 +0000)]
resolvconf: restore RESTARTCMD=, CMD1=, CMD2= and sed pattern as before.
r312992 removed RESTARTCMD_WITH_ARG for @RESTARTCMD something@ but
reverted the sed to be '@RESTARTCMD \(.*\)@' and RESTARTCMD= to be
the value of RESTARTCMD_WITH_ARG.
kib [Fri, 3 Feb 2017 12:51:40 +0000 (12:51 +0000)]
For i386, remove config options CPU_DISABLE_CMPXCHG, CPU_DISABLE_SSE
and device npx.
This means that FPU is always initialized and handled when available,
and SSE+ register file and exception are handled when available. This
makes the kernel FPU code much easier to maintain by the cost of
slight bloat for CPUs older than 25 years.
CPU_DISABLE_CMPXCHG outlived its usefulness, see the removed comment
explaining the original purpose.
Suggested by and discussed with: bde
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
adrian [Fri, 3 Feb 2017 06:04:06 +0000 (06:04 +0000)]
[net80211] don't update quiet time counter values every probe request.
The quiet time counter update is happening each time the IE is added,
which also means it happens for each quiet time IE addition to the probe
response.
Only update the countdown if we request ie (ie, beacon updates.)
markj [Fri, 3 Feb 2017 03:22:47 +0000 (03:22 +0000)]
Sync the x86 dis_tables.c with upstream.
This corresponds to the following illumos issues:
5755 want support for Intel FMA instrs
5756 want support for Intel BMI1 instrs
5757 want support for Intel BMI2 instrs
5758 want support for Intel AVX2 instrs
7204 Want broadwell rdseed and adx support
7208 Want stac/clac disasm support
7733 Need SHA Instruction dis support
7756 dis can't handle x86 SSE 3 instructions
7757 want avx2 disasm tests
7758 want SSE 4.1 disasm tests
imp [Thu, 2 Feb 2017 23:04:06 +0000 (23:04 +0000)]
Ensure that the passthrough request will fit in MAXPHYS bytes after it
has been rounded to full pages. This avoids a panic in
vm_fault_quick_hold_pages due to this off-by-one error passing one
page too many into vmapbuf.
imp [Thu, 2 Feb 2017 23:04:00 +0000 (23:04 +0000)]
Use aligned buffer for the firmware data. Otherwise, when loading a
MAXPHYS bytes of data, the I/O would require MAXPHYS + PAGE_SIZE worth
of pages to do the I/O and we'd hit an assertion in
vm_fault_quick_hold_pages unless MAXPHYS was larger than 1M +
PAGE_SIZE.
danfe [Thu, 2 Feb 2017 20:30:50 +0000 (20:30 +0000)]
Try to fix the old "he capability is stupid" bug in gettytab(5)/getty(8)
There is one capability explicitly documented in gettytab(5) as stupid: he.
And it is indeed. It was meant to facilitate system hostname modification,
but is hardly usable in practice because it allows very limited editing
(e.g., it depends on a particular hostname length, making it non-generic).
Replace it with simple implementation that treats ``he'' as POSIX extended
regular expression which is matched against the hostname. If there are no
parenthesized subexpressions in the pattern, entire matched string is used
as the final hostname. Otherwise, use the first matched subexpression.
If the pattern does not match, the original hostname is not modified.
Using regex(3) gives more freedom, does not complicate the code very much,
and makes a lot more sense, in turn making ``he'' less stupid and actually
useful (e.g., it is now possible to obtain node or domain names from the
original hostname string, without knowing it in advance).