chuck [Fri, 15 Mar 2019 02:11:27 +0000 (02:11 +0000)]
Fix bhyve's NVMe Identify Namespace data
The NVMe Identify Namespace data structure's Number of LBA Formats
(NLBAF) field is a 0's based value (i.e. 0x0 means 1). Since the
emulation only supports a single format, set NLBAF to 0x0, not 1.
glebius [Thu, 14 Mar 2019 22:52:16 +0000 (22:52 +0000)]
PFIL_MEMPTR for ipfw link level hook
With new pfil(9) KPI it is possible to pass a void pointer with length
instead of mbuf pointer to a packet filter. Until this commit no filters
supported that, so pfil run through a shim function pfil_fake_mbuf().
Now the ipfw(4) hook named "default-link", that is instantiated when
net.link.ether.ipfw sysctl is on, supports processing pointer/length
packets natively.
- ip_fw_args now has union for either mbuf or void *, and if flags have
non-zero length, then we use the void *.
- through ipfw_chk() we handle mem/mbuf cases differently.
- ether_header goes away from args. It is ipfw_chk() responsibility
to do parsing of Ethernet header.
- ipfw_log() now uses different bpf APIs to log packets.
Although ipfw_chk() is now capable to process pointer/length packets,
this commit adds support for the link level hook only, see
ipfw_check_frame(). Potentially the IP processing hook ipfw_check_packet()
can be improved too, but that requires more changes since the hook
supports more complex actions: NAT, divert, etc.
glebius [Thu, 14 Mar 2019 22:28:50 +0000 (22:28 +0000)]
- Add more flags to ip_fw_args. At this changeset only IPFW_ARGS_IN and
IPFW_ARGS_OUT are utilized. They are intented to substitute the "dir"
parameter that is often passes together with args.
- Rename ip_fw_args.oif to ifp and now it is set to either input or
output interface, depending on IPFW_ARGS_IN/OUT bit set.
kevans [Thu, 14 Mar 2019 19:48:43 +0000 (19:48 +0000)]
ether_fakeaddr: Use 'b' 's' 'd' for the prefix
This has the advantage of being obvious to sniff out the designated prefix
by eye and it has all the right bits set. Comment stolen from ffec.
I've removed bryanv@'s pending question of using the FreeBSD OUI range --
no one has followed up on this with a definitive action, and there's no
particular reason to shoot for it and the administrative overhead that comes
with deciding exactly how to use it.
kevans [Thu, 14 Mar 2019 17:18:00 +0000 (17:18 +0000)]
ether: centralize fake hwaddr generation
We currently have two places with identical fake hwaddr generation --
if_vxlan and if_bridge. Lift it into if_ethersubr for reuse in other
interfaces that may also need a fake addr.
Reviewed by: bryanv, kp, philip
Differential Revision: https://reviews.freebsd.org/D19573
0mp [Thu, 14 Mar 2019 14:34:36 +0000 (14:34 +0000)]
chroot.8: Add examples & clean up
- Sort arguments in synopsis.
- Clarify that it is possible to specify arguments to the command (and that
they could be passed as further arguments to chroot(1)).
- Standardize the description of the flags.
- Improve formatting (e.g., do not use macros in strings specifying width).
- Add examples.
kib [Wed, 13 Mar 2019 17:30:03 +0000 (17:30 +0000)]
Some fixes for proccontrol(1) man page.
- Fix markup.
- Mention that process can only allow tracing for itself. This is already
stated in procctl(2), but requiring knowledge of the syscall description
is too much for the tool user.
- Clearly state that query mode only works for existing process.
Noted and reviewed by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
bz [Wed, 13 Mar 2019 17:00:15 +0000 (17:00 +0000)]
Enhance IPv6 autoconf startup.
Before this change we would only run rtsol on an interface which was
set to accept_rtadv and did not have rtsold enabled. This change
removes the latter condition and always runs rtsol (rather than the
deferred rtsold) to reduce the delay until we send the first RS.
This change will also handle the accept_rtadv before dhcp hence
starting IPv6 auto-configuration before IPV4 DHCP.
This change is intended for FreeBSD 13 and later only and will not be MFCed.
eugen [Wed, 13 Mar 2019 09:48:33 +0000 (09:48 +0000)]
mfi.4, mrsas.4: document how to get ATA TRIM support for SSD
while using LSI RAID adapters as it was completely obscure before:
mfi has no TRIM support at all and mrsas provides TRIM
if underlying adapter does it (for Non-RAID drives generally).
bcr [Tue, 12 Mar 2019 20:08:37 +0000 (20:08 +0000)]
Extend descriptions and comments about the need to create /etc/pf.conf.
FreeBSD removed the default /etc/pf.conf file in previous releases, but
the documentation kept mentioning it like any other file present in the
system. Change pf.conf(5) to mention in the description of the default
ruleset location that this file needs to be created manually. Also, the
default rc.conf file had it's comment extended a bit to let people know
that this file does not exist by default.
kib [Tue, 12 Mar 2019 19:33:25 +0000 (19:33 +0000)]
hwpmc/core: Adopt to upcoming Skylake TSX errata.
The forthcoming microcode update will fix a TSX bug by clobbering PMC3
when TSX instructions are executed (even speculatively). There is an
alternate mode where CPU executes all TSX instructions by aborting
them, in which case PMC3 is still available to OS. Any code that
correctly uses TSX must be ready to handle abort anyway.
Since it is believed that FreeBSD population of hwpmc(4) users is
significantly larger than the population of TSX users, switch the
microcode into TSX abort mode whenever a pmc is allocated, and back to
bug avoidance mode when the last pmc is deallocated.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
mckusick [Tue, 12 Mar 2019 19:08:41 +0000 (19:08 +0000)]
This is an additional fix for bug report 230962. When using
extended attributes, the kernel can panic with either "ffs_truncate3"
or with "softdep_deallocate_dependencies: dangling deps".
The problem arises because the flushbuflist() function which is
called to clear out buffers is passed either the V_NORMAL flag to
indicate that it should flush buffer associated with the contents
of the file or the V_ALT flag to indicate that it should flush the
buffers associated with the extended attribute data. The buffers
containing the extended attribute data are identified by having
their BX_ALTDATA flag set in the buffer's b_xflags field. The
BX_ALTDATA flag is set on the buffer when the extended attribute
block is first allocated or when its contents are read in from the
disk.
On a busy system, a buffer may be reused for another purpose, but
the contents of the block that it contained continues to be held
in the main page cache. Each physical page is identified as holding
the contents of a logical block within a specified file (identified
by a vnode). When a request is made to read a file, the kernel first
looks for the block in the existing buffers. If it is not found
there, it checks the page cache to see if it is still there. If
it is found in the page cache, then it is remapped into a new
buffer thus avoiding the need to read it in from the disk.
The bug is that when a buffer request made for an extended attribute
is fulfilled by reconstituting a buffer from the page cache rather
than reading it in from disk, the BX_ALTDATA flag was not being
set. Thus the flushbuflist() function would never clear it out and
the "ffs_truncate3" panic would occur because the vnode being cleared
still had buffers on its clean-buffer list. If the extended attribute
was being updated, it is first read, then updated, and finally
written. If the read is fulfilled by reconstituting the buffer
from the page cache the BX_ALTDATA flag was not set and thus the
dirty buffer would never be flushed by flushbuflist(). Eventually
the buffer would be recycled. Since it was never written it would
have an unfinished dependency which would trigger the
"softdep_deallocate_dependencies: dangling deps" panic.
The fix is to ensure that the BX_ALTDATA flag is set when a buffer
has been reconstituted from the page cache.
dim [Tue, 12 Mar 2019 18:19:44 +0000 (18:19 +0000)]
Revert r308867 (which was originally committed in the clang390-import
project branch):
Work around LLVM PR30879, which is about a bad interaction between
X86 Call Frame Optimization on i386 and libunwind, by disallowing the
optimization for i386-freebsd12.
This should fix some instances of broken exception handling when
frame pointers are omitted, in particular some unittests run during
the build of editors/libreoffice.
This hack will be removed as soon as upstream has implemented a more
permanent fix for this problem.
And indeed, after r345018 and r345019, which updated LLVM libunwind to
the most recent version, the above workaround is no longer needed. The
upstream commit which fixed this is:
Specifically, 32 bit (i386-freebsd) executables optimized with omitted
frame pointers and Call Frame Optimization should now behave correctly
when a C++ exception is thrown, and the stack is unwound.
kib [Tue, 12 Mar 2019 16:49:08 +0000 (16:49 +0000)]
isci(4): Use controller->lock for busdma tags.
isci(4) uses deferred loading. Typically on amd64 and i386 non-PAE
the tag does not create any restrictions, but on i386 PAE-tables but
non-PAE configs callbacks might be used.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
kevans [Tue, 12 Mar 2019 16:21:39 +0000 (16:21 +0000)]
stand: Improve some debugging experience
Some of these files using <FOO>_DEBUG defined a DEBUG() macro to serve as a
debug-printf. -DDEBUG is useful to enable some debugging output across
multiple ELF/common parts, so switch the DEBUG-as-printf macros over to
something more like DPRINTF that is more commonly used for this kind of
thing and less likely to conflict.
userboot/elf64_freebsd debugging also assumed %llx for uint64; use PRIx64
instead.
0mp [Tue, 12 Mar 2019 09:27:37 +0000 (09:27 +0000)]
ports.7: Add an example of how to use flavors
At the moment the manual page is not documenting how to build
a flavored package. Let's start documenting flavors with
an example of a typical use case.
Reported by: cem, dim
Reviewed by: bcr, cem, mat, matthew
Approved by: cem (src)
Differential Revision: https://reviews.freebsd.org/D19531
kadesai [Tue, 12 Mar 2019 09:24:58 +0000 (09:24 +0000)]
fw_outstanding"(outstanding IOs at firmware level) counter gets screwed up when R1 fastpath
writes are running. Some of the cases which are not handled properly in driver are:
1. With R1 fastpath supported, single write from CAM layer can consume 2 MPT frames
at driver/firmware level for fastpath qualification(if fw_outstanding < controller Queue Depth).
Due to this driver has to throttle IOs coming from CAM layer as well as second fastpath
write(of R1 write) against Adapter Queue Depth.
If "fw_outstanding" reaches to adapter queue depth, driver should return IOs from CAM layer with
device busy status.While allocating second MPT frame(corresponding to R1 FP write) also, driver
should ensure fw_outstanding should not exceed adapter QD.
2. For R1 fastpath writes completion, driver decrements "fw_oustanding" counter without
really returning MPT frame to free pool. It may cause IOs(with heavy IOs running, consuming whole
adapter Queue Depth) consuming MPT frames reserved for DCMDs(management commands) and
DCMDs(internal and sent by application) not getting MPT frame will start failing.
Below is one test case to hit the issue described above-
1. Run heavy IOs (outstanding IOs should hit adapter Queue Depth).
2. Run management tool (Broadcom's storcli tool) querying adapter in loop (run command- "storcli64 /c0 show" in loop).
3. Management tool's requests would start failing due to non-availability of free MPT frames as all frames would be consumed by IOs.
Fix: Increment/decrement of "fw_outstanding" counter should be in sync with MPT frame get/return.
imp [Tue, 12 Mar 2019 05:10:41 +0000 (05:10 +0000)]
Fix botched merge with 355066
When merging from Netflix's tree, resetting the carsize was dropped
accidentally. This fix fixes that revision by properly resetting how
many are in the car.
imp [Tue, 12 Mar 2019 04:57:05 +0000 (04:57 +0000)]
Add -l to camcontrol readcap.
The -l flag sends only the READ CAPACITY (16) sevice action. Normally
we send the READ CAPACITY (10) command, and only send RC16 when the
capacity is larger than 2TB (since that's the max RC10 can
report). However, some badly programmed drives report different
numbers for RC10 and RC16. This can be hard to diagnose, but generally
there's a "Logical block address out of range" error when RC16 reports
a larger number than RC10 and the RC10 number is the correct one. By
comparing the output of readcap with and without the -l argmuent, one
can determine if there's a mismatch and if the DA_Q_NO_RC16 quirk is
needed.
imp [Tue, 12 Mar 2019 04:49:59 +0000 (04:49 +0000)]
Remove now useless -d and -t flags.
These were used to set dst flag and minutes west of UTC
respectively. These are obsolete and have been removed form the
kernel. These existed primarily to faithfully emulate early
Unix ABIs that have been removed from FreeBSD.
imp [Tue, 12 Mar 2019 04:49:47 +0000 (04:49 +0000)]
Kill tz_minuteswest and tz_dsttime.
Research Unix, 7th Edition introduced TIMEZONE and DSTFLAG
compile-time constants in sys/param.h to communicate these values for
the machine. 4.2BSD moved from the compile-time to run-time and
introduced these variables and used for localtime() to return the
right offset from UTC (sometimes referred to as GMT, for this purpose
is the same). 4.4BSD migrated to using the tzdata code/database and
these variables were basically unused.
FreeBSD removed the real need for these with adjkerntz in
1995. However, some RTC clocks continued to use these variables,
though they were largely unused otherwise. Later, phk centeralized
most of the uses in utc_offset, but left it using both tz_minuteswest
and adjkerntz.
POSIX (IEEE Std 1003.1-2017) states in the gettimeofday specification
"If tzp is not a null pointer, the behavior is unspecified" so there's
no standards reason to retain it anymore. In fact, gettimeofday has
been marked as obsolecent, meaning it could be removed from a future
release of the standard. It is the only interface defined in POSIX
that references these two values. All other references come from the
tzdata database via tzset().
These were used to more faithfully implement early unix ABIs which
have been removed from FreeBSD. NetBSD has completely eliminated
these variables years ago. Linux has migrated to tzdata as well,
though these variables technically still exist for compatibility
with unspecified older programs.
So, there's no real reason to have them these days. They are a
historical vestige that's no longer used in any meaningful way.
mckusick [Mon, 11 Mar 2019 22:42:33 +0000 (22:42 +0000)]
Update the main loop in the flushbuflist() routine to properly select
buffers for flushing when requested to flush both normal and extended
attributes buffers.
mckusick [Mon, 11 Mar 2019 22:05:34 +0000 (22:05 +0000)]
Augment the UFS filesystem specific print function (called by the
kernel vn_printf() routine when printing out vnodes associated with
a UFS filesystem) to also include the inode's link count, effective
link count, generation number, owner, group, flags, size, and for
UFS2 filesystems, the extent size.
mckusick [Mon, 11 Mar 2019 21:49:44 +0000 (21:49 +0000)]
Augment DDB "show buffer" command to print the buffer's referenced
vnode pointer (b_vp). The value of b_vp can be used by "show vnode"
to print the vnode and "show vnodebufs" to print all the clean and
dirty buffers associated with the vnode (which should include this
buffer).
imp [Mon, 11 Mar 2019 20:57:54 +0000 (20:57 +0000)]
Upgrade Chipfancier SLC quirk to all versions
The 16GB, 32GB and 128GB versions of this product all have the same
problem. For some reason, the RC10 size is correct, while the RC16
size is larger (oddly by the capacity size / 1024 bytes). Using the
RC16 size results in illegal LBA range errors when geom tastes the
device. So, expand the quirk to cover all versions of this chip.
Ideally, we'd get both READ CAPACITY 10 and READ CAPACITY 16 sizes and
print a warnnig if they differ and use the smaller of the two numbers,
though that may be problematical as well. Furthermore, SBC-4
encourages users transition to RC16 only, which suggests that in the
future RC10 may disappear from some drives. It's unclear how to cope
with these drives generically.
dim [Mon, 11 Mar 2019 19:15:57 +0000 (19:15 +0000)]
Pull in r355854 from upstream llvm trunk (by Jonas Paulsson):
[RegAlloc] Avoid compile time regression with multiple copy hints.
As a fix for https://bugs.llvm.org/show_bug.cgi?id=40986 ("excessive
compile time building opencollada"), this patch makes sure that no
phys reg is hinted more than once from getRegAllocationHints().
This handles the case were many virtual registers are assigned to the
same physreg. The previous compile time fix (r343686) in
weightCalcHelper() only made sure that physical/virtual registers are
passed no more than once to addRegAllocationHint().
dim [Mon, 11 Mar 2019 18:45:36 +0000 (18:45 +0000)]
Merge LLVM libunwind trunk r351319, from just before upstream's
release_80 branch point. Afterwards, we will merge the rest of the
changes in the actual release_80 branch.
mav [Mon, 11 Mar 2019 17:39:09 +0000 (17:39 +0000)]
Revert minor part of r344934.
I tried to save some CPU time on hopeless aggregation attempts, but it seems
the condition I added is overly strict, blocking also aggregation of optional
I/Os in cases which previously were possible. Revert just to be safe.
hselasky [Mon, 11 Mar 2019 14:29:50 +0000 (14:29 +0000)]
Improve support for switching to and from command polling mode in mlx4core.
Make sure the enter and leave polling routines can be called multiple times
with same setting. Ignore setting polling or event mode twice. This fixes a
deadlock during shutdown if polling mode was already selected.
dab [Mon, 11 Mar 2019 14:26:45 +0000 (14:26 +0000)]
Fix a scribbler in the PMS driver.
The ESGL bit was left uninitialized when executing the REPORT LUNS
ioctl. This could allow a zeroed data buffer to be treated as a
scatter/gather list. The firmware would eventually walk past the end
of the data buffer, potentially find what looked like a valid
address/length pair, and write the result to semi-random memory.
ken [Mon, 11 Mar 2019 14:21:14 +0000 (14:21 +0000)]
Fix CRN resets in the isp(4) driver in certain situations.
The Command Reference Number (CRN) is part of the FC-Tape features
that we enable when talking to tape drives. It starts at 1, and
goes to 255 and wraps around to 1. There are a number of reset
type conditions that result in the CRN getting reset to 1. These
are detailed in section 4.10 (table 8) of the FCP-4r02b specification.
One of the conditions is when a PRLI (Process Login) is sent by
the initiator, and the Establish Image Pair bit is set in Word 0
of the PRLI.
Previously, the isp(4) driver core sent a notification via
isp_async() that the target had changed or stayed in place, but
there was no indication of whether a PRLI was sent and whether the
Establish Image Pair bit was set.
The result of this was that in some situations, notably
switching back and forth between a direct connection and a switch
connection to a tape drive, the isp(4) driver would fail to reset
the CRN in situations that require it according to the spec. When
the CRN isn't reset in a situation that requires it, the tape drive
then rejects every subsequent command that is sent to the drive.
It is assuming that the commands are being sent out of order.
So, modify the isp(4) driver to include Word 0 of the PRLI command
when it sends isp_async() notifications of target changes. Look at
the Establish Image Pair bit, and reset the CRN if that bit is set.
With this change, I am able to switch a tape drive back and forth
between a direct connection and a switch connection, and the isp(4)
driver resets the CRN when it should.
sys/dev/isp_stds.h:
Add bit definitions for PRLI Word 0.
sys/dev/ispmbox.h:
Add PRLI Word 0 to the port database type, isp_pdb_t.
sys/dev/ispvar.h
Add PRLI Word 0 to fcportdb_t.
sys/dev/isp.c:
Populate the new prli_word0 parameter in the port database.
In isp_pdb_add_update(), add a check to see if the
Establish Image Pair bit is set in PRLI Word 0. If it is,
then that is an additional reason to create a change
notification.
sys/dev/isp_freebsd.c:
In isp_async(), if the device changed or stayed, look at
PRLI Word 0 to see if the Establish Image Pair bit is set.
If it is, reset the CRN if we haven't already.
lidl [Mon, 11 Mar 2019 13:33:03 +0000 (13:33 +0000)]
Remove an unneeded 'tail -n 1' from a pipeline
When piping to awk, it's almost always an anti-pattern to use 'grep'
first.
When not in a pipeline, sometimes it is faster to use tail, as awk
must process all the lines in the input stream, and won't 'seek'.
In a pipeline, both grep and awk must process all lines, so we might
as well skip the extra process creation for tail and just use awk
for all the processing.
ae [Mon, 11 Mar 2019 10:42:09 +0000 (10:42 +0000)]
Add IP_FW_NAT64 to codes that ipfw_chk() can return.
It will be used by upcoming NAT64 changes. We use separate code
to avoid propogating EACCES error code to user level applications
when NAT64 consumes a packet.
ae [Mon, 11 Mar 2019 10:33:32 +0000 (10:33 +0000)]
Add NULL pointer check to nat64_output().
It is possible, that a processed packet was originated by local host,
in this case m->m_pkthdr.rcvif is NULL. Check and set it to V_loif to
avoid NULL pointer dereference in IP input code, since it is expected
that packet has valid receiving interface when netisr processes it.