kib [Sun, 4 Nov 2018 00:32:28 +0000 (00:32 +0000)]
Remove Obj_Entry textsize member.
It is unused after r340102, and more important, I do not see how to
define textsize in both practically useful and correct way, for binaries
with more that one executable segments.
The buffer is already zeroed in compile_rule() function, and also it
may contain configured F_NOT flag in o.len field. This fixes the filling
for "not icmp6types" opcode.
emaste [Sat, 3 Nov 2018 19:31:11 +0000 (19:31 +0000)]
libcompat: disable retpoline when building build tools
These are built with the host toolchain which may not support retpoline.
While here, move the MK_ overrides to a separate line and sort them
alphabetically to support future changes.
MFC with: r339511
Sponsored by: The FreeBSD Foundation
Update the "flag" for draft-ietf-6man-ipv6only-flag.
Having the flag named "6" can possibly be a problem for configurations
where parsing strings and numbers can produce ambivalent results.
Rename the "6" flag to the "S"ix (or Silence-IPv4) flag.
mmacy [Sat, 3 Nov 2018 03:43:32 +0000 (03:43 +0000)]
Convert epoch to read / write records per cpu
In discussing D17503 "Run epoch calls sooner and more reliably" with
sbahra@ we came to the conclusion that epoch is currently misusing the
ck_epoch API. It isn't safe to do a "write side" operation (ck_epoch_call
or ck_epoch_poll) in the middle of a "read side" section. Since, by definition,
it's possible to be preempted during the middle of an EPOCH_PREEMPT
epoch the GC task might call ck_epoch_poll or another thread might call
ck_epoch_call on the same section. The right solution is ultimately to change
the way that ck_epoch works for this use case. However, as a stopgap for
12 we agreed to simply have separate records for each use case.
mav [Sat, 3 Nov 2018 03:10:06 +0000 (03:10 +0000)]
9952 Block size change during zfs receive drops spill block
Replication code in receive_object() falsely assumes that if received
object block size is different from local, then it must be a new object
and calls dmu_object_reclaim() to wipe it out. In most cases it is not a
problem, since all dnode, bonus buffer and data block(s) are immediately
rewritten any way, but the problem is that spill block (if used) is not.
This means loss of ACLs, extended attributes, etc.
This issue can be triggered in very simple way:
1. create 4KB file with 10+ ACL entries;
2. take snapshot and send it to different dataset;
3. append another 4KB to the file;
4. take another snapshot and send incrementally;
5. witness ACL loss on receive side.
emaste [Sat, 3 Nov 2018 01:53:26 +0000 (01:53 +0000)]
Remove apparently unused 0-byte files that cause grief on Windows
r235274 added a sort regression test (it operates by comparing output
against GNU sort). The commit included a number of 0-byte files, one
of which ends in a trailing . which reportedly breaks svn/git checkouts
on Windows.
It appears these were added accidentally, so just remove them.
imp [Sat, 3 Nov 2018 00:37:51 +0000 (00:37 +0000)]
Implement ability to turn on/off PHYs for AHCI devices.
As part of Chuck's work on fixing kernel crashes caused by disk I/O
errors, it is useful to be able to trigger various kinds of
errors. This patch allows causing an AHCI-attached disk to disappear,
by having the driver keep the PHY disabled when the driver would
otherwise enable the PHY. It also allows making the disk reappear by
having the driver go back to setting the PHY enable/disable state as
it normal would and simulating the hardware event that causes a bus
rescan.
imp [Fri, 2 Nov 2018 22:15:30 +0000 (22:15 +0000)]
Document r226775: tell why we omit usbus[0-9]+
tcpdump can capture packet traces from the usb bus. usbus[0-9] are
registered as ifnet devices so this can work. When these devices come
up, devd was trying to run pccard_ether on those interfaces, which
didn't exist and generated an error.
emaste [Fri, 2 Nov 2018 21:20:46 +0000 (21:20 +0000)]
newvers.sh: fix git false positive -dirty tag
Assuming that any output from `git diff-index --name-only` implies
changes in the working tree results in false positives: files with
metadata, but not content, changes are also listed.
Check that content differences exist before adding the -dirty tag to
the git hash.
PR: 229230
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D15968
emaste [Fri, 2 Nov 2018 20:48:29 +0000 (20:48 +0000)]
sys/types.h: avoid using terse macro _M
Although _M is reserved for use by the implemenation it is rather non-
descriptive and conflicted with a libc++ test. Just rename to _Major
and _Minor to avoid conflicts.
Reviewed by: dim
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D16734
r340061 included a number of assertions pf_frent_remove(), but these assertions
were the only use of the 'prev' variable. As a result builds without
INVARIANTS had an unused variable, and failed.
pfsync touches pf memory (for pf_state and the pfsync callback
pointers), not the other way around. We need to ensure that pfsync is
torn down before pf.
Notify that the ifnet will go away, even on vnet shutdown
pf subscribes to ifnet_departure_event events, so it can clean up the
ifg_pf_kif and if_pf_kif pointers in the ifnet.
During vnet shutdown interfaces could go away without sending the event,
so pf ends up cleaning these up as part of its shutdown sequence, which
happens after the ifnet has already been freed.
Send the ifnet_departure_event during vnet shutdown, allowing pf to
clean up correctly.
pf: Limit the fragment entry queue length to 64 per bucket.
So we have a global limit of 1024 fragments, but it is fine grained to
the region of the packet. Smaller packets may have less fragments.
This costs another 16 bytes of memory per reassembly and devides the
worst case for searching by 8.
pf: Split the fragment reassembly queue into smaller parts
Remember 16 entry points based on the fragment offset. Instead of
a worst case of 8196 list traversals we now check a maximum of 512
list entries or 16 array elements.
pf: Count holes rather than fragments for reassembly
Avoid traversing the list of fragment entris to check whether the
pf(4) reassembly is complete. Instead count the holes that are
created when inserting a fragment. If there are no holes left, the
fragments are continuous.
brooks [Fri, 2 Nov 2018 14:42:36 +0000 (14:42 +0000)]
Make vop_symlink take a const target path.
This will enable callers to take const paths as part of syscall
decleration improvements.
Where doing so is easy and non-distruptive carry the const through
implementations. In UFS the value is passed to an interface that must
take non-const values. In ZFS, const poisoning would touch code shared
with upstream and it's not worth adding diffs.
Bump __FreeBSD_version for external API consumers.
tsoome [Fri, 2 Nov 2018 11:41:58 +0000 (11:41 +0000)]
loader: biosdisk should check if the media is present
The bd_print/bd_open/bd_strategy need to make sure the device does have
media, before getting into performing IO operations. Some systems can
hung if the device without a media is accessed.
cem [Thu, 1 Nov 2018 23:46:23 +0000 (23:46 +0000)]
kern_poll: Restore explanatory comment removed in r177374
The comment isn't stale. The check is bogus in the sense that poll(2)
does not require pollfd entries to be unique in fd space, so there is no
reason there cannot be more pollfd entries than open or even allowed
fds. The check is mostly a seatbelt against accidental misuse or
abuse. FD_SETSIZE, while usually unrelated to poll, is used as an
arbitrary floor for systems with very low kern.maxfilesperproc.
Additionally, document this possible EINVAL condition in the poll.2
manual.
emaste [Thu, 1 Nov 2018 23:11:47 +0000 (23:11 +0000)]
Retire CLANG_NO_IAS34
CLANG_NO_IAS34 was introduced in r276696 to allow then-HEAD kernels to
be built with clang 3.4 in FreeBSD 10. As FreeBSD 11 and later includes
a version of Clang with a sufficiently capable integrated assembler we
do not need the workaround any longer.
jhb [Thu, 1 Nov 2018 22:23:15 +0000 (22:23 +0000)]
Restrict setting PTE execute permissions on RISC-V.
Previously, RISC-V was enabling execute permissions in PTEs for any
readable page. Now, execute permissions are only enabled if they were
explicitly specified (e.g. via PROT_EXEC to mmap). The one exception
is that the initial kernel mapping in locore still maps all of the
kernel RWX.
While here, change the fault type passed to vm_fault and
pmap_fault_fixup to only include a single VM_PROT_* value representing
the faulting access to match other architectures rather than passing a
bitmask.
jhb [Thu, 1 Nov 2018 22:17:51 +0000 (22:17 +0000)]
Set PTE_A and PTE_D for user mappings in pmap_enter().
This assumes that an access according to the prot in 'flags' triggered
a fault and is going to be retried after the fault returns, so the two
flags are set preemptively to avoid refaulting on the retry.
While here, only bother setting PTE_D for kernel mappings in pmap_enter
for writable mappings.
jhb [Thu, 1 Nov 2018 22:15:25 +0000 (22:15 +0000)]
SBI calls expect a pointer to a u_long rather than a pointer.
This is just cosmetic.
A weirder issue is that the SBI doc claims the hart mask pointer should
be a physical address, not a virtual address. However, the implementation
in bbl seems to just dereference the address directly.
jhb [Thu, 1 Nov 2018 21:46:37 +0000 (21:46 +0000)]
Add support for port unit wiring to cxgbe(4).
- Add a bus_child_location_str method to the nexus drivers that prints
out 'port=N' as the location string exported via devinfo and the
'%location' sysctl node.
- We can't use a bus_hint_device_unit to wire the unit numbers of
devices with a fixed devclass as the device gets assigned a unit in
make_device() before the device creator can set softc, etc.
Instead, when adding a child device, use a helper function much like
a bus_hint_device_unit method to look for wiring hints or to return
-1 to let the system choose a unit number. This function requires
an "at" hint for the port pointing to the nexus device and a "port"
hint listing the port number. For example:
hint.cxl.4.at="t5nex0"
hint.cxl.4.port="0"
wires cxl4 to the first port on the t5nex0 adapter.
jhb [Thu, 1 Nov 2018 21:34:17 +0000 (21:34 +0000)]
Don't enter DDB for fatal traps before panic by default.
Add a new 'debugger_on_trap' knob separate from 'debugger_on_panic'
and make the calls to kdb_trap() in MD fatal trap handlers prior to
calling panic() conditional on this new knob instead of
'debugger_on_panic'. Disable the new knob by default. Developers who
wish to recover from a fatal fault by adjusting saved register state
and retrying the faulting instruction can still do so by enabling the
new knob. However, for the more common case this makes the user
experience for panics due to a fatal fault match the user experience
for other panics, e.g. 'c' in DDB will generate a crash dump and
reboot the system rather than being stuck in an infinite loop of fatal
fault messages and DDB prompts.
carpstats are the last virtualised variable in the file and end up at the
end of the vnet_set. The generated code uses an absolute relocation at
one byte beyond the end of the carpstats array. This means the relocation
for the vnet does not happen for carpstats initialisation and as a result
the kernel panics on module load.
This problem has only been observed with carp and only on i386.
We considered various possible solutions including using linker scripts
to add padding to all kernel modules for pcpu and vnet sections.
While the symbols (by chance) stay in the order of appearance in the file
adding an unused non-file-local variable at the end of the file will extend
the size of set_vnet and hence make the absolute relocation for carpstats
work (think of this as a single-module set_vnet padding).
This is a (tmporary) hack. It is the least intrusive one as we need a
timely solution for the upcoming release. We will revisit the problem in
HEAD. For a lot more information and the possible alternate solutions
please see the PR and the references therein.
rmacklem [Thu, 1 Nov 2018 15:27:22 +0000 (15:27 +0000)]
Fix NFS client vnode locking to avoid a crash during forced dismount.
A crash was reported where the crash occurred in nfs_advlock() when the
NFS_ISV4(vp) macro was being executed. This was caused by the vnode
being VI_DOOMED due to a forced dismount in progress.
This patch fixes the problem by locking the vnode before executing the
NFS_ISV4() macro.
kevans [Thu, 1 Nov 2018 14:00:56 +0000 (14:00 +0000)]
libbe(3): Don't promote non-cloned BEs
Most easily reproducible by attempting to activate the currently activated
BE, one would get a "not a cloned filesystem" error instead of success or a
sane message.