tsoome [Fri, 2 Nov 2018 11:41:58 +0000 (11:41 +0000)]
loader: biosdisk should check if the media is present
The bd_print/bd_open/bd_strategy need to make sure the device does have
media, before getting into performing IO operations. Some systems can
hung if the device without a media is accessed.
cem [Thu, 1 Nov 2018 23:46:23 +0000 (23:46 +0000)]
kern_poll: Restore explanatory comment removed in r177374
The comment isn't stale. The check is bogus in the sense that poll(2)
does not require pollfd entries to be unique in fd space, so there is no
reason there cannot be more pollfd entries than open or even allowed
fds. The check is mostly a seatbelt against accidental misuse or
abuse. FD_SETSIZE, while usually unrelated to poll, is used as an
arbitrary floor for systems with very low kern.maxfilesperproc.
Additionally, document this possible EINVAL condition in the poll.2
manual.
emaste [Thu, 1 Nov 2018 23:11:47 +0000 (23:11 +0000)]
Retire CLANG_NO_IAS34
CLANG_NO_IAS34 was introduced in r276696 to allow then-HEAD kernels to
be built with clang 3.4 in FreeBSD 10. As FreeBSD 11 and later includes
a version of Clang with a sufficiently capable integrated assembler we
do not need the workaround any longer.
jhb [Thu, 1 Nov 2018 22:23:15 +0000 (22:23 +0000)]
Restrict setting PTE execute permissions on RISC-V.
Previously, RISC-V was enabling execute permissions in PTEs for any
readable page. Now, execute permissions are only enabled if they were
explicitly specified (e.g. via PROT_EXEC to mmap). The one exception
is that the initial kernel mapping in locore still maps all of the
kernel RWX.
While here, change the fault type passed to vm_fault and
pmap_fault_fixup to only include a single VM_PROT_* value representing
the faulting access to match other architectures rather than passing a
bitmask.
jhb [Thu, 1 Nov 2018 22:17:51 +0000 (22:17 +0000)]
Set PTE_A and PTE_D for user mappings in pmap_enter().
This assumes that an access according to the prot in 'flags' triggered
a fault and is going to be retried after the fault returns, so the two
flags are set preemptively to avoid refaulting on the retry.
While here, only bother setting PTE_D for kernel mappings in pmap_enter
for writable mappings.
jhb [Thu, 1 Nov 2018 22:15:25 +0000 (22:15 +0000)]
SBI calls expect a pointer to a u_long rather than a pointer.
This is just cosmetic.
A weirder issue is that the SBI doc claims the hart mask pointer should
be a physical address, not a virtual address. However, the implementation
in bbl seems to just dereference the address directly.
jhb [Thu, 1 Nov 2018 21:46:37 +0000 (21:46 +0000)]
Add support for port unit wiring to cxgbe(4).
- Add a bus_child_location_str method to the nexus drivers that prints
out 'port=N' as the location string exported via devinfo and the
'%location' sysctl node.
- We can't use a bus_hint_device_unit to wire the unit numbers of
devices with a fixed devclass as the device gets assigned a unit in
make_device() before the device creator can set softc, etc.
Instead, when adding a child device, use a helper function much like
a bus_hint_device_unit method to look for wiring hints or to return
-1 to let the system choose a unit number. This function requires
an "at" hint for the port pointing to the nexus device and a "port"
hint listing the port number. For example:
hint.cxl.4.at="t5nex0"
hint.cxl.4.port="0"
wires cxl4 to the first port on the t5nex0 adapter.
jhb [Thu, 1 Nov 2018 21:34:17 +0000 (21:34 +0000)]
Don't enter DDB for fatal traps before panic by default.
Add a new 'debugger_on_trap' knob separate from 'debugger_on_panic'
and make the calls to kdb_trap() in MD fatal trap handlers prior to
calling panic() conditional on this new knob instead of
'debugger_on_panic'. Disable the new knob by default. Developers who
wish to recover from a fatal fault by adjusting saved register state
and retrying the faulting instruction can still do so by enabling the
new knob. However, for the more common case this makes the user
experience for panics due to a fatal fault match the user experience
for other panics, e.g. 'c' in DDB will generate a crash dump and
reboot the system rather than being stuck in an infinite loop of fatal
fault messages and DDB prompts.
carpstats are the last virtualised variable in the file and end up at the
end of the vnet_set. The generated code uses an absolute relocation at
one byte beyond the end of the carpstats array. This means the relocation
for the vnet does not happen for carpstats initialisation and as a result
the kernel panics on module load.
This problem has only been observed with carp and only on i386.
We considered various possible solutions including using linker scripts
to add padding to all kernel modules for pcpu and vnet sections.
While the symbols (by chance) stay in the order of appearance in the file
adding an unused non-file-local variable at the end of the file will extend
the size of set_vnet and hence make the absolute relocation for carpstats
work (think of this as a single-module set_vnet padding).
This is a (tmporary) hack. It is the least intrusive one as we need a
timely solution for the upcoming release. We will revisit the problem in
HEAD. For a lot more information and the possible alternate solutions
please see the PR and the references therein.
rmacklem [Thu, 1 Nov 2018 15:27:22 +0000 (15:27 +0000)]
Fix NFS client vnode locking to avoid a crash during forced dismount.
A crash was reported where the crash occurred in nfs_advlock() when the
NFS_ISV4(vp) macro was being executed. This was caused by the vnode
being VI_DOOMED due to a forced dismount in progress.
This patch fixes the problem by locking the vnode before executing the
NFS_ISV4() macro.
kevans [Thu, 1 Nov 2018 14:00:56 +0000 (14:00 +0000)]
libbe(3): Don't promote non-cloned BEs
Most easily reproducible by attempting to activate the currently activated
BE, one would get a "not a cloned filesystem" error instead of success or a
sane message.
0mp [Thu, 1 Nov 2018 11:37:19 +0000 (11:37 +0000)]
mount_smbfs(8): Add the STANDARDS and HISTORY sections
- Document that mount_smbfs(8) only supports SMB1 and that SMB2 and SMB3
are not supported at the moment. Suggest users to browse ports for
software compatible with newer versions of the protocol.
- Copy supported servers list from README.
- Add a SEE ALSO section and reference the chapter about Samba in the
FreeBSD Handbook.
- Add a HISTORY section.
- Style changes:
- Use Dq instead of Em in the EXAMPLES section.
- Mark command modifiers with Cm.
Reviewed by: bcr
Approved by: krion (mentor, implicit), mat (mentor, implicit)
MFC after: 1 week
Sponsored by: Bally Wulff Games & Entertainment GmbH
Differential Revision: https://reviews.freebsd.org/D17798
mckusick [Thu, 1 Nov 2018 03:38:57 +0000 (03:38 +0000)]
In preparation for adding inode check-hashes, convert the clri(8)
program to use the libufs library interface. No functional change
(as for now the libufs library does not do inode check-hashes).
kib [Wed, 31 Oct 2018 23:17:00 +0000 (23:17 +0000)]
Add pci_early function to detect Intel stolen memory.
On some Intel devices BIOS does not properly reserve memory (called
"stolen memory") for the GPU. If the stolen memory is claimed by the
OS, functions that depend on stolen memory (like frame buffer
compression) can't be used.
A function called pci_early_quirks that is called before the virtual
memory system is started was added. In Linux, this PCI early quirks
function iterates through all PCI slots to check for any device that
require quirks. While this more generic solution is preferable I only
ported the Intel graphics specific parts because I think my
implementation would be too similar to Linux GPL'd solution after
looking at the Linux code too much.
The code regarding Intel graphics stolen memory was ported from
Linux. In the case of Intel graphics stolen memory this
pci_early_quirks will read the stolen memory base and size from north
bridge registers. The values are stored in global variables that is
later read by linuxkpi_gplv2. Linuxkpi stores these values in a
Linux-specific structure that is read by the drm driver.
Relevant linuxkpi code is here:
https://github.com/FreeBSDDesktop/kms-drm/blob/drm-v4.16/linuxkpi/gplv2/src/linux_compat.c#L37
For now, only amd64 arch is suppor ted since that is the only arch
supported by the new drm drivers. I was told that Intel GPUs are
always located on 0:2:0 so these values are hard coded for now.
Note that the structure and early execution of the detection code is
not required in its current form, but we expect that the code will be
added shortly which fixes the potential BIOS bugs by reserving the
stolen range in phys_avail[]. This must be done as early as possible
to avoid conflicts with the potential usage of the memory in kernel.
kevans [Wed, 31 Oct 2018 22:38:19 +0000 (22:38 +0000)]
Compile in VERBOSE_SYSINIT support by default, remain silent by default
The loader tunable 'debug.verbose_sysinit' may be used to toggle verbosity.
This is added to the debugging section of these kernconfs to be turned off
in stable branches for clarity of intent.
dteske [Wed, 31 Oct 2018 20:37:12 +0000 (20:37 +0000)]
Add new rc keywords: enable, disable, delete
This adds new keywords to rc/service to enable/disable a service's
rc.conf(5) variable and "delete" to remove the variable.
When the "service_delete_empty" variable in rc.conf(5) is set to "YES"
(default is "NO") an rc.conf.d file (in /etc/ or /usr/local/etc) is
deleted if empty after modification using "service $foo delete".
0mp [Wed, 31 Oct 2018 17:47:08 +0000 (17:47 +0000)]
ps(1): Pet mandoc and igor
- Use Xr to reference other manual pages.
- Reference execve(2) instead of exec(2) as exec(2) does not exist.
- Remove the deprecated "Tn" macro.
- Improve the formatting of the etime description.
Reviewed by: bcr
Approved by: krion (mentor, implicit), mat (mentor, implicit)
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D17780
andrew [Wed, 31 Oct 2018 17:41:53 +0000 (17:41 +0000)]
Always set the MP_QUIRK_CPULIST quirk under ACPI. This needs a run time
check to only set it for emulators as the CPU list may be changed when
the emulator starts. Until this is working just always set it.
tsoome [Wed, 31 Oct 2018 16:42:40 +0000 (16:42 +0000)]
loader: issue edd probe before legacy ah=08 and detect no media
while probing for drives, use int13 extended info before standard one and
provide workaround for case we are not getting needed information in case
of floppy drive.
In case of INT13 errors, there are (at least) 3 error codes appearing in case
of missin media - 20h, 31h and 80h. Flag the no media and do not print an
error.
brooks [Wed, 31 Oct 2018 16:17:45 +0000 (16:17 +0000)]
Reformat syscalls.master for better readability.
This takes advantage of two recents changes to makesyscalls.sh:
r328598: Permit a range of syscall numbers for UNIMPL
r339624: Remove the need for backslashes in syscalls.master
Syscall declerations are now split across multiple lines with the
syscall name and variables each on seperate lines (with an exception for
syscalls taking no arguments.)
emaste [Wed, 31 Oct 2018 14:19:58 +0000 (14:19 +0000)]
Add __used to __CTOR_LIST__ and __DTOR_LIST__
Enabling BSD_CRTBEGIN on amd64 resulted in
error: unused variable '__CTOR_LIST__'.
__CTOR_LIST__ is indeed unused in crtbegin.c; it marks the beginning of
the .ctors array and is used in crtend.c. Annotate __DTOR_LIST__ as
well for consistency.
Discussed with: andrew
MFC with: r339738
Sponsored by: The FreeBSD Foundation
andrew [Wed, 31 Oct 2018 12:00:35 +0000 (12:00 +0000)]
Use pmap_invalidate_all rather than invalidating 512 level 2 entries in
the early pmap_mapbios/unmapbios code. It is even worse when there are
multiple L2 entries to handle as we would need to iterate over all pages.
arichardson [Wed, 31 Oct 2018 10:45:28 +0000 (10:45 +0000)]
Don't run cc --version during cleandir/obj stages
This will no work when there is no cc in $PATH (which is the case before the
cross-tools stage once we no longer inherit $PATH in $WMAKE).
The variables set by bsd.compiler.mk/bsd.linker.mk are not needed in these
stages so this avoids a little bit of makefile parsing.
mckusick [Wed, 31 Oct 2018 05:17:53 +0000 (05:17 +0000)]
In preparation for adding inode check-hashes, change the fsck_ffs
inodirty() function to have a pointer to the inode being dirtied.
No functional change (as for now the parameter is ununsed).
bz [Tue, 30 Oct 2018 20:51:03 +0000 (20:51 +0000)]
As a follow-up to r339930 and various reports implement logging in case
we fail during module load because the pcpu or vnet module sections are
full. We did return a proper error but not leaving any indication to
the user as to what the actual problem was.
Even worse, on 12/13 currently we are seeing an unrelated error (ENOSYS
instead of ENOSPC, which gets skipped over in kern_linker.c) to be
printed which made problem diagnostics even harder.
bz [Tue, 30 Oct 2018 20:45:15 +0000 (20:45 +0000)]
With more excessive use of modules, more kernel parts working with
VIMAGE, and feature richness and global state increasing the 8k of
vnet module space are no longer sufficient for people and loading
multiple modules, e.g., pf(4) and ipl(4) or ipsec(4) will fail on
the second module.
Increase the module space to 8 * PAGE_SIZE which should be enough
to hold multiple firewalls, ipsec, multicast (as in the old days was
a problem), epair, carp, and any kind of other vnet enabled modules.
Sadly this is a global byte array part of the vnet_set, so we cannot
dynamically change its size; otherwise a TUNABLE would have been
a better solution.
PR: 228854
Reported by: Ernie Luzar, Marek Zarychta
Discussed with: rgrimes on current
MFC after: 3 days
bz [Tue, 30 Oct 2018 20:08:48 +0000 (20:08 +0000)]
Initial implementation of draft-ietf-6man-ipv6only-flag.
This change defines the RA "6" (IPv6-Only) flag which routers
may advertise, kernel logic to check if all routers on a link
have the flag set and accordingly update a per-interface flag.
If all routers agree that it is an IPv6-only link, ether_output_frame(),
based on the interface flag, will filter out all ETHERTYPE_IP/ARP
frames, drop them, and return EAFNOSUPPORT to upper layers.
The change also updates ndp to show the "6" flag, ifconfig to
display the IPV6_ONLY nd6 flag if set, and rtadvd to allow
announcing the flag.
Further changes to tcpdump (contrib code) are availble and will
be upstreamed.
Tested the code (slightly earlier version) with 2 FreeBSD
IPv6 routers, a FreeBSD laptop on ethernet as well as wifi,
and with Win10 and OSX clients (which did not fall over with
the "6" flag set but not understood).
We may also want to (a) implement and RX filter, and (b) over
time enahnce user space to, say, stop dhclient from running
when the interface flag is set. Also we might want to start
IPv6 before IPv4 in the future.
All the code is hidden under the EXPERIMENTAL option and not
compiled by default as the draft is a work-in-progress and
we cannot rely on the fact that IANA will assign the bits
as requested by the draft and hence they may change.
markj [Tue, 30 Oct 2018 18:26:34 +0000 (18:26 +0000)]
Add malloc_domainset(9) and _domainset variants to other allocator KPIs.
Remove malloc_domain(9) and most other _domain KPIs added in r327900.
The new functions allow the caller to specify a general NUMA domain
selection policy, rather than specifically requesting an allocation from
a specific domain. The latter policy tends to interact poorly with
M_WAITOK, resulting in situations where a caller is blocked indefinitely
because the specified domain is depleted. Most existing consumers of
the _domain KPIs are converted to instead use a DOMAINSET_PREF() policy,
in which we fall back to other domains to satisfy the allocation
request.
This change also defines a set of DOMAINSET_FIXED() policies, which
only permit allocations from the specified domain.
Discussed with: gallatin, jeff
Reported and tested by: pho (previous version)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17418
markj [Tue, 30 Oct 2018 17:57:40 +0000 (17:57 +0000)]
Fix some problems that manifest when NUMA domain 0 is empty.
- In uma_prealloc(), we need to check for an empty domain before the
first allocation attempt, not after. Fix this by switching
uma_prealloc() to use a vm_domainset iterator, which addresses the
secondary issue of using a signed domain identifier in round-robin
iteration.
- Don't automatically create a page daemon for domain 0.
- In domainset_empty_vm(), recompute ds_cnt and ds_order after
excluding empty domains; otherwise we may frequently specify an empty
domain when calling in to the page allocator, wasting CPU time.
Convert DOMAINSET_PREF() policies for empty domains to round-robin.
- When freeing bootstrap pages, don't count them towards the per-domain
total page counts for now: some vm_phys segments are created before
the SRAT is parsed and are thus always identified as being in domain 0
even when they are not. Then, when bootstrap pages are freed, they
are added to a domain that we had previously thought was empty. Until
this is corrected, we simply exclude them from the per-domain page
count.
Reported and tested by: Rajesh Kumar <rajfbsd@gmail.com>
Reviewed by: gallatin
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17704
bz [Tue, 30 Oct 2018 15:46:30 +0000 (15:46 +0000)]
Introduce an EXPERIMENTAL option for both src.conf(5) and the kernel.
In the last decade(s) we have seen both short term or long term projects
committed to the tree which were considered or even marked "experimental".
While out-of-tree development has become easier than it used to be in
CVS times, there still is a need to have the code shipping with HEAD but
not enabled by default.
While people may think about VIMAGE as one of the recent larger, long term
projects, early protocol implementations (before they are standardised)
are others. (Free)BSD historically was one of the operating systems
which would have running code at early stages and help develop and
influence standardisation and the industry.
Give developers an opportunity to be more pro-active for early adoption
or running large scale code changes stumbling over each others but not
the user's feet. I have not added the option to NOTES in order to avoid
breaking supported option builds, which require constant compile testing.
trasz [Tue, 30 Oct 2018 15:43:06 +0000 (15:43 +0000)]
Remove useless call to access(2) from tzcode. Quoting OpenBSD:
> Remove doaccess variable and access(2) call since this interfers with
> applications like zdump(8) because pledge(2) doesn't allow access(2) to
> /usr/share/zoneinfo.
>
> millert@ better described why this call can go away:
>
> "This looks like an attempt to do access checks based on the real uid instead
> of the effective uid. Basically for setuid programs we don't want to allow a
> user to set TZ to a path they should not be able to otherwise access.
>
> However, we already have a check for issetugid() above so I think the doaccess
> bits can just be removed and we can rely on open()."
>
> After discussion with tb@, deraadt@ and millert@, this was also OK'ed by them
vangyzen [Tue, 30 Oct 2018 14:54:15 +0000 (14:54 +0000)]
Always stop the scheduler when entering kdb
Set curthread->td_stopsched when entering kdb via any vector.
Previously, it was only set when entering via panic, so when
entering kdb another way, mutexes and such were still "live",
and an attempt to lock an already locked mutex would panic.
araujo [Tue, 30 Oct 2018 09:53:57 +0000 (09:53 +0000)]
Allow changing lagg(4) MTU.
Previously, changing the MTU would require destroying the lagg and
creating a new one. Now it is allowed to change the MTU of
the lagg interface and the MTU of the ports will be set to match.
If any port cannot set the new MTU, all ports are reverted to the original
MTU of the lagg. Additionally, when adding ports, the MTU of a port will be
automatically set to the MTU of the lagg. As always, the MTU of the lagg is
initially determined by the MTU of the first port added. If adding an
interface as a port for some reason fails, that interface is reverted to its
original MTU.
Submitted by: Ryan Moeller <ryan@freqlabs.com>
Reviewed by: mav
Relnotes: Yes
Sponsored by: iXsystems Inc.
Differential Revision: https://reviews.freebsd.org/D17576
andrew [Tue, 30 Oct 2018 09:43:26 +0000 (09:43 +0000)]
Run the csu tests on a DSO. This builds the tests into a shared library,
then runs these from the base test programs. With this we can check
crtbeginS.o and crtendS.o are working as expected.