dim [Thu, 5 Mar 2020 18:09:19 +0000 (18:09 +0000)]
Revert r357259, after the merge from head which added linker scripts for
stand/i386 boot:
Revert upstream lld r371957 (git commit 06bb7dfbd) by Fangrui Song:
[ELF] Map the ELF header at imageBase
If there is no readonly section, we map:
* The ELF header at imageBase+maxPageSize
* Program headers at imageBase+maxPageSize+sizeof(Ehdr)
* The first section .text at imageBase+maxPageSize+sizeof(Ehdr)+sizeof(program headers)
Due to the interaction between Writer<ELFT>::fixSectionAlignments and
LinkerScript::allocateHeaders,
`alignDown(p_vaddr(R PT_LOAD)) = alignDown(p_vaddr(RX PT_LOAD))`.
The RX PT_LOAD will override the R PT_LOAD at runtime, which is not ideal:
```
// PHDR at 0x401034, should be 0x400034
PHDR 0x000034 0x00401034 0x00401034 0x000a0 0x000a0 R 0x4
// R PT_LOAD contains just Ehdr and program headers.
// At 0x401000, should be 0x400000
LOAD 0x000000 0x00401000 0x00401000 0x000d4 0x000d4 R 0x1000
LOAD 0x0000d4 0x004010d4 0x004010d4 0x00001 0x00001 R E 0x1000
```
* createPhdrs allocates the headers to the R PT_LOAD.
* fixSectionAlignments assigns `imageBase+maxPageSize+sizeof(Ehdr)+sizeof(program headers)` (formula: `alignTo(dot, maxPageSize) + dot % config->maxPageSize`) to addrExpr of .text
* allocateHeaders computes the minimum address among SHF_ALLOC sections, i.e. addr(.text)
* allocateHeaders sets address of ELF header to `addr(.text)-sizeof(Ehdr)-sizeof(program headers) = imageBase+maxPageSize`
The main observation is that when the SECTIONS command is not used, we
don't have to call allocateHeaders. This requires an assumption that
the presence of PT_PHDR and addresses of headers can be decided
regardless of address information.
This may seem natural because dot is not manipulated by a linker script.
The other thing is that we have to drop the special rule for -T<section>
in `getInitialDot`. If -Ttext is smaller than the image base, the headers
will not be allocated with the old behavior (allocateHeaders is called)
but always allocated with the new behavior.
The behavior change is not a problem. Whether and where headers are
allocated can vary among linkers, or ld.bfd across different versions
(--enable-separate-code or not). It is thus advised to use a linker
script with the PHDRS command to have a consistent behavior across
linkers. If PT_PHDR is needed, an explicit --image-base can be a simpler
alternative.
kib [Thu, 5 Mar 2020 15:52:34 +0000 (15:52 +0000)]
buffer pager: deref ucred immediately after read.
Ucred is passed to bread(9) so that non-local filesystems use proper
credentials. But, since clean buffer might be cached unless
buf_pager_relbuf is not enabled, it makes credentials to have extra
reference until buffer is reclaimed. Ucred reference would prevent
jail from destroying if creds are jailed.
Dereferencing the read credentials on the valid buffer avoid that, and
should be fine because the buffer is valid and does not need re-read.
PR: 238032
Reported by: bz
Reproduced and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D23775
tijl [Thu, 5 Mar 2020 14:41:27 +0000 (14:41 +0000)]
Move compat.linux.map_sched_prio sysctl definition to linux_mib.c so it is
only defined by linux_common kernel module and not both linux and linux64
modules.
alfredo [Thu, 5 Mar 2020 14:13:22 +0000 (14:13 +0000)]
[PowerPC64] restrict memcpy/bcopy optimization to POWER ISA >=V2.07
VSX instructions were added in POWER ISA V2.06 (POWER7), but it
requires data to be word-aligned. Such requirement was removed in
ISA V2.07B (POWER8).
Since current memcpy/bcopy optimization relies on VSX instructions
handling misalignment transparently, and kernel doesn't currently
implement an alignment error handler, this optimzation should be
restrict to ISA V2.07 onwards.
SIGBUS on stxvd2x instruction was reproduced in POWER7+ CPU.
imp [Thu, 5 Mar 2020 06:20:17 +0000 (06:20 +0000)]
xpt_async is submitting a CCB, not finishing it up, so use xpt_action() instead
of xpt_done(). Add the missing XPT_ASYNC case to xpt_action_default. xpt_async
wants to use the side-effect of the xpt_done() routine to queue this to the
camisr thread so it can be done in that context. However, this breaks the
symmetry that you create a ccb and call xpt_action() for it to be
dispatched. Restore that symmetry by having it go through that path. As far as I
can tell, this is the only CCB that we create and call xpt_done() on directly.
glebius [Wed, 4 Mar 2020 22:27:16 +0000 (22:27 +0000)]
When a machine boots the NFS mounting script is executed after
interfaces are configured, but for many interfaces (e.g. all Intel)
ifconfig causes link renegotiation, so the first attempt to mount
NFS always fails. After that mount_nfs sleeps for 30 seconds, while
only a couple seconds are actually required for interface to get up.
Instead of sleeping, do select(2) on routing socket and check if
some interface became UP and in this case retry immediately.
brooks [Wed, 4 Mar 2020 21:27:12 +0000 (21:27 +0000)]
Introduce kern_mmap_req().
This presents an extensible interface to the generic mmap(2)
implementation via a struct pointer intended to use a designated
initializer or compount literal. We take advantage of the mandatory
zeroing of fields not listed in the initializer.
Remove kern_mmap_fpcheck() and use kern_mmap_req().
The motivation for this change is a desire to keep the core
implementation from growing an ever-increasing number of arguments
that must be specified in the correct order for the lowest-level
implementations. In CheriBSD we have already added two more arguments.
I reported this in https://bugs.llvm.org/show_bug.cgi?id=44715, and
initially reverted the upstream change in r357259 to work around it.
However, after some discussion with Fangrui Song in the upstream ticket,
I think we can classify this as an unfortunate interaction between using
-Ttext=0 in combination with --no-rosegment. (We added the latter
in r332090, because btxld does not correctly handle input with more
than 2 PT_LOAD segments.)
Fangrui suggested to use a linker script instead, and Warner was already
attempting this in r305353, but had to revert it due to "crypto-using
boot problems" (not sure what those were :).
This review updates the stand/i386/boot.ldscript to handle more
sections, inserts some symbols like _edata and such that we use in
libsa, and also discards any .interp section.
It uses ORG which is defined on the linker command line using
--defsym ORG=value to set the start of all the sections.
emaste [Wed, 4 Mar 2020 20:29:49 +0000 (20:29 +0000)]
readelf: check note namesz and descsz
Previously corrupt note namesz or descsz (perhaps caused by readelf's
current lack of endian support for notes) resulted in a crash. Check
that namesz and descsz do not extend beyond the end of the buffer before
trying to access name and desc data.
Reported by: jhb
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
manu [Wed, 4 Mar 2020 20:01:03 +0000 (20:01 +0000)]
dwmmc: Rework the DMA engine
Each segment can be up to 4096 bytes in chain structure according to the
RK3399 TRM Part 2.
Set the buffers in full ring where the last one point to the first one.
Correctly reports the MMC_IVAR_MAX_DATA.
Use CACHE_LINE_SIZE for bus_dma alignment.
hselasky [Wed, 4 Mar 2020 17:23:20 +0000 (17:23 +0000)]
Implement a detaching flag for the sound(4) subsystem to take
appropriate actions when we are trying to detach an audio device,
but cannot because someone is using it.
This avoids applications having to wait for the DSP read data
timeout before they receive any error indication.
Tested with virtual_oss(8).
Implement optional table entry limits for if_llatbl.
Implement counting of table entries linked on a per-table base
with an optional (if set > 0) limit of the maximum number of table
entries.
For that the public lltable_link_entry() and lltable_unlink_entry()
functions as well as the internal function pointers change from void
to having an int return type.
Given no consumer currently sets the new llt_maxentries this can be
committed on its own. The moment we make use of the table limits,
the callers of the link function must check the return value as
it can change and entries might not be added.
Adjustments for IPv6 (and possibly IPv4) will follow.
tuexen [Wed, 4 Mar 2020 16:41:25 +0000 (16:41 +0000)]
When using automatically generated flow labels and using TCP SYN
cookies, use the same flow label for the segments sent during the
handshake and after the handshake.
This fixes a bug by making sure that sc_flowlabel is always stored in
network byte order.
Reviewed by: bz@
MFC after: 3 days
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D23957
Add four new counters for ND6 related Anti-DoS measures.
We split these out into a separate upfront commit so that we only
change the struct size one time. Implementations using them will
follow.
tuexen [Wed, 4 Mar 2020 12:22:53 +0000 (12:22 +0000)]
Don't send an uninitilised traffic class in the IPv6 header, when
sending a TCP segment from the TCP SYN cache (like a SYN-ACK).
This fix initialises it to zero. This is correct for the ECN bits,
but is does not honor the DSCP what an application might have set via
the IPPROTO_IPV6 level socket options IPV6_TCLASS. That will be
fixed separately.
Reviewed by: Richard Scheffenegger
MFC after: 3 days
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D23900
luporl [Wed, 4 Mar 2020 12:21:38 +0000 (12:21 +0000)]
[aacraid] Add missing unmap call for SYNC mode
This issue was observed on a PowerPC64 machine with an Adaptec RAID Controller
with PCI device ID 0x028d. After several read/write operations, the kernel was
panic'ing in bus_dmamap_sync(). This was due to a missing aac_unmap_command()
in the SYNC path.
chs [Wed, 4 Mar 2020 00:22:50 +0000 (00:22 +0000)]
if vm_pager_get_pages_async() returns an error, release the sfio->nios
refcount that we took earlier that represents the I/O that ended up
not being started.
imp [Tue, 3 Mar 2020 17:40:29 +0000 (17:40 +0000)]
Get rid of silly /* FALLTHROUGH */ lines
Consistently omit /* FALLTHROUGH */ when we have a case statement that does
nothing. Since compilers don't warn about stacked case statements, and we were
inconsistent, resolve by removing extras.
mav [Tue, 3 Mar 2020 15:05:13 +0000 (15:05 +0000)]
Increase number of write completion threads, matching ZoL.
Our iSCSI benchmarks on a large 80-core system show that previous limit
of 8 threads can be a bottleneck. At some points this change increases
write IOPS by as much as 50%. I am still not sure that so many threads
is really required, but we tested lower amounts and got no significant
benefits, while latencies were a bit worse, so decided to not diverge.
Add proper #includes, and #ifdefs and some style fixes to make RSS
kernels compile again. There are still possible issues with uin16_t
vs. uint_t cpuid which I am not going near.
The results of ktls_get_cpu() are stored in u_int and NETISR_CPUID_NONE
requires u_int. Adjust uint16_t to uint_t in order to make RSS kernels
compile some more again.
HPTS still has to be fixed, which is a bit more complicated.
ip6: retire in6_selectroute_fib() as promised 8 years ago
In r231852 I added in6_selectroute_fib() as a compat function with the
fibnum as an extra argument compared to in6_selectroute() to keep the
KPI stable.
Way too late retire this function again and add the fib to in6_selectroute()
which also only has a single consumer now and was an orphan function before.
0mp [Tue, 3 Mar 2020 13:25:08 +0000 (13:25 +0000)]
powerd.8: Improve style & fix typos
- Sort options.
- Do not use macros (like .Ar) to specify width for Bl (macros within that
string are not expanded).
- Use Cm instead of Ar for mode names.
- Fix some typos reported by mandoc.
- Move the documentation of the PID file from the -P flag description to
the FILES section.
ip6_output: use new routing KPI when not passed a cached route
Implement the equivalent of r347375 (IPv4) for the IPv6 output path.
In IPv6 we get passed a cached route (and inp) by udp6_output()
depending on whether we acquired a write lock on the INP.
In case we neither bind nor connect a first UDP packet would come in
with a cached route (wlocked) and all further packets would not.
In case we bind and do not connect we never write-lock the inp.
When we do not pass in a cached route, rather than providing the
storage for a route locally and pass it over the old lookup code
and down the stack, use the new route lookup KPI and acquire all
details we need to send the packet.
Compared to the IPv4 code the IPv6 code has a couple of possible
complications: given an option with a routing hdr/caching route there,
and path mtu (ro_pmtu) case which now equally has to deal with the
possibility of having a route which is NULL passed in, and the
fwd_tag in case a firewall changes the next hop (something to
factor out in the future).
fib6_rte_to_nh_*: return a link-local gw address with scope embedded
In fib6_rte_to_nh_* when returning a link-local gateway address
currently we do clear the scope. That could be recovered using
the ifp returned as well, but the code in general seems to
expect a link-local address with scope embeedded as otherwise
the "dst" (gw) passed to the output routines will not include
scope and not send the packet out (the right interface).
Do not clear the scope when returning a link-local address and
allow packets to go out (the right interface).
Remove the (now) extra scope recovery in the IPv6 fast-fwd code.
andrew [Tue, 3 Mar 2020 08:28:16 +0000 (08:28 +0000)]
Store the boot exception level on arm64 so it can be queried later
A hypervisor, e.g. bhyve, will need to know what exception levelthe kernel
was in when it started booting. If it was EL2 we can then enable said
hypervisor.
Store the boot exception level and allow the kernel to later query it.
Obtained from: https://github.com/FreeBSD-UPB/freebsd (earlier version)
Sponsored by: Innovate UK
jhibbits [Tue, 3 Mar 2020 03:22:00 +0000 (03:22 +0000)]
powerpc/powernv: powernv_node_numa_domain() fix non-NUMA case
If NUMA is not enabled in the kernel config, or is disabled at boot, this
function should just return domain 0 regardless of what's in the device
tree.
csjp [Tue, 3 Mar 2020 01:46:35 +0000 (01:46 +0000)]
In r358471, we interrupted the case block that would eventually lead
to the path related tokens not being processed. Restore this behavior and
and move AUE_JAIL_SET in this block, as it may conditionally contain a
path token.
cem [Tue, 3 Mar 2020 00:20:08 +0000 (00:20 +0000)]
Add extremely useful calendar(1) application to FreeBSD
It does extremely useful things like execute sendmail and spew dubiously
accurate factoids.
From the feedback, it seems like it is an essential utility in a modern unix
and not at all a useless bikeshed. How do those Linux people live without it?
Reverts r358561.
manu [Mon, 2 Mar 2020 21:19:51 +0000 (21:19 +0000)]
cpufreq_dt: Improve multiple opp support
When looking for cpu with the same OPP starts from the root /cpus node
so each instance of cpufreq_dt will now each cpu with the same operating
point.
Also test that the node we are testing have the property "device_type" set
to be equal to "cpu".
While here add more debug printfs (off by defaults).
emaste [Mon, 2 Mar 2020 20:14:27 +0000 (20:14 +0000)]
Add deprecation notices to ctau and cx drivers
These support outdated or obsolete ISA WAN (T1/E1) sync serial cards,
and these drivers haven't really been touched (other than in tree-wide
sweeps to keep them building) for 15+ years.
Related PCI devices ce and cp are still in the tree, with deprecation
proposed in D23928.
MFC after: 1 week
Relnotes: Yes
Sponsored by: The FreeBSD Foundation
kevans [Mon, 2 Mar 2020 18:40:34 +0000 (18:40 +0000)]
hexdump: tests: take into account byte order
Hexdump test was failling on big endian systems when testing decimal, octal
and hexa outputs as the tests were designed on a little endian system. This
revision adds the two distinct flavors of output expected and determines at
runtime which to compare against.
luporl [Mon, 2 Mar 2020 16:11:25 +0000 (16:11 +0000)]
[aacraid] Prevent sense data from causing a buffer overflow
This issue was observed on a PowerPC64 machine with an Adaptec RAID
Controller with PCI device ID 0x028d, where sense data was causing a
buffer overflow because of wrong max sense length logic.
kevans [Mon, 2 Mar 2020 15:58:50 +0000 (15:58 +0000)]
pkgbase: remove logic for _profile packages
We don't produce these anymore as of r356797, remove the remnant in
generate-ucl.sh that accounted for them. This isn't strictly necessary, but
future work is needed for the various packages that can be generated on a
lib build.
Namely, we may produce -development packages for private/internal libs that
should be installed but won't have the base FreeBSD-libfoo pkg to depend on
because it's internal (e.g. liby, libpmcstat, libifconfig) but we want the
headers installed. It may be a better move to just shove these into
-runtime-development instead, but if not then we've just simplified the
cases that need to take private/internal libs into account.
hselasky [Mon, 2 Mar 2020 09:45:06 +0000 (09:45 +0000)]
Expose the ACPI power button, sleep button and LID state as evdev's.
This allows libinput to disable touchpads when the lid is closed and
various desktop environments can show power-off dialogs when the power
button is pressed. While the latter is doable with devd a
cross-platform solution is nicer.
kevans [Mon, 2 Mar 2020 04:22:38 +0000 (04:22 +0000)]
elfctl: initialize features
GCC points out a couple levels down in convert_to_features that this may be
used uninitialized. Indeed, this is true- initialize it to NULL so that we
at least deref a null pointer.
kevans [Mon, 2 Mar 2020 02:45:57 +0000 (02:45 +0000)]
if_edsc: generate an arbitrary MAC address
When generating an cloned interface instance in edsc_clone_create(),
generate a MAC address from the FF OUI with ether_gen_addr(). This allows us
to have unique local-link addresses. Previously, the MAC address was zero.
emaste [Mon, 2 Mar 2020 02:36:41 +0000 (02:36 +0000)]
Move ELF feature note tool to usr.bin/elfctl
elfctl is a tool for modifying the NT_FREEBSD_FEATURE_CTL ELF note,
which contains a set of flags for enabling or disabling vulnerability
mitigations and other features.
Reviewed by: csjp, kib
MFC after: 2 weeks
Relnotes: Yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23910
grog [Sun, 1 Mar 2020 22:10:37 +0000 (22:10 +0000)]
Remove comment about Blackthorn winds, apparently imported from 4.4BSD
Lite. Nowadays it's trivial to find the explanation, such as at
https://www.deseret.com/2000/2/27/19493013/blackthorn-winds-make-bushes-bud.
It doesn't seem appropriate to replace it with an explanation.
mjg [Sun, 1 Mar 2020 21:53:46 +0000 (21:53 +0000)]
fd: move vnodes out of filedesc into a dedicated structure
The new structure is copy-on-write. With the assumption that path lookups are
significantly more frequent than chdirs and chrooting this is a win.
This provides stable root and jail root vnodes without the need to reference
them on lookup, which in turn means less work on globally shared structures.
Note this also happens to fix a bug where jail vnode was never referenced,
meaning subsequent access on lookup could run into use-after-free.