commit log from upstream:
Fix race in parallel mount's thread dispatching algorithm
Strategy of parallel mount is as follows.
1) Initial thread dispatching is to select sets of mount points that
don't have dependencies on other sets, hence threads can/should run
lock-less and shouldn't race with other threads for other sets. Each
thread dispatched corresponds to top level directory which may or may
not have datasets to be mounted on sub directories.
2) Subsequent recursive thread dispatching for each thread from 1)
is to mount datasets for each set of mount points. The mount points
within each set have dependencies (i.e. child directories), so child
directories are processed only after parent directory completes.
The problem is that the initial thread dispatching in
zfs_foreach_mountpoint() can be multi-threaded when it needs to be
single-threaded, and this puts threads under race condition. This race
appeared as mount/unmount issues on ZoL for ZoL having different
timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8).
`zfs unmount -a` which expects proper mount order can't unmount if the
mounts were reordered by the race condition.
There are currently two known patterns of input list `handles` in
`zfs_foreach_mountpoint(..,handles,..)` which cause the race condition.
1) #8833 case where input is `/a /a /a/b` after sorting.
The problem is that libzfs_path_contains() can't correctly handle an
input list with two same top level directories.
There is a race between two POSIX threads A and B,
* ThreadA for "/a" for test1 and "/a/b"
* ThreadB for "/a" for test0/a
and in case of #8833, ThreadA won the race. Two threads were created
because "/a" wasn't considered as `"/a" contains "/a"`.
2) #8450 case where input is `/ /var/data /var/data/test` after sorting.
The problem is that libzfs_path_contains() can't correctly handle an
input list containing "/".
There is a race between two POSIX threads A and B,
* ThreadA for "/" and "/var/data/test"
* ThreadB for "/var/data"
and in case of #8450, ThreadA won the race. Two threads were created
because "/var/data" wasn't considered as `"/" contains "/var/data"`.
In other words, if there is (at least one) "/" in the input list,
the initial thread dispatching must be single-threaded since every
directory is a child of "/", meaning they all directly or indirectly
depend on "/".
In both cases, the first non_descendant_idx() call fails to correctly
determine "path1-contains-path2", and as a result the initial thread
dispatching creates another thread when it needs to be single-threaded.
Fix a conditional in libzfs_path_contains() to consider above two.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com> Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
PR: 237517, 237397, 239243
Submitted by: Matthew D. Fuller <fullermd@over-yonder.net> (by email)
Reported by: Christopher Krah, Thomas Barabosch, and Jan-Niclas Hilgert of Fraunhofer FKIE
Reported as: FS-22-EXT2-9: Denial of service in ftruncate-0 (ext2_balloc)
FS-11-EXT2-6: Denial Of Service in write-1 (ext2_balloc)
Accept an IEEE Extended Unique Identifier (EUI-64) from the command
line for each NVMe namespace. If one isn't provided, it will create one
based on the CRC16 of:
- the FreeBSD IEEE OUI
- PCI bus, device/slot, function values
- Namespace ID
The NVMe specification defines bits 13:4 of BAR0 as Reserved (i.e. 0x0).
Most drivers do not enforce this, but the Windows NVMe driver does and
will refuse to start the device (i.e. error 10) if any of these bits are
set.
bhyve's NVMe emulation was transferring Identify data back to the guest
incorrectly causing memory corruptions. These corruptions resulted in
core dumps and other system level errors in the guest.
Pedro F. Giffuni [Fri, 26 Jul 2019 21:08:01 +0000 (21:08 +0000)]
MFC r349802 (from fsu@):
Add additional check for 'blocks per group' and 'fragments per group'
superblock fields.
These fields will not be equal only in case if bigalloc filesystem feature is
turned on. This feature is not supported for now.
Reported by: Christopher Krah, Thomas Barabosch, and Jan-Niclas Hilgert of Fraunhofer FKIE
Reported as: FS-27-EXT2-12: Denial of Service in openat-0 (vm_fault_hold/ext2_clusteracct)
MFC r344120:
Unify i386 and amd64 getcontextx.c, and use ifuncs while there.
This is yet another attempt of the merge, previously done as r344436 and
reverted in r344463. It is redone since ld was changed to ifunc-capable
linker on i386.
r349380:
libbe(3): mount: the BE dataset is mounted at /
Other parts of libbe(3) were fairly strict on the mountpoint property of the
BE dataset, and be_mount was not much better. It was improved in r347027 to
allow mountpoint=none for depth==0, but this bit was still sensitive to
mountpoint != / and mountpoint != none. Given that other parts of libbe(3)
no longer restrict the mountpoint property here, and the rest of the base
system is generally OK and will assume that a BE is mounted at /, let's do
the same.
r349383:
libbe(3): restructure be_mount, skip canmount check for BE dataset
Further cleanup after r349380; loader and kernel will both ignore canmount
on the root dataset as well, so we should not be so strict about it when
mounting it. be_mount is restructured to make it more clear that depth==0 is
special, and to not try fetching these properties that we won't care about.
bectl advertises that it has the ability to create recursive and
non-recursive boot environments. This patch implements that functionality
using the be_create_depth API provided by libbe. With this patch, bectl now
works as bectl(8) describes in regards to creating recursive/non-recursive
boot environments.
r344226:
Fix memory corruption bug introduced in r325310
The bug occurred when a bounce buffer was used and the requested read
size was greater than the size of the bounce buffer. This commit also
rewrites the read logic so that it is easier to systematically verify
all alignment and size cases.
r344234:
It turns out r344226 narrowed the overrun bug but did not eliminate it entirely
This commit fixes a remaining output buffer overrun in the
single-sector case when there is a non-zero tail.
CID 1400451: case 0 is missing a break/return and falling through to the
default case. waitpid(0, ...) makes little sense in the child, we likely
wanted to terminate immediately.
CID 1400453: size argument uses sizeof(char **) instead of sizeof(char *)
and is assigned to a char **; sizeof's match but "this isn't a portable
assumption".
Brooks Davis [Thu, 25 Jul 2019 17:33:44 +0000 (17:33 +0000)]
MFC r350116:
Document that setmode(3) is not thread safe.
In some circumstances, setmode(3) may call umask(2) twice to retrieve
the current mode and then restore it. Between calls, the process will
have a umask of 0.
Simon J. Gerraty [Thu, 25 Jul 2019 00:07:10 +0000 (00:07 +0000)]
loader: ignore some variable settings if input unverified
libsecureboot can tell us if the most recent file opened was
verfied or not.
If it's state is VE_UNVERIFIED_OK, skip if variable
matches one of the restricted prefixes.
Brooks Davis [Wed, 24 Jul 2019 21:39:00 +0000 (21:39 +0000)]
MFC r350067:
Add missing mode in open(2) calls with O_CREAT.
When O_CREAT is specified, the third, variadic argument is
required as the permission. If on is not passed, then depending
on the ABI, either the contents of the third argument register
or some arbitrary stuff on the stack will be used as the permission.
Brooks Davis [Wed, 24 Jul 2019 20:17:39 +0000 (20:17 +0000)]
MFC r350049:
Fix two mismatches between function declaration and definition.
In both cases, function pointer arguments were inconsistently declared
and the result worked because of C's odd rules around function pointer
(de)references. With a stricter compiler these fail to compile.
Ed Maste [Wed, 24 Jul 2019 19:21:16 +0000 (19:21 +0000)]
MFC r343606: Enable lld as the system linker on i386
The migration to LLVM's lld linker has been in progress for quite some
time - I opened an LLVM tracking bug (23214) in April 2015 to track
issues using lld as FreeBSD's linker, and requested the first exp-run
using lld as /usr/bin/ld in November 2016.
In 12.0 LLD is the system linker on amd64, arm64, and armv7. i386 was
not switched [for 12.0] as there were additional ports failures not found
on amd64. Those have largely been addressed now, although there are a
small number of issues that are still being worked on. In some of these
cases having lld as the system linker makes it easier for developers and
third parties to investigate failures.
Thanks to antoine@ for handling the exp-runs and to everyone in the
FreeBSD and LLVM communites who have fixed issues with lld to get us to
this point.
Note for 12.1: There are still some issues to resolve in the ports tree,
but having the bootstrap linker (to build the kernel and installed
userland) be lld and the installed system linker (/usr/bin/ld) be GNU ld
causes other problems. In addition having having a different linker
configuration for i386 and amd64 in the same release causes some grief
for the ports team. So, switch to lld as the system linker on i386 in
stable/12 and plan to address remaining ports issues before 12.1.
PR: 214864 [exp-run]
Discussed with: jbeich, antoine
Relnotes: Yes
Sponsored by: The FreeBSD Foundation
Let linuxulator mprotect mask unsupported bits before calling kern_mprotect.
After r349240 kern_mprotect returns EINVAL for unsupported bits in the prot
argument. Linux rtld uses PROT_GROWSDOWN and PROT_GROWS_UP when marking the
stack executable. Mask these bits like kern_mprotect used to do. For other
unsupported bits EINVAL is returned like Linux does.
We can't use a u_int to compute the physical address in
pmap_early_vtophys(). Our int is 32-bit, but the physical address is
64-bit. This works fine if everything lives below 0x100000000, but as
soon as it doesn't this breaks.
Since the non-volatile registers are restored at the end of cpu_switchin (of
the new thread) they're free for us to use for our own purposes. Load the
PCB_FLAGS into a non-volatile register so it's preserved across the C
function calls that manage FPU and altivec state. This removes 4 loads from
each file. Might be a trivial performance improvement (~12 clock cycles per
context switch).
Add a facility for transmitting "raw" work requests on regular NIC queues.
- Use PH_loc.eight[1] as a general 'cflags' (Chelsio flags) field to
describe properties of a queued packet. The MC_RAW_WR flag
indicates an mbuf holding a raw work request. mbuf_cflags() returns
the current flags.
- Raw work request mbufs are allocated via alloc_wr_mbuf() which will
allocate a single contiguous range to hold the mbuf data. The
consumer can use mtod() to obtain the start of the work request and
write the required work request in the buffer. The mbuf can then be
enqueued directly to the txq via mp_ring_enqueue().
- Since raw work requests might potentially send arbitrary work
requests, only set the EQUIQ and EQUEQ bits on work requests that
support them such as the normal tunneled Ethernet packet work
requests.
cxgbe(4): Completely ignore all top level interrupts that are not enabled.
The driver used to log any non-zero cause and when running with a single
line interrupt it would spam the console/logs with reports of interrupts
that are of no interest to anyone.
r348504 moved llvm-symbolizer from the CLANG_EXTRAS knob to CLANG, but
the man page was still in the CLANG_EXTRAS section in
OptionalObsoleteFiles.inc.
llvm-symbolizer: Move out of CLANG_EXTRAS, into CLANG
ASAN reports become a lot more useful with llvm-symbolizer in $PATH, and the
build is not much more time-consuming. The added benefit is that the
resulting reports will actually include symbol information; without, thread
trace information includes a bunch of addresses that immediately resolve to
an inline function in
^/contrib/compiler-rt/lib/sanitizer_common/sanitizer_common.h and take a
little more effort to examine.
Upgrade our copies of clang, llvm, lld, lldb, compiler-rt, libc++,
libunwind and openmp to the upstream release_80 branch r363030
(effectively, 8.0.1 rc2). The 8.0.1 release should follow this within a
week or so.
MFC r349351 (by jhibbits, partially):
powerpc: Transition to Secure-PLT, like most other OSs (Toolchain part)
Summary:
Toolchain follow-up to r349350. LLVM patches will be submitted upstream for
9.0 as well.
The bsd.cpu.mk change is required because GNU ld assumes BSS-PLT if it
cannot determine for certain that it needs Secure-PLT, and some binaries do
not compile in such a way to make it know to use Secure-PLT.
Upgrade our copies of clang, llvm, lld, lldb, compiler-rt, libc++,
libunwind and openmp to the upstream release_80 branch r364487
(effectively, 8.0.1 rc3). The 8.0.1 release will most likely
have no further changes.
MFC r350177:
Merge llvm, clang, compiler-rt, libc++, libunwind, lld, lldb and openmp
8.0.1 final release r366581. The only functional change is a fix for a
mismerge of upstream r360816, which properly restores the r2 register
when unwinding on PowerPC64 (See https://reviews.freebsd.org/D20337).
Ed Maste [Tue, 23 Jul 2019 18:11:42 +0000 (18:11 +0000)]
bhyve: Fix resource leak when using strdup
MFC r340044 (araujo):
Fix resource leak when using strdup(3).
MFC r344160 (rgrimes):
In r340044 an attempt to quiet coverity warning cid 1357336
was incorrectly implemented leading to a possible double free.
It is possible for both the conditional free,
and the unconditional free added in r340044 to be done,
fix that by initializing uopt to NULL,
removing the conditional free,
and only using the unconditional free at the end.
This changes the return code however the caller only tests for 0 and != 0.
One might ask then, why multiple return codes when the caller only tests
for 0 and != 0? From what I can tell, Darren probably passed various
return codes for sake of debugging. The debugging code is long gone
however we can still use the different return codes using DTrace FBT
traces. We can still determine why the compare failed by examining the
differences between the fr1 and fr2 frentry structs, which is a simple
test in DTrace. This allows reducing the number of tests, improving the
code while not affecting our ability to capture information for
diagnostic purposes.
Bhyve can currently emulate two virtual NICs, namely virtio-net and e1000,
and connect to the host network through two backends, namely tap and netmap.
However, there is no interface between virtual NIC functionalities and
backend functionalities. As a result, the backend code is duplicated between
the two virtual NIC implementations and also within the same virtual NIC.
Also, e1000 cannot currently use netmap as a backend.
This patch introduces a network backend API between virtio-net/e1000 and
tap/netmap, to improve code reuse and add missing functionalities.
Virtual NICs and backends can negotiate virtio-net features, such as checksum
offload and TSO. If the backend supports the features, it will propagate this
information to the guest, so that the latter can make use of them. Currently,
only netmap VALE ports support the features, but support should be added to
tap in the future.
Calculate the offset of the interface name using FR_NAME rather than
calclulating it "by hand". This improves consistency with the rest of
the code and is in line with planned fixes and other work.
Recycle the unused FR_CMPSIZ macro which became orphaned in ipfilter 5
prior to its import into FreeBSD. This macro calculates the size to be
compared within the frentry structure. The ipfilter 4 version of the
macro calculated the compare size based upon the static size of the
frentry struct. Today it uses the ipfilter 5 method of calculating the
size based upon the new to ipfilter 5 fr_size value found in the
frentry struct itself.
Fix VOP_PUTPAGES(9) in regards to the use of VM_PAGER_CLUSTER_OK
Submitted by: Ka Ho Ng <khng300 at gmail.com>
Reviewed by: mckusick
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20695
MFC r349940:
Correctly truncate the rule in case when it has several action opcodes.
It is possible, that opcode at the ACTION_PTR() location is not real
action, but action modificator like "log", "tag" etc. In this case we
need to check for each opcode in the loop to find O_EXTERNAL_ACTION.
Alan Somers [Fri, 19 Jul 2019 14:13:50 +0000 (14:13 +0000)]
MFC r349041:
open(2): fix the description of O_FSYNC
The man page claims that with O_FSYNC (aka O_SYNC) the kernel will not cache
written data. However, that's not true. Nor does POSIX require it.
Perhaps it was true when that section of the man page was written in r69336
(I haven't checked). But it's not true now. Now the effect is simply that
writes are sent to disk immediately and synchronously, but they're still
cached.
See also: https://pubs.opengroup.org/onlinepubs/9699919799/
See also: ffs_write in sys/ufs/ffs/ffs_vnops.c
Reviewed by: cem
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20641
ipfilter commands, in this case ipf(8), passes its operations and rules
via an ioctl interface. Rules can be added or removed and stats and
counters can be zeroed out. As the ipfilter interprets these
instructions or operations they are stored in an integer called
addrem (add/remove). 0 is add, 1 is remove, and 2 is clear stats and
counters. Much of this is not documented. This commit documents these
operations by replacing simple integers with a self documenting
enum along with a few basic comments.
- Add rcu list functions.
- Make rcu hlist's foreach macro use rcu calls instead of the non-rcu macro.
- Bump FreeBSD version so we have a checkpoint for the vboxvideo drm driver.
Pull in r365760 from upstream lld trunk (by Fangrui Song):
[ELF] Handle non-glob patterns before glob patterns in version
scripts & fix a corner case of --dynamic-list
This fixes PR38549, which is silently accepted by ld.bfd.
This seems correct because it makes sense to let non-glob patterns
take precedence over glob patterns.
lld issues an error because
`assignWildcardVersion(ver, VER_NDX_LOCAL);` is processed before
`assignExactVersion(ver, v.id, v.name);`.
Move all assignWildcardVersion() calls after assignExactVersion()
calls to fix this.
Also, move handleDynamicList() to the bottom. computeBinding() called
by includeInDynsym() has this cryptic rule:
if (versionId == VER_NDX_LOCAL && isDefined() && !isPreemptible)
return STB_LOCAL;
Before the change:
* foo's version is set to VER_NDX_LOCAL due to `local: *`
* handleDynamicList() is called
- foo.computeBinding() is STB_LOCAL
- foo.includeInDynsym() is false
- foo.isPreemptible is not set (wrong)
* foo's version is set to V1
After the change:
* foo's version is set to VER_NDX_LOCAL due to `local: *`
* foo's version is set to V1
* handleDynamicList() is called
- foo.computeBinding() is STB_GLOBAL
- foo.includeInDynsym() is true
- foo.isPreemptible is set (correct)
This makes it longer necessary to patch the version scripts for the
samba ports, to avoid "duplicate symbol 'pdb_search_init' in version
script" errors.
Eric van Gyzen [Tue, 16 Jul 2019 16:05:42 +0000 (16:05 +0000)]
MFC r349834
Ignore kern.vt.splash_cpu without graphics
When the system has no graphical console, such as bhyve in common
configurations, ignore kern.vt.splash_cpu, instead of panicking
on INVARIANTS kernels.
MFC r349579: nctgpio: change default pin names to those used by the datasheet(s)
That is, instead of the current GPIO00 - GPIO15 the names will be GPIO00
- GPIO07, GPIO10 - GPIO17. The first digit is a GPIO "bank" / group
number and the second one is a pin number within the bank. Alternative
view is that the pin names are changed from decimal numbering scheme to
octal one (as there are 8 pins per bank).
MFC r349460: gpiobus: provide a new hint, pin_list
"pin_list" allows to specify child pins as a list of pin numbers.
Existing hint "pins" serves the same purpose but with a 32-bit wide bit
mask. One problem with that is that a controller can have more than 32
pins. One example is amdgpio. Also, a list of numbers is a little bit
more human friendly than a matching bit mask. As a side note, it seems
that in FDT pins are typically specified by their numbers as well.
This commit also adds accessors for instance variables (IVARs) that
define the child pins. My primary goal is to allow a child to be
configured programmatically rather than via hints (assuming that FDT is
not supported on a platform). Also, while a child should not care about
specific pin numbers that are allocated to it, it could be interested in
how many were actually assigned to it.
While there, I removed "flags" instance variable. It was unused.
MFC r349645:
Remove dead code added after r348743 in the LinuxKPI. The
LINUXKPI_VERSION macro is not defined for any compiled LinuxKPI code
which basically means __GFP_NOTWIRED is never checked when allocating
pages. This should work fine with the existing external DRM code as
long as the page wiring and unwiring is balanced.
This patch fixes 2 panics. The first one is due to the current VNET not
being set in the emulated adapter transmission path. The second one
is caused by the M_PKTHDR flag not being set when preallocated mbufs
are recycled in the transmit path.
Apply a workaround to be able to build clang 8.0.0 headers with clang
3.4.1, which is still in the stable/10 branch.
It looks like clang 3.4.1 implements static_asserts by instantiating a
temporary static object, and if those are in an anonymous union, it
results in "error: anonymous union can only contain non-static data
members".
To work around this implementation limitation, move the static_asserts
in question out of the anonymous unions.
This should make building the latest stable/11 from stable/10 possible
again.
John Baldwin [Sat, 13 Jul 2019 00:23:20 +0000 (00:23 +0000)]
MFC 343068:
Use capsicum_helpers(3) that allow us to simplify the code and its functions
will return success when the kernel is built without support of
the capability mode.
It is important to note, that I'm taking a more conservative approach
with these changes and it will be done in small steps.
John Baldwin [Fri, 12 Jul 2019 22:31:12 +0000 (22:31 +0000)]
MFC 339911,339936,343075,343166,348592: Various AMD CPU-specific fixes.
339911:
Emulate machine check related MSR_EXTFEATURES to allow guest OSes to
boot on AMD FX Series.
339936:
Merge cases with upper block.
This is a cosmetic change only to simplify code.
343075:
vmm(4): Take steps towards multicore bhyve AMD support
vmm's CPUID emulation presented Intel topology information to the guest, but
disabled AMD topology information and in some cases passed through garbage.
I.e., CPUID leaves 0x8000_001[de] were passed through to the guest, but
guest CPUs can migrate between host threads, so the information presented
was not consistent. This could easily be observed with 'cpucontrol -i 0xfoo
/dev/cpuctl0'.
Slightly improve this situation by enabling the AMD topology feature flag
and presenting at least the CPUID fields used by FreeBSD itself to probe
topology on more modern AMD64 hardware (Family 15h+). Older stuff is
probably less interesting. I have not been able to empirically confirm it
is sufficient, but it should not regress anything either.
343166:
vmm(4): Mask Spectre feature bits on AMD hosts
For parity with Intel hosts, which already mask out the CPUID feature
bits that indicate the presence of the SPEC_CTRL MSR, do the same on
AMD.
Eventually we may want to have a better support story for guests, but
for now, limit the damage of incorrectly indicating an MSR we do not yet
support.
Eventually, we may want a generic CPUID override system for
administrators, or for minimum supported feature set in heterogenous
environments with failover. That is a much larger scope effort than
this bug fix.
348592:
Emulate the AMD MSR_LS_CFG MSR used for various Ryzen errata.
Using dominance vs a set membership check is indistinguishable from a
compile time perspective, and the two queries return equivelent
results. Simplify code by using the existing function.
Pull in r360978 from upstream llvm trunk (by Philip Reames):
[LFTR] Strengthen assertions in genLoopLimit [NFCI]
Pull in r362292 from upstream llvm trunk (by Nikita Popov):
[IndVarSimplify] Fixup nowrap flags during LFTR (PR31181)
Fix for https://bugs.llvm.org/show_bug.cgi?id=31181 and partial fix
for LFTR poison handling issues in general.
When LFTR moves a condition from pre-inc to post-inc, it may now
depend on value that is poison due to nowrap flags. To avoid this, we
clear any nowrap flag that SCEV cannot prove for the post-inc addrec.
Additionally, LFTR may switch to a different IV that is dynamically
dead and as such may be arbitrarily poison. This patch will correct
nowrap flags in some but not all cases where this happens. This is
related to the adoption of IR nowrap flags for the pre-inc addrec.
(See some of the switch_to_different_iv tests, where flags are not
dropped or insufficiently dropped.)
Finally, there are likely similar issues with the handling of GEP
inbounds, but we don't have a test case for this yet.
Pull in r362971 from upstream llvm trunk (by Philip Reames):
Prepare for multi-exit LFTR [NFC]
This change does the plumbing to wire an ExitingBB parameter through
the LFTR implementation, and reorganizes the code to work in terms of
a set of individual loop exits. Most of it is fairly obvious, but
there's one key complexity which makes it worthy of consideration.
The actual multi-exit LFTR patch is in D62625 for context.
Specifically, it turns out the existing code uses the backedge taken
count from before a IV is widened. Oddly, we can end up with a
different (more expensive, but semantically equivelent) BE count for
the loop when requerying after widening. For the nestedIV example
from elim-extend, we end up with the following BE counts:
BEFORE: (-2 + (-1 * %innercount) + %limit)
AFTER: (-1 + (sext i32 (-1 + %limit) to i64) + (-1 * (sext i32 %innercount to i64))<nsw>)
This is the only test in tree which seems sensitive to this
difference. The actual result of using the wider BETC on this example
is that we actually produce slightly better code. :)
In review, we decided to accept that test change. This patch is
structured to preserve the old behavior, but a separate change will
immediate follow with the behavior change. (I wanted it separate for
problem attribution purposes.)
Pull in r362975 from upstream llvm trunk (by Philip Reames):
[LFTR] Use recomputed BE count
This was discussed as part of D62880. The basic thought is that
computing BE taken count after widening should produce (on average)
an equally good backedge taken count as the one before widening.
Since there's only one test in the suite which is impacted by this
change, and it's essentially equivelent codegen, that seems to be a
reasonable assertion. This change was separated from r362971 so that
if this turns out to be problematic, the triggering piece is obvious
and easily revertable.
For the nestedIV example from elim-extend.ll, we end up with the
following BE counts:
BEFORE: (-2 + (-1 * %innercount) + %limit)
AFTER: (-1 + (sext i32 (-1 + %limit) to i64) + (-1 * (sext i32 %innercount to i64))<nsw>)
Note that before is an i32 type, and the after is an i64. Truncating
the i64 produces the i32.
Pull in r362980 from upstream llvm trunk (by Philip Reames):
Factor out a helper function for readability and reuse in a future
patch [NFC]
Pull in r363613 from upstream llvm trunk (by Philip Reames):
Fix a bug w/inbounds invalidation in LFTR (recommit)
Recommit r363289 with a bug fix for crash identified in pr42279.
Issue was that a loop exit test does not have to be an icmp, leading
to a null dereference crash when new logic was exercised for that
case. Test case previously committed in r363601.
Original commit comment follows:
This contains fixes for two cases where we might invalidate inbounds
and leave it stale in the IR (a miscompile). Case 1 is when switching
to an IV with no dynamically live uses, and case 2 is when doing
pre-to-post conversion on the same pointer type IV.
The basic scheme used is to prove that using the given IV (pre or
post increment forms) would have to already trigger UB on the path to
the test we're modifying. As such, our potential UB triggering use
does not change the semantics of the original program.
As was pointed out in the review thread by Nikita, this is defending
against a separate issue from the hasConcreteDef case. This is about
poison, that's about undef. Unfortunately, the two are different, see
Nikita's comment for a fuller explanation, he explains it well.
(Note: I'm going to address Nikita's last style comment in a separate
commit just to minimize chance of subtle bugs being introduced due to
typos.)
Pull in r363875 from upstream llvm trunk (by Philip Reames):
[LFTR] Rename variable to minimize confusion [NFC]
(Recommit of r363293 which was reverted when a dependent patch was.)
As pointed out by Nikita in D62625, BackedgeTakenCount is generally
used to refer to the backedge taken count of the loop. A conditional
backedge taken count - one which only applies if a particular exit is
taken - is called a ExitCount in SCEV code, so be consistent here.
Pull in r363877 from upstream llvm trunk (by Philip Reames):
[LFTR] Stylistic cleanup as suggested in last review comment of
D62939 [NFC]
(Resumbit of r363292 which was reverted along w/an earlier patch)
Pull in r364346 from upstream llvm trunk (by Philip Reames):
[LFTR] Adjust debug output to include extensions (if any)
Pull in r364693 from upstream llvm trunk (by Philip Reames):
[IndVars] Remove a bit of manual constant folding [NFC]
SCEV is more than capable of folding (add x, trunc(0)) to x.
Pull in r364709 from upstream llvm trunk (by Nikita Popov):
[LFTR] Fix post-inc pointer IV with truncated exit count (PR41998)
Fixes https://bugs.llvm.org/show_bug.cgi?id=41998. Usually when we
have a truncated exit count we'll truncate the IV when comparing
against the limit, in which case exit count overflow in post-inc form
doesn't matter. However, for pointer IVs we don't do that, so we have
to be careful about incrementing the IV in the wide type.
I'm fixing this by removing the IVCount variable (which was ExitCount
or ExitCount+1) and replacing it with a UsePostInc flag, and then
moving the actual limit adjustment to the individual cases (which
are: pointer IV where we add to the wide type, integer IV where we
add to the narrow type, and constant integer IV where we add to the
wide type).