Alan Somers [Sun, 22 Sep 2019 00:12:43 +0000 (00:12 +0000)]
MF stable/12 r352489
Approved by: re (kib)
r351192:
periodic: fix anticongestion for scripts run after security
Revision 316342, which introduced the anticongestion feature, failed to
consider that the periodic scripts are executed by a recursive invocation of
periodic. The recursive invocation wrongly cleaned up a temporary file that
should've been cleaned up only by the original invocation. The result is
that if the first script that requests an anticongestion sleep runs after
the security scripts, the sleep won't happen.
Fix this bug by delaying cleanup until the end of the original invocation.
MFS r352565: SIOCSIFNAME: Do nothing if we're not actually changing
Instead of throwing EEXIST, just succeed if the name isn't actually
changing. We don't need to trigger departure or any of that because there's
no change from consumers' perspective.
- Copy stable/12@r352480 to releng/12.1 as part of the 12.1 release
cycle.
- Update from PRERELEASE to BETA1.
- Set the default pkg(7) repository to 'quarterly'.
- Bump __FreeBSD_version.
- Prune svn:mergeinfo from the new branch.
Approved by: re (implicit)
Sponsored by: Rubicon Communications, LLC (Netgate)
Alan Cox [Wed, 18 Sep 2019 07:25:04 +0000 (07:25 +0000)]
MFC r350463
In pmap_advise(), when we encounter a superpage mapping, we first demote
the mapping and then destroy one of the 4 KB page mappings so that there
is a potential trigger for repromotion. Currently, we destroy the first
4 KB page mapping that falls within the (current) superpage mapping or the
virtual address range [sva, eva). However, I have found empirically that
destroying the last 4 KB mapping produces slightly better results,
specifically, more promotions and fewer failed promotion attempts.
Accordingly, this revision changes pmap_advise() to destroy the last 4 KB
page mapping. It also replaces some nearby uses of boolean_t with bool.
Jayachandran C. [Wed, 18 Sep 2019 07:22:37 +0000 (07:22 +0000)]
MFC r340602:
gitv3_its: fixes for multiple GIC ITS blocks
First pass of support for multiple GIC ITS blocks with ACPI.
Changes are to:
* register the correct subset of interrupts with pic_register
in case of ACPI.
* initialize just the cpu interface for the first ITS, when
domain information is not avialable. This has to be done
until we split the per-CPU init to do LPI setup just once.
* remove duplicate check for the GIC ITS domain, the sc_cpus
are setup from domain, so the check again in per-CPU init
seems unnecessary.
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D17841
This is a major update for pci_host_generic_acpi.c, the current
implementation has some gaps that are better fixed up in one go.
The changes are to:
* Follow x86 method of not adding PCI resources to PCI host bridge in
ACPI code. This has been moved to pci_host_generic_acpi.c, where we
walk thru its resources of the host bridge and add them.
* Fixup code in pci_host_generic_acpi.c to read all decoded ranges
and update the 'ranges' property. This allows us to share most of
the code with generic implementation (and the FDT one).
* Parse and setup IO ranges and bus ranges when walking the resources
above. Drop most of the changes related to this from acpica code.
* Add the ECAM memory area as mem resource 0. Implement the logic to
get the ECAM area from MCFG (using bus range which we now decode),
or from _CBA (using _BBN/bus range). Drop aarch64 ifdefs from acpica
code which did part of this.
* Switch resource activation to similar code as FDT implementation,
this can be moved into generic implementation in a later pass.
* Drop the mechanism of using the 7th bit of bus number as the domain,
this is not correct and will work only in very specific cases. Use
_SEG as PCI domain and use the bus ranges of the host bridge to
provide start bus number.
This commit should not make any functional change to dev/acpica/acpi.c
for other architectures, almost all the changes there are to revert
earlier additions in this file done for aarch64.
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D17791
r340600:
pci_host_generic, acpi_resource: drop unneeded code
Now that we are handling PCI resources in pci_host_generic_acpi.c, we
don't need these change (made by r336129)
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D17792
r340601:
pci_host_generic : move activate/release to generic code
Now that the ACPI and FDT implementations for activating and
deactivating resources are the same, we can move it to
pci_host_generic.c. No functional changes.
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D17793
Jayachandran C. [Wed, 18 Sep 2019 07:09:16 +0000 (07:09 +0000)]
MFC r340598:
acpica: rework INTRNG interrupts
On arm64 (where INTRNG is enabled), the interrupts have to be mapped
with ACPI_BUS_MAP_INTR() before adding them as resources to devices.
The earlier code did the mapping before calling acpi_set_resource(),
which bypassed code that checked for PCI link interrupts.
To fix this, move the call to map interrupts into acpi_set_resource()
and that requires additional work to lookup interrupt properties.
The changes here are to:
* extend acpi_lookup_irq_handler() to lookup an irq in the ACPI
resources
* create a helper function acpi_map_intr() which uses the updated
acpi_lookup_irq_handler() to look up an irq, and then map it
with ACPI_BUS_MAP_INTR()
* use acpi_map_intr() in acpi_pcib_route_interrupt() to map
pci link interrupts.
With these changes, we can drop the ifdefs in acpi_resource.c, and
we can also drop the call for mapping interrupts in generic_timer.c
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D17790
The current quirk implementation writes a fixed address to the PCI BAR
to fix a firmware bug. The PCI BARs are allocated by firmware and will
change depending on PCI devices present. So using a fixed address here
is not correct.
This quirk worked around a firmware bug that programmed the MSI-X bar
of the SATA controller incorrectly. The newer firmware does not have
this issue, so it is better to drop this quirk altogether.
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D17655
r340596:
pci_host_generic: allocate resources against devices
Fix up pci_host_generic.c and pci_host_generic_fdt.c to allocate
resources against devices that requested them. Currently the
allocation happens against the pcib, which is incorrect.
This is needed for the upcoming changes for fixing up
pci_host_generic_acpi.c
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D17656
r340597:
pci_host_generic*: basic implementation of bus range
Both ACPI and FDT support bus ranges for pci host bridges. Update
pci_host_generic*.[ch] with a default implementation to support this.
This will be used in the next set of changes for ACPI based host
bridge. No functional changes in this commit.
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D17657
r349117:
Three enhancements to arm64's pmap_protect():
Implement protection changes on superpage mappings. Previously, a superpage
mapping was unconditionally demoted by pmap_protect(), even if the
protection change applied to the entire superpage mapping.
Precompute the bit mask describing the protection changes rather than
recomputing it for every page table entry that is changed.
Skip page table entries that already have the requested protection changes
in place.
r349122:
Three changes to arm64's pmap_unwire():
Implement wiring changes on superpage mappings. Previously, a superpage
mapping was unconditionally demoted by pmap_unwire(), even if the wiring
change applied to the entire superpage mapping.
Rewrite a comment to use the arm64 names for bits in a page table entry.
Previously, the bits were referred to by their x86 names.
Use atomic_"op"_64() instead of atomic_"op"_long() to update a page table
entry in order to match the prevailing style in this file.
r349183:
Correct an error in r349122. pmap_unwire() should update the pmap's wired
count, not its resident count.
r349897: (by markj)
Rename pmap_page_dirty() to pmap_pte_dirty().
This is a precursor to implementing dirty bit management.
r349943: (by markj)
Apply some light cleanup to uses of pmap_pte_dirty().
- Check for ATTR_SW_MANAGED before anything else.
- Use pmap_pte_dirty() in pmap_remove_pages().
r350004: (by markj)
Implement software access and dirty bit management for arm64.
Previously the arm64 pmap did no reference or modification tracking;
all mappings were treated as referenced and all read-write mappings
were treated as dirty. This change implements software management
of these attributes.
Dirty bit management is implemented to emulate ARMv8.1's optional
hardware dirty bit modifier management, following a suggestion from alc.
In particular, a mapping with ATTR_SW_DBM set is logically writeable and
is dirty if the ATTR_AP_RW_BIT bit is clear. Mappings with
ATTR_AP_RW_BIT set are write-protected, and a write access will trigger
a permission fault. pmap_fault() handles permission faults for such
mappings and marks the page dirty by clearing ATTR_AP_RW_BIT, thus
mapping the page read-write.
r350029: (by markj)
Propagate attribute changes during demotion.
After r349117 and r349122, some mapping attribute changes do not trigger
superpage demotion. However, pmap_demote_l2() was not updated to ensure
that the replacement L3 entries carry any attribute changes that
occurred since promotion.
r350038: (by markj)
Always use the software DBM bit for now.
r350004 added most of the machinery needed to support hardware DBM
management, but it did not intend to actually enable use of the hardware
DBM bit.
r350191:
Introduce pmap_store(), and use it to replace pmap_load_store() in places
where the page table entry was previously invalid. (Note that I did not
replace pmap_load_store() when it was followed by a TLB invalidation, even
if we are not using the return value from pmap_load_store().)
Correct an error in pmap_enter(). A test for determining when to set
PGA_WRITEABLE was always true, even if the mapping was read only.
In pmap_enter_l2(), when replacing an empty kernel page table page by a
superpage mapping, clear the old l2 entry and issue a TLB invalidation. My
reading of the ARM architecture manual leads me to believe that the TLB
could hold an intermediate entry referencing the empty kernel page table
page even though it contains no valid mappings.
Replace a couple direct uses of atomic_clear_64() by the new
pmap_clear_bits().
In a couple comments, replace the term "paging-structure caches", which is
an Intel-specific term for the caches that hold intermediate entries in the
page table, with wording that is more consistent with the ARM architecture
manual.
r350202:
With the introduction of software dirty bit emulation for managed mappings,
we should test ATTR_SW_DBM, not ATTR_AP_RW, to determine whether to set
PGA_WRITEABLE. In effect, we are currently setting PGA_WRITEABLE based on
whether the dirty bit is preset, not whether the mapping is writeable.
Correct this mistake.
r350422: (by markj)
Remove an unneeded trunc_page() in pmap_fault().
r350427: (by markj)
Have arm64's pmap_fault() handle WnR faults on dirty PTEs.
If we take a WnR permission fault on a managed, writeable and dirty
PTE, simply return success without calling the main fault handler. This
situation can occur if multiple threads simultaneously access a clean
writeable mapping and trigger WnR faults; losers of the race to mark the
PTE dirty would end up calling the main fault handler, which had no work
to do.
r350525: (by markj)
Use ATTR_DBM even when hardware dirty bit management is not enabled.
The ARMv8 reference manual only states that the bit is reserved in
this case; following Linux's example, use it instead of a
software-defined bit for the purpose of indicating that a managed
mapping is writable.
Andrew Turner [Tue, 17 Sep 2019 10:09:59 +0000 (10:09 +0000)]
MFC r343042, r343875
r343042:
Ensure the I-Cache is correctly handled in arm64_icache_sync_range
The cache_handle_range macro to handle the arm64 instruction and data
cache operations would return when it was complete. This causes problems
for arm64_icache_sync_range and arm64_icache_sync_range_checked as they
assume they can execute the i-cache handling instruction after it has been
called.
Fix this by making this assumption correct.
While here add missing instruction barriers and adjust the style to
match the rest of the assembly.
Andrew Turner [Tue, 17 Sep 2019 10:00:53 +0000 (10:00 +0000)]
MFC r342552:
Pass VM_PROT_EXECUTE to vm_fault for instruction faults.
We need to tell vm_fault the reason for the fault was because we tried to
execute from the memory location. Without this it may return with success
as we only request read-only memory, then we return to the same location
and try to execute from the same memory address. This leads to an infinite
loop raising the same fault and returning to the same invalid location.
While here, mark all non-standard ones as FreeBSD-only as
other systems (at least, GNU/Linux and illumos) do not handle
them, so we should not encourage their use.
- make abday, day, abmon, mon, am_pm output quoting match linux
- workaround localeconv() issue for mon_grouping and grouping (PR172215)
- for other values not available in default locale, output -1 instead of
127 (CHAR_MAX) as returned by localeconv()
Andrew Turner [Mon, 16 Sep 2019 15:00:11 +0000 (15:00 +0000)]
MFC r342937:
Fix the location of td->td_frame at the top of the kernel stack.
In cpu_thread_alloc we would allocate space for the trap frame at the top of
the kernel stack. This is just below the pcb, however due to a missing cast
the pointer arithmetic would use the pcb size, not the trapframe size. As
the pcb is larger than the trapframe this is safe, however later in cpu_fork
we include the case leading to the two disagreeing on the location.
Fix by using the same arithmetic in both locations.
Found by: An early KASAN patch
Sponsored by: DARPA, AFRL
Andrew Turner [Mon, 16 Sep 2019 14:35:02 +0000 (14:35 +0000)]
MFC r339948:
Use pmap_invalidate_all rather than invalidating 512 level 2 entries in
the early pmap_mapbios/unmapbios code. It is even worse when there are
multiple L2 entries to handle as we would need to iterate over all pages.
Andrew Turner [Mon, 16 Sep 2019 14:25:51 +0000 (14:25 +0000)]
MFC r348323:
The alignment is passed into contigmalloc_domainset in the 7th argument.
KUBSAN was complaining the pointer contigmalloc_domainset returned was
misaligned. Fix this by using the correct argument to find the alignment
in the function signature.
Andrew Turner [Mon, 16 Sep 2019 14:07:30 +0000 (14:07 +0000)]
MFC r343876:
Add missing data barriers after storeing a new valid pagetable entry.
When moving from an invalid to a valid entry we don't need to invalidate
the tlb, however we do need to ensure the store is ordered before later
memory accesses. This is because this later access may be to a virtual
address within the newly mapped region.
Add the needed barriers to places where we don't later invalidate the
tlb. When we do invalidate the tlb there will be a barrier to correctly
order this.
This fixes a panic on boot on ThunderX2 when INVARIANTS is turned off:
panic: vm_fault_hold: fault on nofault entry, addr: 0xffff000040c11000
Andrew Turner [Mon, 16 Sep 2019 13:45:31 +0000 (13:45 +0000)]
MFC r346996:
Restore x18 in efi_arch_leave.
Some UEFI implementations trash this register and, as we use it as a
platform register, the kernel doesn't save it before calling into the UEFI
runtime services. As we have a copy in tpidr_el1 restore from there when
exiting the EFI environment.
MFC the BSD crtbegin to stable/12 but keep it disabled.
r339738:
Implement a BSD licensed crtbegin/crtend
These are needed for .ctors/.dtors and .jcr handling. The former needs
all the function pointers to be called in the correct order from the
.init/.fini section. The latter just needs to call a gcj specific function
if it exists with a pointer to the start of the .jcr section.
This is currently disabled until __dso_handle support is added.
r339744:
Add a missing include for src.opts.mk. Without it MK_TESTS isn't defined.
MFC with: r339738
Sponsored by: DARPA, AFRL
r339770:
Drop the csu tests WARNS to 5 to fix the powerpc64 build.
MFC with: r339738
Sponsored by: DARPA, AFRL
r339773:
Add __dso_handle to the BSD crtbegin. This is used to identify shared
objects.
MFC with: r339738
Sponsored by: DARPA, AFRL
r339864:
Check __dso_handle is NULL in non-DSO objects. It should only be non-NULL
when accessed from a shared object.
MFC with: r339738
Sponsored by: DARPA, AFRL
r339865:
Include the csu test directories in BSD.tests.dist
MFC with: r339738
Sponsored by: DARPA, AFRL
r339866:
Make the .ctors, .dtors, and .jcr markers as static. They shouldn't be
accessible from out of the files they are defined in.
MFC with: r339738
Sponsored by: DARPA, AFRL
r339907:
The jcr argument to _Jv_RegisterClasses is used, stop marking it otherwise.
MFC with: r339738
Sponsored by: DARPA, AFRL
r339908:
Run the csu tests on a DSO. This builds the tests into a shared library,
then runs these from the base test programs. With this we can check
crtbeginS.o and crtendS.o are working as expected.
MFC with: r339738
Sponsored by: DARPA, AFRL
r339912:
Fix the location of the static keyword.
MFC with: r339738
Sponsored by: DARPA, AFRL
r339913:
Disable the .preinit_array test in DSOs, ld.bfd fails to link objects with
the section.
MFC with: r339738
Sponsored by: DARPA, AFRL
r339916:
Build the csu tests on all architectures.
The tests haven't been run them, but this is enough to build them so I can
get feedback on if the various crt.h headers are correct.
MFC with: r339738
Sponsored by: DARPA, AFRL
r339954:
Add __used to __CTOR_LIST__ and __DTOR_LIST__
Enabling BSD_CRTBEGIN on amd64 resulted in
error: unused variable '__CTOR_LIST__'.
__CTOR_LIST__ is indeed unused in crtbegin.c; it marks the beginning of
the .ctors array and is used in crtend.c. Annotate __DTOR_LIST__ as
well for consistency.
Discussed with: andrew
MFC with: r339738
Sponsored by: The FreeBSD Foundation
r340213:
Add the (untested) mips and sparc64 .init call sequences.
The BSD crtbegin/crtend code now builds on all architectures, however
further work is needed to check if it works correctly.
MFC with: r339738
Sponsored by: DARPA, AFRL
r340395:
Run __cxa_finalize in shared objects in the destructor path.
When we have .dtors call them before .dtor handling, otherwise call from
a destructor.
r340840:
Mark the function called by the MIPS .init/.fini sequence with .local.
As with r328939 we need to mark local symbols as such. Without this the
assembly parser treats the symbols as global and created relocations
against these private symbols.
MFC with: r339738
Sponsored by: DARPA, AFRL
r340910:
Add the missing 0 at the end of the .jcr section.
Without this the dynamic library test was failing as it was calling
_Jv_RegisterClasses multiple times.
r340911:
Re-enable the dynamiclib tests. These should be fixed by r340910.
r341424:
Disable the BSD CRT code on powerpc and sparc64, they need extra crt*.o
files that haven't been implemented.
lib/csu/tests/dynamiclib requires libh_csu.so be built first. I'm not
sure this is the most correct/best way to address this but it solves
the issue in my testing.
PR: 233734
Sponsored by: The FreeBSD Foundation
r342974:
Create crtsavres.o for powerpc builds
Summary:
GCC expects to link in a crtsavres.o on powerpc platforms. On
powerpc64 this is an empty file, but on powerpc and powerpcspe this does contain
some save/restore functions, which may not actually be necessary for newer
modern GCC and clang. This appeases the in-tree gcc, though, and is needed in
order to switch to the BSD CRTRBEGIN.
PR: 233751
Reviewed By: andrew
Differential Revision: https://reviews.freebsd.org/D18826
r351027:
Enable BSD_CRTBEGIN on powerpc
In r342974 jhibbits added support to build crtsavres.o. This was the
blocker for BSD_CRTBEGIN to be enabled there. As such enable this
option again.
Alan Cox [Mon, 16 Sep 2019 04:54:17 +0000 (04:54 +0000)]
MFC r349323, r349442, r349866, r349975
pmap_enter_quick_locked() never replaces a valid mapping, so it need not
perform a TLB invalidation. A barrier suffices. (See r343876.)
Add a comment to pmap_enter_quick_locked() in order to highlight the fact
that it does not replace valid mappings.
Correct a typo in one of pmap_enter()'s comments.
Introduce pmap_clear(), which zeroes a page table entry, and use it,
instead of pmap_load_clear(), in places where we don't care about the page
table entry's prior contents.
Eliminate an unnecessary pmap_load() from pmap_remove_all(). Instead, use
the value returned by the pmap_load_clear() on the very next line.
A KASSERT() in pmap_enter(), which originated in the amd64 pmap, was meant
to check the value returned by the pmap_load_clear() on the previous
line. However, we were ignoring the value returned by the
pmap_load_clear(), and so the KASSERT() was not serving its intended
purpose. Use the value returned by the pmap_load_clear() in the
KASSERT().
Alan Cox [Mon, 16 Sep 2019 02:31:58 +0000 (02:31 +0000)]
MFC r349003, r349031, r349042, r349129, r349290, r349618, r349798
Change pmap_demote_l2_locked() so that it removes the superpage mapping on
a demotion failure. Otherwise, some callers to pmap_demote_l2_locked(),
such as pmap_protect(), may leave an incorrect mapping in place on a
demotion failure.
Change pmap_demote_l2_locked() so that it handles addresses that are not
superpage aligned. Some callers to pmap_demote_l2_locked(), such as
pmap_protect(), may not pass a superpage aligned address.
Optimize TLB invalidation in pmap_remove_l2().
Change the arm64 pmap so that updates to the global count of wired pages
are not performed directly by the pmap. Instead, they are performed by
vm_page_free_pages_toq().
Batch the TLB invalidations that are performed by pmap_protect() rather
than performing them one at a time.
Eliminate a redundant call to pmap_invalidate_page() from
pmap_ts_referenced().
Introduce pmap_remove_l3_range() and use it in two places: (1)
pmap_remove(), where it eliminates redundant TLB invalidations by
pmap_remove() and pmap_remove_l3(), and (2) pmap_enter_l2(), where it may
optimize the TLB invalidations by batching them.
Implement pmap_copy().
Three changes to pmap_enter():
1. Use _pmap_alloc_l3() instead of pmap_alloc_l3() in order to handle the
possibility that a superpage mapping for "va" was created while we slept.
2. Eliminate code for allocating kernel page table pages. Kernel page
table pages are preallocated by pmap_growkernel().
3. Eliminate duplicated unlock operations when KERN_RESOURCE_SHORTAGE is
returned.
Alan Somers [Mon, 16 Sep 2019 00:59:10 +0000 (00:59 +0000)]
MFC r352231:
getsockopt.2: clarify that SO_TIMESTAMP is not 100% reliable
When SO_TIMESTAMP is set, the kernel will attempt to attach a timestamp as
ancillary data to each IP datagram that is received on the socket. However,
it may fail, for example due to insufficient memory. In that case the
packet will still be received but not timestamp will be attached.
r351318:
ping: Add tests of the Internet checksum function
Submitted by: Ján Sučan <sucanjan@gmail.com>
Sponsored by: Google LLC (Google Summer of Code 2019)
Differential Revision: https://reviews.freebsd.org/D21340
r351330:
ping: do reverse DNS lookup of the target address
When printing replies, ping will now attempt a reverse DNS lookup of the
target. That can be suppressed by using the "-n" option. Curiously, ping
has always done reverse lookups in certain error paths, but never in the
success path.
Submitted by: Ján Sučan <sucanjan@gmail.com>
Sponsored by: Google LLC (Google Summer of Code 2019)
Differential Revision: https://reviews.freebsd.org/D21351
r351393:
ping: add a basic functional test
Submitted by: Ján Sučan <sucanjan@gmail.com>
Sponsored by: Google, inc. (Google Summer of Code 2019)
Differential Revision: https://reviews.freebsd.org/D21289
r351398:
ping: By default, don't reverse lookup IP addresses
ping's default is now not to attempt reverse DNS lookups. The -H flag will
enable them. This change is not quite a reversion of r351330. That change
made the happy path and error path do reverse lookups consistently; this
change changes the default for both paths.
Submitted by: Ján Sučan <sucanjan@gmail.com>
Discussed with: cem
MFC-With: 351330
Sponsored by: Google LLC (Google Summer of Code 2019)
Differential Revision: https://reviews.freebsd.org/D21364
r351440:
ping: Fix alignment errors
This fixes -Wcast-align errors when compiled with WARNS=6.
Submitted by: Ján Sučan <sucanjan@gmail.com>
Sponsored by: Google LLC (Google Summer of Code 2019)
Differential Revision: https://reviews.freebsd.org/D21327
r351461:
ping: fix unaligned access to ancillary data
Use CMSG_FIRSTHDR rather than assume that an array is correctly aligned.
Fixes warnings on sparc64 and powerpcspe.
Submitted by: Ján Sučan <sucanjan@gmail.com>
MFH: 2 weeks
Sponsored by: Google LLC (Google Summer of Code 2019)
Differential Revision: https://reviews.freebsd.org/D21406
r351548:
ping: raise WARNS level to 6
Submitted by: Ján Sučan <sucanjan@gmail.com>
Sponsored by: Google LLC (Google Summer of Code 2019)
Differential Revision: https://reviews.freebsd.org/D21405
r352226:
ping: fix a string in an error message
r352229:
ping: Verify whether a datagram timestamp was actually received.
ping(8) uses SO_TIMESTAMP, which attaches a timestamp to each IP datagram at
the time it's received by the kernel. Except that occasionally it doesn't.
Add a check to see whether such a timestamp was actually set before trying
to read it. This fixes segfaults that can happen when the kernel doesn't
attach a timestamp.
The bug has always existed, but prior to r351461 it manifested as an
implausible round-trip-time, not a segfault.
Alan Cox [Sun, 15 Sep 2019 21:32:19 +0000 (21:32 +0000)]
MFC r349070
Previously, when pmap_remove_pages() destroyed a dirty superpage mapping,
it only called vm_page_dirty() on the first of the superpage's constituent
4KB pages. This revision corrects that error, calling vm_page_dirty() on
all of superpage's constituent 4KB pages.
Alan Cox [Sun, 15 Sep 2019 21:27:14 +0000 (21:27 +0000)]
MFC r349905
According to Section D5.10.3 "Maintenance requirements on changing System
register values" of the architecture manual, an isb instruction should be
executed after updating ttbr0_el1 and before invalidating the TLB.
ig4(4): Fix SDA HOLD time set too low on Skylake controllers
Execution of "Soft reset" command (IG4_REG_RESETS_SKL) at controller init
stage sets SDA_HOLD register value to 0x0001 which is often too low for
normal operation.
Set SDA_HOLD back to 28 after reset to restore controller functionality.
Alan Cox [Sun, 15 Sep 2019 17:22:29 +0000 (17:22 +0000)]
MFC r348828
Implement an alternative solution to the amd64 and i386 pmap problem that
we previously addressed in r348246 (and MFCed in r348479).
This pmap problem also exists on arm64 and riscv. However, the original
solution developed for amd64 and i386 cannot be used on arm64 and riscv.
In particular, arm64 and riscv do not define a PG_PROMOTED flag in their
level 2 PTEs. (A PG_PROMOTED flag makes no sense on arm64, where unlike
x86 or riscv we are required to break the old 4KB mappings before making
the 2MB mapping; and on riscv there are no unused bits in the PTE to
define a PG_PROMOTED flag.)
This commit implements an alternative solution that can be used on all
four architectures. Moreover, this solution has two other advantages.
First, on older AMD processors that required the Erratum 383 workaround,
it is less costly. Specifically, it avoids unnecessary calls to
pmap_fill_ptp() on a superpage demotion. Second, it enables the
elimination of some calls to pagezero() in pmap_kernel_remove_{l2,pde}().
In addition, remove a related stale comment from pmap_enter_{l2,pde}().
r350994:
ping: fix data type of a variable for a packet sequence number
Submitted by: Ján Sučan <sucanjan@gmail.com>
Sponsored by: Google, inc. (Google Summer of Code 2019)
Differential Revision: https://reviews.freebsd.org/D21244
r350998:
ping: use the monotonic clock to measure durations
Submitted by: Ján Sučan <sucanjan@gmail.com>
Sponsored by: Google, inc. (Google Summer of Code 2019)
Differential Revision: https://reviews.freebsd.org/D21245
r351030:
ping: fix triptime calculation after r350998
That revision changed the internal clock to the monotonic, but neglected to
change the datagram's timestamp source.
Reported by: Oliver Hartmann, Michael Butler
Reviewed by: Ján Sučan <sucanjan@gmail.com>, allanjude
MFC-With: r350998
Differential Revision: https://reviews.freebsd.org/D21258
r351033:
ping: Make in_cksum() operate on u_char buffer
This fixes -Wcast-align errors for in_cksum() calls when compiled with
WARNS=6.
Submitted by: Ján Sučan <sucanjan@gmail.com>
Sponsored by: Google, inc. (Google Summer of Code 2019)
Differential Revision: https://reviews.freebsd.org/D21261
r351171:
ping: Move in_cksum() to a separate source file
This is a preparation step for adding ATF tests of in_cksum(), which has been
modified to operate on unaligned data. ping.o cannot be linked to the test
executable because both of them contain 'main' symbol.
Submitted by: Ján Sučan <sucanjan@gmail.com>
Sponsored by: Google, inc. (Google Summer of Code 2019)
Differential Revision: https://reviews.freebsd.org/D21288
r351223:
ping: fix -Wformat-truncating warning with GCC
Increase buffer size for the string representation of n_time
ICMP timestamp is a 32-bit number. In pr_ntime(), number of minutes
and seconds is always 2 characters wide. Max. number of hours is 4
characters wide. The buffer size should be at least:
4 + 2 + 2 + 1 (':') + 1 (':') + 1 ('\0') = 11
Submitted by: Ján Sučan <sucanjan@gmail.com>
Sponsored by: Google, inc. (Google Summer of Code 2019)
Differential Revision: https://reviews.freebsd.org/D21325
r351226:
Fix uninitialized variable warnings when MK_CASPER=no
Submitted by: Ján Sučan <sucanjan@gmail.com>
Sponsored by: Google, inc. (Google Summer of Code 2019)
Differential Revision: https://reviews.freebsd.org/D21322
r351424:
ping: fix include guard symbol name to reflect the header file name
Submitted by: Ján Sučan <sucanjan@gmail.com>
MFC-With: 351171
Sponsored by: Google LLC (Google Summer of Code 2019)
Differential Revision: https://reviews.freebsd.org/D21374
MFC r351399: Fix the build with WITHOUT_GOOGLETEST
Attempting to build the fusefs tests WITHOUT_GOOGLETEST will result in an
error if the host system or sysroot doesn't already have googletest headers
in /usr/include/private (e.g. host built/installed WITHOUT_GOOGLETEST, clean
cross-buildworld WITHOUT_GOOGLETEST).
Fix multiple possible locking problems found by syzkaller and
update comment (which was wrong already anyway due to previous
changes).
Improve KASSERTs for debugging lock related issues.
Fold two RSS sections together.
This commit imports the new fusefs driver. It raises the protocol level
from 7.8 to 7.23, fixes many bugs, adds a test suite for the driver, and
adds many new features. New features include:
* Optional kernel-side permissions checks (-o default_permissions)
* Implement VOP_MKNOD, VOP_BMAP, and VOP_ADVLOCK
* Allow interrupting FUSE operations
* Support named pipes and unix-domain sockets in fusefs file systems
* Forward UTIME_NOW during utimensat(2) to the daemon
* kqueue support for /dev/fuse
* Allow updating mounts with "mount -u"
* Allow exporting fusefs file systems over NFS
* Server-initiated invalidation of the name cache or data cache
* Respect RLIMIT_FSIZE
* Try to support servers as old as protocol 7.4
Performance enhancements include:
* Implement FUSE's FOPEN_KEEP_CACHE and FUSE_ASYNC_READ flags
* Cache file attributes
* Cache lookup entries, both positive and negative
* Server-selectable cache modes: writethrough, writeback, or uncached
* Write clustering
* Readahead
* Use counter(9) for statistical reporting
r350990:
fusefs: add SVN Keywords to the test files
Reported by: SVN pre-commit hooks
MFC-With: r350665
Sponsored by: The FreeBSD Foundation
r350992:
fusefs: skip some tests when unsafe aio is disabled
MFC-With: r350665
Sponsored by: The FreeBSD Foundation
r351039:
fusefs: fix intermittency in the default_permissions.Unlink.ok test
The test needs to expect a FUSE_FORGET operation. Most of the time the test
would pass anyway, because by chance FUSE_FORGET would arrive after the
unmount.
MFC-With: 350665
Sponsored by: The FreeBSD Foundation
r351042:
fusefs: Fix the size of fuse_getattr_in
In FUSE protocol 7.9, the size of the FUSE_GETATTR request has increased.
However, the fusefs driver is currently not sending the additional fields.
In our implementation, the additional fields are always zero, so I there
haven't been any test failures until now. But fusefs-lkl requires the
request's length to be correct.
Fix this bug, and also enhance the test suite to catch similar bugs.
PR: 239830
MFC-With: 350665
Sponsored by: The FreeBSD Foundation
r351061:
fusefs: fix the 32-bit build after 351042
Reported by: jhb
MFC-With: 351042
Sponsored by: The FreeBSD Foundation
r351066:
fusefs: fix conditional from r351061
The entirety of r351061 was a copy/paste error. I'm sorry I've been
comitting so hastily.
Reported by: rpokala
Reviewed by: rpokala
MFC-With: 351061
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21265
r351113:
fusefs: don't send the namespace during listextattr
The FUSE_LISTXATTR operation always returns the full list of a file's
extended attributes, in all namespaces. There's no way to filter the list
server-side. However, currently FreeBSD's fusefs driver sends a namespace
string with the FUSE_LISTXATTR request. That behavior was probably copied
from fuse_vnop_getextattr, which has an attribute name argument. It's
been there ever since extended attribute support was added in r324620. This
commit removes it.
Reviewed by: cem
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21280
r351560:
fusefs: Fix some bugs regarding the size of the LISTXATTR list
* A small error in r338152 let to the returned size always being exactly
eight bytes too large.
* The FUSE_LISTXATTR operation works like Linux's listxattr(2): if the
caller does not provide enough space, then the server should return ERANGE
rather than return a truncated list. That's true even though in FUSE's
case the kernel doesn't provide space to the client at all; it simply
requests a maximum size for the list. We previously weren't handling the
case where the server returns ERANGE even though the kernel requested as
much size as the server had told us it needs; that can happen due to a
race.
* We also need to ensure that a pathological server that always returns
ERANGE no matter what size we request in FUSE_LISTXATTR won't cause an
infinite loop in the kernel. As of this commit, it will instead cause an
infinite loop that exits and enters the kernel on each iteration, allowing
signals to be processed.
Reviewed by: cem
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21287
r351961:
Coverity fixes in fusefs(5)
CID 1404532 fixes a signed vs unsigned comparison error in fuse_vnop_bmap.
It could potentially have resulted in VOP_BMAP reporting too many
consecutive blocks.
CID 1404364 is much worse. It was an array access by an untrusted,
user-provided variable. It could potentially have resulted in a malicious
file system crashing the kernel or worse.
Reported by: Coverity
Reviewed by: emaste
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21466
r351963:
fusefs: coverity cleanup in the tests
Address the following defects reported by Coverity:
* Structurally dead code (CID 1404366): set m_quit before FAIL, not after
* Unchecked return value of sysctlbyname (CID 1404321)
* Buffer overflows. These are all false positives caused by the fact that
Coverity thinks I'm using a buffer to store strings, when in fact I'm
really just using it to store a byte array that happens to be initialized
with a string. I'm changing the type from char to uint8_t in the hopes
that it will placate Coverity. (CID 1404338, 1404350, 1404367, 1404376, 1404379, 1404381, 1404388, 1404403, 1404425, 1404433, 1404434, 1404474, 1404480, 1404484, 1404503, 1404505)
* False positive file descriptor leak. I'm going to try to fix this with
Coverity modeling, but I'll also change an EXPECT to ASSERT so we don't
perform meaningless assertions after the failure. (CID 1404320, 1404324, 1404440, 1404445).
* Uninitialized variables in C++ constructors (CID 1404327, 1404346). In the
case of m_maxphys, this actually led to part of the FUSE_INIT's response
being set to stack garbage during the WriteCluster::clustering test.
* Uninitialized sun_len field in struct sockaddr_un (CID 1404330, 1404371, 1404429).
Reported by: Coverity
Reviewed by: emaste
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21457
r352021:
fusefs: suppress some Coverity resource leak CIDs in the tests
The fusefs tests deliberately leak file descriptors. To do otherwise would
add extra complications to the tests' mock FUSE server. This annotation
should hopefully convince Coverity to shut up about the leaks.
Reviewed by: uqs
Sponsored by: The FreeBSD Foundation
r352025:
mount_fusefs: fix a segfault on memory allocation failure
Reported by: Coverity
Coverity CID: 1354188
Sponsored by: The FreeBSD Foundation
r352230:
fusefs: Fix iosize for FUSE_WRITE in 7.8 compat mode
When communicating with a FUSE server that implements version 7.8 (or older)
of the FUSE protocol, the FUSE_WRITE request structure is 16 bytes shorter
than normal. The protocol version check wasn't applied universally, leading
to an extra 16 bytes being sent to such servers. The extra bytes were
allocated and bzero()d, so there was no information disclosure.
Reviewed by: emaste
MFC-With: r350665
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21557
MFC r352194: lualoader: Revert to ASCII menu frame for serial console
The box drawing characters we use aren't necessarily safe with a serial
console; for instance, in the report by npn@, these were causing his xterm
to send back a sequence that lua picked up as input and halted the boot.
This is less than ideal.
Fallback to ASCII frames for console with 'comconsole' in it. This is a
partial revert r338108 by imp@ -- instead of removing the menu entirely and
disabling color/cursor sequences, just reverting the default frame to ASCII
is enough to not break in this setup.
This command simply returns 0 at the moment and explicitly takes no
arguments. This should be used by utilities wanting to see if bectl can
operate on the system they're running, or with a specific root (`bectl -r`).
It may grow more checks than "will libbe successfully init" in the future,
but for now this is enough as that checks for the dataset mounted at "/" and
that it looks capable of being a BE root (e.g. it's not a top-level dataset)
bectl commands can now specify if they want to be silent, and this will turn
off libbe_print_on_error so they can control the output as needed. This is
already used in `bectl check`, and may be turned on in the future for some
other commands where libbe errors are better suppressed as the failure mode
may be obvious.
MFC r351813: bectl(8): implement sorting for 'bectl list' output
Allow 'bectl list' to sort output by a given property name. The property
name is passed in using a command-line flag, '-c' for ascending order and
'-C' for descending order. The properties allowed to sort by are:
- name (the default output, even if '-c' or '-C' are not used)
- creation
- origin
- used
- usedds
- usedsnap
- usedrefreserv
The default output for 'bectl list' is now ascending alphabetical order of
BE name.
To sort by creation time from earliest to latest, the command would be
'bectl list -c creation'
MFC r352092: bectl(8): initialize reverse earlier
This turns into a warning in GCC 4.2 that 'reverse' may be used
uninitialized in this function. While I don't immediately see where it's
deciding this from (there's only two paths that make column != NULL, and
they both set reverse), initializing reverse earlier is good for clarity.
Ian Lepore [Sat, 14 Sep 2019 20:26:50 +0000 (20:26 +0000)]
MFC r351885, r351887
r351885:
Ensure a measurement is complete before reading the result in ads111x.
Also, disable the comparator by default; it's not used for anything.
The previous logic would start a measurement, and then pause_sbt() for the
averaging time currently configured in the chip. After waiting that long,
the code would blindly read the measurement register and return its value.
The problem is that the chip's idea of averaging time is based on its
internal free-running 1MHz oscillator, which may be running at a wildly
different rate than the kernel clock. If the chip's internal timer was
running slower than the kernel clock, we'd end up grabbing a stale result
from an old measurement.
The driver now still uses pause_sbt() to yield the cpu while waiting for
the measurement to complete, but after sleeping it checks the chip's status
register to ensure the measurement engine is idle. If it's not, the driver
uses a retry loop to wait a bit (5% of the original wait time) then check
again for completion.
r351887:
Use a single write of 3 bytes instead of iicdev_writeto() in ads111x.
The iicdev_writeto() function basically does scatter-gather IO by filling
in a pair of iic_msg structs to write the register address then the data
from different locations but with a single bus START/xfer/STOP sequence.
It turns out several low-level i2c controller drivers do not honor the
IIC_NOSTART flag, so the second piece of the write gets a new START on
the bus, and that confuses the ads111x chips which expect a continuous
write of 3 bytes to set a register.
A proper fix for this is to track down all the misbehaving controllers
drivers and fix them. For now this change makes this driver work again.
MFC r351693:
Use DEVICE memory instead of UNCACHEABLE on aarch64 in ioremap() in the LinuxKPI.
This fixes system hangs on reading device registers on aarch64.
ti,dual-volt property say that the eMMC support 1.8V and 3.3V not 3.0V
Use the correct caps for the mmc stack.
Note that the MMCHS_SD_CAPA register can only be written once after bootup
so if one is using a u-boot compiled with eMMC support (this is the default)
this code is a no-op but just in case someone have u-boot compiled without
eMMC support this make eMMC works when the kernel is booted.
r351184:
Add method for getting of syscon handle from parent device.
If simple multifuction device also provides syscon interface, its
childern should be able to consume it. Due to this:
- declare coresponding method in syscon interface
- implement it in simple multifunction device driver
r351189:
Fix bug introduced by r351184.
We should check the returned handle, not the pointer to it.
Noticed by: ian
X-MFC with: r351184
r351217:
arm64: a37x0_gpio: Use syscon instead of MMIO region
The fdt node for this driver is a simple-mfd and syscon compatible one
meaning that simplemfd will be the driver attached for it. The gpio driver
is attached to the 'gpio' subnode so use syscon_get_handle_default to
obtain the handle of the syscon from the parent device and use this
to read/write to the memory region.
Alexander Motin [Fri, 13 Sep 2019 15:15:58 +0000 (15:15 +0000)]
MFC r352082, r352103: Fix number of problems found while testing on SAT devices.
- Remove incomplete and dangerous ata_res decoding from ata_do_cmd().
Instead switch all functions that need the result to use get_ata_status(),
doing the same, but more careful, also reducing code duplication.
- Made get_ata_status() to also decode fixed format sense. In many cases
it is still not enough to make it useful, since it can only report results
of 28-bit command, but it is slightly better then nothing.
- Organize error reporting in ata_do_cmd(), so that if caller specified
AP_FLAG_CHK_COND, it is responsible for command errors (non-ioctl ones).
- Make HPA/AMA errors not fatal for `identify` subcommand.
- Fix reprobe() not being called on HPA/AMA when in quiet mode.
- Remove not very useful messages from `format` and `sanitize` commands
with -y flag. Once they started, they often can't be stopped any way.
Error there mean that command was not even executed, and all information
we have about it is errno, and cam_error_print() call is not very useful.
Plus it is most likely a programmatic error, that shoud not happen.
Alexander Motin [Fri, 13 Sep 2019 14:42:37 +0000 (14:42 +0000)]
MFC r352011: Supply SAT layer with valid transfer sizes.
This is a rework of r344701, that noticed that number of bytes passes to
8 bit sector count field gets truncated. First decision was to not pass
anything, since ATA specs define the field as N/A. But it appeared to be a
problem for some SAT devices, that require information about data transfer
to operate properly. Some additional investigation shown that it is quite
a common practice to set unused fields of ATA commands (fortunately ATA
specs formally allow it) to supply the information to SAT layer. I have
found SAS-SATA interposer that does not allow pass-through without it.
As side effect, reduce code duplication by removing ata_do_28bit_cmd()
function, replacing it with more universal ata_do_cmd().
Simon J. Gerraty [Fri, 13 Sep 2019 05:54:09 +0000 (05:54 +0000)]
Use file destdir for stage_as sets
We cannot use file (without :T) to name targets
but we can use the destination directory (with / replaced by _)
This has the benefit of minimizing the targets created.
r351540:
cxgbe/t4_tom: Initialize all TOE connection parameters in one place.
Remove now-redundant items from toepcb and synq_entry and the code to
support them.
Let the driver calculate tx_align, rx_coalesce, and sndbuf by default.
cxgbe/t4_tom: Limit work requests with immediate payload to a single
descriptor. The per-tid tx credits are in demand during active Tx and
it's best not to use too many just for payload.
Alexander Motin [Wed, 11 Sep 2019 23:45:58 +0000 (23:45 +0000)]
MFC r348268 (by sef), r348293 (by cem):
Add an AESNI-optimized version of the CCM/CBC cryptographic and authentication
code. The primary client of this is probably going to be ZFS encryption.
Right now, aesni_cipher_alloc does a bit of special-casing
for CRYPTO_F_IOV, to not do any allocation if the first uio
is large enough for the requested size. While working on ZFS
crypto port, I ran into horrible performance because the code
uses scatter-gather, and many of the times the data to encrypt
was in the second entry. This code looks through the list, and
tries to see if there is a single uio that can contain the
requested data, and, if so, uses that.
This has a slight impact on the current consumers, in that the
check is a little more complicated for the ones that use
CRYPTO_F_IOV -- but none of them meet the criteria for testing
more than one.