Implement software access and dirty bit management for arm64.
Previously the arm64 pmap did no reference or modification tracking;
all mappings were treated as referenced and all read-write mappings
were treated as dirty. This change implements software management
of these attributes.
Dirty bit management is implemented to emulate ARMv8.1's optional
hardware dirty bit modifier management, following a suggestion from alc.
In particular, a mapping with ATTR_SW_DBM set is logically writeable and
is dirty if the ATTR_AP_RW_BIT bit is clear. Mappings with
ATTR_AP_RW_BIT set are write-protected, and a write access will trigger
a permission fault. pmap_fault() handles permission faults for such
mappings and marks the page dirty by clearing ATTR_AP_RW_BIT, thus
mapping the page read-write.
Reviewed by: alc
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20907
Fix reference counting in pmap_ts_referenced() on RISC-V.
pmap_ts_referenced() does not necessarily clear the access bit from
all accessed mappings of a given page. Thus, if a scan of the mappings
needs to be restarted, we should be careful to avoid double-counting
accessed mappings whose access bits were not cleared in a previous
attempt.
Reported by: alc
Reviewed by: alc
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20926
Remove RELEASE_CRUNCH here. It's obsolete and hasn't worked in a while. The
build options need to be revisited, since many older ones are listed, while
newer useful ones are not. But that rototilling I'll leave to others.
Remove all the RELEASE_CRUNCH instances that partially disable IPSEC
We remove IPSEC only in parts of the tree, and not others. RELEASE_CRUNCH to
disable it has not kept up with all its uses. Remove it. Should there be a real
need to disable IPSEC, one that hasn't shown up in the base system to date,
it can be re-added behind a WITHOUT_IPSEC build option.
If umtxq_check_susp() indicates an exit, we should clean the resources
before returning. Do it by breaking out of the loop and relying on
post-loop cleanup.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 12 days
Differential revision: https://reviews.freebsd.org/D20949
Since these things are more completely controlled by the MK_OPENSSL knob, remove
RELEASE_CRUNCH here. It's no longer needed for the release and other users can
use the more proper knob if they so desire.
Now that we have MK_LS_COLORS, we don't need RELEASE_CRUNCH check here.
The RELEASE_CRUNCH check is redundant here. We don't need it for releases
anymore, and picobsd can control this more directly without making it a special
case.
Improve the input validation for l_linger.
When using the SOL_SOCKET level socket option SO_LINGER, the structure
struct linger is used as the option value. The component l_linger is of
type int, but internally copied to the field so_linger of the structure
struct socket. The type of so_linger is short, but it is assumed to be
non-negative and the value is used to compute ticks to be stored in a
variable of type int.
Therefore, perform input validation on l_linger similar to the one
performed by NetBSD and OpenBSD.
Thanks to syzkaller for making me aware of this issue.
Thanks to markj@ for pointing out that a similar check should be added
to so_linger_set().
This is the second in a number of patches needed to
get BBRv1 into the tree. This fixes the DSACK bug but
is also needed by BBR. We have yet to go two more
one will be for the pacing code (tcp_ratelimit.c) and
the second will be for the new updated LRO code that
allows a transport to know the arrival times of packets
and (tcp_lro.c). After that we should finally be able
to get BBRv1 into head.
When calling sctp_initialize_auth_params(), the inp must have at
least a read lock. To avoid more complex locking dances, just
call it in sctp_aloc_assoc() when the write lock is still held.
Calculate the offset of the interface name using FR_NAME rather than
calclulating it "by hand". This improves consistency with the rest of
the code and is in line with planned fixes and other work.
Recycle the unused FR_CMPSIZ macro which became orphaned in ipfilter 5
prior to its import into FreeBSD. This macro calculates the size to be
compared within the frentry structure. The ipfilter 4 version of the
macro calculated the compare size based upon the static size of the
frentry struct. Today it uses the ipfilter 5 method of calculating the
size based upon the new to ipfilter 5 fr_size value found in the
frentry struct itself.
Revert r349442, which was a workaround for bus errors caused by an errant
TLB entry. Specifically, at the start of pmap_enter_quick_locked(), we
would sometimes have a TLB entry for an invalid PTE, and we would need to
issue a TLB invalidation before exiting pmap_enter_quick_locked(). However,
we should never have a TLB entry for an invalid PTE. r349905 has addressed
the root cause of the problem, and so we no longer need this workaround.
ian [Sat, 13 Jul 2019 16:07:38 +0000 (16:07 +0000)]
Limit access to system accounting files.
In 2013 the security chapter of the Handbook was updated in r42501 to
suggest limiting access to the system accounting file [*1] by creating the
initial file with a mode of 0600. This was in part based on a discussion in
the forums [*2]. Unfortunately, this advice is overridden by the fact that a
new file is created as part of periodic daily processing, and the file mode
is set by the rc.d/accounting script.
These changes update the accounting script to create the directory with mode
0750 if it doesn't already exist, and to create the daily file with mode
0640. This limits write access to root only, read access to root and members
of wheel, and eliminates world access completely. For admins who want to
prevent even members of wheel from accessing the files, the mode of the
/var/account directory can be manually changed to 0700, because the script
never creates or changes that directory if it already exists.
The accounting_rotate_log() function now also handles the error cases of no
existing log file to rotate, and attempting to rotate the file multiple
times (.0 file already exists).
Another small change here eliminates the complexity of the mktemp/chmod/mv
sequence for creating a new acct file by using install(1) with the flags
needed to directly create the file with the desired ownership and
modes. That allows coalescing two separate if checkyesno accounting_enable
blocks into one.
These changes were inspired by my investigation of PR 202203.
ian [Sat, 13 Jul 2019 15:34:29 +0000 (15:34 +0000)]
Add arm_sync_icache() and arm_drain_writebuf() sysarch syscall wrappers.
NetBSD and OpenBSD have libc wrapper functions for the ARM_SYNC_ICACHE and
ARM_DRAIN_WRITEBUF sysarch operations. This change adds compatible functions
to our library. This should make it easier for various upstream sources to
support *BSD operating systems with a single variation of cache maintence
code in tools like interpreters and JIT compilers.
I consider the argument types passed to arm_sync_icache() to be especially
unfortunate, but this is intended to match the other BSDs.
dim [Sat, 13 Jul 2019 15:04:30 +0000 (15:04 +0000)]
Pull in r365760 from upstream lld trunk (by Fangrui Song):
[ELF] Handle non-glob patterns before glob patterns in version
scripts & fix a corner case of --dynamic-list
This fixes PR38549, which is silently accepted by ld.bfd.
This seems correct because it makes sense to let non-glob patterns
take precedence over glob patterns.
lld issues an error because
`assignWildcardVersion(ver, VER_NDX_LOCAL);` is processed before
`assignExactVersion(ver, v.id, v.name);`.
Move all assignWildcardVersion() calls after assignExactVersion()
calls to fix this.
Also, move handleDynamicList() to the bottom. computeBinding() called
by includeInDynsym() has this cryptic rule:
if (versionId == VER_NDX_LOCAL && isDefined() && !isPreemptible)
return STB_LOCAL;
Before the change:
* foo's version is set to VER_NDX_LOCAL due to `local: *`
* handleDynamicList() is called
- foo.computeBinding() is STB_LOCAL
- foo.includeInDynsym() is false
- foo.isPreemptible is not set (wrong)
* foo's version is set to V1
After the change:
* foo's version is set to VER_NDX_LOCAL due to `local: *`
* foo's version is set to V1
* handleDynamicList() is called
- foo.computeBinding() is STB_GLOBAL
- foo.includeInDynsym() is true
- foo.isPreemptible is set (correct)
This makes it longer necessary to patch the version scripts for the
samba ports, to avoid "duplicate symbol 'pdb_search_init' in version
script" errors.
Accept an IEEE Extended Unique Identifier (EUI-64) from the command
line for each NVMe namespace. If one isn't provided, it will create one
based on the CRC16 of:
- the FreeBSD IEEE OUI
- PCI bus, device/slot, function values
- Namespace ID
powerpc64/pmap: No need for moea64_pvo_remove_from_page_locked() wrapper
The only consumer of moea64_pvo_remove_from_page_locked() already has the
page in hand, so there is no need to search for the page while holding the
lock. Drop the wrapper, and rename _moea64_pvo_remove_from_page_locked().
powerpc64/pmap: Reduce scope of PV_LOCK in remove path
Summary:
Since the 'page pv' lock is one of the most highly contended locks, we
need to try to do as much work outside of the lock as we can. The
moea64_pvo_remove_from_page() path is a low hanging fruit, where we can
do some heavy work (PHYS_TO_VM_PAGE()) outside of the lock if needed.
In one path, moea64_remove_all(), the PV lock is already held and can't
be swizzled, so we provide two ways to perform the locked operation, one
that can call PHYS_TO_VM_PAGE outside the lock, and one that calls with
the lock already held.
Summary:
If an illegal instruction is encountered on a process running on a
powerpc64 kernel it would attempt to sync the cache before retrying the
instruction "just in case". However, since curpmap is not set, when
moea64_sync_icache() attempts to lock the pmap, it's locking on a NULL pointer,
triggering a panic. Fix this by adding a (assumed unnecessary) fallback to
curthread's pmap in moea64_sync_icache().
cxgbe(4): Completely ignore all top level interrupts that are not enabled.
The driver used to log any non-zero cause and when running with a single
line interrupt it would spam the console/logs with reports of interrupts
that are of no interest to anyone.
Provide protection against starvation of the ll/sc loops when accessing userpace.
Casueword(9) on ll/sc architectures must be prepared for userspace
constantly modifying the same cache line as containing the CAS word,
and not loop infinitely. Otherwise, rogue userspace livelocks the
kernel.
To fix the issue, change casueword(9) interface to return new value 1
indicating that either comparision or store failed, instead of relying
on the oldval == *oldvalp comparison. The primitive no longer retries
the operation if it failed spuriously. Modify callers of
casueword(9), all in kern_umtx.c, to handle retries, and react to
stops and requests to terminate between retries.
On x86, despite cmpxchg should not return spurious failures, we can
take advantage of the new interface and just return PSL.ZF.
Reviewed by: andrew (arm64, previous version), markj
Tested by: pho
Reported by: https://xenbits.xen.org/xsa/advisory-295.txt
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D20772
Tie the name limit of a VM to SPECNAMELEN from devfs instead of a
hard-coded value. Don't allocate space for it from the kernel stack.
Account for prefix, suffix, and separator space in the name. This
takes the effective length up to 229 bytes on 13-current, and 37 bytes
on 12-stable. 37 bytes is enough to hold a full GUID string.
Correctly truncate the rule in case when it has several action opcodes.
It is possible, that opcode at the ACTION_PTR() location is not real
action, but action modificator like "log", "tag" etc. In this case we
need to check for each opcode in the loop to find O_EXTERNAL_ACTION.
The RELEASE_CRUNCH ifdefs save about 100 bytes of text space. The
complexity is not worth it as they eliminate error messages.
Left the RELEASE_CRUNCH ifdef to eliminate a lot of stuff in place.
That saves an interesting amount of space and change some behaviors,
so absent a more detailed analysis, maintain the status quo.
We've not used this in years since we retired sysinstall, and it
hasn't compiled in at least a year. A full camcontrol is only 180k, so
making it smaller is not as important as it once was.
Stop defining SMALLER. Since we replaced cpio with libarchive version,
there's no options to make it smaller. Also, the comment about the
FreeBSD installer is obsolete. Remove them both.
When the OpenBSD dhclient was brought in 14 years ago, we stopped
supporting building a reduced sized dhclient, yet retained the options
here. Also, the OpenBSD dhclient doesn't need lint defined, so it can
go too.
Move the new ipf_pcksum6() function from ip_fil_freebsd.c to fil.c.
The reason for this is that ipftest(8), which still works on FreeBSD-11,
fails to link to it, breaking stable/11 builds.
ipftest(8) was broken (segfault) sometime during the FreeBSD-12 cycle.
glebius@ suggested we disable building it until I can get around to
fixing it. Hence this was not caught in -current.
The intention is to fix ipftest(8) as it is used by the netbsd-tests
(imported by ngie@ many moons ago) for regression testing.
Summary:
efi loader does not work with static network parameters. It always uses
BOOTP/DHCP and also uses RARP as a fallback. Problems with DHCP servers can
cause the loader to fail to populate network parameters.
Submitted by: Siddharth Tuli <siddharthtuli_gmail.com>
Reviewed by: imp
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D20811
Address problems in blist_alloc introduced in r349777. The swap block allocator could become corrupted
if a retry to allocate swap space, after a larger allocation attempt failed, allocated a smaller set of free blocks
that ended on a 32- or 64-block boundary.
Add tests to detect this kind of failure-to-extend-at-boundary and prevent the associated accounting screwup.
Ensure that mds_handler always points to a valid method.
Depending on system configuration, version, and architecture,
mds_handler might be dereferenced from doreti before
hw_mds_recalculate_boot() initialized it. Statically assign void
method to cover all cases.
Reported by: "Schuendehuette, Matthias (LDA IT PLM)" <matthias.schuendehuette@siemens.com>
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
When a command is finished running, we must transition it from INQUEUE
to busy state. We were failing to do that, so we hit a panic when the
commands were freed. This only affects mpr, mps already did simmilar
things. Now both the polling and interrupt paths properly set BUSY as
appropriate.
powerpc: Only worry about the lower 32 bits of SP in a 32-bit process
Summary:
Running a 32-bit process on a 64-bit POWER CPU may still use all 64-bits
in calculations, while ignoring the upper 32 bits for addressing
storage. It so happens that some processes end up with r1 (SP) having
bit 31 set in some cases (33-bit address). Writing out to this 33-bit
address obviosly fails. Since the CPU ignores the upper bits, we should
as well.
sendsig() and cpu_fetch_syscall_args() appear to be the only functions
that actually rely on userspace register values for copy in/out, and
cpu_fetch_syscall_args() doesn't seem to be bitten in practice yet.
According to Section D5.10.3 "Maintenance requirements on changing System
register values" of the architecture manual, an isb instruction should be
executed after updating ttbr0_el1 and before invalidating the TLB. The
lack of this instruction in pmap_activate() appears to be the reason why
andrew@ and I have observed an unexpected TLB entry for an invalid PTE on
entry to pmap_enter_quick_locked(). Thus, we should now be able to revert
the workaround committed in r349442.
ipfilter commands, in this case ipf(8), passes its operations and rules
via an ioctl interface. Rules can be added or removed and stats and
counters can be zeroed out. As the ipfilter interprets these
instructions or operations they are stored in an integer called
addrem (add/remove). 1 is add, 2 is remove, and 3 is clear stats and
counters. Much of this is not documented. This commit documents these
operations by replacing simple integers with a self documenting
enum along with a few basic comments.
This device cannot cross a 4GB boundary with DMA. Removing the
boundary in r346386 resulted in low frequency memory corruption on
machines with isci(4) controllers.
This commit updates rack to what is basically being used at NF as
well as sets in some of the groundwork for committing BBR. The
hpts system is updated as well as some other needed utilities
for the entrance of BBR. This is actually part 1 of 3 more
needed commits which will finally complete with BBRv1 being
added as a new tcp stack.
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D20834
ian [Wed, 10 Jul 2019 14:34:52 +0000 (14:34 +0000)]
De-pollute arm's sysarch.h.
Instead of including stdint.h for uintptr_t, include sys/_types.h and use
__types for everything that isn't a native C keyword type.
Remove the #include of cdefs.h. It appears after the include of armreg.h
which has a precondition of cdefs.h being included before it, so everyone
including sysarch.h is already including cdefs.h. (When armv5 support
goes away, there will be no need include armreg.h here either.)
Unfortunately, the unprefixed struct member names "addr" and "len" cannot
be changed, because 3rd-party software is relying on them (libcompiler_rt
is one known consumer).
Let linuxulator mprotect mask unsupported bits before calling kern_mprotect.
After r349240 kern_mprotect returns EINVAL for unsupported bits in the prot
argument. Linux rtld uses PROT_GROWSDOWN and PROT_GROWS_UP when marking the
stack executable. Mask these bits like kern_mprotect used to do. For other
unsupported bits EINVAL is returned like Linux does.
dim [Wed, 10 Jul 2019 05:57:37 +0000 (05:57 +0000)]
Apply a workaround to be able to build clang 8.0.0 headers with clang
3.4.1, which is still in the stable/10 branch.
It looks like clang 3.4.1 implements static_asserts by instantiating a
temporary static object, and if those are in an anonymous union, it
results in "error: anonymous union can only contain non-static data
members".
To work around this implementation limitation, move the static_asserts
in question out of the anonymous unions.
This should make building the latest stable/11 from stable/10 possible
again.
Reported by: Mike Tancsa <mike@sentex.net>
MFC after: 3 days