Ruslan Bukin [Sun, 12 May 2019 16:17:05 +0000 (16:17 +0000)]
Add support for HiFive Unleashed -- the board with a multi-core RISC-V SoC
from SiFive, Inc.
The first core on this SoC (hart 0) is a 64-bit microcontroller.
o Pick a hart to run boot process using hart lottery.
This allows to exclude hart 0 from running the boot process.
(BBL releases hart 0 after the main harts, so it never wins the lottery).
o Renumber CPUs early on boot.
Exclude non-MMU cores. Store the original hart ID in struct pcpu. This
allows to find out the correct destination for IPIs and remote sfence
calls.
Mateusz Guzik [Sun, 12 May 2019 06:32:46 +0000 (06:32 +0000)]
random(4): depessimize arc4random
- __predict_false reseeding on entry as it is almost never true.
- don't blindly atomic_cmpset as on x86 it ends up dirtying the cacheline.
it almost ever succeeds per above
- fetch the timestamp prior to getting the cpu number
Reviewed by: cem
Approved by: secteam (delphij)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20242
Rick Macklem [Sat, 11 May 2019 22:41:58 +0000 (22:41 +0000)]
Factor code into two new functions in preparation for a future commit.
Factor code into two functions.
read_exportfile() a functon which reads the exports file(s) and calls
get_exportlist_one() to process each of them.
delete_export() a function which deletes the exports in the kernel for a file
system.
The contents of these functions is just the same code as was used to do the
operations, moved into separate functions. As such, there is no semantic change.
This is being done in preparation for a future commit that will add an
option to do incremental changes of kernel exports upon receiving SIGHUP.
Doug Moore [Sat, 11 May 2019 16:15:13 +0000 (16:15 +0000)]
A new parameter to blist_alloc specifies an upper bound on the size of
the allocation request, so that the blocks allocated are from the next
set of free blocks big enough to satisfy the minimum requirements of
the request, and the number of blocks allocated are as many as
possible, up to the specified maximum. The implementation of
swp_pager_getswapspace uses this parameter to ask for a number of
blocks between the new halved request size and the previous failed
request size. Thus a request for 32 blocks may fail, but instead of
getting only 16 blocks instead, the caller asks for 16 to 31 next, and
might get 19 or 27, which is closer to what they originally wanted.
I expect this to lead to bigger block allocations and less block
fragmentation, at least in some cases.
Justin Hibbits [Sat, 11 May 2019 15:17:42 +0000 (15:17 +0000)]
revert r346588 for now
The rewrite of strcmp in assembly uses an instruction added in PowerISA
2.05, making it SIGILL on CPUs older than the POWER6, such as the PPC970 in
the PowerMac G5. Revert this until we get clang+lld, or retire the in-tree
binutils in favor of newer binutils with IFUNC support, whichever comes
first.
Emmanuel Vadot [Sat, 11 May 2019 15:03:51 +0000 (15:03 +0000)]
twsi: Calculate the clock param based on the bus frequency
Instead of precalculating the different speed, respect the bus frequency
and calculate the clock register parameter based on it.
If the platform didn't register the core clk, fallback on the precomputed
values (This is likely do be the case on Marvell boards).
As per https://datacenter.iers.org/data/latestVersion/16_BULLETIN_C16.txt:
INTERNATIONAL EARTH ROTATION AND REFERENCE SYSTEMS SERVICE (IERS)
SERVICE INTERNATIONAL DE LA ROTATION TERRESTRE ET DES SYSTEMES DE REFERENCE
SERVICE DE LA ROTATION TERRESTRE DE L'IERS
OBSERVATOIRE DE PARIS
61, Av. de l'Observatoire 75014 PARIS (France)
Tel. : +33 1 40 51 23 35
e-mail : services.iers@obspm.fr
http://hpiers.obspm.fr/eop-pc
Paris, 07 January 2019
Bulletin C 57
To authorities responsible
for the measurement and
distribution of time
INFORMATION ON UTC - TAI
NO leap second will be introduced at the end of June 2019.
The difference between Coordinated Universal Time UTC and the
International Atomic Time TAI is :
from 2017 January 1, 0h UTC, until further notice : UTC-TAI = -37 s
Leap seconds can be introduced in UTC at the end of the months of December
or June, depending on the evolution of UT1-TAI. Bulletin C is mailed every
six months, either to announce a time step in UTC, or to confirm that there
will be no time step at the next possible date.
Christian BIZOUARD
Director
Earth Orientation Center of IERS
Observatoire de Paris, France
Requested by: rgrimes
Obtained from: ftp://tycho.usno.navy.mil/pub/ntp/leap-seconds.3757622400
MFC after: 3 days
Doug Moore [Sat, 11 May 2019 10:16:43 +0000 (10:16 +0000)]
Callers of swp_pager_getswapspace get either as many blocks as they
requested, or none, and in the latter case it is up to them to pick a
smaller request to make - which they always do by halving the failed
request. This change to swp_pager_getswapspace leaves the task of
downsizing the request to the function and not its caller. It still
does so by halving the original request.
Doug Moore [Sat, 11 May 2019 09:09:10 +0000 (09:09 +0000)]
When bitpos can't be implemented with an inline ffs* instruction,
change the binary search so that it does not depend on a single bit
only being set in the bitmask. Use bitpos more generally, and avoid
some clearing of bits to accommodate its current behavior.
Kyle Evans [Sat, 11 May 2019 04:18:06 +0000 (04:18 +0000)]
tuntap: Improve style
No functional change.
tun_flags of the tuntap_driver was renamed to ident_flags to reflect the
fact that it's a subset of the tun_flags that identifies a tuntap device.
This maps more easily (visually) to the TUN_DRIVER_IDENT_MASK that masks off
the bits of tun_flags that are applicable to tuntap driver ident. This is a
purely cosmetic change.
Rick Macklem [Fri, 10 May 2019 23:52:17 +0000 (23:52 +0000)]
Factor out some exportlist list operations into separate functions.
This patch moves the code that removes and frees all exportlist elements
out into a separate function called free_exports().
It does the same for the insertion of a new exportlist entry into a list.
It also adds a second argument to ex_search() for the list to use.
None of these changes have any semantic effect. They are being done to
prepare the code for future patches that convert the single linked list
for the exportlist to a hash table of lists and a patch that will do
incremental changes of exports in the kernel.
And it fixes the argument for SLIST_HEAD_INITIALIZER() to be a pointer,
which doesn't really matter, since SLIST_HEAD_INITIALIZER() doesn't use
the argument.
Conrad Meyer [Fri, 10 May 2019 23:12:59 +0000 (23:12 +0000)]
netdump: Ref the interface we're attached to
Serialize netdump configuration / deconfiguration, and discard our
configuration when the affiliated interface goes away by monitoring
ifnet_departure_event.
Reviewed by: markj, with input from vangyzen@ (earlier version)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D20206
Doug Moore [Fri, 10 May 2019 22:49:01 +0000 (22:49 +0000)]
When bitpos can't be implemented with an inline ffs* instruction,
change the binary search so that it does not depend on a single bit
only being set in the bitmask. Use bitpos more generally, and avoid
some clearing of bits to accommodate its current behavior.
Conrad Meyer [Fri, 10 May 2019 21:55:11 +0000 (21:55 +0000)]
netdump: Don't store sensitive key data we don't need
Prior to this revision, struct diocskerneldump_arg (and struct netdump_conf
with embedded diocskerneldump_arg before r347192), were copied in their
entirety to the global 'nd_conf' variable. Also prior to this revision,
de-configuring netdump would *not* remove the the key material from global
nd_conf.
As part of Encrypted Kernel Crash Dumps (EKCD), which was developed
contemporaneously with netdump but happened to land first, the
diocskerneldump_arg structure will contain sensitive key material
(kda_key[]) when encrypted dumps are configured.
Netdump doesn't have any use for the key data -- encryption is handled in
the core dumper code -- so in this revision, we no longer store it.
Unfortunately, I think this leak dates to the initial import of netdump in
r333283; so it's present in FreeBSD 12.0.
Fortunately, the impact *seems* relatively minor. Any new *netdump*
configuration would overwrite the key material; for active encrypted netdump
configurations, the key data stored was just a duplicate of the key material
already in the core dumper code; and no user interface (other than
/dev/kmem) actually exposed the leaked material to userspace.
John Baldwin [Fri, 10 May 2019 20:15:40 +0000 (20:15 +0000)]
Apply r280991 to ip6_fragment.
This uses m_dup_pkthdr() to copy all of the metadata about a packet to
each of its fragments including VLAN tags, mbuf tags, etc. instead of
hand-copying a few fields.
Justin Hibbits [Fri, 10 May 2019 19:36:14 +0000 (19:36 +0000)]
powerpc: Initialize the Hardware Interrupt Offset Register (HIOR) earlier for ppc970
Since we now have a much larger KVA on powerpc64, it's possible to get SLB
traps earlier in boot, possibly even before the HIOR is properly configured
for us. Move the HIOR setup to immediately after reset, so that we use our
exception handlers instead of Open Firmware's.
PR: 233863
Submitted by: Mark Millard (partial)
Reported by: Mark Millard
MFC after: 2 weeks
Doug Moore [Fri, 10 May 2019 18:25:06 +0000 (18:25 +0000)]
blist_next_leaf_alloc walks over all the meta-nodes between one leaf
and the next one, and if blocks are allocated from the next leaf, it
walks back toward where it started, as long as there are interleaving
meta-nodes to be updated on account of the last free blocks under
those meta-nodes being allocated. Only if the walk goes all the way
back to the starting point must we calculate the position of the
meta-node that is the least-comment parent of one leaf and the next,
and update a bit in that meta-node to indicate the allocation of its
last free block.
There's no need to start calculating the position of that least-common
parent until the walk back reaches the original starting point, and
there's no need for a calculation that updates 'radius' to tell us
when we've walked back to the beginning, since comparing scan to next
suffices for that.
Emmanuel Vadot [Fri, 10 May 2019 16:45:17 +0000 (16:45 +0000)]
arm64: rockchip: Don't always put PLL to normal mode
We used to put every PLL in normal mode (meaning that the output would
be the result of the PLL configuration) instead of slow mode (the output
is equal to the external oscillator frequency, 24-26Mhz) but this doesn't
work for most of the PLLs as when we put them into normal mode the registers
configuring the output frequency haven't been set.
Add a normal_mode member in clk_pll_def/clk_pll_sc struct and if it's true
we then set the PLL to normal mode.
For now only set it to the LPLL and BPLL (Little cluster PLL and Big cluster
PLL respectively).
Mark Johnston [Fri, 10 May 2019 16:43:47 +0000 (16:43 +0000)]
Atomically update the global gMsgId in libnetgraph.
Otherwise concurrently running threads may inadvertently use the same
token for different messages.
Preserve the behaviour of disallowing negative message tokens, but allow
a message token value of zero since this simplifies the code a bit and
tokens are documented to be non-negative.
PR: 234442
Reported and tested by: eugen
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Andrew Gallatin [Fri, 10 May 2019 13:41:19 +0000 (13:41 +0000)]
Bind TCP HPTS (pacer) threads to NUMA domains
Bind the TCP pacer threads to NUMA domains and build per-domain
pacer-thread lookup tables. These tables allow us to use the
inpcb's NUMA domain information to match an inpcb with a pacer
thread on the same domain.
The motivation for this is to keep the TCP connection local to a
NUMA domain as much as possible.
Thanks to jhb for pre-reviewing an earlier version of the patch.
Kyle Evans [Fri, 10 May 2019 13:18:22 +0000 (13:18 +0000)]
ifconfig(8): Add kld mappings for ipsec/enc
Additionally, providing mappings makes the comparison for already loaded
modules a little more strict. This should have been done at initial
introduction, but there was no real reason- however, it proves necessary for
enc which has a standard enc -> if_enc mapping but there also exists an
'enc' module that's actually CAM. The mapping lets us unambiguously
determine the correct module.
Ian Lepore [Fri, 10 May 2019 02:30:16 +0000 (02:30 +0000)]
Allow dcons(4) to be unloaded when loaded as a module.
When the module is unloaded, the tty devices are destroyed. That requires
implementing the tsw_free callback to avoid a panic. This driver requires
no particular cleanup to be done from the callback, but the module itself
must remain in memory until the deferred tsw_free callbacks are invoked.
These changes implement that by incrementing a reference count variable in
the detach routine, and decrementing it in the tsw_free callback. The
MOD_UNLOAD event handler doesn't return until the count drops to zero.
Eric Joyner [Fri, 10 May 2019 00:41:42 +0000 (00:41 +0000)]
iflib: use default ntxd and nrxd when user value is not power of 2
From Jake:
A user may set a sysctl to override the default number of Tx or Rx
descriptors. However, certain calculations in the iflib core expect the
number of descriptors to be a power of 2.
Update _iflib_assert to verify that all of the shared context parameters
for the number of descriptors are powers of 2.
Modify iflib_reset_qvalues to check that the provided isc_nrxd value is
a power of 2. If it's not, print a warning message and then use the
default value.
An alternative might be to try rounding the number down instead.
However, this creates problems in case the rounded down value is below
the minimum value that the driver would support.
Enji Cooper [Fri, 10 May 2019 00:03:32 +0000 (00:03 +0000)]
Refactor tests/sys/opencrypto/runtests
* Convert from plain to TAP for slightly improved introspection when skipping
the tests due to requirements not being met.
* Test for the net/py-dpkt (origin) package being required when running the
tests, instead of relying on a copy of the dpkt.py module from 2014. This
enables the tests to work with py3. Subsequently, remove
`tests/sys/opencrypto/dpkt.py(c)?` via `make delete-old`.
* Parameterize out `python2` as `$PYTHON`.
Andrew Gallatin [Thu, 9 May 2019 22:38:15 +0000 (22:38 +0000)]
Remove IPSEC from GENERIC due to performance issues
Having IPSEC compiled into the kernel imposes a non-trivial
performance penalty on multi-threaded workloads due to IPSEC
refcounting. In my benchmarks of multi-threaded UDP
transmit (connected sockets), I've seen a roughly 20% performance
penalty when the IPSEC option is included in the kernel (16.8Mpps
vs 13.8Mpps with 32 senders on a 14 core / 28 HTT Xeon
2697v3)). This is largely due to key_addref() incrementing and
decrementing an atomic reference count on the default
policy. This cause all CPUs to stall on the same cacheline, as it
bounces between different CPUs.
Given that relatively few users use ipsec, and that it can be
loaded as a module, it seems reasonable to ask those users to
load the ipsec module so as to avoid imposing this penalty on the
GENERIC kernel. Its my hope that this will make FreeBSD look
better in "out of the box" benchmark comparisons with other
operating systems.
Many thanks to ae for fixing auto-loading of ipsec.ko when
ifconfig tries to configure ipsec, and to cy for volunteering
to ensure the the racoon ports will load the ipsec.ko module
- Remove Tn macros
- Refernce sysctl(8) instead of sysctl(1)
- Start new sentences on new lines
- Capitalize NFS where needed
- Use Fx for FreeBSD
- Remove a list block (Bl) that was added to the manual page
by accident in r335174
Kyle Evans [Thu, 9 May 2019 12:58:33 +0000 (12:58 +0000)]
ifconfig(8): Partial revert of r347241
r347241 introduced an ifname <-> kld mapping table, mostly so tun/tap/vmnet
can autoload the correct module on use. It also inadvertently made bogus
some previously valid uses of sizeof().
Revert back to ifkind on the stack for simplicity sake. This reduces the
diff from the previous version of ifmaybeload for easiser auditing.
Toomas Soome [Thu, 9 May 2019 11:04:10 +0000 (11:04 +0000)]
loader: ptable_print() needs two tabs sometimes
Since the partition/slice names do vary in length, check the length
of the fixed part of the line against 3 * 8, if the lenth is less than
3 tab stops, print out extra tab.
In mld_v2_cancel_link_timers() check number of references and disconnect
inm before releasing the last reference. This fixes possible panics and
assertion.
Gleb Smirnoff [Wed, 8 May 2019 23:39:24 +0000 (23:39 +0000)]
Existense of PCB route caching doesn't allow us to use new fast route
lookup KPI in ip_output() like it is already used in ip_forward().
However, when there is no PCB provided we can use fast KPI, gaining
performance advantage.
Typical case when ip_output() is called without a PCB pointer is a
sendto(2) on a not connected UDP socket. In practice DNS servers do
this.
Mateusz Guzik [Wed, 8 May 2019 16:30:38 +0000 (16:30 +0000)]
Reduce umtx-related work on exec and exit
- there is no need to take the process lock to iterate the thread
list after single-threading is enforced
- typically there are no mutexes to clean up (testable without taking
the global umtx lock)
- typically there is no need to adjust the priority (testable without
taking thread lock)
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20160
Justin Hibbits [Wed, 8 May 2019 16:15:28 +0000 (16:15 +0000)]
powerpc/booke: Rewrite pmap_sync_icache() a bit
* Make mmu_booke_sync_icache() use the DMAP on 64-bit prcoesses, no need to
map the page into the user's address space. This removes the
pvh_global_lock from the equation on 64-bit.
* Don't map the page with user-readability on 32-bit. I don't know what the
chance of a given user process being able to access the NULL page when
another process's page is added there, but it doesn't seem like a good
idea to map it to NULL with user read permissions.
* Only sync as much as we need to. There are only two significant places
where pmap_sync_icache is used: proc_rwmem(), and the SIGILL second-chance
for powerpc. The SIGILL second chance is likely the most common, and only
syncs 4 bytes, so avoid the other 127 loop iterations (4096 / 32 byte
cacheline) in __syncicache().
Justin Hibbits [Wed, 8 May 2019 16:05:18 +0000 (16:05 +0000)]
powerpc/booke: Do as much work outside of TLB locks as possible
Reduce the surface area of the TLB locks. Unfortunately the same trick for
serializing the tlbie instruction on OEA64 cannot be used here to reduce the
scope of the tlbivax mutex to the tlbsync only, as the mutex also serializes
the TLB miss lock as a side effect, so contention on this lock may not be
reducible any further.
Ruslan Bukin [Wed, 8 May 2019 15:22:27 +0000 (15:22 +0000)]
o Implement a bounce buffer based on device reserved memory.
Grab device reserved physical memory regions from FDT using standard
"memory-region" property and use vmem(9) to allocate buffers from it.
The same vmem could be used by DMA engine drivers to allocate memory for
DMA descriptors.
This is required for platforms that provide uncached memory region
reserved exclusively for DMA operations.
o Change sleepable sx(9) lock type to non-sleepable mutex(9) since
network drivers usually hold mutex during DMA operations. So we don't
take sleepable lock after non-sleepable.
Tested on U.S. Government Furnished Equipment (GFE) 64-bit RISC-V cores.
Conrad Meyer [Wed, 8 May 2019 14:54:32 +0000 (14:54 +0000)]
random(4): Don't complain noisily when an entropy source is slow
Mjg@ reports that RDSEED (r347239) causes a lot of logspam from this printf,
and I don't feel that it is especially useful (even ratelimited). There are
many other quality/quantity checks we're not performing on entropy sources;
lack of high frequency availability does not disqualify a good entropy
source.
There is some discussion in the linked Differential about what logging might
be appropriate and/or polling policy for slower TRNG sources. Please feel
free to chime in if you have opinions.