mjg [Fri, 11 May 2018 08:56:39 +0000 (08:56 +0000)]
amd64: align the .data.exclusive_cache_line section to 128
This aligns the section itself compared to other sections, does not change
internal alignment of fields stored inside. This may or may not come later.
The motivation is partially combating adverse effects of the adjacent cache
line prefetcher. Without the annotation part of read_mostly section was on
the line of fire.
mjg [Fri, 11 May 2018 07:04:57 +0000 (07:04 +0000)]
uma: increase alignment to 128 bytes on amd64
Current UMA internals are not suited for efficient operation in
multi-socket environments. In particular there is very common use of
MAXCPU arrays and other fields which are not always properly aligned and
are not local for target threads (apart from the first node of course).
Turns out the existing UMA_ALIGN macro can be used to mostly work around
the problem until the code get fixed. The current setting of 64 bytes
runs into trouble when adjacent cache line prefetcher gets to work.
An example 128-way benchmark doing a lot of malloc/frees has the following
instruction samples:
mmacy [Fri, 11 May 2018 05:00:40 +0000 (05:00 +0000)]
Allow different bridge types to coexist
if_bridge has a lot of limitations that make it scale poorly to higher data
rates. In my projects/VPC branch I leverage the bridge interface between
layers for my high speed soft switch as well as for purposes of stacking
in general.
mmacy [Fri, 11 May 2018 04:54:12 +0000 (04:54 +0000)]
epoch(9): fix priority handling, make callback lists pcpu, and other fixes
- Lend priority to preempted threads in epoch_wait to handle the case
in which we've had priority lent to us. Previously we borrowed the
priority of the lowest priority preempted thread. (pointed out by mjg@)
- Don't attempt allocate memory per-domain on powerpc, we don't currently
handle empty sockets (as is the case on jhibbits Talos' board).
- Handle deferred callbacks as pcpu lists and poll the lists periodically.
Currently the interval is 1/hz.
- Drop the thread lock when adaptive spinning. Holding the lock starves
other threads and can even lead to lockups.
- Keep a generation count pcpu so that we don't keep spining if a thread
has left and re-entered an epoch section.
- Actually removed the callback from the callback list so that we don't
double free. Sigh ...
des [Fri, 11 May 2018 00:19:49 +0000 (00:19 +0000)]
Slight cleanup of interface event logging.
Make if_printf() use vlog() instead of vprintf(). This means it can no
longer return the number of characters printed, as it used to, but every
single call to if_printf() in the entire kernel ignores the return value
anyway; just return 0 so we don't have to change the prototype.
Consistently use if_printf() throughout sys/net/if.c, instead of a
mixture of if_printf() and log().
In ifa_maintain_loopback_route(), don't needlessly log an error if we
either failed to add a route because it already existed or failed to
remove one because it did not. We still return an error code, though.
des [Fri, 11 May 2018 00:01:43 +0000 (00:01 +0000)]
Reduce <sys/queue.h> pollution.
While <sys/sysctl.h> includes <sys/queue.h> unconditionally, it is only
actually used in code which is conditional on _KERNEL. Make the #include
itself conditional as well, and fix userland code that uses <sys/queue.h>
for other purposes but relied on <sys/sysctl.h> to bring it in.
gjb [Thu, 10 May 2018 21:46:58 +0000 (21:46 +0000)]
Add a special GCE_LICENSE variable to Makefile.gce, which when set,
will include license metadata in the resultant GCE image.
GCE_LICENSE is unset by default, as it primarily pertains to images
produced by the FreeBSD Project, but for downstream FreeBSD consumers,
it can be set in the make(1) environment in the format of:
The "license" is not a license, per se, but required metadata that
is required by the GCE marketplace. For the FreeBSD Project, the
license name is simply 'freebsd', with the description of 'FreeBSD'.
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
imp [Thu, 10 May 2018 20:27:12 +0000 (20:27 +0000)]
Revert r333365
Even though we don't use it, it appears something else requires it to
be != 0 to work. This breaks tftp boot in loader.efi, so revert until
that can be sorted out.
emaste [Thu, 10 May 2018 20:10:02 +0000 (20:10 +0000)]
Error out on attempt to link amd64 kernel with old binutils linker
As of r333461 we require ifunc support to link a working amd64 kernel.
The default in-tree bootstrap linker is lld and it has the required
support, as does any modern out-of-tree binutils linker. The in-tree
GNU ld is from binutils 2.17.50 and it does not have ifunc support,
so produce an error rather than a broken kernel.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D15378
dumbbell [Thu, 10 May 2018 17:00:33 +0000 (17:00 +0000)]
vt(4): Use default VGA palette
Before this change, the VGA palette was configured to match the shell
palette (e.g. color #1 was red). There was one glitch early in boot when
the vt(4)'s VGA palette was loaded: the loader's logo would switch from
red to blue. Likewise for the "Booting..." message switching from blue
to red. That's because the loader's logo was drawed with the default VGA
palette where a few colors are swapped compared to the shell palette
(e.g. blue <-> red).
This change configures the default VGA palette during initialization and
converts input's colors from shell to VGA palette index.
There should be no visible changes, except the loader's logo which will
keep its original color.
kib [Thu, 10 May 2018 15:01:43 +0000 (15:01 +0000)]
Make fpusave() and fpurestore() on amd64 ifuncs.
From now on, linking amd64 kernel requires either lld or newer ld.bfd.
Reviewed by: jhb (as part of the large patch)
Discussed with: emaste
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D13838
ae [Thu, 10 May 2018 12:25:01 +0000 (12:25 +0000)]
Fix the printing of rule comments.
Change uint8_t type of opcode argument to int in the print_opcode()
function. Use negative value to print the rest of opcodes, because
zero value is O_NOP, and it can't be uses for this purpose.
mw [Thu, 10 May 2018 09:37:54 +0000 (09:37 +0000)]
Do not pass header length to the ENA controller
Header length is optional hint for the ENA device. Because It is not
guaranteed that every packet header will be in the first mbuf
segment, it is better to skip passing any information. If the header
length will be indicating invalid value (different than 0), then the
packet will be dropped.
This kind situation can appear, when the UDP packet will be fragmented
by the stack in the ip_fragment() function.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Reported by: Krishna Yenduri <kyenduri@brkt.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
manu [Thu, 10 May 2018 09:37:50 +0000 (09:37 +0000)]
arm64: Add ALT_BREAK_TO_DEBUGGER to GENERIC
It is useful to enter kdb with an escape sequence.
While here move the USB_DEBUG with the others debug options and define
nooptions USB_DEBUG for GENERIC-NODEBUG
mw [Thu, 10 May 2018 09:32:59 +0000 (09:32 +0000)]
Skip setting the MTU for ENA if it is not changing
On AWS, a network interface can get reinitialized every 30 minutes due
to the MTU being (re)set when a new DHCP lease is obtained. This can
cause packet drop, along with annoying syslog messages.
Skip setting the MTU in the ena driver if the new MTU is the same as the
old MTU. Note this fix is already in the netfront driver.
Testing: Verified ena up/down messages do not appear every 30 min in
/var/log/messages with the fix in place.
Submitted by: Krishna Yenduri <kyenduri@brkt.com>
Reviewed by: Michal Krawczyk <mk@semihalf.com>
mw [Thu, 10 May 2018 09:25:51 +0000 (09:25 +0000)]
Apply fixes in ena-com
* Change ena-com BIT macro to work on unsigned value.
To make the shifting operations safer, they should be working on
unsigned values.
* Fix a mutex not owned ASSERT panic in ENA control path.
A thread calling cv_broadcast()/cv_signal() must hold the mutex used for
cv_wait(). Fix the ENA control path code that has this problem.
Submitted by: Krishna Yenduri <kyenduri@brkt.com>
Reviewed by: Michal Krawczyk <mk@semihalf.com>
Tested by: Michal Krawczyk <mk@semihalf.com>
np [Thu, 10 May 2018 06:33:54 +0000 (06:33 +0000)]
cxgbe(4): Disable write-combined doorbells by default.
This had been the default behavior but was changed accidentally as part
of the recent iw_cxgbe+OFED overhaul. Fix another bug in that change
while here: the global knob affects all the adapters in the system and
should be left alone by per-adapter code.
MFC after: 3 days
Sponsored by: Chelsio Communications
jhibbits [Thu, 10 May 2018 03:59:48 +0000 (03:59 +0000)]
Fix PPC symbol resolution
Summary:
There were 2 issues that were preventing correct symbol resolution
on PowerPC/pseries:
1- memory corruption at chrp_attach() - this caused the inital
part of the symbol table to become zeroed, which would cause
the kernel linker to fail to parse it.
(this was probably zeroing out other memory parts as well)
2- DDB symbol resolution wasn't working because symtab contained
not relocated addresses but it was given relocated offsets.
Although relocating the symbol table fixed this, it broke the
linker, that already handled this case.
Thus, the fix for this consists in adding a new DDB macro:
DB_STOFFS(offs) that converts a (potentially) relocated offset
into one that can be compared with symbol table values.
araujo [Thu, 10 May 2018 03:50:20 +0000 (03:50 +0000)]
Rework CTL frontend & backend options to use nv(3), allow creating multiple
ioctl frontend ports.
This revision introduces two changes to CTL:
- Changes the way options are passed to CTL_LUN_REQ and CTL_PORT_REQ ioctls.
Removes ctl_be_arg structure and associated logic and replaces it with
nv(3)-based logic for passing in and out arguments.
- Allows creating multiple ioctl frontend ports using either ctladm(8) or
ctld(8).
New frontend ports are represented by /dev/cam/ctl<pp>.<vp> nodes, eg /dev/cam/ctl5.3.
Those device nodes respond only to CTL_IO ioctl.
New command-line options for ctladm:
# creates new ioctl frontend port with using free pp and vp=0
ctladm port -c
# creates new ioctl frontend port with pp=10 and vp=0
ctladm port -c -O pp=10
# creates new ioctl frontend port with pp=11 and vp=12
ctladm port -c -O pp=11 -O vp=12
# removes port with number 4 (it's a "targ_port" number, not pp number)
ctladm port -r -p 4
New syntax for ctl.conf:
target ... {
port ioctl/<pp>
...
}
target ... {
port ioctl/<pp>/<vp>
...
Note: Most of this work was made by jceel@, thank you.
Submitted by: jceel
Reworked by: myself
Reviewed by: mav (earlier versions and recently during the rework)
Obtained from: FreeNAS and TrueOS
Relnotes: Yes
Sponsored by: iXsystems Inc.
Differential Revision: https://reviews.freebsd.org/D9299
imp [Thu, 10 May 2018 02:31:48 +0000 (02:31 +0000)]
Simplify things a little
Rather than include a copy for memmove to call bcopy to call memcpy
(which handles overlapping copies), make memmove a strong reference to
memcpy to save the two calls.
oshogbo [Wed, 9 May 2018 20:53:38 +0000 (20:53 +0000)]
Introduce the 'n' flag for the geli attach command.
If the 'n' flag is provided the provided key number will be used to
decrypt device. This can be used combined with dryrun to verify if the key
is set correctly. This can be also used to determine which key slot we want to
change on already attached device.
oshogbo [Wed, 9 May 2018 20:51:16 +0000 (20:51 +0000)]
Change option dry-run from 'n' to 'C' in geli attach command.
'n' is used in other commands to define the key index.
We should be consistent with that.
'C' option is used by patch(1) to perform dryrun so lets use that.
mmacy [Wed, 9 May 2018 17:48:52 +0000 (17:48 +0000)]
Remove bogus panic
r333345 added a panic to the default case statement on the incorrect
premise that it should "never happen" when in fact it is simply a
different adapter version.
imp [Wed, 9 May 2018 14:11:35 +0000 (14:11 +0000)]
Minor style nits
Use full copyright year.
Remove 'All Rights Reserved' from new file (rights holder OK'd)
Minor #ifdef motion and #endif tagging
Remove __FBSDID macro from comments
kib [Wed, 9 May 2018 12:09:08 +0000 (12:09 +0000)]
Remove PG_U from the rest of the kernel pmap ptes.
Supposedly, they PG_U bits there were set to easier making some kernel
page accessible to userspace in-place. Since it was not used for the
whole existence of the amd64 pmap.c and current design of the shared
pages prefers double-mapping over the in-place access, remove PG_U
both from the direct map and KVA slots.
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
kib [Wed, 9 May 2018 12:03:40 +0000 (12:03 +0000)]
Remove PG_U from the recursive pte for kernel pmap' PML4 page.
This PML4 page is never used for the userspace process, so there is no
security implications. But the configuration trips SMAP check, which
should be corrected.
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Bring in some last changes in NAT64 implementation:
o Modify ipfw(8) to be able set any prefix6 not just Well-Known,
and also show configured prefix6;
o relocate some definitions and macros into proper place;
o convert nat64_debug and nat64_allow_private variables to be
VNET-compatible;
o add struct nat64_config that keeps generic configuration needed
to NAT64 code;
o add nat64_check_prefix6() function to check validness of specified
by user IPv6 prefix according to RFC6052;
o use nat64_check_private_ip4() and nat64_embed_ip4() functions
instead of nat64_get_ip4() and nat64_set_ip4() macros. This allows
to use any configured IPv6 prefixes that are allowed by RFC6052;
o introduce NAT64_WKPFX flag, that is set when IPv6 prefix is
Well-Known IPv6 prefix. It is used to reduce overhead to check this;
o modify nat64lsn_cfg and nat64stl_cfg structures to use nat64_config
structure. And respectivelly modify the rest of code;
o remove now unused ro argument from nat64_output() function;
o remove __FreeBSD_version ifdef, NAT64 was not merged to older versions;
o add commented -DIPFIREWALL_NAT64_DIRECT_OUTPUT flag to module's Makefile
as example.
emaste [Wed, 9 May 2018 11:17:01 +0000 (11:17 +0000)]
lld: Omit PT_NOTE for SHT_NOTE without SHF_ALLOC
A non-alloc note section should not have a PT_NOTE program header.
Found while linking ghc (Haskell compiler) with lld on FreeBSD. Haskell
emits a .debug-ghc-link-info note section (as the name suggests, it
contains link info) as a SHT_NOTE section without SHF_ALLOC set.
For this case ld.bfd does not emit a PT_NOTE segment for
.debug-ghc-link-info. lld previously emitted a PT_NOTE with p_vaddr = 0
and FreeBSD's rtld segfaulted when trying to parse a note at address 0.
kib [Wed, 9 May 2018 10:28:24 +0000 (10:28 +0000)]
Created static libc PIC/no-SSP library to be used by rtld.
Rtld is not compatible with SSP, and since we link libc_pic.a to rtld
to have the basic support like memory and string copy functions, we
have to both carefully limit libc use, and to provide the ssp support
shims. This change makes the libc use in rtld more straighforward but
still limited, and allows to remove the shims, to be done in the next
commit.
eadler [Wed, 9 May 2018 07:46:57 +0000 (07:46 +0000)]
enigma(1) Remove reference to PGP; modernize a bit
- the port was removed 2017-06-07 in r442847
- gnupg1 is the older version of gpg with legacy PGP support
- remove unused macro
- remove now-false statement about export restrictions
These filters reside in the card's memory instead of its TCAM and can be
configured via a new "hashfilter" subcommand in cxgbetool. Hash and
normal TCAM filters can be used together. The hardware does an
exact-match of packet fields for hash filters, unlike the masked match
performed for TCAM filters. Any T5/T6 card with memory can support at
least half a million hash filters. The sample config file with the
driver configures 512K of these, it is possible to double this to 1
million+ in some cases.
The chip does an exact-match of fields of incoming datagrams with hash
filters and performs the action configured for the filter if it matches.
The fields to match are specified in a "filter mask" in the firmware
config file. The filter mask always includes the 5-tuple (sip, dip,
sport, dport, ipproto). It can, optionally, also include any subset of
the filter mode (see filterMode and filterMask in the firmware config
file).
Exact values of the 5-tuple, the physical port, and VLAN tag would have
to be provided while setting up a hash filter with the chip
configuration above.
Hash filters support all actions supported by TCAM filters. A packet
that hits a hash filter can be dropped, let through (with optional
steering to a specific queue or RSS region), switched out of another
port (with optional L2 rewrite of DMAC, SMAC, VLAN tag), or get NAT'ed.
(Support for some of these will show up in the driver in a follow-up
commit very shortly).
imp [Wed, 9 May 2018 02:02:49 +0000 (02:02 +0000)]
Remove 'All Rights Reserved' from the collection copyright and templates.
The original Berkeley Software Distributions were made in the 1980's
and 1990's. At that time, the Buenos Ares Convention of 1910 was in
force in most of the countries in the Americas. It required an
affirmative statement of rights reservation, typically using 'All
Rights Reserved.' The Regents included this phrase in their copyright
notices to invoke this treaty to ensure maximal copyright protection.
In the 1990's, Latin America coutries ratifeid the Berne Convention on
copyrights which prohibited them from requiring an affirmative
statement to reserve the rights. When Nicaragua ratified in 2000, the
Buenos Ares Convention of 1910 was effectively repealed. This made all
the 'All Rights Reserved' phrases obsolete and legal deadweight most
of the time, and certainly in the cases removed here.
Since it's no longer required, and is in fact meaningless, core has
decided to dropped it from the project's collection copyright and
sample templates. It encourages other rights holders to do the same
after consultation with their legal department.
More see https://en.wikipedia.org/wiki/Buenos_Aires_Convention for
more information.
mmacy [Wed, 9 May 2018 00:00:47 +0000 (00:00 +0000)]
Reduce overhead of ktrace checks in the common case.
KTRPOINT() checks both if we are tracing _and_ if we are recursing within
ktrace. The second condition is only ever executed if ktrace is actually
enabled. This change moves the check out of the hot path in to the functions
themselves.
des [Tue, 8 May 2018 23:13:11 +0000 (23:13 +0000)]
Upgrade to OpenSSH 7.6p1. This will be followed shortly by 7.7p1.
This completely removes client-side support for the SSH 1 protocol,
which was already disabled in 12 but is still enabled in 11. For that
reason, we will not be able to merge 7.6p1 or newer back to 11.
imp [Tue, 8 May 2018 20:02:44 +0000 (20:02 +0000)]
Remove ignored command line options
The --device and --part command line options were planned for Linux
compatibility mode. However, that mode will never happen, so remove
them as last vestiges of a false start.
imp [Tue, 8 May 2018 19:43:57 +0000 (19:43 +0000)]
Improve printing the boot variables.
Print the boot variables in the order in the BootOrder variable, if it
exists, and then in verbose mode print any unreferneced BootXXXX
variables. If BootOrder isn't set, fall back to printing all the
variables.
tuexen [Tue, 8 May 2018 18:48:51 +0000 (18:48 +0000)]
When reporting ERROR or ABORT chunks, don't use more data
that is guaranteed to be contigous.
Thanks to Felix Weinrank for finding and reporting this bug
by fuzzing the usrsctp stack.
shurd [Tue, 8 May 2018 17:15:10 +0000 (17:15 +0000)]
iflib: print message when iflib_tx_structures_setup fails
Print a message when iflib_tx_structures_setup fails, like we do for
iflib_rx_structures_setup.
Now that we always print a message from within
iflib_qset_structures_setup when it fails, stop printing one in
iflib_device_register() at the call site.
Submitted by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed by: gallatin
MFC after: 3 days
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D15300
kib [Tue, 8 May 2018 17:00:34 +0000 (17:00 +0000)]
Prepare DB# handler for deferred trigger of watchpoints.
Since pop %ss/mov %ss instructions defer all interrupts and exceptions
for the next instruction, it is possible that the userspace watchpoint
trap executes on the first instruction of the kernel entry for
syscall/bpt.
In this case, DB# should be treated similarly to NMI: on amd64 we must
always load GSBASE even if the trap comes from kernel mode, and load
the kernel page table root into %cr3. Moreover, the trap must
use the dedicated stack, because we are still on the user stack when
trapped on syscall entry.
For i386, we must reload %cr3. The syscall instruction is not configured,
so there is no issue with executing on user stack when trapping.
Due to some CPU erratas it is not always possible to detect that the
userspace watchpoint triggered by inspecting %dr6. In trap(), compare the
trap %rip with the known unsafe entry points and if matched pretend that
the watchpoint did not fire at all.
Thank you to the MSRC Incident Response Team, and in particular Greg
Lenti and Nate Warfield, for coordinating the response to this issue
across multiple vendors.
Thanks to Computer Recycling at The Working Center of Kitchener for
making hardware available to allow us to test the patch on additional
CPU families.
Reviewed by: jhb
Discussed with: Matthew Dillon
Tested by: emaste
Sponsored by: The FreeBSD Foundation
Security: CVE-2018-8897
Security: FreeBSD-SA-18:06.debugreg