Ian Lepore [Sat, 10 May 2014 20:03:03 +0000 (20:03 +0000)]
When mapping device memory, use PTE_DEVICE rather than PTE_NOCACHE.
On armv4 these are defined as synonyms right now, but it's a bit ambiguous
what NOCACHE means (is buffering/write-combining also enabled or not?); this
is a first step towards replacing PTE_NOCACHE with a less ambiguous name.
Invalidate the cache for the named posix semaphore when opened and
actual file storing the semaphore object is different from the file
created on the first open. Store the file st_dev and st_ino members
of the struct stat in the semaphore structure on open, and compare
them with the attributes of the opened file to detect unlink and
re-creation.
This fixes an issue of sem_unlink(3) failing to flush the named entry
in the semaphore list for the current or remote process, making
sem_unlink(3) not correctly operating if the unlinked semaphore is
still opened.
Jilles Tjoelker [Sat, 10 May 2014 17:42:21 +0000 (17:42 +0000)]
sh: Don't discard getopts state on unknown option or missing argument.
When getopts finds an invalid option or a missing option-argument, it should
not reset its state and should set OPTIND as normal. This is an old ash bug
that was fixed long ago in dash. Our behaviour now matches most other
shells.
For the upgrade case in vm_fault_copy_entry(), when the entry does not
need COW and is writeable (i.e. becoming writeable due to the
mprotect(2) operation), do not create a new backing object for the
entry. The caller of the function is vm_map_protect(), the call is
made to ensure that wired entry has all pages resident and wired in
the top level object and to enable the write. We might need to copy
read-only page from some backing objects into the top object or remap
the page with the write allowed.
This fixes the issue with mishandling of the swap accounting when
read-only wired mapping is upgraded to write-enabled after fork. The
previous code path did not accounted the new object, but it creation
is redundand anyway and the change provides an optimization for the
non-common situation.
Reported by: markj
Suggested and reviewed by: alc (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Warner Losh [Sat, 10 May 2014 16:39:15 +0000 (16:39 +0000)]
bitrotted compat cruft removal:
o KMODDEPS warning is 15 years stale. Remove it.
o MK_CTF will always be defined now, so no need to test to see if it
is defined.
o no need to define MK_FORMAT_EXTENTIONS if undefined anymore.
Warner Losh [Sat, 10 May 2014 16:39:08 +0000 (16:39 +0000)]
grep -L returns non-zero status if none of the files had the pattern
in them. This is often the case, so just ignore the return
code. Actual errors that are found will also be detected downstream in
the rare cases where the return code is 2 instead of 1.
Warner Losh [Sat, 10 May 2014 16:39:00 +0000 (16:39 +0000)]
Sprinkle a few more .WAITs into the mix after csu, libc, msun and the
early built libraries. This should be sufficient for most cases and
has eliminated the issues I've seen with high -j builds. Races likely
still remain, but this knocks the problem down a notch.
Warner Losh [Sat, 10 May 2014 16:38:45 +0000 (16:38 +0000)]
We haven't done anything with _UPGRADING in ~forever (was present, but
not needed, in FreeBSD 6.x, and has been absent in newer versions).
This was needed to upgrade from 3.x -> 4.x, once upon a time.
Warner Losh [Sat, 10 May 2014 16:38:09 +0000 (16:38 +0000)]
Simplify clang ifdefs in the kernel a bit. Introduce
CFLAGS.${COMPILER_TYPE} to mirror userland. Be explicit about which
compiler needs something (not clang isn't necessarily gcc in the
future).
Warner Losh [Sat, 10 May 2014 16:38:03 +0000 (16:38 +0000)]
Eliminate EARLY_BUILD flag. It is redundant and means MK_CLANG_FULL=no
and MK_LLDB=no, so set those explicitly (now that we can do
that). Simplify tests for these variables as well, since we know they
will always be defined regardless of the phase of the build.
Warner Losh [Sat, 10 May 2014 16:37:53 +0000 (16:37 +0000)]
Migrate NO_WARN to MK_WARN. Support legacy NO_WARN usage. Remove a
check for EARLY_BUILD because it isn't necessary (MK_WARN=no will
always be defined for that).
Warner Losh [Sat, 10 May 2014 16:37:44 +0000 (16:37 +0000)]
Support, to the extent we generate proper command lines, compiling
with clang 3.3. Useful for test building -current on a -stable system
in individual directories. Potentially useful if we ever want to
support, say, gcc 4.8 or 4.9's new warnings when building with an
external toolchain (but such support not yet committed). Document
the bsd.compiler.mk interface.
Warner Losh [Sat, 10 May 2014 16:37:39 +0000 (16:37 +0000)]
Optionally allow building the historical FreeBSD make program and
install it as fmake. This defaults to no. This should be viewed as the
first step towards evental migration of this historic code to ports
and removal from the tree.
When printing the map with the ddb 'show procvm' command, do not dump
page queues for the backing objects. The queues are huge and clutter
the display, when mostly the map entries and its backing storage is
interesting.
The page queues can be seen with ddb 'show object' command.
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Alexander Motin [Sat, 10 May 2014 15:21:37 +0000 (15:21 +0000)]
Comment out some pointless device open/close around reading device IDs.
FreeBSD ZFS port unlike OpenSolaris does not use device IDs, and does not
implement respective devid_*() fuctions. It is pointless to open devices
just to close them back immediately.
Optimise host channel disabling:
- For non-periodic traffic we only need to wait two SOFs before
disabling the channel.
- Make sure we release the TX FIFO tracking level after the host
channel is disabled.
- Make sure the host channel state gets reset/disabled initially.
- Two minor code style changes.
Adrian Chadd [Sat, 10 May 2014 00:53:36 +0000 (00:53 +0000)]
Add in support to optionally pin the swi threads.
Under enough load, the swi's can actually be preempted and migrated
to other currently free cores. When doing RSS experiments, this lead
to the per-CPU TCP timers not lining up any more with the RX CPU said
flows were ending up on, leading to increased lock contention.
Since there was a little pushback on flipping them on by default,
I've left the default at "don't pin."
The other less obvious problem here is that the default swi
is also the same as the destination swi for CPU #0. So if one
pins the swi on CPU #0, there's no default floating swi.
A nice future project would be to create a separate swi for
the "default" floating swi, as well as per-CPU swis that are
(optionally) pinned.
Tested:
* parallel TCP tests (2 x 1g unfortunately for now);
CPU: Intel(R) Xeon(R) CPU E5-2650
Note:
This is based on some initial investigation into RSS/TCP stack lock
contention on FreeBSD-HEAD whilst at Netflix in January 2014.
Warner Losh [Fri, 9 May 2014 21:11:27 +0000 (21:11 +0000)]
Introduce kern.opts.mk to hold all the options for kernel module
builds. Include this in the right places. Make src.opts.mk optional so
that modules can be built outside of the tree in the ports system.
Ian Lepore [Fri, 9 May 2014 19:14:34 +0000 (19:14 +0000)]
Call idcache_inv_all from the AP core entry code before turning on the MMU.
Also, enable instruction and branch caches, which should be safe now that
they're properly initialized/invalidated first.
Multiple DWC OTG host mode related fixes and improvements:
- Rework how we allocate and free USB host channels, so that we only
allocate a channel if there is a real packet going out on the USB
cable.
- Use BULK type for control data and status, due to instabilities in
the HW it appears.
- Split FIFO TX levels into one for the periodic FIFO and one for the
non-periodic FIFO.
- Use correct HFNUM mask when scheduling host transactions. The HFNUM
register does not count the full 16-bit range.
- Correct START/COMPLETION slot for TT transactions. For INTERRUPT and
ISOCHRONOUS type transactions the hardware always respects the ODDFRM
bit, which means we need to allocate multiple host channels when
processing such endpoints, to not miss any so-called complete split
opportunities.
- When doing ISOCHRONOUS OUT transfers through a TT send all data
payload in a single ALL-burst. This deacreases the likelyhood for
isochronous data underruns.
- Fixed unbalanced unlock in case of "dwc_otg_init_fifo()" failure.
Michael Tuexen [Fri, 9 May 2014 14:15:48 +0000 (14:15 +0000)]
Fix a logic bug which prevented the sending of UDP packet with 0 checksum.
This bug was introduced in r264212 and should be X-MFCed with that
revision, if UDP-Lite support if MFCed.
When a GPIO pin is set to be turned on by kernel hints (hint.gpio.X.pinon)
make sure the GPIO pin is configured as an output as this is not always the
case.
Fix a bug on ip17x switch initialization which will fail as soon as you
disable the debug and diagnosis options from current. We must wait 2ms
after the switch reset and not 2us.
Warner Losh [Fri, 9 May 2014 04:49:48 +0000 (04:49 +0000)]
We have to include bsd.opts.mk (included in bsd.own.mk) after
/etc/src.conf so that options set there will affect the options
defined in bsd.opts.mk. Fix a few comments while I'm here.
Fix TLR (Transport Layer Retry) support in the mps(4) and mpr(4) drivers.
TLR is necessary for reliable communication with SAS tape drives.
This was broken by change 246713 in the mps(4) driver. It changed the
cm_data field for SCSI I/O requests to point to the CCB instead of the data
buffer. So, instead, look at the CCB's data pointer to determine whether
or not we're talking to a tape drive.
Also, take the residual into account to make sure that we don't go off the
end of the request.
MFC after: 3 days
Sponsored by: Spectra Logic Corporation
Ian Lepore [Thu, 8 May 2014 20:02:38 +0000 (20:02 +0000)]
Consolitate all the AP core startup stuff under a single #ifdef SMP block.
Remove some other ifdefs that came in with a copy/paste that mean basically
"if this processor supports multicore stuff", because if you're starting up
an AP core... it does.
Modify Copyright information and other strings to reflect Qlogic Corporation's purchase of Broadcom's NetXtreme business.
Added clean option to Makefile
Submitted by:David C Somayajulu (davidcs@freebsd.org) QLogic Corporation
MFC after:5 days
Alan Somers [Thu, 8 May 2014 19:10:04 +0000 (19:10 +0000)]
Incorporate feedback from bde and jilles regarding r265472 to dd(1).
* Don't use sysexits.h. Just exit 1 on error and 0 otherwise.
* Don't sacrifice precision by converting the output of clock_gettime() to a
double and then comparing the results. Instead, subtract the values of
the two clock_gettime() calls, then convert to double.
* Don't use CLOCK_MONOTONIC_PRECISE. It's an unportable synonym for
CLOCK_MONOTONIC.
* Use more appropriate names for some local variables.
* In the summary message, round elapsed time to the nearest microsecond.
Reported by: bde, jilles
MFC after: 3 days
X-MFC-With: 265472
Michael Tuexen [Thu, 8 May 2014 17:27:46 +0000 (17:27 +0000)]
For some UDP packets (for example with 200 byte payload) and IP options,
the IP header and the UDP header are not in the same mbuf.
Add code to in_delayed_cksum() to deal with this case.
Alexander Motin [Thu, 8 May 2014 16:59:36 +0000 (16:59 +0000)]
Import adapted OpenSolaris' thread pool API implementation.
The thread pool is used by libzfs to implement parallel disk scanning.
Without this change our dummy wrapper made `zpool import ZZZ` command to
scan all disks sequentially from the single thread when searching for pools.
This change makes it use two threads per CPU, same as in OpenSolaris.
On system with 200 HDDs this change reduces ZFS pool import time from 35
to 22 seconds.
Warner Losh [Thu, 8 May 2014 15:58:34 +0000 (15:58 +0000)]
Add usr/share/mk/src.opts.mk to obsolete files. It never should have
been installed in the first place, and it must be removed ASAP or
weird build errors may start happening in the future if this file is
ever taken from the installed system. Add note to UPDATING.
Since radix has been ignoring sa_family in passed sockaddrs,
no one ever has bothered filling valid sa_family in netmasks.
Additionally, radix adjusts sa_len field in every netmask not to
compare zero bytes at all.
This leads us to rt_mask with sa_family of AF_UNSPEC (-1) and
arbitrary sa_len field (0 for default route, for example).
However, rtsock have been passing that rt_mask intact for ages,
requiring all rtsock consumers to make ther own local hacks.
We even have unfixed on in base:
do `route -n monitor` in one window and issue `route -n get addr`
for some directly-connected address. You will probably see the following:
got message of size 304 on Thu May 8 15:06:06 2014
RTM_GET: Report Metrics: len 304, pid: 30493, seq 1, errno 0, flags:<UP,DONE,PINNED>
locks: inits:
sockaddrs: <DST,GATEWAY,NETMASK,IFP,IFA>
10.0.0.0 link#1 (255) ffff ffff ff em0:8.0.27.c5.29.d4 10.0.0.92
_________________^^^^^^^^^^^^^^^^^^
after the change:
got message of size 312 on Thu May 8 15:44:07 2014
RTM_GET: Report Metrics: len 312, pid: 2895, seq 1, errno 0, flags:<UP,DONE,PINNED>
locks: inits:
sockaddrs: <DST,GATEWAY,NETMASK,IFP,IFA>
10.0.0.0 link#1 255.255.255.0 em0:8.0.27.c5.29.d4 10.0.0.92
_________________^^^^^^^^^^^^^^^^^^
Mark Johnston [Thu, 8 May 2014 03:43:18 +0000 (03:43 +0000)]
Re-apply r248644. This fixes an annoying problem which caused dtrace -c to
fail to attach to stripped binaries. With the _r_debug_postinit symbol,
dtrace(1) can now set a breakpoint in the victim process after it has
registered its DOF table(s) with the kernel. r_debug_state cannot be used
for this purpose since it is called before DOF is made available, in which
case dtrace(1) cannot create USDT probes before the program begins
execution.
Mark Johnston [Thu, 8 May 2014 03:26:25 +0000 (03:26 +0000)]
Handle the different event types properly in rd_event_addr(). In particular,
with r265456 _r_debug_postinit can be used for RD_POSTINIT events. rtld(1)
uses r_debug_state for dl state transitions, so we use its address for
RD_DLACTIVITY events.
Ed Maste [Wed, 7 May 2014 21:16:47 +0000 (21:16 +0000)]
Handle ELF files with 65280 or more sections
If e_shnum or e_shstrndx are at least SHN_LORESERVE (0xff00) then an
escape value is used to indicate that the actual value is found in one
of section 0's fields.