yongari [Tue, 6 Dec 2011 00:58:42 +0000 (00:58 +0000)]
Make et_probe() return BUS_PROBE_DEFAULT such that allow other
driver that has high precedence for the controller override et(4).
Add missing callout_drain(9) in device detach and rework detach
routine. While I'm here use rman_get_rid(9) instead of using
cached resource id because bus methods are free to change the
id.
yongari [Tue, 6 Dec 2011 00:18:37 +0000 (00:18 +0000)]
et(4) supports VLAN oversized frame so correctly set header length.
While I'm here remove initializing if_mtu, it is set by
ether_ifattach(9). Also move callout_init_mtx(9) to the right below
driver lock initialization.
yongari [Mon, 5 Dec 2011 22:55:52 +0000 (22:55 +0000)]
Fix alt(4) support. Also add check for number of available TX
descriptors before trying to send frames. If we're not able to
send a frame, make sure to prepend it to if_snd queue such that
alt(4) should work.
While I'm here prefer ETHER_BPF_MTAP to BPF_MTAP. ETHER_BPF_MTAP
should be used for controllers that support VLAN hardware tag
insertion. The controller supports VLAN tag insertion but lacks
VLAN tag stripping in RX path though.
marius [Mon, 5 Dec 2011 21:38:45 +0000 (21:38 +0000)]
- In mii_attach(9) just set the driver for a newly added miibus(4) instance
before calling bus_enumerate_hinted_children(9) (which is the minimum for
this to work) instead of fully probing it so later on we can just call
bus_generic_attach(9) on the parent of the miibus(4) instance. The latter
is necessary in order to work around what seems to be a bzzarre race in
newbus affecting a few machines since r227687, causing no driver being
probed for the newly added miibus(4) instance. Presumably this is the
same race that was the motivation for the work around done in r215348.
Reported and tested by: yongari
- Revert the removal of a static in r221913 in order to help compilers to
produce more optimal code.
trociny [Mon, 5 Dec 2011 19:34:02 +0000 (19:34 +0000)]
Protect kern.proc.auxv and kern.proc.ps_strings sysctls with p_candebug().
Citing jilles:
If we are ever going to do ASLR, the AUXV information tells an attacker
where the stack, executable and RTLD are located, which defeats much of
the point of randomizing the addresses in the first place.
Given that the AUXV information seems to be used by debuggers only anyway,
I think it would be good to move it to p_candebug() now.
The full virtual memory maps (KERN_PROC_VMMAP, procstat -v) are already
under p_candebug().
alc [Mon, 5 Dec 2011 18:29:25 +0000 (18:29 +0000)]
Introduce vm_reserv_alloc_contig() and teach vm_page_alloc_contig() how to
use superpage reservations. So, for the first time, kernel virtual memory
that is allocated by contigmalloc(), kmem_alloc_attr(), and
kmem_alloc_contig() can be promoted to superpages. In fact, even a series
of small contigmalloc() allocations may collectively result in a promoted
superpage.
Eliminate some duplication of code in vm_reserv_alloc_page().
Change the type of vm_reserv_reclaim_contig()'s first parameter in order
that it be consistent with other vm_*_contig() functions.
yongari [Mon, 5 Dec 2011 18:10:43 +0000 (18:10 +0000)]
Fix off by one error in mbuf access. Previously it caused panic.
While I'm here use NULL to compare mbuf pointer and add additional
check for zero length mbuf before accessing the mbuf.
Get rid of kludgy per-descriptor state handling in acpi_apm.
Where i386/bios/apm.c requires no per-descriptor state, the ACPI version
of these device do. Instead of using hackish clone lists that leave
stale device nodes lying around, use the cdevpriv API.
luigi [Mon, 5 Dec 2011 15:33:13 +0000 (15:33 +0000)]
add netmap support for "em", "lem", "igb" and "re".
On my hardware, "em" in netmap mode does about 1.388 Mpps
on one card (on an Asus motherboard), and 1.1 Mpps on another
card (PCIe bus). Both seem to be NIC-limited, because
i have the same rate even with the CPU running at 150 MHz.
On the "re" driver the tx throughput is around 420-450 Kpps
on various (8111C and the like) chipsets. On the Rx side
performance seems much better, and i can receive the full
load generated by the "em" cards.
luigi [Mon, 5 Dec 2011 12:06:53 +0000 (12:06 +0000)]
1. Fix the handling of link reset while in netmap more.
A link reset now is completely transparent for the netmap client:
even if the NIC resets its own ring (e.g. restarting from 0),
the client will not see any change in the current rx/tx positions,
because the driver will keep track of the offset between the two.
2. make the device-specific code more uniform across different drivers
There were some inconsistencies in the implementation of the netmap
support routines, now drivers have been aligned to a common
code structure.
3. import netmap support for ixgbe . This is implemented as a very
small patch for ixgbe.c (233 lines, 11 chunks, mostly comments:
in total the patch has only 54 lines of new code) , as most of
the code is in an external file sys/dev/netmap/ixgbe_netmap.h ,
following some initial comments from Jack Vogel about making
changes less intrusive.
(Note, i have emailed Jack multiple times asking if he had
comments on this structure of the code; i got no reply so
i assume he is fine with it).
Support for other drivers (em, lem, re, igb) will come later.
"ixgbe" is now the reference driver for netmap support. Both the
external file (sys/dev/netmap/ixgbe_netmap.h) and the device-specific
patches (in sys/dev/ixgbe/ixgbe.c) are heavily commented and should
serve as a reference for other device drivers.
Tested on i386 and amd64 with the pkt-gen program in tools/tools/netmap,
the sender does 14.88 Mpps at 1050 Mhz and 14.2 Mpps at 900 MHz
on an i7-860 with 4 cores and 82599 card. Haven't tried yet more
aggressive optimizations such as adding 'prefetch' instructions
in the time-critical parts of the code.
kib [Sun, 4 Dec 2011 19:25:49 +0000 (19:25 +0000)]
Initialize fifoinfo fi_wgen field on open. The only important is the
difference between fi_wgen and f_seqcount, so the change is purely
cosmetic, but it makes the code easier to understand.
rmacklem [Sun, 4 Dec 2011 16:33:04 +0000 (16:33 +0000)]
This patch adds a sysctl to the NFSv4 server which optionally disables the
check for a UTF-8 compliant file name. Enabling this sysctl results in
an NFSv4 server that is non-RFC3530 compliant, therefore it is not enabled
by default. However, enabling this sysctl results in NFSv3 compatible
behaviour and fixes the problem reported by "dan at sunsaturn.com"
to freebsd-current@ on Nov. 14, 2011 under the subject "NFSV4 readlink_stat".
Tested by: dan at sunsaturn.com
Reviewed by: zack
MFC after: 2 weeks
dumbbell [Sun, 4 Dec 2011 14:44:31 +0000 (14:44 +0000)]
Support domain-search in dhclient(8)
The "domain-search" option (option 119) allows a DHCP server to publish
a list of implicit domain suffixes used during name lookup. This option
is described in RFC 3397.
For instance, if the domain-search option says:
".example.org .example.com"
and one wants to resolve "foobar", the resolver will try:
1. "foobar.example.org"
2. "foobar.example.com"
The file /etc/resolv.conf is updated with a "search" directive if the
DHCP server provides "domain-search".
A regression test suite is included in this patch under
tools/regression/sbin/dhclient.
PR: bin/151940
Sponsored by Yakaz (http://www.yakaz.com)
adrian [Sun, 4 Dec 2011 11:55:33 +0000 (11:55 +0000)]
Allow the i2c node requirements to be slightly relaxed.
These realtek switch PHYs speak a variant of i2c with some slightly
modified handling.
From the submitter, slightly modified now that some further digging
has been done:
The I2C framework makes a assumption that the read/not-write bit of the first
byte (the address) indicates whether reads or writes are to follow.
The RTL8366 family uses the bus: after sending the address+read/not-write byte,
two register address bytes are sent, then the 16-bit register value is sent
or received. While the register write access can be performed as a 4-byte
write, the read access requires the read bit to be set, but the first two bytes
for the register address then need to be transmitted.
This patch maintains the i2c protocol behaviour but allows it to be relaxed
(for these kinds of switch PHYs, and whatever else Realtek may do with this
almost-but-not-quite i2c bus) - by setting the "strict" hint to 0.
The "strict" hint defaults to 1.
marius [Sat, 3 Dec 2011 13:51:57 +0000 (13:51 +0000)]
Revert r225889 a bit. While it's correct that in total store order there's
no need to additionally add CPU memory barriers to the acquire variants of
atomic(9), these are documented to also include compiler memory barriers.
So add the latter, which were previously included by using membar(), back.
rmacklem [Sat, 3 Dec 2011 02:27:26 +0000 (02:27 +0000)]
Post r223774, the NFSv4 client no longer has multiple instances
of the same lock_owner4 string. As such, the handling of cleanup
of lock_owners could be simplified. This simplification permitted
the client to do a ReleaseLockOwner operation when the process that
the lock_owner4 string represents, has exited. This permits the
server to release any storage related to the lock_owner4 string
before the associated open is closed. Without this change, it
is possible to exhaust a server's storage when a long running
process opens a file and then many child processes do locking
on the file, because the open doesn't get closed. A similar patch
was applied to the Linux NFSv4 client recently so that it wouldn't
exhaust a server's storage.
marius [Fri, 2 Dec 2011 22:03:27 +0000 (22:03 +0000)]
It doesn't make much sense to check whether child is NULL after already
having dereferenced it. We either should generally check the device_t's
supplied to bus functions before using them (which we seem to virtually
never do) or just assume that they are not NULL.
While at it make this code fit 78 columns.
marius [Fri, 2 Dec 2011 21:19:14 +0000 (21:19 +0000)]
- In device_probe_child(9) check the return value of device_set_driver(9)
when actually setting a driver as especially ENOMEM is fatal in these
cases.
- Annotate other calls to device_set_devclass(9) and device_set_driver(9)
without the return value being checked and that are okay to fail.
jhb [Fri, 2 Dec 2011 19:59:46 +0000 (19:59 +0000)]
When changing the user priority of a thread, change the real priority
in addition to the user priority for threads whose current real priority
is equal to the previous user priority or if the new priority is a
real-time priority. This allows priority changes of other threads to
have an immediate effect.
jchandra [Fri, 2 Dec 2011 15:24:39 +0000 (15:24 +0000)]
Fix OF_finddevice error return value in case of FDT.
According to the open firmware standard, finddevice call has to return
a phandle with value of -1 in case of error.
This commit is to:
- Fix the FDT implementation of this interface (ofw_fdt_finddevice) to
return (phandle_t)-1 in case of error, instead of 0 as it does now.
- Fix up the callers of OF_finddevice() to compare the return value with
-1 instead of 0 to check for errors.
- Since phandle_t is unsigned, the return value of OF_finddevice should
be checked with '== -1' rather than '<= 0' or '> 0', fix up these cases
as well.
mav [Fri, 2 Dec 2011 12:52:33 +0000 (12:52 +0000)]
Add hw.ahci.force tunable to control whether AHCI drivers should attach
to known AHCI-capable chips (AMD/NVIDIA), configured for legacy emulation.
Enabled by default to get additional performance and functionality of AHCI
when it can't be enabled by BIOS. Can be disabled to honor BIOS settings if
needed for some reason.
nwhitehorn [Fri, 2 Dec 2011 02:05:26 +0000 (02:05 +0000)]
Prevent user astonishment by providing the shell option at the end, after
any installer-provided configuration files have been copied. This allows
users to edit their fstab, if desired, and to see what the installer has
placed in rc.conf.
obrien [Fri, 2 Dec 2011 01:06:33 +0000 (01:06 +0000)]
Tweak the r137233 fix to r136283 -- Code was making two send() attempts
vs. the comment documented "If we are working with a privileged socket,
then take only one attempt". Make the code match.
Furthermore, critical privileged applications that [over] log a vast amount
can look like a DoS to this code. Given it's unlikely the single reattempted
send() will succeeded, avoid usurping the scheduler in a library API for a
single non-critical facility in critical applications.
kensmith [Fri, 2 Dec 2011 00:38:47 +0000 (00:38 +0000)]
Add a screen that asks if the user would like to enable crash dumps,
giving them a very brief description of the trade-offs. Whether the
user opts in or out add an entry to what will become /etc/rc.conf
explaining what dumpdev is and how to turn on/off crash dumps. The folks
who handle interacting with users submitting PRs have asked for this.
jhb [Thu, 1 Dec 2011 18:46:28 +0000 (18:46 +0000)]
Enhance the sequential access heuristic used to perform readahead in the
NFS server and reuse it for writes as well to allow writes to the backing
store to be clustered.
- Use a prime number for the size of the heuristic table (1017 is not
prime).
- Move the logic to locate a heuristic entry from the table and compute
the sequential count out of VOP_READ() and into a separate routine.
- Use the logic from sequential_heuristic() in vfs_vnops.c to update the
seqcount when a sequential access is performed rather than just
increasing seqcount by 1. This lets the clustering count ramp up
faster.
- Allow for some reordering of RPCs and if it is detected leave the current
seqcount as-is rather than dropping back to a seqcount of 1. Also,
when out of order access is encountered, cut seqcount in half rather than
dropping it all the way back to 1 to further aid with reordering.
- Fix the new NFS server to properly update the next offset after a
successful VOP_READ() so that the readahead actually works.
Some of these changes came from an earlier patch by Bjorn Gronwall that was
forwarded to me by bde@.
kib [Thu, 1 Dec 2011 11:36:41 +0000 (11:36 +0000)]
If alloc_unr() call in the pipe_create() failed, then pipe->pipe_ino is
-1. But, because ino_t is unsigned, this case was not covered by the
test ino > 0 in pipeclose(), leading to the free_unr(-1). Fix it by
explicitely comparing with 0 and -1. [1]
Do no access freed memory, the inode number was cached to prevent access
to cpipe after it possibly was freed, but I failed to commit the right
patch.
Noted by: gianni [1]
Pointy hat to: kib
MFC after: 3 days
lstewart [Thu, 1 Dec 2011 07:41:30 +0000 (07:41 +0000)]
Add a man page describing the feed-forward clock kernel support, including how
to enable and configure the functionality.
Committed on behalf of Julien Ridoux and Darryl Veitch from the University of
Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward
Clock Synchronization Algorithms" project.
For more information, see http://www.synclab.org/radclock/
Discussed with: Julien Ridoux (jridoux at unimelb edu au)
Submitted by: Julien Ridoux (jridoux at unimelb edu au)
lstewart [Thu, 1 Dec 2011 07:19:13 +0000 (07:19 +0000)]
Revise the sysctl handling code and restructure the hierarchy of sysctls
introduced when feed-forward clock support is enabled in the kernel:
- Rename the "choice" variable to "available".
- Streamline the implementation of the "active" variable's sysctl handler
function.
- Create a kern.sysclock sysctl node for general sysclock related configuration
options. Place the "available" and "active" variables under this node.
- Create a kern.sysclock.ffclock sysctl node for feed-forward clock specific
configuration options. Place the "version" and "ffcounter_bypass" variables
under this node.
- Tweak some of the description strings.
Discussed with: Julien Ridoux (jridoux at unimelb edu au)
fjoe [Wed, 30 Nov 2011 18:11:49 +0000 (18:11 +0000)]
- CTF knob is now implemented using common scheme: MK_CTF=yes/no is
defined based on WITH/WITHOUT_CTF settings, default is WITHOUT_CTF,
NO_CTF overrides WITH_CTF (used by Makefile.inc1)
- CTFCONVERT_CMD/NORMAL_CTFCONVERT are now defined to empty string
if make(1) can handle empty commands
fjoe [Wed, 30 Nov 2011 18:07:38 +0000 (18:07 +0000)]
- Fix segmentation fault when running "+command" when run with -jX -n due
to Compat_RunCommand() being called with `cmd' that is not on the node->commands
list
- Make ellipsis ("..." command) handling consistent: check for "..." command
in job make after variables expansion to match compat make behavior
- Fix empty command handling (after variables expansion and @+- modifiers
are processed): now empty commands are ignored in compat make and are not
printed in job make case
- Bump MAKE_VERSION to 5-2011-11-30-0
fjoe [Wed, 30 Nov 2011 13:33:09 +0000 (13:33 +0000)]
Generate ${NORMAL_CTFCONVERT} invocation without '@' modifier:
- ${NORMAL_CC} is also invoked without '@'
- Userland CTF support was changed previously to echo ctfconvert invocations too
fjoe [Wed, 30 Nov 2011 05:49:17 +0000 (05:49 +0000)]
Add three execution tests for make(1):
- plus: execute "+command" when run with -jX -n
- ellipsis: ellipsis ("...") from variable
- empty: empty command (from variable)
Currently make(1) fails all three tests:
- plus: segmentation fault due to incorrect command list handling
- ellipsis: works in compat mode but fails in job (-jX) mode
- empty:
- compat mode: prints error message
- job mode: works but prints empty string
fjoe [Tue, 29 Nov 2011 16:34:44 +0000 (16:34 +0000)]
- fix WITH_CTF when specified in /etc/src.conf [1]
- CTFCONVERT_CMD=... is a hack (should be defined to empty string instead):
make(1) should be taught to ignore empty commands silently in compat mode
(as it does in !compat mode, GNU make also silently ignores empty commands)
and to skip printing empty commands in !compat mode
- config(8) should generate ${NORMAL_CTFCONVERT} invocation without '@':
this will allow to simplify kern.pre.mk even more and lessen the number
of shell invocations during kernel build when CTF is turned off
- WITH_CTF can now be converted to usual MK_CTF=yes/no infrastructure
kib [Tue, 29 Nov 2011 13:07:32 +0000 (13:07 +0000)]
Hide the internals of vm_page_lock(9) from the loadable modules.
Since the address of vm_page lock mutex depends on the kernel options,
it is easy for module to get out of sync with the kernel.
No vm_page_lockptr() accessor is provided for modules. It can be added
later if needed, unless proper KPI is developed to serve the needs.
lstewart [Tue, 29 Nov 2011 08:43:04 +0000 (08:43 +0000)]
Make sysclock_active publicly available to external consumers.
Committed on behalf of Julien Ridoux and Darryl Veitch from the University of
Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward
Clock Synchronization Algorithms" project.
For more information, see http://www.synclab.org/radclock/
Discussed with: Julien Ridoux (jridoux at unimelb edu au)
Submitted by: Julien Ridoux (jridoux at unimelb edu au)
lstewart [Tue, 29 Nov 2011 08:33:40 +0000 (08:33 +0000)]
Do away with the somewhat clunky sysclock_ops structure and associated code,
reimplementing the [get]{bin,nano,micro}[up]time() wrapper functions in terms of
the new "fromclock" API instead.
Committed on behalf of Julien Ridoux and Darryl Veitch from the University of
Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward
Clock Synchronization Algorithms" project.
For more information, see http://www.synclab.org/radclock/
Discussed with: Julien Ridoux (jridoux at unimelb edu au)
Submitted by: Julien Ridoux (jridoux at unimelb edu au)
fjoe [Tue, 29 Nov 2011 08:20:23 +0000 (08:20 +0000)]
Allow NO_FOO to override WITH_FOO that could be specified in /etc/src.conf.
This is required to override knobs (e.g. WITH_PROFILE) during buildworld
stages in Makefile.inc1 (otherwise the build is stopped due to both WITH_FOO
and WITHOUT_FOO defined).