Warner Losh [Wed, 16 Dec 2009 02:54:34 +0000 (02:54 +0000)]
Add NO_KERNELOBJ flag, similar to NO_KERNEL{CONFIG,DEPEND,CLEAN},
which disables doing a make obj. Use it when you know it will work
only. KERNFAST now implies NO_KERNELOBJ, since you don't need to keep
doing obj when doing incremental kernel builds.
Luigi Rizzo [Tue, 15 Dec 2009 21:24:12 +0000 (21:24 +0000)]
more splitting of ip_fw2.c, now extract the 'table' routines
and the sockopt routines (the upper half of the kernel).
Whoever is the author of the 'table' code (Ruslan/glebius/oleg ?)
please change the attribution in ip_fw_table.c. I have copied
the copyright line from ip_fw2.c but it carries my name and I have
neither written nor designed the feature so I don't deserve
the credit.
Gavin Atkinson [Tue, 15 Dec 2009 20:44:12 +0000 (20:44 +0000)]
ifconfig(8) is documented to take a ISO 3166-1 country code to set the
regulatory domain with the "country" parameter, but will also take a full
country name. The man page warns that only the ISO code is unambiguous.
In reality, however, the first match on either would be accepted, leading
to "DE" being interpreted as the "DEBUG" country rather than Germany, and
"MO" selecting Morocco rather than the correct country, Macau.
Fix this by always checking for an ISO CC match first, and only search on
the full country name if that fails.
PR: bin/140571
Tested by: Dirk Meyer dirk.meyer dinoex.sub.org
Reviewed by: sam
Approved by: ed (mentor)
MFC after: 1 month
Luigi Rizzo [Tue, 15 Dec 2009 16:15:14 +0000 (16:15 +0000)]
Start splitting ip_fw2.c and ip_fw.h into smaller components.
At this time we pull out from ip_fw2.c the logging functions, and
support for dynamic rules, and move kernel-only stuff into
netinet/ipfw/ip_fw_private.h
No ABI change involved in this commit, unless I made some mistake.
ip_fw.h has changed, though not in the userland-visible part.
Files touched by this commit:
conf/files
now references the two new source files
netinet/ip_fw.h
remove kernel-only definitions gone into netinet/ipfw/ip_fw_private.h.
netinet/ipfw/ip_fw_private.h
new file with kernel-specific ipfw definitions
netinet/ipfw/ip_fw_log.c
ipfw_log and related functions
netinet/ipfw/ip_fw_dynamic.c
code related to dynamic rules
netinet/ipfw/ip_fw2.c
removed the pieces that goes in the new files
netinet/ipfw/ip_fw_nat.c
minor rearrangement to remove LOOKUP_NAT from the
main headers. This require a new function pointer.
A bunch of other kernel files that included netinet/ip_fw.h now
require netinet/ipfw/ip_fw_private.h as well.
Not 100% sure i caught all of them.
Luigi Rizzo [Tue, 15 Dec 2009 09:46:27 +0000 (09:46 +0000)]
implement a new match option,
lookup {dst-ip|src-ip|dst-port|src-port|uid|jail} N
which searches the specified field in table N and sets tablearg
accordingly.
With dst-ip or src-ip the option replicates two existing options.
When used with other arguments, the option can be useful to
quickly dispatch traffic based on other fields.
Doug Barton [Tue, 15 Dec 2009 05:14:39 +0000 (05:14 +0000)]
The named process needs to have a "working directory" that it can
write to. This is specified in "options { directory }" in named.conf.
So, create /etc/namedb/working with appropriate permissions, and
update the entry in named.conf to match.
In addition to specifying the working directory, file and path names
in named.conf can be specified relative to the directory listed.
However, since that directory is now different from /etc/namedb
(where the configuration, zone, rndc.*, and other files are located)
further update named.conf to specify all file names with fully
qualified paths. Also update the comment about file and path names
so users know this should be done for all file/path names in the file.
This change will eliminate the 'working directory is not writable'
messages at boot time without sacrificing security. It will also
allow for features in newer versions of BIND (9.7+) to work as
designed.
Pyun YongHyeon [Mon, 14 Dec 2009 22:55:20 +0000 (22:55 +0000)]
Tell upper layer vge(4) supports long frames. This should be done
after ether_ifattach(), as ether_ifattach() initializes it with
ETHER_HDR_LEN.
While I'm here remove setting if_mtu, it's already handled in
ether_ifattach().
Pyun YongHyeon [Mon, 14 Dec 2009 22:20:05 +0000 (22:20 +0000)]
Whenever link state change interrupt is raised, vge_tick() is
called and vge(4) used to drive auto-negotiation timer(mii_tick) in
vge_tick(). Therefore the mii_tick was not called for every hz such
that auto-negotiation complete was never handled in vge(4).
Use mii_pollstat to extract current negotiated speed/duplex instead
of mii_tick. The latter is valid only for auto-negotiation case.
While I'm here change the confusing function name vge_tick() to
vge_link_statchg().
Pyun YongHyeon [Mon, 14 Dec 2009 21:16:02 +0000 (21:16 +0000)]
We don't have to reload EEPROM in vge_reset(). Because vge_reset()
is called in vge_init_lock(), vge(4) always used to reload EEPROM.
Also add more comment why vge(4) clears VGE_CHIPCFG0_PACPI bit.
While I'm here add missing new line in vge_reset().
Pyun YongHyeon [Mon, 14 Dec 2009 20:39:42 +0000 (20:39 +0000)]
Save PHY address by reading VGE_MIICFG register. For PCIe
controllers(VT613x), we assume the PHY address is 1.
Use the saved PHY address in MII register access routines and
remove accessing VGE_MIICFG register.
While I'm here save PCI express capability register which will be
used in near future.
Pyun YongHyeon [Mon, 14 Dec 2009 20:17:53 +0000 (20:17 +0000)]
Introduce vge_flags member in softc. The vge_flags member will
record device specific bits. Remove vge_link and use vge_flags.
While here, move clearing link state before mii_mediachg() as
mii_mediachg() may affect link state.
Luigi Rizzo [Mon, 14 Dec 2009 20:12:51 +0000 (20:12 +0000)]
Move the scan for max_keylen into route.c::route_init(),
and make max_keylen an argument for rn_init().
This removes an unnecessary dependency on domain.h from radix.c
Gavin Atkinson [Mon, 14 Dec 2009 19:18:02 +0000 (19:18 +0000)]
Don't panic on failure to attach if we fail before or during the
if_alloc() of ifp. This fixes the panic reported in the PR, but
not the attach failure.
PR: kern/139079
Tested by: Steven Noonan <steven uplinklabs.net>
Reviewed by: thompsa
Approved by: ed (mentor)
MFC after: 2 weeks`
Pyun YongHyeon [Mon, 14 Dec 2009 19:08:11 +0000 (19:08 +0000)]
Clear VGE_TXDESC_Q bit for transmitted frames. The VGE_TXDESC_Q bit
seems to work like a tag that indicates 'not list end' of queued
frames. Without having a VGE_TXDESC_Q bit indicates 'list end'. So
the last frame of multiple queued frames has no VGE_TXDESC_Q bit.
The hardware has peculiar behavior for VGE_TXDESC_Q bit handling.
If the VGE_TXDESC_Q bit of descriptor was set the controller would
fetch next descriptor. However if next descriptor's OWN bit was
cleared but VGE_TXDESC_Q was set, it could confuse controller.
Clearing VGE_TXDESC_Q bit for transmitted frames ensure correct
behavior.
Pyun YongHyeon [Mon, 14 Dec 2009 18:44:23 +0000 (18:44 +0000)]
Overhaul bus_dma(9) usage and fix various things.
o Separate TX/RX buffer DMA tag from TX/RX descriptor ring DMA tag.
o Separate RX buffer DMA tag from common buffer DMA tag. RX DMA
tag has different restriction compared to TX DMA tag.
o Add 40bit DMA address support.
o Adjust TX/RX descriptor ring alignment to 64 bytes from 256
bytes as documented in datasheet.
o Added check to ensure TX/RX ring reside within a 4GB boundary.
Since TX/RX ring shares the same high address register they
should have the same high address.
o TX/RX side bus_dmamap_load_mbuf_sg(9) support.
o Add lock assertion to vge_setmulti().
o Add RX spare DMA map to recover from DMA map load failure.
o Add optimized RX buffer handler, vge_discard_rxbuf which is
activated when vge(4) sees bad frames.
o Don't blindly update VGE_RXDESC_RESIDUECNT register. Datasheet
says the register should be updated only when number of
available RX descriptors are multiple of 4.
o Use __NO_STRICT_ALIGNMENT instead of defining VGE_FIXUP_RX which
is only set for i386 architecture. Previously vge(4) also
performed expensive copy operation to align IP header on amd64.
This change should give RX performance boost on amd64
architecture.
o Don't reinitialize controller if driver is already running. This
should reduce number of link state flipping.
o Since vge(4) drops a driver lock before passing received frame
to upper layer, make sure vge(4) is still running after
re-acquiring driver lock.
o Add second argument count to vge_rxeof(). The argument will
limit number of packets could be processed in RX handler.
o Rearrange vge_rxeof() not to allocate RX buffer if received
frame was bad packet.
o Removed if_printf that prints DMA map failure. This type of
message shouldn't be used in fast path of driver.
o Reduce number of allowed TX buffer fragments to 6 from 7. A TX
descriptor allows 7 fragments of a frame. However the CMZ field
of descriptor has just 3bits and the controller wants to see
fragment + 1 in the field. So if we have 7 fragments the field
value would be 0 which seems to cause unexpected results under
certain conditions. This change should fix occasional TX hang
observed on vge(4).
o Simplify vge_stat_locked() and add number of available TX
descriptor check.
o vge(4) controllers lack padding short frames. Make sure to fill
zero for the padded bytes. This closes unintended information
disclosure.
o Don't set VGE_TDCTL_JUMBO flag. Datasheet is not clear whether
this bit should be set by driver or write-back status bit after
transmission. At least vendor's driver does not set this bit so
remove it. Without this bit vge(4) still can send jumbo frames.
o Don't start driver when vge(4) know there are not enough RX
buffers.
o Remove volatile keyword in RX descriptor structure. This should
be handled by bus_dma(9).
o Collapse two 16bits member of TX/RX descriptor into single 32bits
member.
o Reduce number of RX descriptors to 252 from 256. The
VGE_RXDESCNUM is 16bits register but only lower 8bits are valid.
So the maximum number of RX descriptors would be 255. However
the number of should be multiple of 4 as controller wants to
update 4 RX descriptors at a time. This limits the maximum
number of RX descriptor to be 252.
Tested by: Dewayne Geraghty (dewayne.geraghty <> heuristicsystems dot com dot au)
Carey Jones (m.carey.jones <> gmail dot com)
Yoshiaki Kasahara (kasahara <> nc dor kyushu-u dot ac dotjp)
Xin LI [Mon, 14 Dec 2009 16:54:39 +0000 (16:54 +0000)]
Style improvements:
- Sort function prototypes;
- Apply static on all function bodies. To quote bde@:
> It is a good obfuscation to declare functions as static only in the
> prototype, so that you can't see the static for the actual function.
> The reverse obfuscation (with static only in the function definition)
> would make more sense, but is a constraint error.
Luigi Rizzo [Mon, 14 Dec 2009 12:23:46 +0000 (12:23 +0000)]
Properly fix callout handling by putting all the per-cpu info in
struct callout_cpu. From the comment in the file:
+ * There is one struct callout_cpu per cpu, holding all relevant
+ * state for the callout processing thread on the individual CPU.
+ * In particular:
+ * cc_ticks is incremented once per tick in callout_cpu().
+ * It tracks the global 'ticks' but in a way that the individual
+ * threads should not worry about races in the order in which
+ * hardclock() and hardclock_cpu() run on the various CPUs.
+ * cc_softclock is advanced in callout_cpu() to point to the
+ * first entry in cc_callwheel that may need handling. In turn,
+ * a softclock() is scheduled so it can serve the various entries i
+ * such that cc_softclock <= i <= cc_ticks .
Together with a smaller patch committed in september, this fixes a
bug that affects 8.0 with apps that rely on callouts to fire exactly
in the number of ticks specified (qemu among them).
Right now, callouts in 8.0 fire one tick late.
This was discussed in september with JeffR and jhb
Marcel Moolenaar [Mon, 14 Dec 2009 01:26:01 +0000 (01:26 +0000)]
Work-around a race condition on ia64 while unlocking a contested lock.
The race condition is believed to be in UMTX_OP_MUTEX_WAKE. On ia64,
we simply go to the kernel to unlock.
The big question is why this is only a race condition on ia64...
Robert Watson [Sun, 13 Dec 2009 20:27:46 +0000 (20:27 +0000)]
Add Mark Heily's libkqueue test suite as a general kqueue test suite to
tools/regression. It tests a number of aspects of kqueue behavior,
although not all currently pass (possibly bugs in the test suite?).
Submitted by: Mark Heily <mark at heily.com>
Obtained from: svn://mark.heily.com/libkqueue/trunk/test (r114)
Marius Strobl [Sun, 13 Dec 2009 18:42:06 +0000 (18:42 +0000)]
Properly support M5229 revision 0xc7 and 0xc8:
- These revisions no longer have cable detection capability.
- The UDMA support bit of register 0x4b has been dropped without an
replacement.
- According to Linux it's crucial for working ATAPI DMA support to
also set the reserved bit 1 of regsiter 0x53 with these revisions.
Marius Strobl [Sun, 13 Dec 2009 18:26:19 +0000 (18:26 +0000)]
Specify the capability and media bits of the capabilities page in
native, i.e. big-endian, format and convert as appropriate like we
also do with the multibyte fields of the other pages. This fixes
the output of acd_describe() to match reality on big-endian machines
without breaking it on little-endian ones. While at it, also convert
the remaining multibyte fields of the pages read although they are
currently unused for consistency and in order to prevent possible
similar bugs in the future.
Hiroki Sato [Sun, 13 Dec 2009 15:19:01 +0000 (15:19 +0000)]
- Fix main() to use two separated sockets for the two transports
when "-P port" is specified. It invoked svc{tcp,udp}_create()
for only one of the two allocated sockets, and prevented the
TCP socket from binding to as the result.
- Use TI-RPC functions and handle sockets in a
transport-independent way. At this moment only AF_INET ("udp"
and "tcp") is supported because others need rewrites of ACL
handling and yp clients.
- Add '-h addr' to specify addresses to bind to.
- Convert _msgout() to use variable argument lists and remove
asprintf() for error strings.
Bjoern A. Zeeb [Sun, 13 Dec 2009 13:57:32 +0000 (13:57 +0000)]
Throughout the network stack we have a few places of
if (jailed(cred))
left. If you are running with a vnet (virtual network stack) those will
return true and defer you to classic IP-jails handling and thus things
will be "denied" or returned with an error.
Work around this problem by introducing another "jailed()" function,
jailed_without_vnet(), that also takes vnets into account, and permits
the calls, should the jail from the given cred have its own virtual
network stack.
We cannot change the classic jailed() call to do that, as it is used
outside the network stack as well.
Discussed with: julian, zec, jamie, rwatson (back in Sept)
MFC after: 5 days
Xin LI [Sun, 13 Dec 2009 04:50:11 +0000 (04:50 +0000)]
- Remove times.h from C programs that does not manipulate with time at
all.
- Remove pathnames.h from all but io.c since it's the only module that
used these definations.
Xin LI [Sun, 13 Dec 2009 03:29:05 +0000 (03:29 +0000)]
Explicitly say that this is an internal library which is intended to be
used within FreeBSD base system only, and discourage user applications
from using it. User applications should use the expat version from the
ports/package collection.
Reviewed by: simon (earlier version)
MFC after: 2 weeks
Marcel Moolenaar [Sun, 13 Dec 2009 01:20:32 +0000 (01:20 +0000)]
Add support for memory disk (md). The size of the memory disk
is determined by MD_IMAGE_SIZE. A file system can be embedded
into the loader with /sys/tools/embed_mfs.sh.
Note that md.c is not included when MD_IMAGE_SIZE is not set.
Marius Strobl [Sun, 13 Dec 2009 00:13:21 +0000 (00:13 +0000)]
Unbreak the ata_atapi() usage. Since r200171 the mode setting functions
get a ata_device type device passed instead of a ata_channel one, thus
ata_atapi() has to be adjusted accordingly.
Doug Barton [Sat, 12 Dec 2009 21:51:50 +0000 (21:51 +0000)]
Since the change to rc.subr in r198162 it's not necessary to specify
command in the rc.d script if we have a corresponding ${name}_program
entry, which we do for named.
Rename named_precmd to named_prestart to make it more clear and match
convention.
Move the command_args definition related to -u up into _prestart().
It (and the associated $named_uid value) are only used there, and
unlike required_* and pidfile don't need to be used until this stage.
Fix a silly bug that would only have affected people who were using
the new named_wait or named_auto_forward features, AND had set up an
rndc.conf file instead of using the automatically generated rndc.key.
For named_conf:
Add "-c $named_conf" to command_args if it's not set to the
default. If it is set to the default and we're using the base
BIND it's not necessary. If we're using BIND from the ports
the user is likely to have included it in _flags (due to long
necessity for doing so) so don't duplicate that if it's set.
Attilio Rao [Sat, 12 Dec 2009 21:31:07 +0000 (21:31 +0000)]
In current code, threads performing an interruptible sleep (on both
sxlock, via the sx_{s, x}lock_sig() interface, or plain lockmgr), will
leave the waiters flag on forcing the owner to do a wakeup even when if
the waiter queue is empty.
That operation may lead to a deadlock in the case of doing a fake wakeup
on the "preferred" (based on the wakeup algorithm) queue while the other
queue has real waiters on it, because nobody is going to wakeup the 2nd
queue waiters and they will sleep indefinitively.
A similar bug, is present, for lockmgr in the case the waiters are
sleeping with LK_SLEEPFAIL on. In this case, even if the waiters queue
is not empty, the waiters won't progress after being awake but they will
just fail, still not taking care of the 2nd queue waiters (as instead the
lock owned doing the wakeup would expect).
In order to fix this bug in a cheap way (without adding too much locking
and complicating too much the semantic) add a sleepqueue interface which
does report the actual number of waiters on a specified queue of a
waitchannel (sleepq_sleepcnt()) and use it in order to determine if the
exclusive waiters (or shared waiters) are actually present on the lockmgr
(or sx) before to give them precedence in the wakeup algorithm.
This fix alone, however doesn't solve the LK_SLEEPFAIL bug. In order to
cope with it, add the tracking of how many exclusive LK_SLEEPFAIL waiters
a lockmgr has and if all the waiters on the exclusive waiters queue are
LK_SLEEPFAIL just wake both queues.
The sleepq_sleepcnt() introduction and ABI breakage require
__FreeBSD_version bumping.
Jaakko Heinonen [Sat, 12 Dec 2009 18:18:46 +0000 (18:18 +0000)]
Don't read the newline character to line buffer because lines are passed
to wcscoll(3). Newline characters could cause incorrect results when
comparing lines.
Also, if an input line didn't contain a newline character, it was
omitted from the output. According to my interpretation, SUSv3 requires
that the newline is always printed.
Luigi Rizzo [Sat, 12 Dec 2009 15:49:28 +0000 (15:49 +0000)]
Make the code buildable in userland so it is easier to test it:
this requires a small reordering of headers and a few #defines to
map functions not available in userland.
Remove a useless #ifndef block at the beginning of the file.
Introduce (temporarily) rn_init2(), see the comment in the code
for the proper long term change.
Doug Barton [Sat, 12 Dec 2009 02:19:41 +0000 (02:19 +0000)]
Over time things that used to be files/directories/links can change
to something else. So add code to detect when things don't match and
give the user choices about how to fix it.
If we're using -P and something in the above check needs to be moved
we need to have the directory there for it, so create it at the
beginning and delete empty versions of it at the end.
The case where something used to be a file or link and now is supposed
to be a directory (e.g., /etc/security) is especially dangerous, so
make failure to install a necessary directory in $DESTDIR a fatal error.
Pyun YongHyeon [Sat, 12 Dec 2009 00:06:43 +0000 (00:06 +0000)]
Remove driver lock assertion in MII register access. This change
was made in r199543 to remove MTX_RECURSE. These routines can be
called in device attach phase(e.g. mii_phy_probe()) so checking
assertion here is not right as caller does not hold a driver lock.