Xin LI [Thu, 3 Aug 2006 03:34:36 +0000 (03:34 +0000)]
In DCE 1.1, the time_low value is defined as an unsigned 32-bit
integer. Presently, our implementation employs an approach that
converts the value to int64_t, then back to int, unfortunately,
this approach can be problematic when the the difference between
the two time_low is larger than 0x7fffffff, as the value is then
truncated to int.
To quote the test case from the original PR, the following is
true with the current implementation:
This commit adds a new intermediate variable which uses int64_t
to store the result of subtraction between the two time_low values,
which would not introduce different semantic of the MSB found in
time_low value.
PR: 83107
Submitted by: Steve Sears <sjs at acm dot org>
MFC After: 1 month
Robert Watson [Wed, 2 Aug 2006 18:37:44 +0000 (18:37 +0000)]
Move destroying kqueue state from above pru_detach to below it in
sofree(), as a number of protocols expect to be able to call
soisdisconnected() during detach. That may not be a good assumption,
but until I'm sure if it's a good assumption or not, allow it.
John Baldwin [Wed, 2 Aug 2006 17:41:58 +0000 (17:41 +0000)]
- Use m_getcl(), m_get(), and m_gethdr() rather than the older macros for
alloc'ing mbufs so that there is less error handling required.
- Go ahead and account for the data space in the first mbuf before entering
the loop to alloc more mbuf's. This simplifies the loop logic and avoids
confusing Coverity.
CID: 817
Reviewed by: sam
Tested by: pjd
Found by: Coverity Prevent (tm)
Robert Watson [Wed, 2 Aug 2006 16:22:34 +0000 (16:22 +0000)]
Remove call to soisdisconnected() in at_pcbdetach(): by the time the
socket is being detached, there are no consumers left worth notifying
about the disconnect.
Robert Watson [Wed, 2 Aug 2006 16:18:05 +0000 (16:18 +0000)]
Move soisdisconnected() in tcp_discardcb() to one of its calling contexts,
tcp_twstart(), but not to the other, tcp_detach(), as the socket is
already being torn down and therefore there are no listeners. This avoids
a panic if kqueue state is registered on the socket at close(), and
eliminates to XXX comments. There is one case remaining in which
tcp_discardcb() reaches up to the socket layer as part of the TCP host
cache, which would be good to avoid.
Reported by: Goran Gajic <ggajic at afrodita dot rcub dot bg dot ac dot yu>
John Baldwin [Wed, 2 Aug 2006 15:27:48 +0000 (15:27 +0000)]
Fix some bugs in the previous revision (1.419). Don't perform extra
vfs_rel() on the mountpoint if the MAC checks fail in kern_statfs() and
kern_fstatfs(). Similarly, don't perform an extra vfs_rel() if we get
a doomed vnode in kern_fstatfs(), and handle the case of mp being NULL
(for some doomed vnodes) by conditionalizing the vfs_rel() in
kern_fstatfs() on mp != NULL.
CID: 1517
Found by: Coverity Prevent (tm) (kern_fstatfs())
Pointy hat to: jhb
Robert Watson [Wed, 2 Aug 2006 14:30:58 +0000 (14:30 +0000)]
Remove now unneeded ENOTCONN clause from SOCK_DGRAM side of uipc_send():
we have to check it regardless of the target address, so don't check it
twice.
Bruce M Simpson [Wed, 2 Aug 2006 13:05:38 +0000 (13:05 +0000)]
Block a variety of signals which may afffect reboot(8), before killing
init(8), to avoid losing a race to them and dying before being able
to call reboot(2).
Pyun YongHyeon [Wed, 2 Aug 2006 05:28:52 +0000 (05:28 +0000)]
Replace hard-coded magic constants to system defined constants
(BUS_PROBE_DEFAULT, BUS_PROBE_GENERIC etc). These pseudo PHY
drivers were forgotten from the conversion due to the repo copy
to dc driver location.
Andrew Thompson [Wed, 2 Aug 2006 02:47:27 +0000 (02:47 +0000)]
Add a callback so we can notify the parent bridge that a port state change has
occured, we need to do this from a taskqueue to avoid a LOR with the if_bridge
mutex.
Robert Watson [Wed, 2 Aug 2006 00:45:27 +0000 (00:45 +0000)]
Move updated of 'numopensockets' from bottom of sodealloc() to the top,
eliminating a second set of identical mutex operations at the bottom.
This allows brief exceeding of the max sockets limit, but only by
sockets in the last stages of being torn down.
Maxim Sobolev [Tue, 1 Aug 2006 22:19:01 +0000 (22:19 +0000)]
Add device to access and modify Open Firmware NVRAM settings in
PowerPC-based Apple's machines and small utility to do it from
userland modelled after the similar utility in Darwin/OSX.
Alan Cox [Tue, 1 Aug 2006 19:06:06 +0000 (19:06 +0000)]
Complete the transition from pmap_page_protect() to pmap_remove_write().
Originally, I had adopted sparc64's name, pmap_clear_write(), for the
function that is now pmap_remove_write(). However, this function is more
like pmap_remove_all() than like pmap_clear_modify() or
pmap_clear_reference(), hence, the name change.
The higher-level rationale behind this change is described in
src/sys/amd64/amd64/pmap.c revision 1.567. The short version is that I'm
trying to clean up and fix our support for execute access.
Qing Li [Tue, 1 Aug 2006 17:28:10 +0000 (17:28 +0000)]
In vlan_input(), if the network interface does not perform h/w based
vlan tag processing, the code will use bcopy() to remove the vlan
tag field but the code copies 2 bytes too many, which essentially
overwrites the protocol type field.
Also, a tag value of -1 is generated for unrecognized interface type,
which would cause an invalid memory access in the vlans[] array.
In addition, removed a line of dead code and its associated comments.
Bill Paul [Tue, 1 Aug 2006 17:18:25 +0000 (17:18 +0000)]
Another small update to the re(4) driver:
- Change the workaround for the autopad/checksum offload bug so that
instead of lying about the map size, we actually create a properly
padded mbuf and map it as usual. The other trick works, but is ugly.
This approach also gives us a chance to zero the pad space to avoid
possibly leaking data.
- With the PCIe devices, it looks issuing a TX command while there's
already a transmission in progress doesn't have any effect. In other
words, if you send two packets in rapid succession, the second one may
end up sitting in the TX DMA ring until another transmit command is
issued later in the future. Basically, if re_txeof() sees that there
are still descriptors outstanding, it needs to manually resume the
TX DMA channel by issuing another TX command to make sure all
transmissions are flushed out. (The PCI devices seem to keep the
TX channel moving until all descriptors have been consumed. I'm not
sure why the PCIe devices behave differently.)
(You can see this issue if you do the following test: plug an re(4)
interface into another host via crossover cable, and from the other
host do 'ping -c 2 <host with re(4) NIC>' to prime the ARP cache,
then do 'ping -c 1 -s 1473 <host with re(4) NIC>'. You're supposed
to see two packets sent in response, but you may only see one. If
you do 'ping -c 1 -s 1473 <host with re(4) NIC>' again, you'll
see two packets, but one will be the missing fragment from the last
ping, followed by one of the fragments from this ping.)
- Add the PCI ID for the US Robotics 997902 NIC, which is based on
the RTL8169S.
- Add a tsleep() of 1 second in re_detach() after the interrupt handler
is disconnected. This should allow any tasks queued up by the ISR
to drain. Now, I know you're supposed to use taskqueue_drain() for
this, but something about the way taskqueue_drain() works with
taskqueue_fast queues doesn't seem quite right, and I refuse to be
tricked into fixing it.
John Baldwin [Tue, 1 Aug 2006 16:32:20 +0000 (16:32 +0000)]
Make system call modules a bit more robust:
- If we fail to register the system call during MOD_LOAD, then note that
so that we don't try to deregister it or invoke the chained event handler
during the subsequent MOD_UNLOAD event. Doing the deregister when the
register failed could result in trashing system call entries.
- Add a SI_SUB_SYSCALLS just before starting up init and use that to
register syscall modules instead of SI_SUB_DRIVERS. Registering system
calls as late as possible increases the chances that any other module
event handlers or SYSINITs in a module are executed to initialize the
data in a kld before a syscall dependent on that data is able to be
invoked.
John Baldwin [Tue, 1 Aug 2006 16:27:14 +0000 (16:27 +0000)]
- Add a new function nfsrv_destroycache() to tear down the server request
cache when unloading the nfsserver module. This fixes a memory leak and
a stale pointer.
- Use callout_drain() rather than callout_stop() when unloading the
nfsserver module.
John Baldwin [Tue, 1 Aug 2006 15:29:46 +0000 (15:29 +0000)]
Some cosmetic tweaks:
- Right justify 'pid' label.
- Move the uid column to the right 2 columns so that the 3 process id
columns (pid, ppid, pgrp) are grouped together.
- Expand the uid column to 5 chars.
- Don't indent the tid for multithreaded processes.
Andrew Gallatin [Tue, 1 Aug 2006 14:02:54 +0000 (14:02 +0000)]
- add read only sysctl to indicate if write-combining was enabled
- enable mxge_dummy_rdma() right after reset, and make sure to disable
when detaching the driver.
Robert Watson [Tue, 1 Aug 2006 10:30:26 +0000 (10:30 +0000)]
Reimplement socket buffer tear-down in sofree(): as the socket is no
longer referenced by other threads (hence our freeing it), we don't need
to set the can't send and can't receive flags, wake up the consumers,
perform two levels of locking, etc. Implement a fast-path teardown,
sbdestroy(), which flushes and releases each socket buffer. A manual
dom_dispose of the receive buffer is still required explicitly to GC
any in-flight file descriptors, etc, before flushing the buffer.
This results in a 9% UP performance improvement and 16% SMP performance
improvement on a tight loop of socket();close(); in micro-benchmarking,
but will likely also affect CPU-bound macro-benchmark performance.
Robert Watson [Mon, 31 Jul 2006 23:00:05 +0000 (23:00 +0000)]
Close a race that occurs when using sendto() to connect and send on a
UNIX domain socket at the same time as the remote host is closing the
new connections as quickly as they open. Since the connect() and
send() paths are non-atomic with respect to another, it is possible
for the second thread's close() call to disconnect the two sockets
as connect() returns, leading to the consumer (which plans to send())
with a NULL kernel pointer to its proposed peer. As a result, after
acquiring the UNIX domain socket subsystem lock, we need to revalidate
the connection pointers even though connect() has technically succeed,
and reurn an error to say that there's no connection on which to
perform the send.
We might want to rethink the specific errno number, perhaps ECONNRESET
would be better.
PR: 100940
Reported by: Young Hyun <youngh at caida dot org>
MFC after: 2 weeks
MFC note: Some adaptation will be required
inetd and telnetd are not included in the standard release
crunched floppies, but they can be included as options in
src/release/picobsd (omitted by default though.) Therefore
preserve the RELEASE_CRUNCH knob in their Makefiles, but
tell its real purpose in a comment.
David E. O'Brien [Mon, 31 Jul 2006 15:44:13 +0000 (15:44 +0000)]
Rather than print out a nice error message giving details sufficent to fix
a 'ufs_dirbad' and then panicing (making it very hard to see the details),
put them in the panic message itself.
Actually skip over undocumented options with "continue"
to avoid artifacts in the manpage generated. Previously
an orphaned paragraph on dependencies of such an option
would appear.
Bill Paul [Sun, 30 Jul 2006 23:25:21 +0000 (23:25 +0000)]
Fix the following bugs in re(4)
- Correct the PCI ID for the 8169SC/8110SC in the device list (I added
the macro for it to if_rlreg.h before, but forgot to use it.)
- Remove the extra interrupt spinlock I added previously. After giving it
some more thought, it's not really needed.
- Work around a hardware bug in some versions of the 8169. When sending
very small IP datagrams with checksum offload enabled, a conflict can
occur between the TX autopadding feature and the hardware checksumming
that can corrupt the outbound packet. This is the reason that checksum
offload sometimes breaks NFS: if you're using NFS over UDP, and you're
very unlucky, you might find yourself doing a fragmented NFS write where
the last fragment is smaller than the minimum ethernet frame size (60
bytes). (It's rare, but if you keep NFS running long enough it'll
happen.) If checksum offload is enabled, the chip will have to both
autopad the fragment and calculate its checksum header. This confuses
some revs of the 8169, causing the packet that appears on the wire
to be corrupted. (The IP addresses and the checksum field are mangled.)
This will cause the NFS write to fail. Unfortunately, when NFS retries,
it sends the same write request over and over again, and it keeps
failing, so NFS stays wedged.
(A simple way to provoke the failure is to connect the failing system
to a network with a known good machine and do "ping -s 1473 <badhost>"
from the good system. The ping will fail.)
Someone had previously worked around this using the heavy-handed
approahch of just disabling checksum offload. The correct fix is to
manually pad short frames where the TCP/IP stack has requested
checksum offloading. This allows us to have checksum offload turned
on by default but still let NFS work right.
- Not a bug, but change the ID strings for devices with hardware rev
0x30000000 and 0x38000000 to both be 8168B/8111B. According to RealTek,
they're both the same device, but 0x30000000 is an earlier silicon spin.
This was missed the first time around since eng_padlock.c was not part
of OpenSSL 0.9.7e and therefor did not have the v0_9_7e CVS tag used
during original resolve of conflicts.
Noticed by: Antoine Brodin <antoine.brodin@laposte.net>
Stephen McKay [Sun, 30 Jul 2006 12:54:37 +0000 (12:54 +0000)]
This script should probably have an enabling variable since it can produce
surprising results. For now, at least make it safe to boot the default
kernel when /boot/kernel is already a symlink.
Tim Kientzle [Sun, 30 Jul 2006 00:29:01 +0000 (00:29 +0000)]
Use 'skip' when ignoring data in tar archives. This dramatically
increases performance when extracting a single entry from a large
uncompressed archive, especially on slow devices such as USB hard
drives.
Requires a number of changes:
* New archive_read_open2() supports a 'skip' client function
* Old archive_read_open() is implemented as a wrapper now, to
continue supporting the old API/ABI.
* _read_open_fd and _read_open_file sprout new 'skip' functions.
* compression layer gets a new 'skip' operation.
* compression_none passes skip requests through to client.
* compression_{gzip,bzip2,compress} simply ignore skip requests.
Thanks to: Benjamin Lutz, who designed and implemented the whole thing.
I'm just committing it. ;-)
TODO: Need to update the documentation a little bit.
Tim Kientzle [Sat, 29 Jul 2006 23:51:10 +0000 (23:51 +0000)]
Don't mention 'pax' in the context of POSIX-1988, since
pax wasn't introduced until the 1993 (?) revision.
(I need to double-check when pax was introduced and
clarify some of the history here. In particular,
I should explain that the 'pax' standard now owns the
'ustar' format spec.)
Add a new sysctl, hw.acpi.handle_reboot. If set, acpi will attempt to
perform the reboot action via the reset register instead of our legacy
method. Default is 0 (use legacy). This is needed because some systems
hang on reboot even though they claim to support the reset register.
Link kldxref(8) static on PowerPC to work around a SIGSEGV that
cannot easily be analyzed due to there being no debugger yet.
The SIGSEGV only happens when kldxref is linked shared.
Since kldxref(8) is needed for a release build, having it not
dump core is important.
Change maketempfile() to return a FILE* so as to eliminate the fopen()
that immediately follows the only call to it. maketempfile() uses
mkstemp(), so the temporary file has already been opened and using
fopen() again just opens the file twice. This also fixes the invalid
mode used on the fopen().
While here, assign NULL to fxref after fclose() because we test for
fxref being !NULL to determine if we have the (temporary) hints file
open.
Remove sio(4) and related options from MI files to amd64, i386
and pc98 MD files. Remove nodevice and nooption lines specific
to sio(4) from ia64, powerpc and sparc64 NOTES. There were no
such lines for arm yet.
sio(4) is usable on less than half the platforms, not counting
a future mips platform. Its presence in MI files is therefore
increasingly becoming a burden.