rwatson [Wed, 15 Nov 2006 12:43:45 +0000 (12:43 +0000)]
Add a short regression test to try to exercise races in the non-atomic
nature of implied connect via sendto(). Oddly, uipc_usrreq.c implements
this for stream sockets, but doesn't set the flag in its protocol
definition so that it can actually be used. As such, the stream test is
implemented but doesn't run for now.
kib [Wed, 15 Nov 2006 11:04:37 +0000 (11:04 +0000)]
Group pid and parent are shared in a case of CLONE_THREAD not CLONE_VM.
This fix lets clone02 LTP test pass with 2.6 emulation. In reality 99%
of the cases are that CLONE_VM and CLONE_THREAD are both set so it
seemed to work.
kib [Wed, 15 Nov 2006 10:01:06 +0000 (10:01 +0000)]
In rev 1.188 of linux_misc.c the added check for valid options ommited
__WCLONE. This fixes it thus fixing skype/teamspeak to not keep zombies
after exit.
Submitted by: rdivacky
Reported by: Bakul Shah (bakul at bitblocks com)
kientzle [Wed, 15 Nov 2006 05:33:38 +0000 (05:33 +0000)]
Add archive_write_open_filename()/archive_read_open_filename() as
synonyms for archive_write_open_file()/archive_read_open_file().
The new names are much clearer.
kientzle [Wed, 15 Nov 2006 05:14:20 +0000 (05:14 +0000)]
Change the internal API for writing data to an entry; make the
internal format-specific functions return the same as the public
function, so that the public API layer doesn't have to guess the
correct return value. This addresses an obscure problem that occurs
when someone tries to write more data than the size of the entry (as
indicated in the entry header). In this case, the return value from
archive_write_data() was incorrect, reflecting the requested write
rather than the amount actually written.
ambrisko [Tue, 14 Nov 2006 16:48:00 +0000 (16:48 +0000)]
- Add in FreeBSD native ioctl that models the Linux version.
- Add a translation so the Linux ioctl's don't conflict with
the FreeBSD definition.
- Assume Linux 32bit emulation on amd64.
This was tested on i386 and amd64 with the 32bit Linux MegaCli.
Eventually we should do a 32bit native FreeBSD translation app.
mjacob [Tue, 14 Nov 2006 08:45:48 +0000 (08:45 +0000)]
Push things closer to path failover by implementing loop down and
gone device timers and zombie state entries. There are tunables
that can be used to select a number of parameters.
loop_down_limit - how long to wait for loop to come back up before
declaring
all devices dead (default 300 seconds)
gone_device_time- how long to wait for a device that has appeared
to leave the loop or fabric to reappear (default 30 seconds)
Internal tunables include (which should be externalized):
quick_boot_time- how long to wait when booting for loop to come up
change_is_bad- whether or not to accept devices with the same
WWNN/WWPN that reappear at a different PortID as being the 'same'
device.
Keen students of some of the subtle issues here will ask how
one can keep devices from being re-accepted at all (the answer
is to set a gone_device_time to zero- that effectively would
be the same thing).
jhb [Mon, 13 Nov 2006 22:23:34 +0000 (22:23 +0000)]
MD support for PCI Message Signalled Interrupts on amd64 and i386:
- Add a new apic_alloc_vectors() method to the local APIC support code
to allocate N contiguous IDT vectors (aligned on a M >= N boundary).
This function is used to allocate IDT vectors for a group of MSI
messages.
- Add MSI and MSI-X PICs. The PIC code here provides methods to manage
edge-triggered MSI messages as x86 interrupt sources. In addition to
the PIC methods, msi.c also includes methods to allocate and release
MSI and MSI-X messages. For x86, we allow for up to 128 different
MSI IRQs starting at IRQ 256 (IRQs 0-15 are reserved for ISA IRQs,
16-254 for APIC PCI IRQs, and IRQ 255 is reserved).
- Add pcib_(alloc|release)_msi[x]() methods to the MD x86 PCI bridge
drivers to bubble the request up to the nexus driver.
- Add pcib_(alloc|release)_msi[x]() methods to the x86 nexus drivers that
ask the MSI PIC code to allocate resources and IDT vectors.
jhb [Mon, 13 Nov 2006 21:47:30 +0000 (21:47 +0000)]
First cut at MI support for PCI Message Signalled Interrupts (MSI):
- Add 3 new functions to the pci_if interface along with suitable wrappers
to provide the device driver visible API:
- pci_alloc_msi(dev, int *count) backed by PCI_ALLOC_MSI(). '*count'
here is an in and out parameter. The driver stores the desired number
of messages in '*count' before calling the function. On success,
'*count' holds the number of messages allocated to the device. Also on
success, the driver can access the messages as SYS_RES_IRQ resources
starting at rid 1. Note that the legacy INTx interrupt resource will
not be available when using MSI. Note that this function will allocate
either MSI or MSI-X messages depending on the devices capabilities and
the 'hw.pci.enable_msix' and 'hw.pci.enable_msi' tunables. Also note
that the driver should activate the memory resource that holds the
MSI-X table and pending bit array (PBA) before calling this function
if the device supports MSI-X.
- pci_release_msi(dev) backed by PCI_RELEASE_MSI(). This function
releases the messages allocated for this device. All of the
SYS_RES_IRQ resources need to be released for this function to succeed.
- pci_msi_count(dev) backed by PCI_MSI_COUNT(). This function returns
the maximum number of MSI or MSI-X messages supported by this device.
MSI-X is preferred if present, but this function will honor the
'hw.pci.enable_msix' and 'hw.pci.enable_msi' tunables. This function
should return the largest value that pci_alloc_msi() can return
(assuming the MD code is able to allocate sufficient backing resources
for all of the messages).
- Add default implementations for these 3 methods to the pci_driver generic
PCI bus driver. (The various other PCI bus drivers such as for ACPI and
OFW will inherit these default implementations.) This default
implementation depends on 4 new pcib_if methods that bubble up through
the PCI bridges to the MD code to allocate IRQ values and perform any
needed MD setup code needed:
- PCIB_ALLOC_MSI() attempts to allocate a group of MSI messages.
- PCIB_RELEASE_MSI() releases a group of MSI messages.
- PCIB_ALLOC_MSIX() attempts to allocate a single MSI-X message.
- PCIB_RELEASE_MSIX() releases a single MSI-X message.
- Add default implementations for these 4 methods that just pass the
request up to the parent bus's parent bridge driver and use the
default implementation in the various MI PCI bridge drivers.
- Add MI functions for use by MD code when managing MSI and MSI-X
interrupts:
- pci_enable_msi(dev, address, data) programs the MSI capability address
and data registers for a group of MSI messages
- pci_enable_msix(dev, index, address, data) initializes a single MSI-X
message in the MSI-X table
- pci_mask_msix(dev, index) masks a single MSI-X message
- pci_unmask_msix(dev, index) unmasks a single MSI-X message
- pci_pending_msix(dev, index) returns true if the specified MSI-X
message is currently pending
- Save the MSI capability address and data registers in the pci_cfgreg
block in a PCI devices ivars and restore the values when a device is
resumed. Note that the MSI-X table is not currently restored during
resume.
- Add constants for MSI-X register offsets and fields.
- Record interesting data about any MSI-X capability blocks we come
across in the pci_cfgreg block in the ivars for PCI devices.
Tested on: em (i386, MSI), bce (amd64/i386, MSI), mpt (amd64, MSI-X)
Reviewed by: scottl, grehan, jfv
MFC after: 2 months
jhb [Mon, 13 Nov 2006 21:14:54 +0000 (21:14 +0000)]
Various fixes:
- Remove an extra entry from the array for 0x0f prefixed instruction groups.
This fixes decoding of instructions where the second opcode >= 0x80.
- Add support for the 64-bit immediate mov instructions.
- When short_addr is enabled, don't parse the modr/m byte for a 16-bit
address, but as a 32-bit address.
- Support %rip relative addressing.
- Don't print a displacement of 0 if there is a base or index register.
ru [Mon, 13 Nov 2006 20:33:54 +0000 (20:33 +0000)]
Fix NKPT comments to match reality. Note that the current value
of NKPT is no longer enough to run amd64 with 16G of RAM, as it
doesn't have space for mapping a kernel (16M kernel would require
additionally 8 page tables).
bz [Mon, 13 Nov 2006 19:07:32 +0000 (19:07 +0000)]
Add SCTP as a known upper layer protocol over v6.
We are not yet aware of the protocol internals but this way
SCTP traffic over v6 will not be discarded.
Reported by: Peter Lei via rrs
Tested by: Peter Lei <peterlei cisco.com>
ru [Mon, 13 Nov 2006 09:46:16 +0000 (09:46 +0000)]
Fix minor formatting issues:
- make document title match filename;
- remove hard sentence breaks, whitespace at EOL, and double whitespace;
- sort SEE ALSO xrefs, adding missing section numbers;
- fix a misspelled macro name.
jkoshy [Mon, 13 Nov 2006 04:28:29 +0000 (04:28 +0000)]
Attempt to improve application portability by marking `struct ar_hdr'
as `packed'.
The C standard leaves the alignment of individual members of a C
struct upto the implementation, so pedantically speaking portable
code cannot assume that the layout of a `struct ar_hdr' in memory
will match its layout in a file. Using a __packed attribute
declaration forces file and memory layouts for this structure to
match.
kientzle [Mon, 13 Nov 2006 00:26:45 +0000 (00:26 +0000)]
Minor cleanup of the standard read/write I/O modules:
* Use public API, don't access struct archive directly. (People should be able to copy these into their applications as a template for custom I/O callbacks.)
* Set "skip" only for regular files. ("skip" allows the low-level library to catch attempts to add an archive to itself or extract over itself.)
* Simplify the write_open functions by just calling stat() at the beginning. Somehow, these functions had acquired some complex logic that tried to avoid the stat() call but never succeeded.
kientzle [Sun, 12 Nov 2006 23:45:40 +0000 (23:45 +0000)]
Correctly handle writing very large blocks (>1M) through to a disk
file. This doesn't happen in normal use, because the file I/O and
decompression layers only pass through smaller blocks. It can happen
with custom read functions that block I/O in larger blocks.
alc [Sun, 12 Nov 2006 21:48:34 +0000 (21:48 +0000)]
Make pmap_enter() responsible for setting PG_WRITEABLE instead
of its caller. (As a beneficial side-effect, a high-contention
acquisition of the page queues lock in vm_fault() is eliminated.)
andre [Sun, 12 Nov 2006 20:57:00 +0000 (20:57 +0000)]
In kern_sendfile() fix the calculation of sbytes (the total number of bytes
written to the socket). The rewrite in revision 1.240 got confused by the
FreeBSD 4.x bug compatibility code.
For some reason lighttpd, that was used for testing the new sendfile code,
was not affected by the problem but apache and others using headers/trailers
in the sendfile call received incorrect sbytes values after return from non-
blocking sockets. This then lead to restarts with wrong offsets and thus
mixed up file contents when the socket was writeable again. All programs
not using headers/trailers, like ftpd, were not affected by the bug.
keramida [Sun, 12 Nov 2006 19:03:39 +0000 (19:03 +0000)]
In revision 1.14 I broke the -4 and -6 options of sockstat(1).
Using either one of the two would result in an empty protos[]
array, and no sockets were actually listed:
% sockstat -4
USER COMMAND PID FD PROTO LOCAL ADDRESS FOREIGN ADDRESS
% sockstat -6
USER COMMAND PID FD PROTO LOCAL ADDRESS FOREIGN ADDRESS
%
Fix this bug by tweaking appropriately the logic of handling opt_4,
opt_6, opt_u and protos_defined.
jkoshy [Sun, 12 Nov 2006 18:43:25 +0000 (18:43 +0000)]
- Replace the use of DPSRCS with something more appropriate. DPSRCS
is for when you need something in ".depend", but not compiled in.
- Style fixes: Spell ${.OBJDIR} as ".".
- Neaten a comment.
bz [Sun, 12 Nov 2006 18:38:07 +0000 (18:38 +0000)]
Remove some comments about NetBSD. This in on FreeBSD and we do not
want to confuse people at the very beginning.
Sync TOC/paragraph numbers in the text.
Requested by: Benedikt Stockebrand during his talk at EuroBSDCon 2006
Reviewed by: gnn
ume [Sun, 12 Nov 2006 17:36:58 +0000 (17:36 +0000)]
Teach an IPV6CP to pppd(8).
The eui64.[ch] and ipv6cp.[ch] were taken from ppp-2.3.11.
However, our stock pppd(8) doesn't provide option_t nor some
utility functions. So, I made some hacks to adjust to our
stock pppd(8).
The sys_bsd.c part was taken from NetBSD with some
modifications to adjust to our stock pppd(8).
rrs [Sat, 11 Nov 2006 22:44:12 +0000 (22:44 +0000)]
In a true restart case, the send_lock was
not being aquired. This meant that when we cleanup
the outbound we may have one in transit to be
added with the old sequence number. This is bad
since then we loose a message :(
Also the report_outbound needed to have the right
lock when its called which it did not.. I added
the lock with of course a flag since we want to
have the lock before we call it in the restart
case.
This also fixed the FIX ME case where, in the cookie
collision case, we mark for retransmit any that
were bundled with the cookie that was dropped.
This also means changes to the output routine
so we can assure getting the COOKIE-ACK sent
BEFORE we retransmit the Data.
kris [Sat, 11 Nov 2006 22:24:10 +0000 (22:24 +0000)]
Request pre-commit review of BSD.{local,x11*}.dist by portmgr, since these
files interface with ports and we have policies for how/when they should be
updated.
keramida [Sat, 11 Nov 2006 22:11:54 +0000 (22:11 +0000)]
Add support for filtering sockets by protocol type. The default
behavior of sockstat(1) will still be to show "udp", "tcp" and
"divert" protocols, but we can now provide a (comma-separated)
list of protocols, as in:
% sockstat -P tcp
to list only TCP sockets, or we can filter more than one protocol
by separating the protocol names with a comma:
% sockstat -P tcp,udp
Protocol names are parsed with getprotobyname(3), so any protocol
whose name is listed in `/etc/protocols' should work fine.
Submitted by: Josh Carroll <josh.carroll@psualum.com>
Approved by: des
rrs [Sat, 11 Nov 2006 15:59:01 +0000 (15:59 +0000)]
Turns out we would reset the TSN seq counter during
a colliding INIT. This if fine except when we have
data outstanding... we basically reset it to the
previous value it was.. so then we end up assigning
the same TSN to two different data chunks.
This patch:
1) Finds a missing lock for when we change the stream
numbers during COOKIE and INIT-ACK processing.. we
were NOT locking the send_buffer.. which COULD cause
problems (found by inspection looking for <2>)
2) Fixes a case during a colliding INIT where we incorrectly
reset the sending Sequence thus in some cases duplicately
assigning a TSN.
3) Additional enhancments to logging so we can see strm/tsn in
the receiver AND new tracking to watch what the sender
is doing with TSN and STRM seq's.