Randall Stewart [Fri, 3 Nov 2006 15:23:16 +0000 (15:23 +0000)]
Ok, here it is, we finally add SCTP to current. Note that this
work is not just mine, but it is also the works of Peter Lei
and Michael Tuexen. They both are my two key other developers
working on the project.. and they need ata-boy's too:
****
peterlei@cisco.com
tuexen@fh-muenster.de
****
I did do a make sysent which updated the
syscall's and sysproto.. I hope that is correct... without
it you don't build since we have new syscalls for SCTP :-0
So go out and look at the NOTES, add
option SCTP (make sure inet and inet6 are present too)
and play with SCTP.
I will see about comitting some test tools I have after I
figure out where I should place them. I also have a
lib (libsctp.a) that adds some of the missing socketapi
functions that I need to put into lib's.. I will talk
to George about this :-)
There may still be some 64 bit issues in here, none of
us have a 64 bit processor to test with yet.. Michael
may have a MAC but thats another beast too..
If you have a mac and want to use SCTP contact Michael
he maintains a web site with a loadable module with
this code :-)
Ruslan Ermilov [Fri, 3 Nov 2006 12:02:24 +0000 (12:02 +0000)]
Remove the -C option as it does more harm than good. To be fully
compatible, it would have to (at least):
- support the "compat-compat" -T option,
- *not* support the -l, -O, and -v options,
- default to soft updates being disabled.
Worse, the compatibility mode makes it impossible to mount_mfs(8)
a file system from fstab(5) with soft updates disabled (-S). [1]
Now, the only difference when called as "mount_mfs" or "mfs" (as
opposed to "mdmfs") is that the file mode of the mount point is
set by default to 01777. All options available to mdmfs(8) are
also available to mount_mfs(8); the -C option is still recognized
but ignored for backward compatibility.
Warner Losh [Fri, 3 Nov 2006 07:39:37 +0000 (07:39 +0000)]
MFp4:
o Fix the packet statistics
o Make sure we set the FD bit when in full duplex
o Improve TX side efficency by eliminating a data copy for
unfragmented mbufs (the hardware can't do s/g).
o Minor busdma pedantry
o better comments in some places, more XXX in others
o Minor style nits.
This solves a problem I was seeing where I'd get no ethernet when not
booting with a NFS root. Well, unless I unplugged the cable and
plugged it back in first so I'd get the same up down up messages I get
for NFS root...
Thanks to sam and scottl for suggestions on making this driver more
efficient through better use of approrpiate APIs.
John Birrell [Fri, 3 Nov 2006 06:23:53 +0000 (06:23 +0000)]
Always init the console before trying to cnadd it to
avoid the case where the console name isn't set and
cnadd wants to use printf to complain about it.
Nick Hibma [Thu, 2 Nov 2006 20:43:20 +0000 (20:43 +0000)]
Only use the filename part of the kernel configuration file as an argument to
KERNCONF after the file has been copied to the sys/${ARCH}/conf directory. This
allows the use of one kernel config file for multiple images. E.g.:
Andre Oppermann [Thu, 2 Nov 2006 17:45:28 +0000 (17:45 +0000)]
Use the improved m_uiotombuf() function instead of home grown sosend_copyin()
to do the userland to kernel copying in sosend_generic() and sosend_dgram().
sosend_copyin() is retained for ZERO_COPY_SOCKETS which are not yet supported
by m_uiotombuf().
Benchmaring shows significant improvements (95% confidence):
66% less cpu (or 2.9 times better) with new sosend vs. old sosend (non-TSO)
65% less cpu (or 2.8 times better) with new sosend vs. old sosend (TSO)
(Sender AMD Opteron 852 (2.6GHz) with em(4) PCI-X-133 interface and receiver
DELL Poweredge SC1425 P-IV Xeon 3.2GHz with em(4) LOM connected back to back
at 1000Base-TX full duplex.)
Andre Oppermann [Thu, 2 Nov 2006 17:37:22 +0000 (17:37 +0000)]
Rename m_getm() to m_getm2() and rewrite it to allocate up to page sized
mbuf clusters. Add a flags parameter to accept M_PKTHDR and M_EOR mbuf
chain flags. Provide compatibility macro for m_getm() calling m_getm2()
with M_PKTHDR set.
Rewrite m_uiotombuf() to use m_getm2() for mbuf allocation and do the
uiomove() in a tight loop over the mbuf chain. Add a flags parameter to
accept mbuf flags to be passed to m_getm2(). Adjust all callers for the
extra parameter.
Ruslan Ermilov [Thu, 2 Nov 2006 17:28:38 +0000 (17:28 +0000)]
Revert the last change. Masking only 2 MSBs of the virtual address
to get the physical address doesn't work for all values of KVA_PAGES,
while masking 8 MSBs works for all values of KVA_PAGES that are
multiple of 4 for non-PAE and 8 for PAE. (This leaves us limited
with 12MB for non-PAE kernels and 14MB for PAE kernels.)
To get things right, we'd need to subtract the KERNBASE from the
virtual address (but KERNBASE is not easy to figure out from here),
or have physical addresses set properly in the ELF headers.
Andre Oppermann [Thu, 2 Nov 2006 16:53:26 +0000 (16:53 +0000)]
Rewrite kern_sendfile() to work in two loops, the inner which turns as many
VM pages into mbufs as it can -- up to the free send socket buffer space.
The outer loop then drops the whole mbuf chain into the send socket buffer,
calls tcp_output() on it and then waits until 50% of the socket buffer are
free again to repeat the cycle. This way tcp_output() gets the full amount
of data to work with and can issue up to 64K sends for TSO to chop up in
the network adapter without using any CPU cycles. Thus it gets very efficient
especially with the readahead the VM and I/O system do.
The previous sendfile(2) code simply looped over the file, turned each 4K
page into an mbuf and sent it off. This had the effect that TSO could only
generate 2 packets per send instead of up to 44 at its maximum of 64K.
Add experimental SF_MNOWAIT flag to sendfile(2) to return ENOMEM instead of
sleeping on mbuf allocation failures.
Benchmarking shows significant improvements (95% confidence):
45% less cpu (or 1.81 times better) with new sendfile vs. old sendfile (non-TSO)
83% less cpu (or 5.7 times better) with new sendfile vs. old sendfile (TSO)
(Sender AMD Opteron 852 (2.6GHz) with em(4) PCI-X-133 interface and receiver
DELL Poweredge SC1425 P-IV Xeon 3.2GHz with em(4) LOM connected back to back
at 1000Base-TX full duplex.)
On trap while inside ddb, the trap handler calls kdb_reenter(), that
longjmp to the default context. As result, "alltrace" command may
be prematurely terminated (without error message). This is happens,
for instance, when system is low on memory and referenced page in
kernel-mode thread stack is swapped out.
Protect "alltrace" against termination on trap by setting temporary
kdb_jmpbuf context.
- Use g_duplicate_bio() instead of g_clone_bio(), so there memory is
allocated with M_WAITOK flag.
- Check 'buf' instead of 'error' so Prevent is not confused.
CID: 1562, 1563
Found by: Coverity Prevent analysis tool
Andrew Thompson [Thu, 2 Nov 2006 08:44:19 +0000 (08:44 +0000)]
Do not test all the conditions if the port is already forwarding. Also print a
debug message if the port is agreed as it is an important condition of the
protocol.
Sync the EFI headers with version 1.10.14.62 of the Intel sample EFI
implementation. This re-introduces C99 style comments that previously
were replaced by original C comments.
Extend struct devdesc with a unit field, called d_unit. Promote the
device (kind) specific unit field to the common field. This change
allows a future version of libefi to work without requiring anything
more than what is defined in struct devdesc and as such makes it
possible to compile said version of libefi for different platforms
without requiring that those platforms have identical derivatives
of struct devdesc.
Don't unconditionally compile-in the bcache code. It's only used on
i386/amd64 and pc98. Remove useless calls to bcache_init() from the
ia64 and sparc64 loaders, as well as from the OFW common code.
o Make sure to clear f->f_devdata if d_dev->dv_open() fails. It
would otherwise cause devclose() to free() the memory again.
o Refactor devopen() so that it's more readable.
Marius Strobl [Thu, 2 Nov 2006 00:01:15 +0000 (00:01 +0000)]
- In sunkbd_probe_keyboard() don't bother to determine the keyboard layout
as we have no use for that info. Instead let this function return the
keyboard ID and verify at its invocation in sunkbd_configure() that we're
talking to a Sun type 4/5/6 keyboard, i.e. a keyboard supported by this
driver.
- Add an option SUNKBD_EMULATE_ATKBD whose code is based on the respective
code in ukbd(4) and like UKBD_EMULATE_ATSCANCODE causes this driver to
emit AT keyboard/KB_101 compatible scan codes in K_RAW mode as assumed by
kbdmux(4). Unlike UKBD_EMULATE_ATSCANCODE, SUNKBD_EMULATE_ATKBD also
triggers the use of AT keyboard maps and thus allows to use the map files
in share/syscons/keymaps with this driver at the cost of an additional
translation (in ukbd(4) this just is the way of operation).
- Implement an option SUNKBD_DFLT_KEYMAP, which like the equivalent options
of the other keyboard drivers allows to specify the default in-kernel
keyboard map. For obvious reasons this made to only work when also using
SUNKBD_EMULATE_ATKBD.
- Implement sunkbd_check(), sunkbd_check_char() and sunkbd_clear_state(),
which are also required for interoperability with kbdmux(4).
- Implement K_CODE mode and FreeBSD keypad compose.
- As a minor hack define KBD_DFLT_KEYMAP also in the !SUNKBD_EMULATE_ATKBD
case so we can obtain fkey_tab from <dev/kbd/kbdtables.h> rather than
having to duplicate it and #ifdef some more code.
- Don't use the TX-buffer for writing the two command bytes for setting the
keyboard LEDs as this consequently requires a hardware FIFO that is at
least two bytes in depth, which the NMOS-variant of the Zilog SCCs doesn't
have. Thus use an inlined version of uart_putc() to consecutively write
the command bytes (a cleaner approach would be to do this via the soft
interrupt handler but that variant wouldn't work while in ddb(4)). [1]
- Fix some minor style(9) bugs.
Now, that we have gjournal in the tree add possibility to configure
gmirror and graid3 in a way that it is not resynchronized after a
power failure or system crash.
It is safe when gjournal is running on top of gmirror/graid3.
John Baldwin [Wed, 1 Nov 2006 15:36:47 +0000 (15:36 +0000)]
Fix botch in last commit (I tested on 6.x which doesn't have TSO):
- Test the mac_type rather than if_hwassist (since ifp doesn't exist yet)
to determine if the adapter supports TSO and thus to change the sizes
for the bus_dma tag.
Do not include both <sys/types.h> and <sys/param.h>, it is a style bug as
sys/types.h is included in sys/param.h, so instead just move the
#include <sys/param.h> before the headers that need it.
Update the code to the current sync(2) version:
- Do not modify mnt_flag without mount interlock held.
- Do not touch MNT_ASYNC flag, as this can lead to a race with nmount(2).
Andrew Thompson [Wed, 1 Nov 2006 09:07:47 +0000 (09:07 +0000)]
Bring in support for the Rapid Spanning Tree Protocol (802.1w).
RSTP provides faster spanning tree convergence, the protocol will exchange
information with neighboring switches to quickly transition to forwarding
without creating loops. The code will default to RSTP mode but will downgrade
any port connected to a legacy STP network so is fully backward compatible.
John Birrell [Wed, 1 Nov 2006 09:05:40 +0000 (09:05 +0000)]
When building an upgraded make, don't worry about processing it for
use with DTrace because the normal buildworld will do that when the
tools are built.
John Birrell [Wed, 1 Nov 2006 09:02:11 +0000 (09:02 +0000)]
Add a build option to support WITH_CDDL and WITHOUT_CDDL, defaulting
to WITH_CDDL.
This option enables building code that is licensed under Sun's CDDL.
The DTrace code is licensed that way, so by default it will get built
unless the WITHOUT_CDDL option is used.
There is another build toggle, NO_CTF, which turns off execution of
ctfconvert and ctfmerge in sys.mk, but this can't be implemented as
WITH_/WITHOUT because bsd.own.mk isn't included in all Makefiles and
sys.mk is included automatically by make.
John Birrell [Wed, 1 Nov 2006 04:54:51 +0000 (04:54 +0000)]
Add a cnputs() function to write a string to the console with
a lock to prevent interspersed strings written from different CPUs
at the same time.
To avoid putting a buffer on the stack or having to malloc one,
space is incorporated in the per-cpu structure. The buffer
size if 128 bytes; chosen because it's the next power of 2 size
up from 80 characters.
String writes to the console are buffered up the end of the line
or until the buffer fills. Then the buffer is flushed to all
console devices.
Existing low level console output via cnputc() is unaffected by
this change. ithread calls to log() are also unaffected to avoid
blocking those threads.
A minor change to the behaviour in a panic situation is that
console output will still be buffered, but won't be written to
a tty as before. This should prevent interspersed panic output
as a number of CPUs panic before we end up single threaded
running ddb.
Implements gjournal support. If file system has gjournal support enabled
and -p flag was given perform fast file system checking (bascially only
garbage collecting of orphaned objects).
Rename bread() to blread() and bwrite() to blwrite() as we now link to
the libufs library, which also implement functions with that names.
Add -J flag to both newfs(8) and tunefs(8) which allows to enable gjournal
support.
I left -j flag for UFS journal implementation which we may gain at some
point.
Add gjournal specific code to the UFS file system:
- Add FS_GJOURNAL flag which enables gjournal support on a file system.
- Add cg_unrefs field to the cylinder group structure which holds
number of unreferenced (orphaned) inodes in the given cylinder group.
- Add fs_unrefs field to the super block structure which holds
total number of unreferenced (orphaned) inodes.
- When file or a directory is orphaned (last reference is removed, but
object is still open), increase fs_unrefs and cg_unrefs fields,
which is a hint for fsck in which cylinder groups looks for such
(orphaned) objects.
- When file is last closed, decrease {fs,cg}_unrefs fields.
- Add VV_DELETED vnode flag which points at orphaned objects.
Add MNT_GJOURNAL flag which indicates, that file system has gjournal
support enabled.
Add mnt_gjprovider field which keeps gjournal provider's name on which
file system is placed on. This allows to not place file system on gjournal
directly and allows gjournal class to pair gjournal provider with file
system.