Warner Losh [Sat, 4 Feb 2006 08:15:29 +0000 (08:15 +0000)]
Remove ifdef notyet for SIOCGHWADDR
Treat SIOCADDMULTI and SIOCDELMULTI the same, since they had the same code
Remove redundant assignment to error
Convert to using the altq interface completely.
Tai-hwa Liang [Sat, 4 Feb 2006 08:07:00 +0000 (08:07 +0000)]
s/bin/sbin/ for mount_nwfs, mount_portalfs and mount_smbfs. They never
lived in bin since 1994.
Whilst here, also document the removal time of aforementioned utilities
as well.
Hajimu UMEMOTO [Sat, 4 Feb 2006 07:59:17 +0000 (07:59 +0000)]
Never select the PCB that has INP_IPV6 flag and is bound to :: if
we have another PCB which is bound to 0.0.0.0. If a PCB has the
INP_IPV6 flag, then we set its cost higher than IPv4 only PCBs.
Robert Watson [Sat, 4 Feb 2006 00:14:06 +0000 (00:14 +0000)]
Cast pointers to (uintptr_t) before down-casting to (int). This avoids
an incompatible conversion from a 64-bit pointer to a 32-bit integer on
64-bit platforms. We will investigate whether Solaris uses a 64-bit
token here, or a new record here, in order to avoid truncating user
pointers that are 64-bit. However, in the mean time, truncation is fine
as these are rarely/never used fields in audit records.
Robert Watson [Fri, 3 Feb 2006 15:42:16 +0000 (15:42 +0000)]
In fchdir(), Giant must be separately acquired and dropped if the old
vnode is from a file system that is not MPSAFE, as vrele() expects
Giant to be held when it is called on a non-MPSAFE vnode.
Marius Strobl [Fri, 3 Feb 2006 12:35:42 +0000 (12:35 +0000)]
- Don't shift the clock frequency in MHz left by 8 before assigning it
to sbus_mdvec.dv_clock as sbus_mdvec.dv_clock is meant to be specified
in MHz. While this was a bug it shouldn't have affected FreeBSD/sparc64
as sbus_mdvec.dv_clock is used to limit the clock rate of chips when
a machine isn't able to support them at maximum speed which isn't the
case for sun4u machines.
- Remove the code that checks whether the clock frequency returned by
sbus_get_clockfreq() is 0 and falls back to 25MHz if it is as that's
already done in sbus(4).
Gleb Smirnoff [Fri, 3 Feb 2006 11:38:19 +0000 (11:38 +0000)]
Dropping the lock in the transmit_event() is not safe, because we
store some pipe pointers on stack. If user reconfigures dummynet
in the interlock gap, we can work with freed pipes after relock.
To fix this, we decided not to send packets in transmit_event(),
but fill a queue. At the end of dummynet() and dummynet_io(),
after the lock is dropped, if there is something in the queue
we run dummynet_send() to process the queue.
Peter Wemm [Fri, 3 Feb 2006 00:16:36 +0000 (00:16 +0000)]
Make PV entries dynamic on amd64. i386 has a pre-reserved block of kva
dedicated to storing pv entries, originally so that kva didn't have to be
allocated at inconvenient times. For amd64, we can get the same effect by
using the direct map area. Allocating pages is the same as with the object
backed method, but now we can just lookup the page in the direct map area.
Thus, no more pageable kva is reserved. This is the single largest
consumer of kva on our work machines and this change should help conserve
the fixed size 2GB pageable kva on the amd64 kernel.
There are a pair of sysctl nodes introduced, named the same as their
tunable counterparts. vm.pmap.shpgperproc and vm.pmap.pv_entry_max
They work just like the tunables of the same path, except the values are
linked. The pv entry cap is now dynamically changeable.
I didn't make them totally unlimited because we need some sort of safety
limit still. One could consume all physical memory without a cap.
Tor Egge [Thu, 2 Feb 2006 21:37:39 +0000 (21:37 +0000)]
For low memory situations, non-VMIO buffers didnt't release pages back to
the system when brelse() was called with B_RELBUF set on the buffer. This
could be a problem when the system was low on memory, had many buffers on
QUEUE_EMPTYKVA and started to traverse directories. For each getnewbuf(),
pages were allocated from the system, driving the free reserve downwards.
For each brelse(), the system put the buffer on QUEUE_CLEAN, with B_INVAL
set.
This commit changes the semantics of B_RELBUF to also free pages from
non-VMIO buffers.
Paul Saab [Thu, 2 Feb 2006 17:50:59 +0000 (17:50 +0000)]
- Move the command setup from amr_start1 into the card specific submit
routines.
- Add or replace cpu_spinwait() with DELAY(1) to a few of the busy
loops when reading from the controller to work around firmware bugs
which can crash the controller.
Marius Strobl [Thu, 2 Feb 2006 14:57:00 +0000 (14:57 +0000)]
Correct and improve the description of le(4) vs. pcn(4); apparently I
was thinking from the pcn(4) perspective instead of the le(4) one when
writing the former version as le(4) supports a superset of the chips
supported by pcn(4) and not the other way round.
Robert Watson [Thu, 2 Feb 2006 10:32:27 +0000 (10:32 +0000)]
Add audit.4 man page, providing basic documentation for configuring the
kernel audit facility, warnings about the experimental nature of this
implementation, and pointers at a large number of other audit related
man pages.
Oleg Bulyzhin [Thu, 2 Feb 2006 09:58:31 +0000 (09:58 +0000)]
Enable 'complete' rx checksum offloading (i.e. let chip calculate checksums
with pseudo header for tcp/udp packets). This could save one in_pseudo() call
per incoming tcp/udp packet.
Somewhat re-factor the read/write locking mechanism associated with the packet
filtering mechanisms to use the new rwlock(9) locking API:
- Drop the variables stored in the phil_head structure which were specific to
conditions and the home rolled read/write locking mechanism.
- Drop some includes which were used for condition variables
- Drop the inline functions, and convert them to macros. Also, move these
macros into pfil.h
- Move pfil list locking macros intp phil.h as well
- Rename ph_busy_count to ph_nhooks. This variable will represent the number
of IN/OUT hooks registered with the pfil head structure
- Define PFIL_HOOKED macro which evaluates to true if there are any
hooks to be ran by pfil_run_hooks
- In the IP/IP6 stacks, change the ph_busy_count comparison to use the new
PFIL_HOOKED macro.
- Drop optimization in pfil_run_hooks which checks to see if there are any
hooks to be ran, and returns if not. This check is already performed by the
IP stacks when they call:
if (!PFIL_HOOKED(ph))
goto skip_hooks;
- Drop in assertion which makes sure that the number of hooks never drops
below 0 for good measure. This in theory should never happen, and if it
does than there are problems somewhere
- Drop special logic around PFIL_WAITOK because rw_wlock(9) does not sleep
- Drop variables which support home rolled read/write locking mechanism from
the IPFW firewall chain structure.
- Swap out the read/write firewall chain lock internal to use the rwlock(9)
API instead of our home rolled version
- Convert the inlined functions to macros
Reviewed by: mlaier, andre, glebius
Thanks to: jhb for the new locking API
Robert Watson [Thu, 2 Feb 2006 00:37:05 +0000 (00:37 +0000)]
Add new fields to process-related data structures:
- td_ar to struct thread, which holds the in-progress audit record during
a system call.
- p_au to struct proc, which holds per-process audit state, such as the
audit identifier, audit terminal, and process audit masks.
In the earlier implementation, td_ar was added to the zero'd section of
struct thread. In order to facilitate merging to RELENG_6, it has been
moved to the end of the data structure, requiring explicit
initalization in the thread constructor.
Much help from: wsalamon
Obtained from: TrustedBSD Project
Robert Watson [Wed, 1 Feb 2006 21:00:16 +0000 (21:00 +0000)]
Add 'options AUDIT' and associate various .c files with the AUDIT
option. We always build audit_syscalls.c so that the system call
stubs can return ENOSYS rather than the system call code
generating SIGSYS for the system calls. We are not yet ready to
add AUDIT to LINT, as the prototypes for system call arguments
won't be there until after the system calls for audit are added.
Much work from: wsalamon
Obtained from: TrustedBSD Project
Robert Watson [Wed, 1 Feb 2006 20:01:18 +0000 (20:01 +0000)]
Import kernel audit framework:
- Management of audit state on processes.
- Audit system calls to configure process and system audit state.
- Reliable audit record queue implementation, audit_worker kernel
thread to asynchronously store records on disk.
- Audit event argument.
- Internal audit data structure -> BSM audit trail conversion library.
- Audit event pre-selection.
- Audit pseudo-device permitting kernel->user upcalls to notify auditd
of kernel audit events.
Much work by: wsalamon
Obtained from: TrustedBSD Project, Apple Computer, Inc.
Robert Watson [Wed, 1 Feb 2006 19:54:22 +0000 (19:54 +0000)]
Update src/sys/bsm include files to match OpenBSM (albeit with a
couple of FreeBSD-specific modifications that may be merged out
later). These include files define the basic audit data
structures, types, and definitions use by the kernel, or shared
by the kernel and user space.
Obtained from: TrustedBSD Project, Apple Computer, Inc.
John Baldwin [Wed, 1 Feb 2006 15:45:29 +0000 (15:45 +0000)]
Don't add an agp child in vgapci's attach routine if the PCIY_AGP
capability is present as not all devices supported by the agp_i810 driver
(such as i915) have the AGP capability. Instead, add an identify routine
to the agp_i810 driver that uses the PCI ID to determine if it should
create an agp child device.
Oleg Bulyzhin [Wed, 1 Feb 2006 15:16:03 +0000 (15:16 +0000)]
Optimize bge_rxeof() & bge_txeof(): return immediately if there are no packets
to process. It could give us [significant?] perfomance increase if there is big
difference between RX/TX flows.
Submitted by: Mihail Balikov <mihail.balikov AT interbgc DOT com>
Approved by: glebius (mentor)
MFC after: 3 days
Andre Oppermann [Wed, 1 Feb 2006 13:55:03 +0000 (13:55 +0000)]
Move the IPSEC related code blocks to their own file to unclutter
and signifincantly improve the readability of ip_input() and
ip_output() again.
The resulting IPSEC hooks in ip_input() and ip_output() may be
used later on for making IPSEC loadable.
This move is mostly mechanical and should preserve current IPSEC
behaviour as-is. Nothing shall prevent improvements in the way
IPSEC interacts with the IPv4 stack.
Yaroslav Tykhiy [Wed, 1 Feb 2006 13:04:52 +0000 (13:04 +0000)]
Record the change in vnone_create_vobject() argument size,
which broke kernel ABI to filesystem modules on i386, where
sizeof(size_t) != sizeof(off_t).
Yaroslav Tykhiy [Wed, 1 Feb 2006 12:43:13 +0000 (12:43 +0000)]
Use off_t for file size passed to vnode_create_vobject().
The former type, size_t, was causing truncation to 32 bits on i386,
which immediately led to undersizing of VM objects backed by
files >4GB. In particular, sendfile(2) was broken for such files.
Jeff Roberson [Wed, 1 Feb 2006 09:34:32 +0000 (09:34 +0000)]
- Solve a problem where a vput could be called on an outgoing directory
without Giant held. Do this by tracking the vfslocked state for
the directory seperate from the child. This is only important
in the case where we cross a mountpoint.
Sponsored by: Isilon Systems, Inc.
MFC After: 3 days