Submitted to and merged from NetBSD (rev. 1.23 and 1.24):
- Don't delete the current line when typing `yy'.
- Don't use a possibly stale pointer in cv_paste().
-
Tor Egge [Wed, 10 Aug 2005 00:17:36 +0000 (00:17 +0000)]
Don't allow pagedaemon to skip pages while scanning PQ_ACTIVE or PQ_INACTIVE
due to the vm object being locked.
When a process writes large amounts of data to a file, the vm object associated
with that file can contain most of the physical pages on the machine. If the
process is preempted while holding the lock on the vm object, pagedaemon would
be able to move very few pages from PQ_INACTIVE to PQ_CACHE or from PQ_ACTIVE
to PQ_INACTIVE, resulting in unlimited cleaning of dirty pages belonging to
other vm objects.
Temporarily unlock the page queues lock while locking vm objects to avoid lock
order violation. Detect and handle relevant page queue changes.
This change depends on both the lock portion of struct vm_object and normal
struct vm_page being type stable.
John Baldwin [Tue, 9 Aug 2005 21:53:26 +0000 (21:53 +0000)]
Call tulip_start() rather than tulip_ifstart() from the interrupt handler
to avoid recursing on the driver lock. Not sure why my test box didn't
catch this earlier.
Robert Watson [Tue, 9 Aug 2005 17:19:21 +0000 (17:19 +0000)]
Add helper function ip_findmoptions(), which accepts an inpcb, and attempts
to atomically return either an existing set of IP multicast options for the
PCB, or a newlly allocated set with default values. The inpcb is returned
locked. This function may sleep.
Call ip_moptions() to acquire a reference to a PCB's socket options, and
perform the update of the options while holding the PCB lock. Release the
lock before returning.
Remove garbage collection of multicast options when values return to the
default, as this complicates locking substantially. Most applications
allocate a socket either to be multicast, or not, and don't tend to keep
around sockets that have previously been used for multicast, then used for
unicast.
This closes a number of race conditions involving multiple threads or
processes modifying the IP multicast state of a socket simultaenously.
Robert Watson [Tue, 9 Aug 2005 13:27:50 +0000 (13:27 +0000)]
Add an order between UDP inpcb locks and the IPv4 multicast address
list lock, as there has been a report that an alternative lock order
is getting introduced. This should help ferret it out.
Reported by: Ed Maste <emaste at phaedrus dot sandvine dot ca>
Robert Watson [Tue, 9 Aug 2005 12:56:20 +0000 (12:56 +0000)]
For each interface flag, indicate whether or not it is owned by the
device driver, owned by the network stack, or initialized by the device
driver before attach and read-only from then on.
Not all device drivers and network stack components currently follow
these rules, especially with respect to IFF_UP, and a few exceptions
with IFF_ALLMULTI.
Robert Watson [Tue, 9 Aug 2005 10:20:02 +0000 (10:20 +0000)]
Propagate rename of IFF_OACTIVE and IFF_RUNNING to IFF_DRV_OACTIVE and
IFF_DRV_RUNNING, as well as the move from ifnet.if_flags to
ifnet.if_drv_flags. Device drivers are now responsible for
synchronizing access to these flags, as they are in if_drv_flags. This
helps prevent races between the network stack and device driver in
maintaining the interface flags field.
Many __FreeBSD__ and __FreeBSD_version checks maintained and continued;
some less so.
Robert Watson [Tue, 9 Aug 2005 10:16:17 +0000 (10:16 +0000)]
Rename IFF_RUNNING to IFF_DRV_RUNNING, IFF_OACTIVE to IFF_DRV_OACTIVE,
and move both flags from ifnet.if_flags to ifnet.if_drv_flags, making
and documenting the locking of these flags the responsibility of the
device driver, not the network stack. The flags for these two fields
will be mutually exclusive so that they can be exposed to user space as
though they were stored in the same variable.
Provide #defines to provide the old names #ifndef _KERNEL, so that user
applications (such as ifconfig) can use the old flag names. Using the
old names in a device driver will result in a compile error in order to
help device driver writers adopt the new model.
When exposing the interface flags to user space, via interface ioctls
or routing sockets, or the two fields together. Since the driver flags
cannot currently be set for user space, no new logic is currently
required to handle this case.
Add some assertions that general purpose network stack routines, such
as if_setflags(), are not improperly used on driver-owned flags.
With this change, a large number of very minor network stack races are
closed, subject to correct device driver locking. Most were likely
never triggered.
Driver sweep to follow; many thanks to pjd and bz for the line-by-line
review they gave this patch.
Gleb Smirnoff [Tue, 9 Aug 2005 08:37:28 +0000 (08:37 +0000)]
- Use 'error' variable to store error value, instead of 'i'.
- Push 'i' into the only block where it is used.
- Remove redundant check for rt being NULL. If rt_check() hasn't
returned an error, then rt is valid.
Colin Percival [Tue, 9 Aug 2005 03:32:29 +0000 (03:32 +0000)]
When parsing the HTTP_PROXY environment variable, strip a trailing /
from the port number (if any exists). This unbreaks
env HTTP_PROXY="http://localhost:3128/" portsnap fetch
While I'm here, list both the host and the port in the error message
output if getaddrinfo() fails, since either of them could be responsible
for the failure.
Andrew Thompson [Mon, 8 Aug 2005 22:21:55 +0000 (22:21 +0000)]
Use m_copypacket() which is an optimization of the common case
m_copym(m, 0, M_COPYALL, how).
This is required for strict alignment architectures where we align the IP
header in the input path but m_copym() will create an unaligned copy in
bridge_broadcast(). m_copypacket() preserves alignment of the first mbuf.
Noticed by: Petri Simolin
Approved by: mlaier (mentor)
MFC after: 3 days
Drop in a WITNESS_WARN into SYSCTL_IN to make sure that we are
not holding any non-sleep-able-locks locks when copyin is called.
This gets executed un-conditionally since we have no function
to wire the buffer in this direction.
John Baldwin [Mon, 8 Aug 2005 21:03:54 +0000 (21:03 +0000)]
- Use callout_init_mtx() to close a small race between callout_stop() and
the timeout routine.
- Fix locking in detach.
- Add locking in shutdown.
- Don't mess with the PCI command register in resume, the PCI bus driver
already does this for us.
- Add locking to the non-serial ifmedia routines.
- Fix locking in ioctl.
- Remove spls and support for 4.x.
Colin Percival [Mon, 8 Aug 2005 20:10:06 +0000 (20:10 +0000)]
Add portsnap to the base system. This is a secure, easy to use,
fast, lightweight, and generally good way for users to keep their
ports trees up to date.
This is version 0.9.4 from the ports tree (sysutils/portsnap) with
the following changes:
1. The experimental pipelined http code is enabled. No seatbelts
in -CURRENT. (^_^)
2. The working directory has moved from /usr/local/portsnap to
/var/db/portsnap (as discussed on -arch two days ago).
3. Portsnap now fetches a list of mirrors (distributed as DNS SRV
records) and selects one randomly. This should help to avoid the
uneven loading which plagues the cvsup mirror network.
4. The license is now 2-clause BSD instead of 3-clause BSD.
5. Various incidental changes to make portsnap fit into the base
system's build mechanics.
X-MFC-After: 6.0-RELEASE
X-MFC-Before: 5.5-RELEASE X-MFC-To: RELENG_6, RELENG_5, ports
discussed on: -arch and several other places
"yes please" from: simon, remko, flz, Diane Bruce
thinks this is a great idea: bsdimp
Hopes he didn't forget any files: cperciva
Robert Watson [Mon, 8 Aug 2005 19:55:32 +0000 (19:55 +0000)]
Merge the dev_clone and dev_clone_cred event handlers into a single
event handler, dev_clone, which accepts a credential argument.
Implementors of the event can ignore it if they're not interested,
and most do. This avoids having multiple event handler types and
fall-back/precedence logic in devfs.
This changes the kernel API for /dev cloning, and may affect third
party packages containg cloning kernel modules.
Ha! This is a very interesting bug.
I copied strcasecmp() from userland to the kernel and it didn't worked!
I started to debug the problem and I find out that this line:
while (tolower(*us1) == tolower(*us2++)) {
was adding _3_ bytes to 'us2' pointer. Am I loosing my minds here?!...
No, in-kernel tolower() is a macro which uses its argument three times.
Bad tolower(9), no cookie.
Check to see if we wired the user-supplied buffers in SYSCTL_OUT, if
the buffer has not been wired and we are holding any non-sleep-able locks,
drop a witness warning. If the buffer has not been wired, it is possible
that the writing of the data can sleep, especially if the page is not in
memory. This can result in a number of different locking issues, including
dead locks.
Sam Leffler [Mon, 8 Aug 2005 18:46:36 +0000 (18:46 +0000)]
Split crypto tx+rx key indices and add a key index -> node mapping table:
Crypto changes:
o change driver/net80211 key_alloc api to return tx+rx key indices; a
driver can leave the rx key index set to IEEE80211_KEYIX_NONE or set
it to be the same as the tx key index (the former disables use of
the key index in building the keyix->node mapping table and is the
default setup for naive drivers by null_key_alloc)
o add cs_max_keyid to crypto state to specify the max h/w key index a
driver will return; this is used to allocate the key index mapping
table and to bounds check table loookups
o while here introduce ieee80211_keyix (finally) for the type of a h/w
key index
o change crypto notifiers for rx failures to pass the rx key index up
as appropriate (michael failure, replay, etc.)
Node table changes:
o optionally allocate a h/w key index to node mapping table for the
station table using the max key index setting supplied by drivers
(note the scan table does not get a map)
o defer node table allocation to lateattach so the driver has a chance
to set the max key id to size the key index map
o while here also defer the aid bitmap allocation
o add new ieee80211_find_rxnode_withkey api to find a sta/node entry
on frame receive with an optional h/w key index to use in checking
mapping table; also updates the map if it does a hash lookup and the
found node has a rx key index set in the unicast key; note this work
is separated from the old ieee80211_find_rxnode call so drivers do
not need to be aware of the new mechanism
o move some node table manipulation under the node table lock to close
a race on node delete
o add ieee80211_node_delucastkey to do the dirty work of deleting
unicast key state for a node (deletes any key and handles key map
references)
Ath driver:
o nuke private sc_keyixmap mechansim in favor of net80211 support
o update key alloc api
These changes close several race conditions for the ath driver operating
in ap mode. Other drivers should see no change. Station mode operation
for ath no longer uses the key index map but performance tests show no
noticeable change and this will be fixed when the scan table is eliminated
with the new scanning support.
Robert Watson [Mon, 8 Aug 2005 16:09:33 +0000 (16:09 +0000)]
Insert a series of place-holder function pointers in mac_policy.h for
entry points that will be inserted over the life-time of the 6.x branch,
including for:
- New struct file labeling (void * already added to struct file), events,
access control checks.
- Additional struct mount access control checks, internalization/
externalization.
- mac_check_cap()
- System call enter/exit check and event.
- Socket and vnode ioctl entry points.
David Xu [Mon, 8 Aug 2005 14:20:10 +0000 (14:20 +0000)]
Try best to keep a preempted thread at front of run queue, this seems
improved performance a bit for some workloads, but still seeing interactive
lagging unless cpu idling race is fixed.
Scott Long [Mon, 8 Aug 2005 12:16:21 +0000 (12:16 +0000)]
Complete the removal of __FreeBSD_version checks from the amr driver. The
driver had advanced enough over the years that direct sharing of code with
FreeBSD 4.x was in no way possible anymore.
Sam Leffler [Mon, 8 Aug 2005 03:30:57 +0000 (03:30 +0000)]
Cleanup beacon/listen interval handling:
o separate configured beacon interval from listen interval; this
avoids potential use of one value for the other (e.g. setting
powersavesleep to 0 clobbers the beacon interval used in hostap
or ibss mode)
o bounds check the beacon interval received in probe response and
beacon frames and drop frames with bogus settings; not clear
if we should instead clamp the value as any alteration would
result in mismatched sta+ap configuration and probably be more
confusing (don't want to log to the console but perhaps ok with
rate limiting)
o while here up max beacon interval to reflect WiFi standard
Noticed by: Martin <nakal@nurfuerspam.de>
MFC after: 1 week
Skip jails which are already running and inform why.
We're checking for /var/run/jail_<name>.id file and if it exists, we don't
start the jail. It should be also safe in case of reboot(8), because
rc.d/cleanvar script is going to remove /var/run/jail_* files.
It helps to avoid potential mess when the same jail is started twice,
because of an administrator mistake (been there, done that).
We don't need to skip /var/run/log socket, as syslogd is always started
after rc.d/cleanvar. And if we wanted to skip /var/run/log we still needed
to skip /var/run/logpriv, which wasn't implemented.
Alan Cox [Sun, 7 Aug 2005 22:00:47 +0000 (22:00 +0000)]
When support for 2MB/4MB pages was added in revision 1.148 an error was
made in pmap_protect(): The pmap's resident count should not be reduced
unless mappings are removed.
The errant change to the pmap's resident count could result in a later
pmap_remove() failing to remove any mappings if the errant change has set
the pmap's resident count to zero.
Sync libedit with recent NetBSD developments. Including improvements to the
vi-mode, removal of clause 3, cleanups and the export of the tokenization
functions.
Revert the replacement of realloc() with reallocf() (el.h:1.2, map.c:1.5 and
tokenizer.c:1.3). Contrary to the commit log there were no memory leaks,
but the change introduced a bug because the free'd pointer was not zeroed
and calling the appropriate _end() function would call free() a second time.
Peter Grehan [Sun, 7 Aug 2005 02:20:35 +0000 (02:20 +0000)]
Export a routine, kobj_machdep_init(), that allows platforms
to use the kobj subsystem as soon at mutex_init() has been called
instead of having to wait for the SI_SUB_LOCK sysinit.
Improve SMP support:
o Allocate a VHPT per CPU. The VHPT is a hash table that the CPU
uses to look up translations it can't find in the TLB. As such,
the VHPT serves as a level 1 cache (the TLB being a level 0 cache)
and best results are obtained when it's not shared between CPUs.
The collision chain (i.e. the hash bucket) is shared between CPUs,
as all buckets together constitute our collection of PTEs. To
achieve this, the collision chain does not point to the first PTE
in the list anymore, but to a hash bucket head structure. The
head structure contains the pointer to the first PTE in the list,
as well as a mutex to lock the bucket. Thus, each bucket is locked
independently of each other. With at least 1024 buckets in the VHPT,
this provides for sufficiently finei-grained locking to make the
ssolution scalable to large SMP machines.
o Add synchronisation to the lazy FP context switching. We do this
with a seperate per-thread lock. On SMP machines the lazy high FP
context switching without synchronisation caused inconsistent
state, which resulted in a panic. Since the use of the high FP
registers is not common, it's possible that races exist. The ia64
package build has proven to be a good stress test, so this will
get plenty of exercise in the near future.
o Don't use the local ID of the processor we want to send the IPI to
as the argument to ipi_send(). use the struct pcpu pointer instead.
The reason for this is that IPI delivery is unreliable. It has been
observed that sending an IPI to a CPU causes it to receive a stray
external interrupt. As such, we need a way to make the delivery
reliable. The intended solution is to queue requests in the target
CPU's per-CPU structure and use a single IPI to inform the CPU that
there's a new entry in the queue. If that IPI gets lost, the CPU
can check it's queue at any convenient time (such as for each
clock interrupt). This also allows us to send requests to a CPU
without interrupting it, if such would be beneficial.
With these changes SMP is almost working. There are still some random
process crashes and the machine can hang due to having the IPI lost
that deals with the high FP context switch.
The overhead of introducing the hash bucket head structure results
in a performance degradation of about 1% for UP (extra pointer
indirection). This is surprisingly small and is offset by gaining
reasonably/good scalable SMP support.
Reduce the default MAXCPU from 16 to 4. This is in preparation of
allocating a VHPT per CPU. Since we don't yet know how many CPUs
are actually in the system at the time we need to allocate the
VHPTs, we allocate for MAXCPU processors. This can result in a
lot of wasted space for 2-way machines. So, for now, limit MAXCPU
to something smaller until we have something more dynamic.
For ia64_ptc_{e,g,ga,l}(), use instruction serialization. We
typically don't know what the TLB described and need to assume
that it affects the fetching of instructions.