Colin Percival [Tue, 2 May 2006 05:27:30 +0000 (05:27 +0000)]
Teach portsnap to parse the output of the host(1) in BIND 8 as well as
the host(1) from BIND 9. This doesn't matter for HEAD, but will help
people who install portsnap from the ports tree onto older versions of
FreeBSD.
PR: ports/93901
Sponsored by: FreeBSD security development fundraiser
John Baldwin [Mon, 1 May 2006 22:07:00 +0000 (22:07 +0000)]
Add various constants for the PAT MSR and the PAT PTE and PDE flags.
Initialize the PAT MSR during boot to map PAT type 2 to Write-Combining
(WC) instead of Uncached (UC-).
Robert Watson [Mon, 1 May 2006 21:39:48 +0000 (21:39 +0000)]
Break out socket access control and delivery logic from udp6_input()
into its own function, udp6_append(). This mirrors a similar structure
in udp_input() and udp_append(), and makes the whole thing a lot more
readable.
While here, add missing inpcb locking in UDP6 input path.
John Baldwin [Mon, 1 May 2006 21:36:47 +0000 (21:36 +0000)]
Add a new 'pmap_invalidate_cache()' to flush the CPU caches via the
wbinvd() instruction. This includes a new IPI so that all CPU caches on
all CPUs are flushed for the SMP case.
Peter Wemm [Mon, 1 May 2006 21:22:38 +0000 (21:22 +0000)]
Using an idea from Stephan Uphoff, use the empty pte's that correspond
to the unused kva in the pv memory block to thread a freelist through.
This allows us to free pages that used to be used for pv entry chunks
since we can now track holes in the kva memory block.
For some time now, -i and -P options are mutually exclusive, there's even
a regression test init-i-P.t which asserts this, but it looks I forgot to
update nokey.t regression test.
Bruce Evans [Mon, 1 May 2006 07:02:52 +0000 (07:02 +0000)]
For the vmstat sub-display:
vmstat.c:
Move totfr to be under daefr and prcfr since it logically belongs there.
Move all the count fields (wire, act, inact, cache and free) to near
the bottom of the sub-display (after all the rate fields) to reduce
competition with adjoining sub-displays.
systat.1:
Move things as above.
Attempt to improve missing and poor wording in the description of the
fields. The long sentence was hard to parse and didn't say anything
about the different units.
Tim Kientzle [Mon, 1 May 2006 01:02:19 +0000 (01:02 +0000)]
Simplify some of the wide-character handling, inspired
in part by OpenBSD's not-quite-standard-compliant
standard libraries. (No loss of functionality,
just minor recoding to not rely on certain "standard"
facilities that weren't actually needed.)
Bruce Evans [Mon, 1 May 2006 00:26:43 +0000 (00:26 +0000)]
Unbreak the support for 24-row terminals in the vmstat display. The
part that handled the 17th and 18th rows of the vmstat-proper subdisplay
was deleted in rev.1.10 when these rows stopped being used and was not
restored when the 17th row was used again. For such terminals, we now
lose the `buf' field instead of making a mess with it. Terminals with
fewer than 24 rows have never been supported.
The problem is not avoided by using curses since we use the last line
for data entry and don't use a separate subwindow for this line.
Some other things in the vmstat display could be handled better using
subwindows.
Bruce Evans [Sun, 30 Apr 2006 23:52:16 +0000 (23:52 +0000)]
Sort the ex-extended vmstat fields into their documented order in the
output too.
Fine tune all coordinates and most field widths in the vmstat (sub)display
for this and previous changes now that we have to change almost all of them
just to move the ex-extended fields:
- change VMSTATROW back to 7. It was 6 due to a hack in the extended vm
stats changes.
- reduce the maximum field width that we try for from 9 to 8. 4 or 5 is
enough for most fields but we try to use the same width for all fields.
8 is enough to display everything without changing units memory sizes
exceed 100GB.
Fix some unrelated coordinates and field widths in comments.
Bruce Evans [Sun, 30 Apr 2006 22:34:54 +0000 (22:34 +0000)]
Eliminate the "extended" vm stats. Move all fields in the extended
vm stats to the normal vm stats. Sort them into the normal stats
according to the man page only in the source code so that diffs are
almost readable. Reduce style bugs in printing the value of %ozfod.
Bruce Evans [Sun, 30 Apr 2006 20:31:00 +0000 (20:31 +0000)]
Reduce the namei (sub)display by 5 columns to make enough space for a
new vnstat display to the right of the namei display.
Move the non-vmstat fields {des,num,fre}vn from the vmstat display to a
new vnstat display. Move the dtbuf field there too. The buf and dtbuf
fields are non-vmstat and non-vnstat, so there is no good place to
display them. I need to move at least 1 of them out of the vm stats
for further cleanups of the vm stats, and there is only space for 1
of them in the vn stats. (The best place for the current buf field
is actually /dev/null, since it has been completely broken for about
10 years and broken for longer. It gives an uninteresting virtual
memory count where an interesting real memory count is wanted.)
Bruce Evans [Sun, 30 Apr 2006 09:23:11 +0000 (09:23 +0000)]
Removed the description of the nonexistent want_fd command. want_fd existed
for only 2 weeks in 1998-1999. It was replaced by general commands to
select the set of disk drives displayed.
Bruce Evans [Sun, 30 Apr 2006 09:13:59 +0000 (09:13 +0000)]
Don't redraw the disk names on every update. This was apparently done
to handle changes to the set of disks selected, but it is unnecessary
for that since the whole screen is redrawn when this set is changed.
It was also buggy:
- MAXDRIVES*6 = 42 was hard-coded as only 30 spaces in a string literal,
the last 2 disk names were not cleared as intended
- when the extended vmstats are active, clearing of even 30 columns
overruns the ozfod value field by 3 columns. This was harmless because
the field is much wider than necessary.
Bruce Evans [Sun, 30 Apr 2006 07:27:23 +0000 (07:27 +0000)]
Fix "slow (on-the-fly) zero fills percentage (`%slo-z')" some more. The
value printed is actually the optimized (i.e., the non-slow, not-on-the-fly
zero fills percentage) except in overflow cases. Describe it as %ozfod
in the display. Move the field descriptor 1 to the left so that there
is space for 5 characters after the % sign (this leaves no space between
the number and the descriptor but the % character serves well as a
separator).
Fixed integer overflow at z.ozfod = UINT_MAX/100 in the calculation of
%ozfod. This value can be reached just a few hours or minutes after
booting, so %ozfod was usually garbage in boot mode. Now %ozfod is
correct in boot mode for a few days or hours.
Print a non-dummy %ozfod when the division for it isn't division by 0
instead of when the result will be less than 100%. A result of 100%
may be correct, though a result of more than 100% indicates overflow
of one or both counters.
Bruce Evans [Sun, 30 Apr 2006 04:26:46 +0000 (04:26 +0000)]
Show the load average in the tcp display (it was already shown, perhaps
not very usefully, in all other displays). This was the original point
of the PR.
Move the load average up by 2 so that it starts in row 0 for all windows
(2 lines above it were wasted for all other windows except vmstat).
Move everything below it up by 2 or 3 (3 for icmp and icmp6 which had
an extra blank line due from not compensating for the foot-shooting in
note (3); only ip and ip6 compensated). Reduce the magic numbers related
to this.
Notes by the submitter:
%%%
1. All the subwin() calls are identical using #define MAINWIN_ROW 3
(systat.h).
2. The load average is at the top of the window.
3. Each display starts on the fourth line. I made changes to those
displays that shifted the start line (i.e., icmp). This entailed a
lot of changes within the comments at the top of those displays.
4. For ip6, I shifted the "Input next-header histogram" column down one
row to separate it from "IPv6 Output". I raised "bad scope packets"
and "address selection failed" up one row to stay with "IPv6 Input"
(valid?). They were down one row to probably line up at the bottom,
but I think they should stick with their fellow items in a column.
5. I condensed ifstat a bit. It had a lot of empty rows.
%%%
Bruce Evans [Sun, 30 Apr 2006 01:39:46 +0000 (01:39 +0000)]
Edit the interrupt name strings to shorten them. This is believed to
only affect amd64 and i386. alpha uses "intr N" instead of "irqN" and
mostly has no device names. ia64 uses only device names.
- Edit interrupt names once after they are read from the kernel and not
every time they are displayed.
- Discard bogus trailing spaces so that the next step doesn't move things
to oblivion.
- If an interrupt name starts with "irqN:" (as it usually does in on
amd64 and i386), then move "irqN" to the end and strip ":", since we
have no space for the ":" and don't want to start descriptions with
"N" after stripping "irq" in the next step (since "N" would look like
a count). This step may need reworking for interrupt names containing
several device names -- then moving the irq number to the end would
lose it instead of losing some device names.
- Remove "irq" from an interrupt name if and only if the original name is
too long to display.
Bruce Evans [Sun, 30 Apr 2006 00:50:08 +0000 (00:50 +0000)]
Backed out rev.1.49 since it had buffer overruns and only worked
accidentally.
Read buffer overruns:
The size of the target array (TSOTTA == 10) is a wrong limit to use for
scanning the source string.
Write buffer overruns:
TSOTTA is also a wrong limit to use for copying to the target buffer,
since we want to add a NUL terminator afterwards. TSOTTA was also 1
too small for holding both the desired number of visible characters
and the NUL.
Worked accidentally:
There is error in the algorithm that tends to result in the space saved
by stripping "irq" not actually being used, but some cases worked
accidentally provided "irqN" is near the end of the source string and
"N" is only 1 digit.
Starting with 5.mumble-CURRENT, "irqN" is at the beginning of the
string on all (?) arches that have it and the accidents don't happen.
E.g. on i386's, the keyboard irq is now named
"irq1: atkbd0<bogus blank padding>" by the kernel, and this name was
converted to "1: atkb" -- not only the device number but part of the
device name has been lost --, while before 5.mumble the kernel name
was "atkbd0 irq1" and systat accidentally preserved the irq number to
give "atkbd0 1". The ":" in the string wastes precious space, and
stripping "irq" results in descriptions starting with numbers which
makes them look too much like counts. This commit just fixes the last
problem.
vn_start_write()/vn_finished_write() is not needed here, because
vn_start_write() is always called earlier in the code path and calling
the function recursively may lead to a deadlock.
Bruce Evans [Sat, 29 Apr 2006 21:30:23 +0000 (21:30 +0000)]
Abbreviate long field descriptors at write time so that they don't get
clobbered at runtime:
dirtybuf -> dtbuf
desiredvnodes -> desvn
numvnodes -> numvn
freevnodes -> frevn
The vmstats column has only 5 characters available for descriptors, but up
to 13 were used. The extras get clobbered at runtime by interrupt values
and/or descriptors on systems with more than 12 interrupt sources.
%slo-z -> %sloz
This one is in the "extended" vmstats area and doesn't get clobbered now.
Removed stale documentation of desvn.
Changed a descriptor:
tfree -> totfr
so that it is consistent with the abbreviations for other free counts
(daefr and prcfr) and thus almost decodeable.
Fixed missing documentation of tfree/totfr. This and everything else
in the extended vmstats area is misdocumented as being in a certain
place in the vmstats column.
Gordon Tetlow [Sat, 29 Apr 2006 18:21:43 +0000 (18:21 +0000)]
Add auto upgrade capability to mergemaster.
An mtree description of all non-zero files that make
distribution installs (only size and md5) is built from the
temproot. When the user completes a mergemaster run, the
mtree description file gets installed into /var/db for
safe-keeping.
When the user then decides to do a subsequent upgrade (with
the -U flag), the existing mtree description from /var/db
is called into service looking for files that are different in
DESTDIR. This is stashed away until a file that would normally
end up prompting the user to look at changes is encountered.
Since there are no user modified changes, the new file is
installed without bothering the user.
Add curses ACS line graphics support for iso15 fonts
Now ncurses-based programs such as sysinstall and mc will display the
correct font for graphical lines instead of "-" and "+" characters.
Correct two special characters for cons25l1 in termcap: use real
arrows instead of ">>" and "<<".
Add a lot of additional symbols for line drawing which are taken from
the CP437 font.
Almost all of the ACS symbols are now implemented.
Check the buffer size when copying the line returned by el_gets() into our
own buffer. Interactively typing in long lines (>1023 characters)
previously overflowed the buffer. Unlike the NetBSD people I don't see the
need to subtract 8 from BUFSIZ, so I just used BUFSIZ-1.
Andrew Thompson [Sat, 29 Apr 2006 05:37:25 +0000 (05:37 +0000)]
Add support for fragmenting ipv4 packets.
The packet filter may reassemble the ip fragments and return a packet that is
larger than the MTU of the sending interface. There is no check for DF or icmp
replies as we can only get a large packet to fragment by reassembling a
previous fragment, and this only happens after a call to pfil(9).
Robert Watson [Fri, 28 Apr 2006 21:39:57 +0000 (21:39 +0000)]
Also check use_pty in the ptmx clone lookup; this means that when ptmx
support is turned off using the sysctl, we no longer even allow the
ptmx device to be looked up.
Rewrite of puc(4). Significant changes are:
o Properly use rman(9) to manage resources. This eliminates the
need to puc-specific hacks to rman. It also allows devinfo(8)
to be used to find out the specific assignment of resources to
serial/parallel ports.
o Compress the PCI device "database" by optimizing for the common
case and to use a procedural interface to handle the exceptions.
The procedural interface also generalizes the need to setup the
hardware (program chipsets, program clock frequencies).
o Eliminate the need for PUC_FASTINTR. Serdev devices are fast by
default and non-serdev devices are handled by the bus.
o Use the serdev I/F to collect interrupt status and to handle
interrupts across ports in priority order.
o Sync the PCI device configuration to include devices found in
NetBSD and not yet merged to FreeBSD.
o Add support for Quatech 2, 4 and 8 port UARTs.
o Add support for a couple dozen Timedia serial cards as found
in Linux.
John Baldwin [Fri, 28 Apr 2006 20:08:16 +0000 (20:08 +0000)]
The nvidia binary blob sometimes defers tx completion notification to the
OS dependent layer. Thus, the watchdog timer can go off when the tx
engine is working fine but the OS dependent layer just hasn't been called
to cleanup finished tx transactions. To workaround this, when the watchdog
fires, poke the binary blob to force it to flush any pending tx
completions. If this drops the pending tx count to zero then just return
without logging a message or resetting the chip.
This reportedly fixes the 'device timeout()' errors with at least several
NF4 nve(4) parts.
Submitted by: Nathan Alexander Whitehorn <nathanw@uchicago.edu> (code)
Submitted by: dg (inspiration for comment and explanation)
MFC after: 1 week
Sam Leffler [Fri, 28 Apr 2006 19:06:15 +0000 (19:06 +0000)]
Ensure outbound data packets in hostap mode are delivered only to
stations that are associated by making ieee80211_find_txnode return
NULL when a unicast frame is to be delivered to an unassociated
station. This will be handled differently in the future but for
now putting the check here allows all drivers to immediately do
the right thing.
Peter Wemm [Fri, 28 Apr 2006 19:05:08 +0000 (19:05 +0000)]
Interim fix for pmap problems I introduced with my last commit.
Remove the code to dyanmically change the pv_entry limits. Go back
to a single fixed kva reservation for pv entries, like was done
before when using the uma zone. Go back to never freeing pages
back to the free pool after they are no longer used, just like
before.
This stops the lock order reversal due to aquiring the kernel map
lock while pmap was locked.
This fixes the recursive panic if invariants are enabled.
The problem was that allocating/freeing kva causes vm_map_entry
nodes to be allocated/freed. That can recurse back into pmap as
new pages are hooked up to kvm and hence all the problem.
Allocating/freeing kva indirectly allocate/frees memory.
So, by going back to a single fixed size kva block and an index,
we avoid the recursion panics and the LOR.
The problem is that now with a linear block of kva, we have no
mechanism to track holes once pages are freed. UMA has the same
problem when using custom object for a zone and a fixed reservation
of kva. Simple solutions like having a bitmap would work, but would
be very inefficient when there are hundreds of thousands of bits
in the map. A first-free pointer is similarly flawed because pages
can be freed at random and the first-free pointer would be rewinding
huge amounts. If we could allocate memory for tree strucures or
an external freelist, that would work. Except we cannot allocate/free
memory here because we cannot allocate/free address space to use
it in. Anyway, my change here reverts back to the UMA behavior of
not freeing pages for now, thereby avoiding holes in the map.
ups@ had a truely evil idea that I'll investigate. It should allow
freeing unused pages again by giving us a no-cost way to track the
holes in the kva block. But in the meantime, this should get people
booting with witness and/or invariants again.
Footnote: amd64 doesn't have this problem because of the direct map
access method. I'd done all my witness/invariants testing there. I'd
never considered that the harmless-looking kmem_alloc/kmem_free calls
would cause such a problem and it didn't show up on the boot test.
- Don't hold the device sx lock when going to sleep.
- Prevent possible live-lock in case of memory problems by freeing
already completed requests first.
Reported and tested by: markus, Bradley W. Dutton <brad-fbsd-stable@duttonbros.com>
MFC after: 1 day
- Remove dead code.
- Comment possible event miss, which isn't critical, but probably can be
fixed by replacing the event lock usage with the queue lock.
Andrew Thompson [Fri, 28 Apr 2006 11:48:53 +0000 (11:48 +0000)]
- use ath(4) in the wireless examples rather than the aging wi(4)
- make the packet filtering its own section and clarify a few points
- note that the interfaces need to be upped [1]
Robert Watson [Fri, 28 Apr 2006 10:45:27 +0000 (10:45 +0000)]
Add a basic man page for the sysctl(9) macro interfaces. Previously man
pages existed only for the dynamic sysctl interfaces. There's probably
more complete and accurate content, better advice, etc, that could be added
here.
Per scottl's suggest, add a small piece of moralizing text regarding the
fact that sysctl names quickly get embedded in system configuration files,
libraries, third party applications, and even books, so renaming and
removing names after they've been published is a tricky issue.
Mike Silbersack [Fri, 28 Apr 2006 05:27:27 +0000 (05:27 +0000)]
Switch all bus_dmamap_sync calls that used PREREAD to PREWRITE and all
POSTWRITE to POSTREAD.
No guarantee that all busdma is usage is perfect, but this change (in
addition to scott's last two commits) makes if_bfe work with > 1GB of
memory in my laptop.
Add some incomplete support for Marvell Yukon EC controllers based on
OpenBSD changes. With these changes, PHY part of the driver becomes
functional (it senses media changes and negotiates speed just fine),
previously it just hang with no PHY message, but no data goes through
interface (error message is "can not stop transfer of Tx/Rx descriptor).
Hopefully somebody with more clue/free time will be able to pick up
after me.
Jeff Roberson [Fri, 28 Apr 2006 01:05:31 +0000 (01:05 +0000)]
- Add a BO_NEEDSGIANT flag to the bufobj. This flag forces all child
buffers to go on the buf daemon's DIRTYGIANT queue.
- Set BO_NEEDSGIANT on ffs's devvp since the ffs_copyonwrite handler
runs in the context of the buf daemon and may require Giant.
Jeff Roberson [Fri, 28 Apr 2006 00:59:48 +0000 (00:59 +0000)]
- Consistently track ni_dvp and ni_vp with dvfslocked and vfslocked rather
than trying to optimize it into a single lock. This adds more calls to
lock giant with non smpsafe filesystems but is the only way to reliably
hold the correct lock.
- Remove an invalid assert in the mountedhere case in lookup and fix the
code to properly deal with the scenario. We can actually have a lookup
that returns dp == dvp with mountedhere set with certain unmount races.