Add a simple C program to check mmap calls to various different addresses.
The most important test is the mapping fixed at address 0 depending on the
new sysctl.
Things will be updated and possibly converted to m4/.t style once the
details about the kernel patch will be shaken out.
simon [Sun, 27 Sep 2009 21:01:07 +0000 (21:01 +0000)]
- When we run our trap cleanup handler, echo that we are running this
handler to make it more clear why we are 'suddenly' running df,
umount, and mdconfig.
- Remove trap handler again after we have unconfigured the memory
device etc. Before we could end up running the trap handler if a
later stage failed, which was a bit confusing and not really useful.
ed [Sun, 27 Sep 2009 18:19:41 +0000 (18:19 +0000)]
Add support for VT200-style mouse input.
Right now if applications want to use the mouse on the command line,
they use sysmouse(4) and install a signal handler in the kernel to
deliver signals when mouse events arrive. This conflicts with my plan to
change to TERM=xterm, so implement proper VT200-style mouse input.
Because mouse input is now streamed through the TTY, it means you can
now SSH to another system on the console and use the mouse there as
well. The disadvantage of the VT200 mouse protocol, is that it doesn't
seem to generate events when moving the cursor. Only when pressing and
releasing mouse buttons.
There are different protocols as well, but this one seems to be most
commonly supported.
Reported by: Paul B. Mahol <onemda gmail com>
Tested with: vim(1)
simon [Sun, 27 Sep 2009 14:49:51 +0000 (14:49 +0000)]
Do not allow mmap with the MAP_FIXED argument to map at address zero.
This is done to make it harder to exploit kernel NULL pointer security
vulnerabilities. While this of course does not fix vulnerabilities,
it does mitigate their impact.
Note that this may break some applications, most likely emulators or
similar, which for one reason or another require mapping memory at
zero.
This restriction can be disabled with the security.bsd.mmap_zero
sysctl variable.
des [Sat, 26 Sep 2009 23:05:01 +0000 (23:05 +0000)]
printerr_reply() has never been used for as long as we've had this code in
our tree (13+ years). This is an excellent argument for aggressive use
of "static".
Fix several logic bugs in the previous IPv6 variable change and
re-add $ipv6_enable support for backward compatibility. From
UPDATING:
1. To use IPv6, simply define $ifconfig_IF_ipv6 like $ifconfig_IF
for IPv4. For aliases, $ifconfig_IF_aliasN should be used.
Note that both variables need the "inet6" keyword at the head.
Do not set $ipv6_network_interfaces manually if you do not
understand what you are doing. It is not needed in most cases.
$ipv6_ifconfig_IF and $ipv6_ifconfig_IF_aliasN still work, but
they are obsolete.
2. $ipv6_enable is obsolete. Use $ipv6_prefer and/or
"inet6 accept_rtadv" keyword in ifconfig(8) instead.
If you define $ipv6_enable=YES, it means $ipv6_prefer=YES and
all configured interfaces have "inet6 accept_rtadv" in the
$ifconfig_IF_ipv6. These are for backward compatibility.
3. A new variable $ipv6_prefer has been added. If NO, IPv6
functionality of interfaces with no corresponding
$ifconfig_IF_ipv6 is disabled by using "inet6 ifdisabled" flag,
and the default address selection policy of ip6addrctl(8)
is the IPv4-preferred one (see rc.d/ip6addrctl for more details).
Note that if you want to configure IPv6 functionality on the
disabled interfaces after boot, first you need to clear the flag by
using ifconfig(8) like:
ifconfig em0 inet6 -ifdisabled
If YES, the default address selection policy is set as
IPv6-preferred.
The default value of $ipv6_prefer is NO.
4. If your system need to receive Router Advertisement messages,
define "inet6 accept_rtadv" in $ifconfig_IF_ipv6. The rc(8)
scripts automatically invoke rtsol(8) when the interface becomes
UP. The Router Advertisement messages are used for SLAAC
(State-Less Address AutoConfiguration).
Make malloc(3) superpage aware. Specifically, if getpagesizes(3) returns
a large page size that is greater than malloc(3)'s default chunk size but
less than or equal to 4 MB, then increase the chunk size to match the large
page size.
Most often, using a chunk size that is less than the large page size is not
a problem. However, consider a long-running application that allocates and
frees significant amounts of memory. In particular, it frees enough memory
at times that some of that memory is munmap()ed. Up until the first
munmap(), a 1MB chunk size is just fine; it's not a problem for the virtual
memory system. Two adjacent 1MB chunks that are aligned on a 2MB boundary
will be promoted automatically to a superpage even though they were
allocated at different times. The trouble begins with the munmap(),
releasing a 1MB chunk will trigger the demotion of the containing superpage,
leaving behind a half-used 2MB reservation. Now comes the real problem.
Unfortunately, when the application needs to allocate more memory, and it
recycles the previously munmap()ed address range, the implementation of
mmap() won't be able to reuse the reservation. Basically, the coalescing
rules in the virtual memory system don't allow this new range to combine
with its neighbor. The effect being that superpage promotion will not
reoccur for this range of addresses until both 1MB chunks are freed at some
point in the future.
ed [Sat, 26 Sep 2009 15:26:32 +0000 (15:26 +0000)]
Add 256 color support.
It is quite inconvenient that if an application for xterm uses 256 color
mode, text suddenly starts to blink (because of ;5; in the middle).
We'd better just implement 256 color mode and add a conversion routine
from 256 to 8 color mode, which doesn't seem to be too bad in practice.
Remapping colors is done quite simple. If one of the channels is most
actively represented, primary colors are used. If two channels are most
actively represented, secondary colors are used. If all three channels
are equal (gray), it picks between black and white.
ed [Sat, 26 Sep 2009 15:07:11 +0000 (15:07 +0000)]
Properly get out of origin mode if the cursor has to move outside of it.
In some cases events may occur that move the cursor outside the
scrolling region while in origin mode, which is normally not possible.
Events like these include:
lindev(4) [1] is supposed to be a collection of linux-specific pseudo
devices that we also support, just not by default (thus only LINT or
module builds by default).
While currently there is only "/dev/full" [2], we are planning to see more
in the future. We may decide to change the module/dependency logic in the
future should the list grow too long.
This is not part of linux.ko as also non-linux binaries like kFreeBSD
userland or ports can make use of this as well.
Handle cases where virtual (GFS) vnodes are referenced when doing forced
unmount. In that case we cannot depend on the proper order of invalidating
vnodes, so we have to free resources when we have a chance.
PR: kern/139062
Reported by: trasz
MFC after: 3 days
- Don't depend on value returned by gfs_*_inactive(), it doesn't work
well with forced unmounts when GFS vnodes are referenced.
- Make other preparations to GFS for forced unmounts.
PR: kern/139062
Reported by: trasz
MFC after: 3 days
- Use x86bios_offset() instead of BIOS_PADDRTOVADDR() macro.[1]
- Clear all registers before calling real mode interrupt handlers as we did
for dpms and vesa and re-enable the function as it should be fixed by this.
- Tidy up register access. For example, when we call INT 0x15, AH=0xc0,
we used to initialize AX=0xc000 to clear AL at the same time but it is
very confusing. We don't have to do this any more because we are explicitly
clearing all registers now.
- Check size of system configuration table although it is almost always 8.
This is to make sure we are not reading some random low physical memory.
Hopefully it is just zero in that case. :-)
- Fix some style nits and add more comments.
Reject some VESA graphics modes if the controller does not have enough
memory to support them. Some adapters have expansible memory slots but
video mode table is static. In this case, unusable modes may be reported.
- Reduce BIOS memory mapping. We want 1MB of physical memory, not 12MB[1].
- Remove CS and IP registers from x86bios.h. They have no use for us.
- Adjust register dump to make it little bit more useful for debugging.
ed [Fri, 25 Sep 2009 11:58:51 +0000 (11:58 +0000)]
Conformance: ignore {delete,insert} line while outside the scrolling region.
I noticed a small inconsistency in delete and insert line between xterm
and libteken. libteken allows these actions to happen while the cursor
is placed outside the scrolling region, while xterm does not.
This behaviour seems to be VT100-like. Confirmation:
http://www.vt100.net/docs/vt102-ug/chapter5.html
"This sequence is ignored when cursor is outside scrolling region."
In function do_rw_wrlock, when a writer got an error and before returning,
check if there are readers blocked by us via URWLOCK_WRITE_WAITERS flag,
and resume the readers. The error must be EAGAIN, otherwise there must
have memory problem, and nobody can rescue the buggy application.
ed [Thu, 24 Sep 2009 20:33:14 +0000 (20:33 +0000)]
Make SCS work in 8-bit mode.
This means we can finally do things like VT100 box drawing when using
Syscons (8-bit characters). As far as I know, the only remaining issue
is the absense of proper escape sequences for special keyboard
characters (cursor, F1 to F12, etc) and xterm emulation should be ready
for general use.
Enabling xterm would have the following advantages:
- Easier possible migration to Unicode. cons25 termcap entries are very
8-bit centric. They use things like CP437 characters for box drawing,
etc.
- Better support for SSH'ing to other operating systems/devices. Most
switches use VT100-style admin interfaces.
- Reduced bandwidth, because applications can now use things like
scrolling regions.
- You can finally use applications like dtach(1) on both the console and
inside an xterm.
Some broken VESA BIOSes, e.g., IBM T23, return wrong value from
vesa_bios_get_line_length() in graphics mode. Always calculate the value
from known video info instead.
Align Tx/Rx descriptors on 32 bytes boundary instead of PAGE_SIZE.
Also align setup descriptor on 32 bytes boundary. Tx buffer have no
alignment limitation so create dmamap without alignment
restriction[1]. Rx buffer still seems to require 4 bytes alignment
limitation but we can simply use MCLBYTES for size to map the
buffer instead of TULIP_DATA_PER_DESC as the buffer is allocated
with m_getcl(9).
de(4) supports up to TULIP_MAX_TXSEG segments for Tx buffers,
increase maximum dma segment size to TULIP_MAX_TXSEG * MCLBYTES.
While I'm here remove TULIP_DATA_PER_DESC as it is not used anymore.
This should fix de(4) breakage introduced after r176206.
Do not call BUS_DRIVER_ADDED() for detached buses (attach failed) on
driver load. This fixes crash on atapicam module load on systems,
where some ata channels (usually ata1) was probed, but failed to attach.
Before calling vflush(FORCECLOSE) mark file system as unmounted so the
following vnops will fail. This is very important, because without this change
vnode could be reclaimed at any point, even if we increased usecount. The only
way to ensure that vnode won't be reclaimed was to lock it, which would be very
hard to do in ZFS without changing a lot of code. With this change simply
increasing usecount is enough to be sure vnode won't be reclaimed from under
us. To be precise it can still be reclaimed but we won't be able to see it,
because every try to enter ZFS through VFS will result in EIO.
The only function that cannot return EIO, because it is needed for vflush() is
zfs_root(). Introduce ZFS_ENTER_NOERROR() macro that only locks
z_teardown_lock and never returns EIO.
Close race in zfs_zget(). We have to increase usecount first and then
check for VI_DOOMED flag. Before this change vnode could be reclaimed
between checking for the flag and increasing usecount.
o introduce PCIE_REGMAX and use it instead of ad-hoc constant
o where 'reg' parameter/variable is not already unsigned, cast it to
unsigned before comparison with maximum value to cut off negative
values
o use PCI_SLOTMAX in several places where 31 or 32 were explicitly used
o drop redundant check of 'bytes' in i386 pciereg_cfgread() - valid
values are already checked in the subsequent switch
rwlock implemented from libthr need to fall through the 'hard path' and
query umtx also if the shared waiters bit is set on a shared lock.
The writer starvation avoidance technique, infact, can lead to shared
waiters on a shared lock which can bring to a missed wakeup and thus
to a deadlock if the right bit is not checked (a notable case is the
writers counterpart to be handled through expired timeouts).
Fix that by checking for the shared waiters bit also when unlocking the
shared locks.
That bug was causing a reported MySQL deadlock.
Many thanks go to Nick Esborn and his employer DesertNet which provided
time and machines to identify and fix this issue.
PR: thread/135673
Reported by: Nick Esborn <nick at desert dot net>
Tested by: Nick Esborn <nick at desert dot net>
Reviewed by: jeff
- Use spin lock instead of default mutex for safety. INT/IRET instructions
save/clear/restore flags but emulated flags have no effect on the host.
I believe BIOS writers never meant to run their code in emulated
environment with interrupt enabled. :-)
- Use memcpy(3) instead of copying individual members. I believe struct
x86regs was intentionally copied from the first half of struct x86emu_regs
for this very purpose.
- Fix some style nits and consistencies.
Extract the code to find and map the MADT ACPI table during early kernel
startup and genericize it so it can be reused to map other tables as well:
- Add a routine to walk a list of ACPI subtables such as those used in the
APIC and SRAT tables in the MI acpi(4) driver.
- Move the routines for mapping and unmapping an ACPI table as well as
mapping the RSDT or XSDT and searching for a table with a given signature
out into acpica_machdep.c for both amd64 and i386.
Update 802.11s mesh support to draft 3.03. This includes a revised frame
format for peering and changes to the PERR frames.
Note that this is incompatible with the previous code.
- Split the logic to parse an SMAP entry out into a separate function on
amd64 similar to i386. This fixes a bug on amd64 where overlapping
entries would not cause the SMAP parsing to stop.
- Change the SMAP parsing code to do a sorted insertion into physmap[]
instead of an append to support systems with out-of-order SMAP entries.
PR: amd64/138220
Reported by: James R. Van Artsdalen james of jrv org
MFC after: 3 days
Don't reread the command register to see if enabling I/O or memory
decoding "took". Other OS's that I checked do not do this and it breaks
some amdpm(4) devices. Prior to 7.2 we did not honor the error returned
when this failed anyway, so this in effect restores previous behavior.
PR: kern/137668
Tested by: Aurelien Mere aurelien.mere amc-os.com
MFC after: 3 days
marius [Tue, 22 Sep 2009 11:47:21 +0000 (11:47 +0000)]
- Add missing bus_dmamap_sync(9) calls for the work DMA map. Previously
the work area was totally unsynchronized which means this driver only
had a chance of working on x86 when no bounce buffers were involved,
which isn't that likely given that support for 64-bit DMA is currently
broken throughout ata(4).
- Add necessary little-endian conversion of accesses to the work area,
making this driver work on big-endian hosts. While at it, use the
alignment-agnostic byte order encoders in order to be on the safe side.
- Clear the reserved member of the SG list entries in order to be on the
safe side. [1]
marius [Tue, 22 Sep 2009 11:38:45 +0000 (11:38 +0000)]
- According to Linux, the ALi M5451 can do 31-bit DMA instead of just
30-bit like the reset of the controllers supported by this driver.
Actually ALi M5451 can be setup up to generate 32-bit addresses by
setting the 31st bit via the accompanying ISA bridge, which allows
it to work in sparc64 machines whose IOMMU require at least 32-bit
DMA. Even though other architectures would also benefit from 32-bit
DMA, enabling this bit is limited to sparc64 as bus_dma(9) doesn't
generally guarantee that a low address of BUS_SPACE_MAXADDR_32BIT
results in a buffer in the 32-bit range.
- According to Tatsuo YOKOGAWA's ali(4), the the DMA transfer size of
ALi M5451 is fixed to 64k and in fact using the default size of 4k
- The 4DWAVE DX and NX require the recording buffer to be 8-byte
aligned so adjust the bus_dma_tag_create(9) accordingly.
- Unlike the rest of the controllers supported by this driver, the
ALi M5451 only has 32 hardware channels instead of 64 so limit the
loop in tr_intr() accordingly. [1]
Submitted by: yongari [1]
Reviewed by: yongari (superset of what is committed)
MFC after: 3 days
ed [Tue, 22 Sep 2009 11:29:11 +0000 (11:29 +0000)]
Use an unsigned integer for storing the key code.
It seems Clang breaks when checking for SPCLKEY, which is now
0x80000000. Using an unsigned integer fixes this. This is also
consistent with other pieces of kbd/syscons code, because these also use
u_int.
Build x86bios only for i386/amd64 for now. More work is required
to make these functional on other architectures, and the current
code breaks sparc64 and powerpc.
Improve mxge watchdog routine's ability to reliably reset a failed NIC:
- Mark the link as down, so if watchdog reset fails, link watching
failover software can notice it
- Don't send MXGEFW_CMD_ETHERNET_DOWN if the NIC has been reset, it is
not needed, and will fail on a freshly reset NIC.
- Ensure the transmit routines aren't attempting to PIO write to doorbells
while the NIC is being reset.
- Download the correct f/w, rather than using the EEPROM f/w after reset.
- Export a count of the number of watchdog resets via sysctl
- Zero all f/w stats at reset. This will lead to less confusing
diagnostic output when investigating NIC failures.