PR SCTP Bugs. Basically a full sized frame of
PR SCTP FWD-TSN's would not be sent and thus
cause a stalled connection. Also the rwnd
Calculation was also off on the receiver side for
PR-SCTP.
MFC after: 1 month
The emulation of 'ld' and 'sd' instructions only works for ABIs that support
64-bit registers and the instructions 'ldl' and 'ldr' that operate on those
registers.
o) Subtract 64K from the default userland stack pointer. GCC generate code
that with a 32-bit ABI on a system with 64-bit registers can attempt to
access an invalid (well, kernel) memory address rather than the intended
user address for stack-relative loads and stores. Lowering the stack
pointer works around this. [1]
o) Make TRAP_DEBUG code conditional on the trap_debug variable. Make
trap_debug default to 0 instead of 1 now but make it possible to change it
at runtime using sysctl.
o) Kill programs that attempt an unaligned access of a kernel address. Note
that with some ABIs, calling useracc() is not sufficient since the register
may be 64-bit but vm_offset_t is 32-bit so a kernel address could be
truncated to what looks like a valid user address, allowing the user to
crash the kernel.
o) Clean up unaligned access emulation to support unaligned 16-bit and 64-bit
accesses. (For 16-bit accesses it was checking for user access to too much
memory (4 bytes) and there was no 64-bit support.) This still lacks support
for unaligned load-linked and store-conditional.
- Use the traditional behaviour for filename and directory name inclusion
and exclusion patterns [1]
- Some improvements on the exiting code, like replacing memcpy with
strlcpy/strcpy
Approved by: delphij (mentor)
Pointed out by: bf [1], des [1]
des [Wed, 28 Jul 2010 16:11:22 +0000 (16:11 +0000)]
Redo fetch_read() using non-blocking sockets. This is necessary to
avoid a hang in the SSL case if the server sends a close notification
before we are done reading. In the non-SSL case, it can provide a
minor (but probably not noticeable) performance improvement for small
transfers.
Add MALLOC_DEBUG_MAXZONES debug malloc(9) option to use multiple uma
zones for each malloc bucket size. The purpose is to isolate
different malloc types into hash classes, so that any buffer overruns
or use-after-free will usually only affect memory from malloc types in
that hash class. This is purely a debugging tool; by varying the hash
function and tracking which hash class was corrupted, the intersection
of the hash classes from each instance will point to a single malloc
type that is being misused. At this point inspection or memguard(9)
can be used to catch the offending code.
Add MALLOC_DEBUG_MAXZONES=8 to -current GENERIC configuration files.
The suggestion to have this on by default came from Kostik Belousov on
-arch.
This code is based on work by Ron Steinke at Isilon Systems.
Add a parser for the ACPI SRAT table for amd64 and i386. It sets
PCPU(domain) for each CPU and populates a mem_affinity array suitable
for the NUMA support in the physical memory allocator.
Update mlockall(2) to mention that it's superuser-only syscall, just
like the mlock(2) manual page says. Update mlock(2) to say that hitting
RLIMIT_MEMLOCK results in ENOMEM, not EAGAIN.
Very rough first cut at NUMA support for the physical page allocator. For
now it uses a very dumb first-touch allocation policy. This will change in
the future.
- Each architecture indicates the maximum number of supported memory domains
via a new VM_NDOMAIN parameter in <machine/vmparam.h>.
- Each cpu now has a PCPU_GET(domain) member to indicate the memory domain
a CPU belongs to. Domain values are dense and numbered from 0.
- When a platform supports multiple domains, the default freelist
(VM_FREELIST_DEFAULT) is split up into N freelists, one for each domain.
The MD code is required to populate an array of mem_affinity structures.
Each entry in the array defines a range of memory (start and end) and a
domain for the range. Multiple entries may be present for a single
domain. The list is terminated by an entry where all fields are zero.
This array of structures is used to split up phys_avail[] regions that
fall in VM_FREELIST_DEFAULT into per-domain freelists.
- Each memory domain has a separate lookup-array of freelists that is
used when fulfulling a physical memory allocation. Right now the
per-domain freelists are listed in a round-robin order for each domain.
In the future a table such as the ACPI SLIT table may be used to order
the per-domain lookup lists based on the penalty for each memory domain
relative to a specific domain. The lookup lists may be examined via a
new vm.phys.lookup_lists sysctl.
- The first-touch policy is implemented by using PCPU_GET(domain) to
pick a lookup list when allocating memory.
mips/rmi/bus_space_rmi_pci.c is needed even when PCI is disabled. This
file really provides a bus that does byteswapping, and can be used by
non-PCI components too.
Introduce exec_alloc_args(). The objective being to encapsulate the
details of the string buffer allocation in one place.
Eliminate the portion of the string buffer that was dedicated to storing
the interpreter name. The pointer to the interpreter name can simply be
made to point to the appropriate argument string.
Document that the "ngtee" action no longer accepts packet, and
thus don't depend on one_pass flag anymore.
This is a POLA violation, but it is quite difficult to restore
the old behavior with new code. Also, the new behavior matches
behavior of the older "tee" action, and this is more intuitive.
When installing a new ARP entry via 'arp -S', lla_lookup() will
either find an existing entry, or allocate a new one. In the latter
case an entry would have flags, that were supplied as argument to
lla_lookup(). In case of an existing entry, flags aren't modified.
This lead to losing LLE_PUB and/or LLE_PROXY flags.
We should apply these flags either in lla_rt_output() or in the
in.c:in_lltable_lookup(). It seems to me that lla_rt_output() is
a more correct choice.
PR: kern/148784, kern/146539
Silence from: qingli, 5 days
Fixup mips/rmi for the new mips timer code(r210403). This will get XLR
booting again.
The code is a copy of the mips/mips/tick.c with minor modifications for
XLR interrupt handling. Disable mips/rmi/clock.c for now, the PIC based
timer code will be added later.
- Support two devices made by West Mountain Radio in uslcom(4) [1]
- Bring in several other devices from OpenBSD while here. Use the
official manufacturer name over the OpenBSD name in the case of
GEMALTO. Reorder list slightly to aid future syncing.
- Remove duplicate SILABS CP2102 define from usbdevs
Prevent uhid(4) from attaching to the Gembird Silver Shield remote power
plug. Note that the Vendor ID 0x04b4 is officially assigned to Cypress,
so use that instead of adding a second vendor with an identical ID, in the
same way other similar cases are treated in usb/usbdevs.
Re-implement FPU suspend/resume for amd64. This removes superfluous uses
of critical_enter(9) and critical_exit(9) by fpugetregs() and fpusetregs().
Also, we do not touch PCB flags any more.
- Change the warning about PCI-e links narrower than x8 to only apply to
10G cards. 1G cards are x4 only.
- Use constants from pcireg.h for reading the current link width.
- Use pci_set_max_read_req() rather than implementing it by hand.
Revert r210451, and the similar part of the r210431. The forward-declaration
for the enum tag when enum definition is not complete is not allowed by
C99, and is gcc extension.
Make sure that we report chunks if a socket
still exists that were not sent. In either
case carefully remove the data if it does not
get taken by the reporting routines.
When counting the number of chunks in the
retransmission queue to validate the retran count, we
need to include the chunks in the control send queue
too. Otherwise the count will not match and you will get
the invarient warning if invarients are on.
If an ; or & token was followed by an EOF token, pending here-documents were
left uninitialized. Execution would crash, either in the main shell process
for literal here-documents or in a child process for expanded
here-documents. In the latter case the problem is hard to detect apart from
the core dumps and log messages.
Side effect: slightly different retries on inputs where EOF is not
persistent.
Note that tools/regression/bin/sh/parser/heredoc6.0 still causes a similar
crash in a child process. The text passed to eval is malformed and should be
rejected.
Add an example to encourage people to have a look at either
make(1) or /usr/ports/ports-mgmt/portconf for port-specific
variables/options to compile a port.
PR: docs/145655
Submitted by: Armin Pirkovitsch (armin at frozen dash zone dot org)
Discussed with: dougb
MFC after: 7 days
Change the order in which the file name, arguments, environment, and
shell command are stored in exec*()'s demand-paged string buffer. For
a "buildworld" on an 8GB amd64 multiprocessor, the new order reduces
the number of global TLB shootdowns by 31%. It also eliminates about
330k page faults on the kernel address space.
Change exec_shell_imgact() to use "args->begin_argv" consistently as
the start of the argument and environment strings. Previously, it
would sometimes use "args->buf", which is the start of the overall
buffer, but no longer the start of the argument and environment
strings. While I'm here, eliminate unnecessary passing of "&length"
to copystr(), where we don't actually care about the length of the
copied string.
Clean up the initialization of the exec map. In particular, use the
correct size for an entry, and express that size in the same way that
is used when an entry is allocated. The old size was one page too
large. (This discrepancy originated in 2004 when I rewrote
exec_map_first_page() to use sf_buf_alloc() instead of the exec map
for mapping the first page of the executable.)
Export PCI IDs of ATA/SATA controllers through CAM and ata(4) layers to
GEOM. This information needed for proper soft-RAID's on-disk metadata
reading and writing.
- Fix --color behaviour to only output color sequences if stdout is a tty
or if forced mode is specified [1]
- While here, add some alternative names for the options and make then
case-insensitive
- Fix -q and -l behaviour [2]
- Some small changes to make the code easier to review
Get N64 building by defining VM_FREELIST_DIRECT to be
VM_FREELIST_DEFAULT. I believe this is correct, since KX is set in
n64, and thus all RAM can be direct mapped.
andrew [Sat, 24 Jul 2010 23:41:09 +0000 (23:41 +0000)]
Allow external interrupts.
- Set the external pin to interrupt in bus_setup_intr
- Implement bus_config_intr for external interrupts
- Extend arm_{,un}mask_irq to work with external interrupts
Move sys/nfsclient/nfs_lock.c into sys/nfs and build it as a separate
module that can be used by both the regular and experimental nfs
clients. This fixes the problem reported by jh@ where /dev/nfslock
would be registered twice when both nfs clients were used.
I also defined the size of the lm_fh field to be the correct value,
as it should be the maximum size of an NFSv3 file handle.
Increment td->td_intr_nesting_level for LAPIC timer interrupts. Among other
things it hints SCHED_ULE to run clock swi handlers on their native CPUs,
avoiding many unneeded IPI_PREEMPT calls.
Remove extra commas from KTR_EVENT4() macro to match number of CTR6() args.
Comparing to other macros there should be strings concatenation, not a
separate arguments.
simon [Sat, 24 Jul 2010 10:04:35 +0000 (10:04 +0000)]
Make failed open of /dev/mdctl in the bsnmpd hostres module non-fatal.
This makes it possible to use the hostres module when bsnmpd is not
running as root.