marius [Thu, 1 Apr 2010 15:17:50 +0000 (15:17 +0000)]
MFC: r205409
- The firmware of Sun Fire V1280 has a misfeature of setting %wstate to
7 which corresponds to WSTATE_KMIX in OpenSolaris whenever calling into
it which totally screws us even when restoring %wstate afterwards as
spill/fill traps can happen while in OFW. The rather hackish OpenBSD
approach of just setting the equivalent of WSTATE_KERNEL to 7 also is
no option as we treat %wstate as a bit field. So in order to deal with
this problem actually implement spill/fill handlers for %wstate 7 which
just act as the WSTATE_KERNEL ones except of theoretically also handling
32-bit, turn off interrupts completely so we don't even take IPIs while
in OFW which should ensure we only take spill/fill traps at most and
restore %wstate after calling into OFW once we have taken over the trap
table. While at it, actually set WSTATE_{,PROM}_KMIX before calling into
OFW just like OpenSolaris does, which should at least help testing this
change on non-V1280.
- Remove comments referring to the %wstate usage in BSD/OS.
- Remove the no longer used RSF_ALIGN_RETRY macro.
- Correct some trap table addresses in comments.
- Ensure %wstate is set to WSTATE_KERNEL when taking over the trap table.
- Ensure PSTATE_AM is off when entering or exiting to OFW as well as that
interrupts are also completely off when exiting to OFW as the firmware
trap table shouldn't be used to handle our interrupts.
Update the page table locking for the 64-bit PMAP. One of these revisions
largely reverted the other, so there is a small amount of churn and the
addition of some mtx_assert()s.
Fix two small bugs. The PowerPC 970 does not support non-coherent memory
access, and reflects this by autonomously writing LPTE_M into PTE entries.
As such, we should not panic if LPTE_M changes by itself. While here,
fix a harmless typo in moea64_sync_icache().
MFC: r197542:
- When we run our trap cleanup handler, echo that we are running this
handler to make it more clear why we are 'suddenly' running df,
umount, and mdconfig.
- Remove trap handler again after we have unconfigured the memory
device etc. Before we could end up running the trap handler if a
later stage failed, which was a bit confusing and not really useful.
MFC after: 2 weeks
Log:
- restructure flowtable to support ipv6
- add a name argument to flowtable_alloc for printing with ddb commands
- extend ddb commands to print destination address or 4-tuples
- don't parse ports in ulp header if FL_HASH_ALL is not passed
- add kern_flowtable_insert to enable more generic use of flowtable
(e.g. system calls for adding entries)
- don't hash loopback addresses
- cleanup whitespace
- keep statistics per-cpu for per-cpu flowtables to avoid cache line contention
- add sysctls to accumulate stats and report aggregate
r205069:
Log:
fix stats reporting sysctl
r205093:
Log:
re-update copyright to 2010
pointed out by danfe@
r205097:
Log:
flowtable_get_hashkey is only used by a DDB function - move under #ifdef DDB
pointed out by jkim@
r205488:
Log:
- boot-time size the ipv4 flowtable and the maximum number of flows
- increase flow cleaning frequency and decrease flow caching time
when near the flow limit
- stop allocating new flows when within 3% of maxflows don't start
allocating again until below 12.5%
marius [Wed, 31 Mar 2010 22:05:49 +0000 (22:05 +0000)]
MFC: r205399
Improve the KVA space sizing of r186682; on machines with large dTLBs we
can actually use all of the available lockable entries of the tiny dTLB
for the kernel TSB. With this change the KVA space sizing happens to be
more in line with the MI one so up to at least 24GB machines KVA doesn't
need to be limited manually. This is just another stopgap though, the
real solution is to take advantage of ASI_ATOMIC_QUAD_LDD_PHYS on CPUs
providing it so we don't need to lock the kernel TSB pages into the dTLB
in the first place.
marius [Wed, 31 Mar 2010 21:57:48 +0000 (21:57 +0000)]
MFC: r205258
- Add TTE and context register bits for the additional page sizes supported
by UltraSparc-IV and -IV+ as well as SPARC64 V, VI, VII and VIIIfx CPUs.
- Replace TLB_PCXR_PGSZ_MASK and TLB_SCXR_PGSZ_MASK with TLB_CXR_PGSZ_MASK
which just is the complement of TLB_CXR_CTX_MASK instead of trying to
assemble it from the page size bits which vary across CPUs.
- Add macros for the remainder of the SFSR bits, which are useful for at
least debugging purposes.
marius [Wed, 31 Mar 2010 21:32:52 +0000 (21:32 +0000)]
MFC: r204152, r204164
Some machines can not only consist of CPUs running at different speeds
but also of different types, f.e. Sun Fire V890 can be equipped with a
mix of UltraSPARC IV and IV+ CPUs, requiring different MMU initialization
and different workarounds for model specific errata. Therefore move the
CPU implementation number from a global variable to the per-CPU data.
Functions which are called before the latter is available are passed the
implementation number as a parameter now.
jkim [Wed, 31 Mar 2010 16:01:48 +0000 (16:01 +0000)]
MFC: r205855
Print memory model of the video mode except for planar memory model.
'P', 'D', 'C', 'H', and 'V' mean packed pixel, direct color, CGA, Hercules,
and VGA X memory models respectively where they have fixed number of planes.
Sync. pixel mode support for VESA and VGA frame buffers with HEAD.
- Map entire video memory again. Although we do not use them all directly,
it seems VGA renderer may access unmapped memory region and cause kernel
panic.
- Fall back to VGA palette functions if VESA function failed and DAC is
still in 6-bit mode. Although we have to check non-VGA compatibility bit
here, it seems there are too many broken VESA BIOSes out to rely on it.
- Be careful when we determine bytes per scan line information. We compare
mode table data against minimum value. If the mode table does not make
sense, we set the minimum in the mode info.
- Teach VGA framebuffer about 8-bit palette format for VESA.
- Add my copyright here.
jkim [Wed, 31 Mar 2010 15:39:46 +0000 (15:39 +0000)]
MFC: r205550, r205605, r205865
Sync. pixel mode support for syscons(4) with HEAD.
- Separate 24-bit pixel draw from 32-bit case. Although it is slower, we do
not want to write a useless zero to inaccessible memory region.
- We only want the dummy palette for direct color mode.
- Detect illegal access to unmapped memory within real mode emulator.
- Map EBDA if available and support memory wraparound above 1MB as VM86 does.
- Set initial %ds to 0x40 as X.org int10 handler does.
- Print the initial memory map when bootverbose is set.
- Optimize real mode page table lookup.
- Add strictly aligned memory access for distant future.
- Update copyright date.
marcel [Wed, 31 Mar 2010 03:20:14 +0000 (03:20 +0000)]
MFC revs 199502, 199566 and 199574:
Add a seatbelt to the Nested TLB Fault handler to give us a chance
to panic when we have an unexpected TLB fault while interrupt
collection is disabled.
marcel [Wed, 31 Mar 2010 02:43:58 +0000 (02:43 +0000)]
MFC rev 198341 and 198342:
o Introduce vm_sync_icache() for making the I-cache coherent with
the memory or D-cache, depending on the semantics of the platform.
vm_sync_icache() is basically a wrapper around pmap_sync_icache(),
that translates the vm_map_t argumument to pmap_t.
o Introduce pmap_sync_icache() to all PMAP implementation. For powerpc
it replaces the pmap_page_executable() function, added to solve
the I-cache problem in uiomove_fromphys().
o In proc_rwmem() call vm_sync_icache() when writing to a page that
has execute permissions. This assures that when breakpoints are
written, the I-cache will be coherent and the process will actually
hit the breakpoint.
o This also fixes the Book-E PMAP implementation that was missing
necessary locking while trying to deal with the I-cache coherency
in pmap_enter() (read: mmu_booke_enter_locked).
luigi [Wed, 31 Mar 2010 01:51:08 +0000 (01:51 +0000)]
A last-minute change in the previous commit broke rule deletion,
so i am fixing it, this time with a more detailed description
of what the code is supposed to do.
marius [Tue, 30 Mar 2010 20:44:04 +0000 (20:44 +0000)]
MFC: r203845
- Add the 'cmp' and 'core' pseudo-busses which are used to group CPU cores
to the exclusion lists as the CPU nodes aren't handled as regular devices
either. Also add the pseudo-devices found in Sun Fire V1280.
- Allow nexus_attach() and nexus_alloc_resource() to be used by drivers
derived from nexus(4) for subordinate busses.
- Don't add the zero-sized memory resources of glue devices to the resource
lists.
marius [Tue, 30 Mar 2010 20:29:45 +0000 (20:29 +0000)]
MFC: r203838
- Search the whole OFW device tree instead of only the children of the
root nexus device for the CPUs as starting with UltraSPARC IV the 'cpu'
nodes hang off of from 'cmp' (chip multi-threading processor) or 'core'
or combinations thereof. Also in large UltraSPARC III based machines
the 'cpu' nodes hang off of 'ssm' (scalable shared memory) nodes which
group snooping-coherency domains together instead of directly from the
nexus.
It would be great if we could use newbus to deal with the different ways
the 'cpu' devices can hang off of pseudo ones but unfortunately both
cpu_mp_setmaxid() and sparc64_init() have to work prior to regular device
probing.
- Add support for UltraSPARC IV and IV+ CPUs. Due to the fact that these
are multi-core each CPU has two Fireplane config registers and thus the
module/target ID has to be determined differently so the one specific
to a certain core is used. Similarly, starting with UltraSPARC IV the
individual cores use a different property in the OFW device tree to
indicate the CPU/core ID as it no longer is in coincidence with the
shared slot/socket ID.
This involves changing the MD KTR code to not directly read the UPA
module ID either. We use the MID stored in the per-CPU data instead of
calling cpu_get_mid() as a replacement in order prevent clobbering any
registers as side-effect in the assembler version. This requires CATR()
invocations from mp_startup() prior to mapping the per-CPU pages to be
removed though.
While at it additionally distinguish between CPUs with Fireplane and
JBus interconnects as these also use slightly different sizes for the
JBus/agent/module/target IDs.
- Make sparc64_shutdown_final() static as it's not used outside of
machdep.c.
marius [Tue, 30 Mar 2010 20:12:42 +0000 (20:12 +0000)]
MFC: r203833
- At least the trap table of the Sun Fire V1280 firmware apparently has
no cleanwindows handler so just remove trying to trigger it from _start
and the AP trampoline code as that leads to a crash there. This should
be okay as leaking data from the OFW via the CPU registers on start of
the kernel should be no real concern.
- Make the comments of _start and the AP trampoline code regarding the
initializations they perform match each other and reality.
- Make the comments of the AP trampoline code regarding iTLB accesses
refer to the right macro.
marius [Tue, 30 Mar 2010 20:05:20 +0000 (20:05 +0000)]
MFC: r203830, r203831
Use the SUNW,{d,i}tlb-load methods for entering locked TLB entries like
OpenBSD and OpenSolaris do instead of fiddling with the MMUs ourselves.
Unlike direct access the firmware methods don't automatically use the
next free (?) TLB slot, instead the slot to be used has to be specified.
We allocate the TLB slots for the kernel top-down as OpenSolaris suggests
that the firmware will always allocate the ones for its own use bottom-up.
Besides being simpler, according to OpenBSD using the firmware methods is
required to allow booting on Sun Fire E10K with multi-systemboard domains.
marius [Tue, 30 Mar 2010 20:02:26 +0000 (20:02 +0000)]
MFC: r203829
- Assert that HEAPSZ is a multiple of PAGE_SIZE as at least the firmware
of Sun Fire V1280 doesn't round up the size itself but instead lets
claiming of non page-sized amounts of memory fail.
- Change parameters and variables related to the TLB slots to unsigned
which is more appropriate.
- Search the whole OFW device tree instead of only the children of the
root nexus device for the BSP as starting with UltraSPARC IV the 'cpu'
nodes hang off of from 'cmp' (chip multi-threading processor) or 'core'
or combinations thereof. Also in large UltraSPARC III based machines
the 'cpu' nodes hang off of 'ssm' (scalable shared memory) nodes which
group snooping-coherency domains together instead of directly from the
nexus.
- Add support for UltraSPARC IV and IV+ BSPs. Due to the fact that these
are multi-core each CPU has two Fireplane config registers and thus the
module/target ID has to be determined differently so the one specific
to a certain core is used. Similarly, starting with UltraSPARC IV the
individual cores use a different property in the OFW device tree to
indicate the CPU/core ID as it no longer is in coincidence with the
shared slot/socket ID.
While at it additionally distinguish between CPUs with Fireplane and
JBus interconnects as these also use slightly different sizes for the
JBus/agent/module/target IDs.
- Check the return value of init_heap(). This requires moving it after
cons_probe() so we can panic when appropriate. This should be fine as
the PowerPC OFW loader uses that order for quite some time now.
marius [Tue, 30 Mar 2010 19:37:47 +0000 (19:37 +0000)]
MFC: r203341
- Remove the BUS_HANDLE_MIN checking in the __BUS_DEBUG_ACCESS macro;
for UPA it should have fulfilled its purpose by now and Fireplane-
and JBus-based machines are way to messy in organization to implement
something equivalent.
- Fix a bunch of style(9) bugs.
marius [Tue, 30 Mar 2010 19:36:00 +0000 (19:36 +0000)]
MFC: r203335
- Const'ify the bus_stream_asi and bus_type_asi arrays.
- Replace hard-coded functions names missed in bus_machdep.c with __func__.
- Break some long lines.
marius [Tue, 30 Mar 2010 19:08:02 +0000 (19:08 +0000)]
MFC: r205397
- While SPARC V9 allows tininess to be detected either before or after
rounding (impl. dep. #55), the SPARC JPS1 responsible for SPARC64 and
UltraSPARC processors defines that in all cases tinyness is detected
before rounding, therefore rounding up to the smallest normalised
number should set the underflow flag.
- If an infinite result is rounded down, the result should have an
exponent 1 less than the value for infinity.
attilio [Tue, 30 Mar 2010 11:46:43 +0000 (11:46 +0000)]
MFC r205160:
Checkin a facility for specifying a passthrough FIB from userland.
arcconf tool by Adaptec already seems to use for identifying the
Serial Number of the devices.
attilio [Tue, 30 Mar 2010 11:19:29 +0000 (11:19 +0000)]
MFC r204641, r204753:
Improving the clocks auto-tunning by firstly checking if the atrtc may be
correctly initialized and just then assign to softclock/profclock.
jkim [Mon, 29 Mar 2010 15:59:37 +0000 (15:59 +0000)]
MFC: r205647
Fix stupid typos. Some VESA BIOSes directly call BIOS interrupt handlers
within the VBE interrupt handler. Unfortunately it was causing real mode
page faults because we were fetching instructions from bogus addresses.
attilio [Mon, 29 Mar 2010 15:39:17 +0000 (15:39 +0000)]
MFC r199852, r202387, r202441, r202534:
Handling all the three clocks with the LAPIC may lead to aliasing for
softclock and profclock.
Revert the change when the LAPIC started taking charge of all three of
them.
dougb [Mon, 29 Mar 2010 06:31:58 +0000 (06:31 +0000)]
Update to 9.6.2-P1, the latest patchfix release which deals with
the problems related to the handling of broken DNSSEC trust chains.
This fix is only relevant for those who have DNSSEC validation
enabled and configure trust anchors from third parties, either
manually, or through a system like DLV.
emaste [Mon, 29 Mar 2010 00:14:34 +0000 (00:14 +0000)]
MFC r204264:
Minor diff reduction with Adaptec's driver: in aac_release_command() set
cm_queue to AAC_ADAP_NORM_CMD_QUEUE by default. In every place it was
set, it was set to AAC_ADAP_NORM_CMD_QUEUE anyhow.
jh [Sun, 28 Mar 2010 11:22:38 +0000 (11:22 +0000)]
MFC r198175:
- If lstat()/stat() fails with an error other than ENOENT, don't ignore
the error and assume that the file doesn't exist. Touch could return
success with -c option even if the file existed and time was not set.
- If the first utimes_f() call fails with -A option, give up and don't
continue trying to set times to current time. [1]
- Set exit status to 1 when setting of timestamps fails for a directory
or symbolic link even though lstat()/stat() would succeed.
- Don't print bogus error message when rw() succeeds.
trasz [Sat, 27 Mar 2010 18:45:53 +0000 (18:45 +0000)]
MFC r203122:
Improve descriptions, remove turnstiles (since, from what I understand,
they are only used to implement other synchronization primitives), tweak
formatting.
MFC r203127:
Add description of bounded sleep vs unbounded sleep (aka blocking). Move
rules into their own section.
MFC r203131:
Cosmetic fixes.
MFC r203759:
Improve description for Giant and mention blocking inside interrupt threads.
MFC r203762:
Start sentences with a new line.
Submitted by: brueffer
MFC r203825:
Remove list of locking primitives, which is kind of redundant, move
information about witness(9) to the section about interactions, and
expand 'contexts' table.
MFC r203929:
Some rewording and language fixes.
PR: docs/136918, docs/134074
Submitted by: Ben Kaduk <kaduk at mit dot edu>, Haven Hash <havenster at gmail dot com>
trasz [Sat, 27 Mar 2010 18:09:40 +0000 (18:09 +0000)]
MFC r200273:
Don't add VAPPEND if the file is not being opened for writing. Note that this
only affects cases where open(2) is being used improperly - i.e. when the user
specifies O_APPEND without O_WRONLY or O_RDWR.
trasz [Sat, 27 Mar 2010 18:08:14 +0000 (18:08 +0000)]
MFC r200058:
Add change that was somehow missed in r192586. It could manifest by
incorrectly returning EINVAL from acl_valid(3) for applications linked
against pre-8.0 libc.
trasz [Sat, 27 Mar 2010 18:04:33 +0000 (18:04 +0000)]
MFC r199875:
Provide a set of sysctls and tunables to disable device node creation
for specific "kinds" of disk labels - for example, GPT UUIDs. Reason
for this is that sometimes, other GEOM classes attach to these device
nodes instead of the proper ones - e.g. they attach to /dev/gptid/XXX
instead of /dev/ada0p2, which is annoying.
bz [Sat, 27 Mar 2010 17:57:17 +0000 (17:57 +0000)]
MFC r204840:
As statfs.f_flags are uint64_t the local variables should be as well.
We'll start noticing this with the next flag introduced as the lower
32bit are all used.
bz [Sat, 27 Mar 2010 17:54:44 +0000 (17:54 +0000)]
MFC r205626:
Print the pointer to the lock with the panic message. The previous
panic: rw lock not unlocked
was not really helpful for debugging. Now one can at least call
show lock <ptr>
form ddb to learn more about the lock.
bz [Sat, 27 Mar 2010 17:52:56 +0000 (17:52 +0000)]
MFC r205276:
Add ddb support to the "new" link layer code ("new-arp"):
- show all lltables [1] (optional flag to also show the llentries as well)
- show lltable <struct lltable *>
- show llentry <struct llentry *>
bz [Sat, 27 Mar 2010 17:50:02 +0000 (17:50 +0000)]
MFC r204838:
Destroy TCP UMA zones (empty or not) upon network stack teardown
to not leak them, otherwise making UMA/vmstat unhappy with every
stoped vnet.
We will still leak pages (especially for zones marked NOFREE).
Reshuffle cleanup order in tcp_destroy() to get rid of what we can
easily free first.
bz [Sat, 27 Mar 2010 17:48:13 +0000 (17:48 +0000)]
MFC r204805:
Rework reference counting in case we queue into the netisr,
or overflow the netisr queue and fall back to the interface
queue so that we can garuantee that the ifnet pointer stays
valid. Formerly we ended up with reference counts <= 0 in
case the netisr had returned ENOBUFS. The idea is to track
any packet in the netisr queue and only change the refount
on edge operations for the fallback interface queue. This
also avoids problems in case the if_snd.ifq_len lies to us.
Also rework refount assertions to make sure they trigger if
we go below 1. Formerly a negative refence count did not
trigger the assert as the refcount variable is u_int.
bz [Sat, 27 Mar 2010 17:46:06 +0000 (17:46 +0000)]
MFC r204807:
Destroy UDP UMA zones (empty or not) upon network stack teardown
to not leak them making UMA/vmstat -z unhappy with every stoped vnet.
We will still leak pages (especially as zones are marked NOFREE).
bz [Sat, 27 Mar 2010 17:40:28 +0000 (17:40 +0000)]
MFC r204279:
Use the DB_SHOW_ALL_COMMAND() macro to register the formerly 'show ifnets'
in the db_show_all_table as 'show all ifnets' and with that follow the
convention for showing complete lists.
bz [Sat, 27 Mar 2010 17:39:02 +0000 (17:39 +0000)]
MFC r204145:
Start to implement ifnet DDB support:
- 'show ifnets' prints a list of ifnet *s per virtual network stack,
- 'show ifnet <struct ifnet *>' prints fields matching the given ifp.
We do not yet print the complete set of fields and might want to
factor this out to an extra if_debug.c file in case this grows
a lot[1]. We may also want to grow 'show ifnet <if_xname>' support[1].
bz [Sat, 27 Mar 2010 17:34:57 +0000 (17:34 +0000)]
MFC r204140:
Split up ip_drain() into an outer lock and iterator part and
a "locked" version that will only handle a single network stack
instance. The latter is called directly from ip_destroy().
Hook up an ip_destroy() function to release resources from the
legacy IP network layer upon virtual network stack teardown.
bz [Sat, 27 Mar 2010 17:31:54 +0000 (17:31 +0000)]
MFC r203729:
Add DDB support for printing vnet_sysinit and vnet_sysuninit
ordered call lists. Try to lookup function/symbol names and print
those in addition to the pointers, along with the constants for
subsystem and order.
This is useful for debugging vnet teardown ordering issues.
Make it possible to call the actual printing frunction from normal
code at runtime, ie. from vnet_sysuninit(), if DDB support is there.
bz [Sat, 27 Mar 2010 17:29:50 +0000 (17:29 +0000)]
MFC r203727:
Add an SDT provider for "vnet"s along with probes for vnet_alloc
and vnet_destroy.
Use the line number rather than NULL as dummy argument.
Note: the fbt provider does not reliably provide :return probes
(depending on optimization levels used at compile time) making
it unusable for scripts to generate complete call-traces with
well defined boundaries over allocations or destructions of
virtual network stacks.
trasz [Sat, 27 Mar 2010 17:22:11 +0000 (17:22 +0000)]
MFC r197680:
Provide default implementation for VOP_ACCESS(9), so that filesystems which
want to provide VOP_ACCESSX(9) don't have to implement both. Note that
this commit makes implementation of either of these two mandatory.