Change the interface to the Energy Efficient Ethernet (EEE)
setting in the igb and em driver. This was necessitated by
a shared code change that I was given late in the game, a data
type changed from bool to int, in the last update I dealt with
it by a cast, but it was pointed out (thanks jhb) that there
was a potential problem with this. John suggested this safer
approach, and it is fine with me...
Add a new GEOM method, resize(), which is called after provider size changes.
Add a new routine, g_resize_provider(), to use to notify GEOM about provider
change.
Drop page queues mutex on each iteration of vm_pageout_scan over the
inactive queue, unless busy page is found.
Dropping the mutex often should allow the other lock acquires to
proceed without waiting for whole inactive scan to finish. On machines
with lot of physical memory scan often need to iterate a lot before it
finishes or finds a page which requires laundring, causing high
latency for other lock waiters.
Fix orphan() methods of several GEOM classes to not assume that there
is an error set on the provider. With GEOM resizing, class can become
orphaned when it doesn't implement resize() method and the provider size
decreases.
Implement ia64_physmem_alloc() and use it consistently to get memory
before VM has been initialized. This includes:
1. Replacing pmap_steal_memory(),
2. Replace the handcrafted logic to allocate a naturally aligned VHPT,
3. Properly allocate the DPCPU for the BSP.
Ad 3: Appending the DPCPU to kernend worked as long as we wouldn't
cross into the next PBVM page. If we were to cross into the next
page, then there wouldn't be a PTE entry on the page table for it
and we would end up with a MCA following a page fault. As such,
this commit fixes MCAs occasionally seen.
Create a generic way to support multiple boards within an
arm platform. Add all the atmel boards to the ATMEL kernel for
testing purposes. Until boot loader arg parsing of baord type
is done, this won't actually be able to do the runtime selection.
Generalize this for loading the loader into the SPI. Plus trim about
100 bytes from the binary with silly tricks. Hope to get this small
enough to run on the models that have 4k SRAM. We are close compiled
for the at91rm9200, but still need to trim for the target.
Strip out the useless junk. All we really care about is the text,
data and bss sections. All the rest is needed for normal binaries,
but boot loaders aren't normal.
Hide the creation of phys_avail behind an API to make it easier to do it
correctly. We now iterate the EFI memory descriptors once and collect all
the information in a single pass. This includes:
1. The I/O port base address,
2. The PAL memory region. Have the physmem API track this.
3. Memory descriptors of memory we can't use, like bad memory, runtime
services code & data, etc. Have the physmem API track these.
4. memory descriptors of memory we can use or re-use, such as free
memory, boot time services code & data, loader code & data, etc.
These are added by the physmem API.
Since the PBVM page table and pages are in memory described as loader
data, inform the physmem API of chunks that need to be delated from the
available physical memory.
While here, remove Maxmem and replace it with the better named paddr_max.
Maxmem was defined as physmem, which is generally wrong. Now, paddr_max
is properly defined as the largesty physical address.
The upshot of all this is that:
1. We properly determine realmem.
2. We maximize physmem by re-using memory where possible.
3. We remove complexity from ia64_init() in machdep.c.
4. Remove confusion about realmem, physmem & Maxmem.
The new ia64_physmem_alloc() is to replace pmap_steal_memory() in pmap.c,
as well as replace the handcrafted allocation of the VHPT for the BSP in
pmap_bootstrap() in pmap.c. This is step 2 and addresses the manipulation
of phys_avail after it is being created.
agp.c:
Don't use Maxmem when the amount of memory is meant. Use realmem instead.
Maxmem is not only a MD variable, it represents the highest physical memory
address in use. On systems where memory is sparsely layed-out the highest
memory address and the amount of memory are not interchangeable. Scaling the
AGP aperture based on the actual amount of memory (= realmem) rather than
the available memory (= physmem) makes sure there's consistent behaviour
across architectures.
agp_i810.c:
While arguably the use of Maxmem can be considered correct, replace its use
with realmem anyway. agp_i810.c is specific to amd64, i386 & pc98, which
have a dense physical memory layout. Avoiding Maxmem here is done with an
eye on copy-n-paste behaviour in general and to avoid confusion caused by
using realmem in agp.c and Maxmem in agp_i810.c.
r237748 continuation: segment-override prefixes are not invalid in long mode
Update DTrace disassembler accordingly. The code to treat the prefixes
as null prefixes was already in place.
Although in practice compilers seem to generate only cs-prefix for use
in long NOPs, the same treatment is applied to all of cs, ds, es, ss for
consistency.
Reported by: emaste
Tested by: emaste
Obtained from: Illumos commit 13442:4adbe6de60c8 (+ local changes)
MFC after: 5 days
Add support for the 'invept' and 'invvpid' instructions. Beyond simply
adding appropriate table entries, the assembler had to be adjusted as
these are the first non-SSE instructions to use a 3-byte opcode (and a
mandatory prefix to boot).
Several fixes to the amd64 disassembler:
- Add generic support for opcodes that are escape bytes used for
multi-byte opcodes (such as the 0x0f prefix). Use this to replace
the hard-coded 0x0f special case and add support for three-byte
opcodes that use the 0x0f38 prefix.
- Decode all Intel VMX instructions. invept and invvpid in particular are
three-byte opcodes that use the 0x0f38 escape prefix.
- Rework how the special 'SDEP' size flag works such that the default
instruction name (i_name) is the instruction when the data size
prefix (0x66) is not specified, and the alternate name in i_extra is
used when the prefix is included.
- Add a new 'ADEP' size flag similar to 'SDEP' except that it chooses
between i_name and i_extra based on the address size prefix (0x67).
Use this to fix the decoding for jrcxz vs jecxz which is determined
by the address size prefix, not the operand size prefix. Also, jcxz
is not possible in 64-bit mode, but jrcxz is the default instruction
for that opcode.
- Add support for handling instructions that have a mandatory 'rep'
prefix (this means not outputting the 'repe ' prefix until determining
if it is used as part of an opcode). Make 'pause' less of a special
case this way.
- Decode 'cmpxchg16b' and 'cdqe' which are variants of other instructions
but with a REX.W prefix.
Allow threads to finish up when terminated by user
Set a flag and allow worker threads to finish upon ^C, instead of
immediately cancelling them, so that final packet count and rate
stats can be displayed.
Add another PS/2 keyboard PNP ID. This ID is listed as
"Reserved by Microsoft" in the standard PNP ID table, but has been seen
in the wild on at least one laptop.
PR: kern/169571
Submitted by: Matthias Apitz guru unixarea de
MFC after: 3 days
Make pmap_enter()'s management of PV entries consistent with the other pmap
functions that manage PV entries. Specifically, remove the PV entry from
the containing PV list only after the corresponding PTE is destroyed.
Update the pmap's wired mapping count in pmap_enter() before the PV list
lock is acquired.
Update to the ixgbe driver:
- Add a couple of new devices
- Flow control changes in shared and core code
- Bug fix to Flow Director for 82598
- Shared code sync to internal with required core change
Thanks to those helping in the testing and improvements to this driver!
Sync with Intel internal source:
shared code update and small changes in core required
Add support for new i210/i211 devices
Improve queue calculation based on mac type
Remove the "funny targets" make check. We no longer need embedded :: targets
to build FreeBSD (they are used in Perl man pages). We never needed embedded
"!" in targets that I can find.
We got this from OpenBSD and I cannot find any other make that supports
such things -- contrary to their commit message claim: "This behaviour
is also consistent with other versions of make.".
Now that our assembler supports the xsave family of instructions, use them
natively rather than hand-assembled versions. For xgetbv/xsetbv, add a
wrapper API to deal with xcr* registers: rxcr() and load_xcr().
Correct an error in r237513. The call to reserve_pv_entries() must come
before pmap_demote_pde() updates the PDE. Otherwise, pmap_pv_demote_pde()
can crash.
Add support for the 'xsave', 'xrstor', 'xsaveopt', 'xgetbv', and 'xsetbv'
instructions. I reimplemented this from scratch based on the Intel
manuals and the existing support for handling the fxsave and fxrstor
instructions. This will let us use these instructions natively with GCC
rather than hardcoding the opcodes in hex.
Make use of GEOM Gate direct reads feature. This allows HAST to serve
reads with native speed of the underlying provider.
There are three situations when direct reads are not used:
1. Data is being synchronized and synchronization source is the secondary
node, which means secondary node has more recent data and we should read
from it.
2. Local read failed and we have to try to read from the secondary node.
3. Local component is unavailable and all I/O requests are served from the
secondary node.
Extend GEOM Gate class to handle read I/O requests directly within the kernel.
This will allow HAST to read directly from the local component without
even communicating userland daemon.
Prefer sysctl to open/read/close for obtaining random data.
This method is more sandbox-friendly and also should be faster as only
one syscall is needed instead of three.
In case of an error fall back to the old method.
Use correct part of the Master-Key for generating encryption keys.
Before this change the IV-Key was used to generate encryption keys,
which was incorrect, but safe - for the XTS mode this key was unused
anyway and for CBC mode it was used differently to generate IV
vectors, so there is no risk that IV vector collides with encryption
key somehow.
Bump version number and keep compatibility for older versions.
- Change --nthreads parameter to --parallel for GNU compatibility
- Change default sort method to mergesort, which has a better worst case
performance than qsort
Add the possibility to specify a threshold for the number of negative cache
results required to have the cache return lookup failure.
A new configuration parameter is introduced, which must be set to a value
greater than 1 to activate this feature. The default behavior is unchanged.
The purpose of this change is to allow probes for the existence of an entry
(which are expected to fail), before that entry is added to one of the
queried databases, without the cache returning the stale information from
the probe query until that cache entry expires. If, for example, a new user
account is created after checking that the new account name is available,
the negative cache entry would prevent immediate access to the account.
For that example, the new configuration option
negative-confidence-threshold passwd 2
will require a second negative query result to consider the negative cache
entry for a passwd entry valid, but if the user account has been created
between the queries, then the positive query result from the second query
will be cached and returned.
When ip_output()/ip6_output() is supplied a struct route *ro argument,
it skips FLOWTABLE lookup. However, the non-NULL ro has dual meaning
here: it may be supplied to provide route, and it may be supplied to
store and return to caller the route that ip_output()/ip6_output()
finds. In the latter case skipping FLOWTABLE lookup is pessimisation.
The difference between struct route filled by FLOWTABLE and filled
by rtalloc() family is that the former doesn't hold a reference on
its rtentry. Reference is hold by flow entry, and it is about to
be released in future. Thus, route filled by FLOWTABLE shouldn't
be passed to RTFREE() macro.
- Introduce new flag for struct route/route_in6, that marks route
not holding a reference on rtentry.
- Introduce new macro RO_RTFREE() that cleans up a struct route
depending on its kind.
- All callers to ip_output()/ip6_output() that do supply non-NULL
but empty route should use RO_RTFREE() to free results of
lookup.
- ip_output()/ip6_output() now do FLOWTABLE lookup always when
ro->ro_rt == NULL.
Fix panics triggered by older mfiutil binaries run on the new mfi(4) driver.
The new driver changed the size of the mfi_dcmd_frame structure in such a
way that a MFI_IOC_PASSTHRU ioctl from an old amd64 binary is treated as an
MFI_IOC_PASSTHRU32 ioctl in the new driver. As a result, the user pointer
is treated as the buffer length. mfi_user_command() doesn't have a bounds
check on the buffer length, so it passes a really big value to malloc()
which panics when it tries to exhaust the kmem_map. Fix this two ways:
- Only honor MFI_IOC_PASSTHRU32 if the binary has the SV_ILP32 flag set,
otherwise treat it as an unknown ioctl.
- Add a bounds check on the buffer length passed by the user. For now
it fails any user attempts to use a buffer larger than 1MB.
While here, fix a few other nits:
- Remove an unnecessary check for a NULL return from malloc(M_WAITOK).
- Use the ENOTTY errno for invalid ioctl commands instead of ENOENT.
- Make ipfw's sched rules case insensitive, for user-friendliness.
- Add a note to the ipfw(8) man page about the rules no longer being
case sensitive.
- Fix some typos in the man page.
adrian [Tue, 3 Jul 2012 06:59:12 +0000 (06:59 +0000)]
Begin abstracting out the RX path in preparation for RX EDMA support.
The RX EDMA support requires a modified approach to the RX descriptor
handling.
Specifically:
* There's now two RX queues - high and low priority;
* The RX queues are implemented as FIFOs; they're now an array of pointers
to buffers;
* .. and the RX buffer and descriptor are in the same "buffer", rather than
being separate.
So to that end, this commit abstracts out most of the RX related functions
from the bulk of the driver. Notably, the RX DMA/buffer allocation isn't
updated, primarily because I haven't yet fleshed out what it should look
like.
Whilst I'm here, create a set of matching but mostly unimplemented EDMA
stubs.
Tested:
* AR9280, station mode
TODO:
* Thorough AP and other mode testing for non-EDMA chips;
* Figure out how to allocate RX buffers suitable for RX EDMA, including
correctly setting the mbuf length to compensate for the RX descriptor
and completion status area.
Add a driver for the Freescale FCM module in the localbus controller.
This driver does not yet handle multiple chip selects properly.
Note that the NAND infrastructure does not perform full page
reads or writes, which means that this driver cannot make use
of the hardware ECC that is otherwise present.
Support lbc interrupts:
o Save and clear the LTESR register in the interrupt handler.
o In lbc_read_reg(), return the saved LTESR register value if applicable
(i.e. when the saved value is not invalid (read: ~0U)).
o In lbc_write_reg(), clear the bits in the saved register when when it's
written to and when the asved value is not invalid.
o Also in lbc_write_reg(), the LTESR register is unlocked (in H/W) when
bit 1 of LTEATR is cleared. We use this to invalidate our saved LTESR
register value. Subsequent reads and write go to H/W directly.
While here:
o In lbc_read_reg() & lbc_write_reg(), add some belts and suspenders to
catch when register offsets are out of range.
o In lbc_attach(), initialize completely and don't leave something left
for lbc_banks_enable().