编辑: AA003 | 2016-12-28 |
s Guide. Table 1. MaverickCrunch Load/Store Mnemonics cfldrs Cd, [Rn] cfldrd Cd, [Rn] cfldr32 Cd, [Rn] cfldr64 Cd, [Rn] cfstrs Cd, [Rn] cfstrd Cd, [Rn] cflstr32 Cd, [Rn] cfstr64 Cd, [Rn] cfmvsr Cn, Rd cfmvdlr Cn, Rd cfmvdhr Cn, Rd cfmv64lr Cn, Rd cfmv64hr Cn, Rd cfmvsr Rd, Cn cfmvrdl Rd, Cn cfmvrdh Rd, Cn cfmvr64l Rd, Cn cfmvr64h Rd, Cn cfmval32 Cd, Cn cfmvam32 Cd, Cn cfmv32a Cd, Cn cfmv64a Cd, Cn cfmvsc32 Cd, Cn cfmv32sc Cd, Cn cfcpys Cd, Cn cfcpyd Cd, Cn AN253
3 Table 2. MaverickCrunch Data Manipulation Mnemonics Table 3. MaverickCrunch Arithmetic Mnemonics 2.3 Architecture The MaverickCrunch coprocessor uses the standard ARM coprocessor interface, sharing its memory in- terface and instruction stream. The MaverickCrunch coprocessor is pipelined, has data forwarding capa- bilities, and can run synchronously or asynchronously with respect to the ARM920T pipeline. There are two separate pipelines in the MaverickCrunch coprocessor (see Figure 1). The first pipeline, five stages long, is used for LDC, STC, MCR, and MRC instructions. Its stages are Fetch (F), Decode (D), Execute (E), Memory Access (M), and Register Write-Back (W). The second pipeline, seven stages long, is used for the CDP instructions. Its stages are Fetch (F), Decode (D), Execute/Operand Fetch (E), Exe- cute (E1), Execute (E2), Execute (E3), and Register Write-Back (W). The MaverickCrunch LDC/STC/MCR/MRC pipeline is identical to, and '
follows'
the ARM920T'
s pipeline. That is, the contents of the LDC/STC/MCR/MRC pipeline are identical to the contents of the ARM920T pipeline. Note: The ARM pipeline is not shown in Figure 1, but is identical to the LDC/STC/MCR/MRC pipeline. The MaverickCrunch CDP pipeline is nearly twice as deep and runs at half the speed of the ARM920T'
s pipeline. The CDP pipeline may run asynchronously with respect to the ARM920T'
s pipeline after the ini- tial execution stage. Specifically, the CDP pipeline may run asynchronously in the E1, E2, E3 and W stag- es. Running MaverickCrunch in synchronous mode forces the CDP pipeline to serialize the instruction stream resulting in an eight-cycle stall per data path instruction. The CDP pipeline'
s asynchronous capa- ble stages are shaded in Figure 1. cfcvtsd Cd, Cn cfcvtds Cd, Cn cfcmp64 Rd, Cn, Cm cfcvt32d Cd, Cn cfcvt64s Cd, Cn cfcvt32s Cd, Cn cfcvts32 Cd, Cn cfcvtd32 Cd, Cn cfcvt64d Cd, Cn cfrshl32 Cm, Cn, Rd cftruncs32 Cd, Cn cftruncd32 Cd, Cn cfsh64 Cd, Cn, cfrshl64 Cm, Cn, Rd cfsh32 Cd, Cn, cfcmp32 Rd, Cn, Cm cfcmps Rd, Cn, Cm cfcmpd Rd, Cn, Cm cfabss Cd, Cn cfnegs Cd, Cn cfadds Cd, Cn, Cm cfsubs Cd, Cn, Cm cfnegd Cd, Cn cfaddd Cd, Cn, Cm cfsubd Cd, Cn, Cm cfmuld Cd, Cn, Cm cfabs32 Cd, Cn cfadd64 Cd, Cn, Cm cfneg32 Cd, Cn cfadd32 Cd, Cn, Cm cfsub32 Cd, Cn, Cm cfmul32 Cd, Cn, Cm cfmac32 Cd, Cn, Cm cfmsc32 Cd, Cn, Cm cfabs64 Cd, Cn cfneg64 Cd, Cn cfsub64 Cd, Cn, Cm cfmul64 Cd, Cn, Cm cfmadd32 Ca, Cd, Cn, Cm cfmsub32 Ca, Cd, Cn, Cm cfmadda32 Ca, Cd, Cn, Cm cfmsuba32 Ca, Cd, Cn, Cm cfmuls Cd, Cn, Cm cfabsd Cd, Cn AN253
4 Figure 1. MaverickCrunch Pipelines 3. Code Optimization for MaverickCrunch This section describes guidelines for writing optimized code for the MaverickCrunch coprocessor. These guidelines are divided into algorithm, compiler and architecture sections. It is assumed that the correct algorithm has been chosen, and that all non-hardware specific optimizations have been completed. How- ever, optimization should not begin until all of the code has been written and tested for function- ality. 3.1 Algorithms This section focuses on methods to reduce algorithm execution time. After the code'