编辑: AA003 | 2016-12-28 |
2004 (All Rights Reserved) http://www.cirrus.com AN253 Optimizing Code Speed for the MaverickCrunch? Coprocessor Brett Davis 1. Introduction This application note is intended to assist developers in optimizing their source code for use with the Mav- erickCrunch coprocessor. This document begins with a brief overview of the MaverickCrunch coproces- sor, followed by optimization guidelines and concludes with an example applying the guidelines discussed. Multiple facets of code optimization must be considered in order to realize the full benefit of the Maverick- Crunch coprocessor. The guidelines in this document are categorized as algorithm, compiler, or hardware optimizations. The discussion on algorithm optimization centers on high level programming details such as compound expressions and loop unrolling. Next, the compiler optimization guidelines deal with the ef- fects of compiler optimization on code performance - primarily code size and execution speed. Finally, the hardware optimization section enumerates optimization guidelines related to the MaverickCrunch copro- cessor implementation such as IEEE-754 implementation and pipeline stalls. Note: Algorithm selection will not be discussed in this applications note. It is assumed that the developer has selected and implemented the correct algorithm for their application. 2. MaverickCrunch This section introduces and summarizes the features, instruction set and architecture of the Maverick- Crunch coprocessor. For further in-depth information on these topics, please read Chapter
3 of the User'
s Guide. 2.1 Features The MaverickCrunch coprocessor accelerates IEEE-754 floating point arithmetic, and 32-bit and 64-bit fixed point arithmetic. The MaverickCrunch coprocessor is an excellent candidate for encoding and de- coding digital audio, digital signal processing (such as IIR, FIR, FFT) and numeric approximations. Key features of the MaverickCrunch include: - IEEE-754 based single and double precision floating point support - Full IEEE-754 rounding support - Inexact, Overflow, Underflow, and Invalid Operator IEEE-754 exceptions - 32/64-bit fixed point integer operations - Add, multiply, and compare functions for all data types - Fixed point integer MAC 32-bit input with 72-bit accumulate - Fixed point integer shifts JAN '
04 AN253REV1 AN253
2 - Conversion between floating point and integer data representations - Sixteen (16) 64-bit general-purpose registers - Four (4) 72-bit accumulators - Status and control registers 2.2 Instruction Set The MaverickCrunch coprocessor'
s instruction set is robust and includes memory, control, and arithmetic operations. MaverickCrunch mnemonics are translated by the compiler or assembler into ARM coproces- sor instructions. For example, the MaverickCrunch mnemonic for double precision floating-point multiply is: cfmuld c0, c1, c2 The equivalent ARM coprocessor instruction is: cdp p4, 1, c0, c1, c2,
1 There are five categories of ARM coprocessor instructions: Data Path (CDP), Load (LDC), Store (STC), Coprocessor to ARM Moves (MCR), and ARM to coprocessor moves (MRC). CDP instructions include all arithmetic operations, and any other operation internal to the coprocessor. LDC and STC instructions in- clude the set of operations responsible for moving data between memory and the coprocessor. MCR and MRC instructions are responsible for moving data between ARM and coprocessor registers. Table 1, Table
2 and Table
3 summarize all of the MaverickCrunch'
s instruction mnemonics. For more information on the MaverickCrunch instruction set, please see the table: MaverickCrunch Instruction Set in the User'