ARCHITECTURE OF 6713 DSP PROCESSOR
This chapter provides an overview of the architectural structure of the
TMS320C67xx DSP, which comprises the central processing unit (CPU), memory, and on-chip peripherals. The C67xE DSPs use an advanced modified Harvard architecture that maximizes processing power with eight buses. Separate program and data spaces allow simultaneous access to program instructions and data, providing a high degree of parallelism. For example, three reads and one write can be performed in a single cycle. Instructions with parallel store and application-specific instructions fully utilize this architecture. In addition, data can be transferred between data and program spaces. Such
Parallelism supports a powerful set of arithmetic, logic, and bit-manipulation operations that can all be performed in a single machine cycle. Also, the C67xx DSP includes the control mechanisms to manage interrupts, repeated operations, and function calling.
Fig 2 – 1 BLOCK DIAGRAM OF
TMS 320VC 6713
The C67xx DSP architecture is built around eight major 16-bit buses (four program/data buses and four address buses):
_ The program bus (PB) carries the instruction code and immediate operands from program memory.
_ Three data buses (CB, DB, and EB) interconnect to various elements, such as the CPU, data address generation logic, program address generation logic, on-chip peripherals, and data memory.
_ The CB and DB carry the operands that are read from data memory.
_ The EB carries the data to be written to memory.
_ Four address buses (
PAB, CAB, DAB, and EAB) carry the addresses needed for instruction execution.
The C67xx DSP can generate up to two data-memory addresses per cycle using the two auxiliary register arithmetic units (ARAU0 and ARAU1). The PB can carry data operands stored in program space (for instance, a coefficient table) to the multiplier and adder for multiply/accumulate operations or to a destination in data space for data move instructions (MVPD and READA). This capability, in conjunction with the feature of dual-operand read, supports the execution of single-cycle, 3-operand instructions such as the FIRS instruction. The C67xx DSP also has an on-chip bidirectional bus for accessing on-chip peripherals. This bus is connected to DB and EB through the bus exchanger in the CPU interface. Accesses that use this bus can require two or more cycles for reads and writes, depending on the peripheral’s structure.
Central Processing Unit (CPU)
The CPU is common to all C67xE devices. The C67x CPU contains:
_ 40-bit arithmetic logic unit (
_ Two 40-bit accumulators
_ Barrel shifter
_ 17 × 17-bit multiplier
_ 40-bit adder
_ Compare, select, and store unit (CSSU)
_ Data address generation unit
_ Program address generation unit
Arithmetic Logic Unit (
The C67x DSP performs 2s-complement arithmetic with a 40-bit arithmetic logic unit (
ALU) and two 40-bit accumulators (accumulators A and B). The ALU can also perform Boolean operations. The ALU uses these inputs:
_ 16-bit immediate value
_ 16-bit word from data memory
_ 16-bit value in the temporary register, T
_ Two 16-bit words from data memory
_ 32-bit word from data memory
_ 40-bit word from either accumulator
ALU can also function as two 16-bit ALUs and perform two 16-bit operations simultaneously.
Fig 2 – 2 ALU UNIT
Accumulators A and B store the output from the
ALU or the multiplier/adder block. They can also provide a second input to the ALU; accumulator A can be an input to the multiplier/adder. Each accumulator is divided into three parts:
_ Guard bits (bits 39–32)
_ High-order word (bits 31–16)
_ Low-order word (bits 15–0)
Instructions are provided for storing the guard bits, for storing the high- and the low-order accumulator words in data memory, and for transferring 32-bit accumulator words in or out of data memory. Also, either of the accumulators can be used as temporary storage for the other.
The C67x DSP barrel shifter has a 40-bit input connected to the accumulators or to data memory (using CB or DB), and a 40-bit output connected to the
ALU or to data memory (using EB). The barrel shifter can produce a left shift of 0 to 31 bits and a right shift of 0 to 16 bits on the input data. The shift requirements are defined in the shift count field of the instruction, the shift count field (ASM) of status register ST1, or in temporary register T (when it is designated as a shift count register).The barrel shifter and the exponent encoder normalize the values in an accumulator in a single cycle. The LSBs of the output are filled with 0s, and the MSBs can be either zero filled or sign extended, depending on the state of the sign-extension mode bit (SXM) in ST1. Additional shift capabilities enable the processor to perform numerical scaling, bit extraction, extended arithmetic,
and overflow prevention operations.
The multiplier/adder unit performs 17 _ 17-bit 2s-complement multiplication with a 40-bit addition in a single instruction cycle. The multiplier/adder block consists of several elements: a multiplier, an adder, signed/unsigned input control logic, fractional control logic, a zero detector, a rounder (2s complement), overflow/saturation logic, and a 16-bit temporary storage register (T). The multiplier has two inputs: one input is selected from T, a data-memory operand, or accumulator A; the other is selected from program memory, data memory, accumulator A, or an immediate value. The fast, on-chip multiplier allows the C54x DSP to perform operations efficiently such as convolution, correlation, and filtering. In addition, the multiplier and
ALU together execute multiply/accumulate ( MAC) computations and ALU operations in parallel in a single instruction cycle. This function is used in determining the Euclidian distance and in implementing symmetrical and LMS filters, which are required for complex DSP algorithms. See section 4.5, Multiplier/Adder Unit, on page 4-19, for more details about the multiplier/adder unit.
Fig 2 – 3 MULTIPLIER/ADDER
Fig 2 – 3 MULTIPLIER/ADDER
These are the some of the important parts of the processor and you are instructed to go through the detailed architecture once which helps you in developing the optimized code for the required application.