|
Mobile handset makers are facing the dilemma of how to increase memory bandwidth to accommodate emerging applications while still minimizing the device's size and cost. New feature-rich phones that enable applications such as video, music, navigation, and networking require multiple-gigabit memory bandwidths, which differs greatly from the what's generally available today. A better memory controller interface is required to support future generations well beyond the turn of the decade.
For example, an LCD with a 1280- by 1024-pixel resolution, a typical 60-Hz refresh rate, and 24-bit (RGB) color encoding needs an aggregate read/write bandwidth to the memory buffer of 1.5 Gbits/s, resulting in a total memory bandwidth of 3.7 Gbits/s. This bandwidth is required to drive what is a common resolution for an external display, and doesn't account for rendering, etc. Making comparisons between handset display applications and today's computer graphics systems isn't so far-fetched, with mobile phones taking a similar path as computers by integrating gigahertz processors and DDR-DRAM main memory. These systems require memory bandwidth between 80 and 400 Gbit/s.
In a typical, high-end phone with two processor cores (one for baseband processing and one for applications processing), both processors require non-volatile (flash) memory to store code, and volatile memory (SRAM and DRAM) as a temporary buffer when processing data.
A large amount of fast access memory is needed when application processors execute user-loadable multimedia applications. However, the speed required exceeds the fastest flash-memory access times available today. To achieve the desired performance, inexpensive NAND or MirrorBit ORNAND flash memory stores the application code, which is then copied into and executed from the faster-access-speed DRAM. Hence, application processors typically require NAND or ORNAND and DRAM memory.
Alternately, baseband processors execute deeply embedded protocol stacks. This type of code is optimally executed directly from flash (executed-in-place or XIP) and requires random access to the flash. Only NOR flash memory supports an efficient XIP model, so it's typically chosen for storing baseband code. DRAM can be shared with the apps processor as a temporary buffer for the baseband, or the NOR memory may be paired with SRAM or pSRAM dedicated to supporting the baseband.
Integrating these multiple memory types often requires that more than 100 pins be devoted to the memory interface, which can represent roughly 30% of a phone's overall CPU pin count. Texas Instruments' OMAP 1611 and STMicroelectronics' Nomadik "macro" architecture are examples of high-performance CPUs that include a high pin count to which memory contributes significantly.
As users demand new handset features like LAN connectivity, GPS capabilities, and TV-on-mobile, OEMs face a challenge. These features require increased processing power, and more memory bandwidth is needed to support this processing power. This means the number of pins must be increased to support the enhanced data processing. With system memory currently consuming such a large portion of the overall CPU pin count, additional pins bring about myriad concerns and technology challenges:
- Each pin directly adds 0.4 cents to a phone CPU's cost, which results in an extra overall cost of about 60 cents for signaling and associated power ground pins.
- Reducing the number of pins allows for smaller and less expensive packages.
- For pad-limited designs, the number of I/Os directly reduces die cost.
- CPUs with high pin counts require more board layers for signal routing on pcbs, which increases system cost.
- Routing through vias that connect multiple board layers causes noise issues.
- A larger area is needed to accommodate the pins, which opposes the trend of making smaller and thinner mobile phones.
Simply holding off on integrating advanced features until a faster memory solution can be developed is not an option. While widening the memory bus interface may be a technical possibility, it's not a logical solution as it increases the already high pin count on the CPU. Therefore, memory buses must move toward DRAM-like performance while allowing for new I/O technologies that support frequency scaling and pin-count reduction. Considering the extensive infrastructure that's built around the memory system, it's difficult to radically change memory buses.
A solution overview
Memory controllers consist of two interfaces. The host interface connects the controller to the system bus, and the memory interface connects the controller to memory devices. Both interfaces reside in their own clock domains and are typically separated by FIFO queues. This allows the memory controller to be easily repartitioned into a host interface and a memory interface by replacing the queues with a high-speed, low-pin-count bus.
Repartitioning the memory system in this way provides a unified bus between the host and memory systems with regards to density, memory types, and speed grades. It also minimizes the loading on the external bus, allowing for high bus frequencies, which is a pre-condition to lowering the pin count. Additionally, repartitioning constrains the high-pin-count interface between memory devices to a solution internal to the typical multi-chip package (MCP), which lowers the handset cost and allows for wide memory banks within the MCP (Fig. 1).

1. A unified bus between the host and memory systems minimizes the loading on the external bus.
The bus between the host and client interface is implemented as a DDR-based memory bus so that the total external pin count for a sample memory system would be:
- Control bus: 15 pins
- Address bus: 12 pins
- Data bus: 8 to 16 pins
The total pin count for a complete memory system would then be 35 to 43 pins.
|