One of the most touted features by AMD is that the Barcelona is a single chip, native quad-core design as opposed to Intel's dual-chip on a package processor. Whether this holds any significance in the real world or not is still debatable and largely unproven, but it is an interesting statistic. With that, AMD's Barcelona is built on a huge die occupying 283mm² with 463 million transistors. This could prove to be expensive to AMD should there be any defects as they've not yet mastered the 65nm process as seen from their attempt to ramp up clock speeds on their existing Brisbane-based processors. Perhaps this is why we see AMD trumphing tri-core processor designs to be available next year, but that's another topic for another day.
The individual cores themselves are quite similar in structure to the existing second-generation Opteron processors, but Barcelona has a number of enhancements and tweaks to bring up to speed for current and future needs. After all, the Barcelona core design is supposed to carry AMD forward to the next two years and it should have sufficient capabilities built-in to cater to all needs till the next major redesign is available. One of the more significant changes is that the Barcelona can now execute 128-bit SSE operations in a single cycle as AMD has increased the floating-point pipeline width to 128 bits (up from just 64 bits on the previous generation, and is now equivalent to Intel's Core architecture). While making this change, AMD also doubled their instruction and data delivery mechanisms with a 32-byte/cycle instruction fetch, dual 128-bit load/cycle data cache load bandwidth, 128-bits/cycle pipes to the L2 and the built-in memory controller to bolster their more powerful processing units. On the virtualization front, the nested paging feature introduced in the second generation Opteron makes its way into the Barcelona as well, primarily to help accelerate the performance of virtualized environments.
The processor's L1 and L2 cache hierarchy remains as it was in the previous Opteron processor design with 64KB of dedicated L1 cache for data and instructions per core, 512KB of dedicated L2 cache per core and to augment these, the Barcelona core now incorporates 2MB of shared L3 cache. With four processing cores on the die, you can expect significant traffic flow in and out of the processor, so the L3 shared cache design acts as a go-between to further supplement the various processing cores and store frequently used data closer to the cores rather than reaching out to the memory.