FPGA vendors should know how to generate RAM from the primitives effectively one may think. I did at least. Until I tried it out for the RISC-V small implementation.
There are two memory blocks made with Vivado IP Integrator. For some reason this small RISC-V SoC did show resource utilization over 200 LUT while I know it should be a smaller than that. After looking at detailed report there was 24 LUT and 3 Flip-Flops consumed in 32KByte 8 bit RAM. How can this be, 32Kx8 bit memory should use 8 BRAM primitives and 0 LUT. Checking out in RTL view after synthesis:
Ok this explains part of the problem, BRAM's are configured as 8 bit wide with 8 to 1 multiplexer at the output. This generates some LUT, but where did those 3 flip flops come? Looking again in post implementation RTL:
Right they are needed the address must be delayed for one clock for the output multiplexer to work properly, so those 3 flip flops are really needed.
So when the complete RISC-V soft CPU takes 59 Slices then the "extra added overhead" from Xilinx RAM generator takes 11 Slices! Checking out configuration options:
So where is 32kx1 ? This would be the one to choose when making 32K deep memory, this options is simply missing. Lets try what happens if we select 16kx1 - OK this is looking better, this time the RAM synthesizer is using 32kx1, well selected was 16kx1, so the generator must have guessed my mind and did what I wanted.
Both memory blocks are using now only BRAM and no logic resources.
Looks nice and works too!







 
No comments:
Post a Comment