Getting to grips with all the XPM generics means making sure they are used in the right combinations, e.g. they depend on the primitive type required. This feels like an overhead each time you choose to instaniate an XPM_MEMORY. I would prefer to have those generics worked out and assured before each use so that I can plough on with the design task in hand without needing to reconsider each time. Also, there are two facilities made available that I routinely ignore in pursuit of a working design:
- The sleep function.
- Error injection and detection pins.
- Automatic Primitive Selection
- MESSAGE_CONTROL and SIM_ASSERT_CHK Generics
- CASCADE_HEIGHT Generic
- regceb Input Pin
- Sleep Pin
- Error Correcting Codes (ECC)
- Memory Initialisation Files
- Cleaner Abstractions
- Conclusions
- References
Automatic Primitive Selection
My chosen device is a a Kintex-7, which means the automatic selection chooses between distributed RAM and BlockRAM. So the obvious question is what determines the choice?Address bits | Data bits | Memory bits | RAM Type |
---|---|---|---|
5 | 24 | 768 | Distributed / LUTRAM |
5 | 30 | 960 | Distributed / LUTRAM |
5 | 31 | 992 | Distributed / LUTRAM |
2 | 255 | 1020 | Distributed / LUTRAM |
2 | 256 | 1024 | BlockRAM |
4 | 64 | 1024 | BlockRAM |
5 | 32 | 1024 | BlockRAM |
6 | 16 | 1024 | BlockRAM |
These figures show that the "auto" mode switched from distributed to block RAMs consistently at a threshold of 1024 bits of memory. The primitive selected depends on the number of address bits, 4 or fewer uses a RAMB36E1 with 36,864 memory bits and 5 or more uses the smaller RAMB18E1 with 18,432 memory bits. That means the utilisation of the memory resource can be as poor as 1024/36,864 = 0.7%. This may affect your personal design choices and cause you to decide on a manual generic specification.
On the subject of utilisation, its worth looking closer at the distributed RAM's implementation. Here (in the main) the RAM32M block is used comprised of 1 RAMS32 and 3 RAMD32 primitives. The former primitive is not dual port, so this means the distributed RAM or LUTRAM implementation of the simple dual port RAM tends to use up to 3/4 of a RAMD32M component at most. There are also occaisons when RAM32X1D primitives are used, and inspection of the synthesis results appears to show they only get utilised to 50% at any one time. The LUTRAM is perhaps best assumed to be 64 bits of memory for simplicity, and realistic designs usually use LUTRAMs with between 50% and 75% efficiency for simple dual port memories.
MESSAGE_CONTROL and SIM_ASSERT_CHK Generics
MESSAGE_CONTROL allows you to enable the dynamic message reporting, for example of collision warnings. As you can see from the above transcript from ModelSim, it only works for certain primitive types. It also prints as a warning even when the correct primitive type is selected. This just seems to be "transcript spam". So what is the division of effort between MESSAGE_CONTROL and SIM_ASSERT_CHK? I keep this disabled as it does not seem to provide helpful reporting, but I have kept SIM_ASSERT_CHK enabled as with XPM_CDC_* components I have seem very useful warnings reported (e.g. failure to hold CDC input long enough for two sampling of the destination clock).
CASCADE_HEIGHT Generic
The CASCADE_HEIGHT generic works much as you would expect for the synthesis constraint by the same name, for details on how this behaves see a previous post Cascade Block RAMs for Larger Memories.
regceb Input Pin
I find this optional signal quite pointless. I assume that its provided as its a pin on the RAMB18E1 & RAMB36E1 primitives, and easy to create an equivalent for LUTRAMs with registered outputs. The correct way to drive it is by delaying enb with one fewer delays than the read latency requested. It is worth noting that enb is already delayed through a shift register in order to drive each of the previous delay stages on the output of the LUTRAM. So by delaying enb correctly for the regceb input, you have effectively duplicated that delay, but with one more stage (pink and purple coloured registers). The duplication may well be optimised by the synthesis tool, providing none of the delays for regceb are optimised into an SRL primitive. This can be avoided by setting the STEPS.SYNTH_DESIGN.ARGS.SHREG_MIN_SIZE synthesis property large enough during synthesis. Finally, if you value a 'data valid' output to accompany your doutb data bus, then simply add yet another stage of delay to the regceb pin. You will often see the regceb input pin tied high and then ignored.
The circuit tracks in the picture above have been coloured by delay stage to make the delay duplication clearer, and the 'data valid' output register is shown driven by the regceb stage.
Sleep Pin
Both the Block and Ultra RAMs support a dynamic sleep mode. The testbench demonstrates successfully entering and leaving sleep, and the sleep signal timing has been optimised separately on both the write and read cycles within the test bench as a demonstration.
Operation | Minimum Clock Cycles Before Sleep | Minimum Clock Cycles After Sleep |
---|---|---|
Write | 1 | 2 |
Read | 0 | 2 |
Both the test benches for single and dual clocks demonstrate the operation of the sleep pin.
I don't want to cause alarm, but when I synthesise the XPM, I find the sleep pin is not connected to anything, yet the simulation model definitely fails the test bench when you drive the sleep pin incorrectly.
Error Correcting Codes (ECC)
Xilinx have used the (76, 64) Hamming code that provides single be error correction and double bit error detection. Their implementation is described in detail in XAPP645. It requires the data bus to be a multiple of 64-bits, anything else will compile but the simulation will fail early on with a Verilog assertion check.
The example simulation randomly injects single or double bit errors via the injectsbiterra/b inputs. It then checks on reading that the s/dbiterrb outputs correctly detect the errors when presenting the data for each address. All single bit error data words are verified against the expected value, as they have been corrected, and the double bit error words are ignored for checking, just counted to ensure the correct number of expected errors.
From the simulation, the ECC errors induced do not look realistic and this is perhaps a simulation model issue. Also two of the generics are described as only affecting synthesis, ECC_TYPE and ECC_BIT_RANGE. These two generics are included in the 2022.1 xpm_VCOMP.vhd file, but omitted from the 2022.1 documentation. Only in the 2023.1 documentation are they explained.
Memory Initialisation Files
This requires the setting of two parameters, MEMORY_INIT_FILE gives the file name defining the initial RAM contents and MEMORY_INIT_PARAM must be set to the empty string, "".
Don't forget if using relative path names, you need to copy the .MEM file across to the current directory for the simulation.
The initialisation file format is defined in UG1580. The test_dpram_1clk_init.vhdl test bench provides a verification of the RAM readback after initialisation.
Cleaner Abstractions
I desire to switch effortlessly between BlockRam and LUTRAM implementations, and would prefer to have the number of clock explicitly defined by the entity for removal of confusion. Even so, there remain differences in the respective ranges of the READ_LATENCY_B generic, for which the simplest check remains an assertion at run time. Making it compile time checkable is possible with an additional package to define a function for the minimum latency depending on specified primitive type, MEMORY_PRIMITIVE/primitive_g. So I propose these two entities and their architectures can be found in GitHub. Within the architecture, functions are defined to setup generics based on the primitive type automatically (e.g. WAKEUP_TIME), avoiding assertion failures, e.g. the WRITE_MODE_B generic setting.
I would generally prefer not to be specifying generics via strings since VHDL makes string handling unnecessarily unpleasant. E.g. Testing primitive_c (= "distributed") is equal to "block" requires testing of the length of string and truncation of the left hand side to avoid compiler warnings. But then using an enumerated type requires definition of a new package and hence ideally a new file with more source files to compile and maintain making the idea a complication overall.
Conclusions
The regceb pin has a simple formulae to drive, the sleep input does not look so complicated to use, and the error injection and detection pins have been exercised and checked. On the latter, in the case where you require a 64-bit data bus from a BlockRAM the error detection comes "for free", after all the 18k and 36k BlockRAMs provide the additional storage required and it would be unused otherwise. Just note that you will need to verify the sbiterrb and dbiterrb output signals remain low, or decide what actions to take when they go high.
References
- Github Source Code
- Vivado Design Suite 7 Series FPGA and Zynq-7000 SoC Libraries Guide - XPM_MEMORY_SDPRAM, Xilinx UG953
- Single Error Correction and Double Error Detection application note, Xilinx XAPP645
- Hamming Codes, Wikipedia
- UpdateMEM User Guide, Memory File Format, Xilinx UG1580