"The SRL16 is an alternative mode for the look-up tables where they are used as 16-bit shift registers. Using this Shift Register LUT (SRL) mode can improve performance and rapidly lead to cost savings of an order of magnitude. Although the SRL16 can be automatically inferred by the software tools, considering their effective use can lead to more cost-effective designs."
Ref: Using Look-Up Tables as Shift Registers (SRL16) in Spartan-3 Generation FPGAs
The inference of Xilinx SRLs is often believed to be dependent on the inclusion of a reset condition. Since the SRL does not have a reset input, it is logical that to infer use of SRLs, the HDL code must also exclude a reset pin. Or is it?
Two versions of a bus shift register will be tried, one with and one without a reset condition to test the synthesis results of Vivado 2019.1. Under what conditions will a "Shift Register LUT" (SRL) be inferred?
library ieee;
use ieee.std_logic_1164.all;
entity delay is
generic(
cycle : integer := 4;
width : integer := 18
);
port(
clk : in std_logic;
reset : in std_logic;
input : in std_logic_vector(width-1 downto 0);
output : out std_logic_vector(width-1 downto 0)
);
end entity;
Shifting With A Reset Condition
architecture rtl of delay is
type my_type is array (0 to cycle-1) of std_logic_vector(width-1 downto 0);
signal int_sig : my_type;
begin
main : process(clk)
begin
if rising_edge(clk) then
if reset = '1' then
int_sig <= (others => (others => '0'));
else
int_sig <= input & int_sig(0 to cycle-2);
end if;
end if;
end process main;
output <= int_sig(cycle -1);
end architecture;
# 'width' generic >= 2 for SRL inferencing to work with resets,
# otherwise it is just a standard chain of FDREs.
set_property generic {cycle=4 width=2} [current_fileset]
set target_run [get_runs -filter {IS_SYNTHESIS}]
set_property -name {STEPS.SYNTH_DESIGN.ARGS.MORE OPTIONS} -value {-mode out_of_context} -objects $target_run
# Ensure STEPS.SYNTH_DESIGN.ARGS.SHREG_MIN_SIZE < 'cycle' generic
# Vivado default is 3
set_property STEPS.SYNTH_DESIGN.ARGS.SHREG_MIN_SIZE 3 $target_run
get_property generic [current_fileset]
#set_property top_arch rtl [current_fileset]
set_property top_arch srlmap [current_fileset]
refresh_design
reset_run -quiet [get_runs]
launch_runs -jobs [get_param general.maxThreads] $target_run
wait_on_run $target_run
open_run synth_1 -name $target_run
Note the TCL script provides the actual generic values used in the following synthesis, cycle=4, width=2. The script also allows for the swapping between two different architectures, with the caveat that I could not put the two architectures in the same file.
Whilst the TCL property to set the top-level architecture exists, it has only a visual effect in the sources window, but no effect on the elaborated RTL, which always comes from the last architecture parsed. This can be verified by selecting a register in the RTL view and pressing F7 to go to the VHDL source.

Results:
SRLs are inferred even with a reset condition but with caveats and without the use of an XDC constraint such as:
set_property shreg_extract yes [get_cells int_sig*]
The shift register must be a bus of at least 2 bits wide, or it is cheaper to use an FDRE chain.
The shift register must be longer than a global Vivado property STEPS.SYNTH_DESIGN.ARGS.SHREG_MIN_SIZE. E.g. the default is:
set_property STEPS.SYNTH_DESIGN.ARGS.SHREG_MIN_SIZE 3 \
[get_runs -filter {IS_SYNTHESIS}]
When a reset is specified Vivado includes extra "fudge logic" to fake the reset value. Hence why a single bit bus is cheaper as its own chain of FDREs. This is a chain of FDREs passing the reset value up the chain until the SRL assumes the first non-reset value. The fudge logic has to be spliced in at the end of the shift register, so two stages of shift are subtracted from the SRL and replaced by {a pair of FDREs and one LUT added} for each bit of the bus width.
Static Shift Register Report:
+------------+-------------------+--------+-------+--------------+--------------------+-------------------+--------+---------+
|Module Name | RTL Name | Length | Width | Reset Signal | Pull out first Reg | Pull out last Reg | SRL16E | SRLC32E |
+------------+-------------------+--------+-------+--------------+--------------------+-------------------+--------+---------+
|delay | int_sig_reg[3][1] | 4 | 2 | YES | NO | YES | 2 | 0 |
+------------+-------------------+--------+-------+--------------+--------------------+-------------------+--------+---------+
Report Cell Usage: +------+-------+------+ | |Cell |Count | +------+-------+------+ |1 |LUT2 | 2| |2 |SRL16E | 2| |3 |FDRE | 7| +------+-------+------+
Shifting Without A Reset Condition
architecture srlmap of delay is
type my_type is array (0 to cycle-1) of std_logic_vector(width-1 downto 0);
signal int_sig : my_type;
begin
main : process(clk)
begin
if rising_edge(clk) then
int_sig <= input & int_sig(0 to cycle-2);
end if;
end process main;
output <= int_sig(cycle -1);
end architecture;

As expected, SRLs are safely inferred automatically when no reset is specified. Unexpectedly the first and last bits of shift register are realised in FDREs.
- No fudge logic
- No means of returning to a known state on reset without invoking the GSR
- Here the first and last stages of the shift register are not included in the SRL, for any value of the ‘cycles’ generic.
This last point may simply be because these FDREs are interfacing in and out of the component but this feature has not been explored due to the extra investment of time required to identify the conditions required to move them into the SRL. Xilinx application notes allude to the need for the final shift register after the SRL to make the shift register "fully synchronous". This architecture has been synthesis "out of context" to avoid IBUFs and OBUFs, and there might be some timing reason why "Pull out first Reg" and "Pull out last Reg" are both "YES" below.
Static Shift Register Report:
+------------+-------------------+--------+-------+--------------+--------------------+-------------------+--------+---------+
|Module Name | RTL Name | Length | Width | Reset Signal | Pull out first Reg | Pull out last Reg | SRL16E | SRLC32E |
+------------+-------------------+--------+-------+--------------+--------------------+-------------------+--------+---------+
|delay | int_sig_reg[3][1] | 4 | 2 | NO | YES | YES | 2 | 0 |
+------------+-------------------+--------+-------+--------------+--------------------+-------------------+--------+---------+
Report Cell Usage: +------+-------+------+ | |Cell |Count | +------+-------+------+ |1 |LUT2 | 2| |2 |SRL16E | 2| |3 |FDRE | 7| +------+-------+------+
Conclusions
Vivado does an efficient job, always inferring SRLs where practical. When a reset condition is coded in HDL, extra logic is included to "fudge" the observable effect of any reset. There are caveats in both cases about when bits of the shift register are included or not in the SRLs, and FDREs will still likely be present unless care is taken with an 'edge' timing conditions.