Xilinx IP Catalogue provides an IP core to convert data widths on an AXI stream, but the question is "how hard can it be?" Well, with all AXI signalling applications unfortunately the answer is usually "harder than you think", as there is always something that will catch you out with the tight feedback loops using valid and ready signals for the handshake. In its simplicity, converting 16-bit data to 8-bit should just cause sufficient back pressure on the input every other word so as to provide spacing on the output to stagger the data. So what's the simple solution given the desire to add a "pause" function, e.g. from a finite state machine, such that the data width conversion can be used to insert words on the output, for example when converting packet protocols. Thankfully there's a well understood solution for pausing an AXI data stream to start from.

Firstly we'll look at the simple case where we just split the input word into two every time. Then we'll get clever with conditionally splitting the word based on 'byte enables' to select which halves of the input word proceed to the output.
Table of Contents
Simple Data Bus Halving Without Selection
The first if clause selects which half of the 16-bit input data to send out. The swapping must be controlled by both the enable and output ready signals. It must not swap if the input data is not valid.
The second if clause must choose when to drive the output AXI stream. Again the output must be ready, but care must also be taken with the valid signal to ensure that the AXI protocol for valid not waiting for ready (to avoid deadlock) is obeyed.
library ieee;
use ieee.std_logic_1164.all;
entity axi_width_conv_pause is
port(
clk : in std_logic;
s_axi_data : in std_logic_vector(15 downto 0);
s_axi_valid : in std_logic;
s_axi_ready : out std_logic := '0';
enable : in std_logic;
m_axi_data : out std_logic_vector(7 downto 0) := (others => '0');
m_axi_valid : out std_logic := '0';
m_axi_ready : in std_logic
);
end entity;
architecture rtl of axi_width_conv_pause is
-- We're processing the first half of the input word and hence about to process the second half
signal s_half : std_logic := '0';
begin
s_axi_ready <= m_axi_ready and s_half and enable;
process(clk)
begin
if rising_edge(clk) then
if m_axi_ready = '1' and enable = '1' then
if s_axi_valid = '1' and s_half = '0' then
s_half <= '1';
elsif s_half = '1' then
s_half <= '0';
end if;
end if;
if m_axi_ready = '1' and s_half = '0' then
m_axi_data <= s_axi_data(7 downto 0);
m_axi_valid <= s_axi_valid and enable;
elsif (m_axi_ready = '1' or m_axi_valid = '0') and s_half = '1' then
m_axi_data <= s_axi_data(15 downto 8);
m_axi_valid <= enable;
end if;
end if;
end process;
end architecture;
It is pleasing and reassuring to see the similarities with the code for pausing an AXI data stream creeping in.

The test bench uses randomness to vary the AXI signalling, and the enable is strobed high and low for multiple clock cycles at a time.
entity test_axi_width_conv_pause is
end entity;
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
library local;
use local.testbench_pkg.all;
architecture test of test_axi_width_conv_pause is
constant max_loops_c : positive := 512;
signal clk : std_logic;
signal s_axi_data : std_logic_vector(15 downto 0) := (others => '0');
signal s_axi_valid : std_logic := '0';
signal s_axi_ready : std_logic := '0';
signal enable : std_logic := '1';
signal m_axi_data : std_logic_vector( 7 downto 0) := (others => '0');
signal m_axi_valid : std_logic := '0';
signal m_axi_ready : std_logic := '0';
begin
clkgen : clock(clk, 10 ns);
axi_delay_i : entity work.axi_width_conv_pause
port map (
clk => clk,
s_axi_data => s_axi_data,
s_axi_valid => s_axi_valid,
s_axi_ready => s_axi_ready,
enable => enable,
m_axi_data => m_axi_data,
m_axi_valid => m_axi_valid,
m_axi_ready => m_axi_ready
);
pause : process
constant interval_c : time := 200 ns;
begin
enable <= '1';
while true loop
wait for interval_c;
wait_nr_ticks(clk, 1);
enable <= not enable;
end loop;
wait;
end process;
source : process
variable i : natural := 1;
begin
s_axi_data <= (others => '0');
s_axi_valid <= '0';
wait_nr_ticks(clk, 1);
while i <= max_loops_c loop
s_axi_valid <= '0';
wait_rndr_ticks(clk, 0.25);
s_axi_valid <= '1';
s_axi_data <= std_logic_vector(to_unsigned((i+1) mod 256, m_axi_data'length) & to_unsigned((i mod 256), m_axi_data'length));
wait_nf_ticks(clk, 1);
wait_until(s_axi_ready, '1');
wait_nr_ticks(clk, 1);
i := i + 2;
end loop;
s_axi_valid <= '0';
wait_nr_ticks(clk, 1);
wait;
end process;
sink : process
variable i : natural := 1;
variable tests_passed : boolean := true;
begin
m_axi_ready <= '0';
wait_nr_ticks(clk, 10);
while i <= max_loops_c loop
m_axi_ready <= '0';
wait_rndr_ticks(clk, 0.1);
m_axi_ready <= '1';
wait_nf_ticks(clk, 1);
wait_until(m_axi_valid, '1');
if to_integer(unsigned(m_axi_data)) /= (i mod 256) then
report "Incorrect data read, expected: " & integer'image(i mod 256) & " got: " & integer'image(to_integer(unsigned(m_axi_data)));
tests_passed := false;
end if;
wait_nr_ticks(clk, 1);
i := i + 1;
end loop;
m_axi_ready <= '0';
wait_nr_ticks(clk, 1);
if tests_passed then
report "All tests PASSED";
else
report "At least one test FAILED";
end if;
stop_clocks;
wait;
end process;
end architecture;
Data Bus Halving With Selection
The aim is to select which of the two bytes in the input word proceed to the output, neither, one or both. The table below describes the required case statement decoding logic.
s_axi_byte_en(1:0) | Actions |
---|---|
00 | Emit an invalid data cycle. |
01 | Emit low byte and move on to next input word. |
10 | Emit high byte and move on to next input word. |
11 | Emit low, then high byte and move on to next input word. |
When both bytes are valid, both bytes must wait for the output to be ready, but for the first byte of the pair, the input ready must be held low. Otherwise we are simply outputting one of the pair or none which is easy.
library ieee;
use ieee.std_logic_1164.all;
entity axi_width_conv_pause_filter is
port(
clk : in std_logic;
s_axi_data : in std_logic_vector(15 downto 0);
s_axi_byte_en : in std_logic_vector(1 downto 0);
s_axi_valid : in std_logic;
s_axi_ready : out std_logic := '0';
enable : in std_logic;
m_axi_data : out std_logic_vector(7 downto 0) := (others => '0');
m_axi_valid : out std_logic := '0';
m_axi_ready : in std_logic
);
end entity;
architecture rtl of axi_width_conv_pause_filter is
-- We're processing the first half of the input word and hence about to process the second half
signal s_half : std_logic := '0';
begin
s_axi_ready <= m_axi_ready and s_half when s_axi_byte_en = "11" else
m_axi_ready and enable;
process(clk)
begin
if rising_edge(clk) then
if m_axi_ready = '1' then
-- 's_axi_byte_en(1:0)'
--
-- | 1:0 | Actions
-- +-----+--------------------------------------------------------
-- | 0 0 | Emit an invalid data cycle
-- | 0 1 | Emit low byte and move on to next input word
-- | 1 0 | Emit high byte and move on to next input word
-- | 1 1 | Emit low, then high byte and move on to next input word
case to_bitvector(s_axi_byte_en) is
when "00" =>
m_axi_valid <= '0';
if m_axi_valid = '1' then
s_half <= '0';
end if;
when "01" =>
m_axi_data <= s_axi_data(7 downto 0);
m_axi_valid <= s_axi_valid and enable;
if m_axi_valid = '1' then
s_half <= '0';
end if;
when "10" =>
m_axi_data <= s_axi_data(15 downto 8);
m_axi_valid <= (s_axi_valid or s_half) and enable;
if m_axi_valid = '1' then
s_half <= '0';
end if;
when "11" =>
if s_half = '1' then
-- This is where we are about to process the second half of a word where both bytes are valid
m_axi_data <= s_axi_data(15 downto 8);
m_axi_valid <= '1';
if m_axi_valid = '1' then
-- 'enable' is omitted as we're pausing the input only, and finishing the output
s_half <= '0';
end if;
else
m_axi_data <= s_axi_data(7 downto 0);
m_axi_valid <= s_axi_valid and enable;
if s_axi_valid = '1' and enable = '1' then
s_half <= '1';
end if;
end if;
end case;
end if;
end if;
end process;
end architecture;

entity test_axi_width_conv_pause_filter is
end entity;
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
library local;
use local.testbench_pkg.all;
architecture test of test_axi_width_conv_pause_filter is
constant max_loops_c : positive := 2048;
signal clk : std_logic;
signal s_axi_data : std_logic_vector(15 downto 0) := (others => '0');
signal s_axi_byte_en : std_logic_vector( 1 downto 0) := "00";
signal s_axi_valid : std_logic := '0';
signal s_axi_ready : std_logic := '0';
signal enable : std_logic := '1';
signal m_axi_data : std_logic_vector( 7 downto 0) := (others => '0');
signal m_axi_valid : std_logic := '0';
signal m_axi_ready : std_logic := '0';
begin
clkgen : clock(clk, 10 ns);
axi_delay_i : entity work.axi_width_conv_pause_filter
port map (
clk => clk,
s_axi_data => s_axi_data,
s_axi_byte_en => s_axi_byte_en,
s_axi_valid => s_axi_valid,
s_axi_ready => s_axi_ready,
enable => enable,
m_axi_data => m_axi_data,
m_axi_valid => m_axi_valid,
m_axi_ready => m_axi_ready
);
pause : process
constant interval_c : time := 200 ns;
begin
enable <= '1';
while true loop
wait for interval_c;
wait_nr_ticks(clk, 1);
enable <= not enable;
end loop;
wait;
end process;
source : process
variable i : natural := 1;
variable be : std_logic_vector(1 downto 0);
begin
s_axi_data <= (others => '0');
s_axi_valid <= '0';
wait_nr_ticks(clk, 1);
while i <= max_loops_c loop
s_axi_valid <= '0';
wait_rndr_ticks(clk, 0.25);
s_axi_valid <= '1';
be := random_vector(s_axi_byte_en'length);
case to_bitvector(be) is
when "00" =>
s_axi_data <= x"0000";
when "01" =>
s_axi_data <= x"00" & std_logic_vector(to_unsigned(i mod 256, m_axi_data'length));
i := i + 1;
when "10" =>
s_axi_data <= std_logic_vector(to_unsigned(i mod 256, m_axi_data'length)) & x"00";
i := i + 1;
when "11" =>
s_axi_data <= std_logic_vector(to_unsigned((i+1) mod 256, m_axi_data'length) & to_unsigned(i mod 256, m_axi_data'length));
i := i + 2;
end case;
s_axi_byte_en <= be;
wait_nf_ticks(clk, 1);
loop
if s_axi_ready = '1' then
exit;
end if;
wait_nf_ticks(clk, 1);
end loop;
wait_nr_ticks(clk, 1);
end loop;
s_axi_valid <= '0';
wait_nr_ticks(clk, 1);
wait;
end process;
sink : process
variable i : natural := 1;
variable tests_passed : boolean := true;
variable od : natural;
begin
m_axi_ready <= '0';
wait_nr_ticks(clk, 10);
while i <= max_loops_c loop
m_axi_ready <= '0';
wait_rndr_ticks(clk, 0.1);
m_axi_ready <= '1';
wait_nf_ticks(clk, 1);
wait_until(m_axi_valid, '1');
od := to_integer(unsigned(m_axi_data));
if od /= (i mod 256) then
report "Incorrect data read, expected: 0x" & to_hstring(to_unsigned(i mod 256, 8)) &
" (" & to_string(i mod 256) & ") got: 0x" &
to_hstring(unsigned(m_axi_data)) &
" (" & to_string(od) & ")";
tests_passed := false;
end if;
wait_nr_ticks(clk, 1);
i := i + 1;
end loop;
m_axi_ready <= '0';
wait_nr_ticks(clk, 1);
if tests_passed then
report "All tests PASSED";
else
report "At least one test FAILED";
end if;
stop_clocks;
wait;
end process;
end architecture;
Conclusions
Typically you would use the IP Core that can be created by Vivado and then manipulate the ready signal to apply back pressure on output data. The desire here is to make it easy for a finite state machine to pause the data stream without it having to manage the ready and valid signals and ensure they meet the AXI standards after being manipulated.