Dynamic Timing Check For A Standard Clock Domain Crossing Solution

Posted by on 26 Nov 2023 in FPGA, VHDL

Firstly a standard Clock Domain Crossing (CDC) solution is presented. Hopefully this does not differ from versions of the same solution provided by others elsewhere! Note some sources will call this a "MUX recirculation technique", in my code the multiplexer on the destination register is substituted by a chip enable pin, so they are essentially the same. Then the dynamic timing check and its implementation are presented. Finally OSVVM testing is provided and must cope with the intimate testing of control signals across the asynchronous clock interface for a sampling of different generics in multiple DUT instantiation. This blog post is significantly Xilinx device specific. The VHDL code will work in other devices, but the constraints and attributes used will need to be amended.

Standard Solution

The following table summarises the range of Clock Domain Crossing (CDC) solutions and their benefits. For this example we'll be using the second row, for a data bus with "data valid" control signal. The idea for this is not at all novel since Xilinx's XPM library uses dynamic timing checks and prints violations to the simulator transcript during simulation. However Xilinx does not provide an "out of the box" XPM solution for this particular type of CDC solution, and it is left to the user to write their own. This process is likely to go wrong even if you are careful with using CDCs because you need to supply:

The correctly thought out constraints;
Dynamic runtime checking (the subject of this post) to mitigate the CDC inputs switching faster than metastability allows for correctly transitioning the data to the CDC outputs.

CDC synchroniser solutions and their benefits.
Synchroniser Type	Multi-Bit?	Fast clock domain to Slow?	Data every clock cycle on input?
2+ (ASYNC_REG) Flops	No	No	No, even slow to fast has the dynamic timing requirement for safely sampling.
Data bus with “data valid” (no handshake)	Yes	No	No, design utilises multiple clock cycles.
Data bus with “data valid” and acknowledgement handshake	Yes	Yes	No, design utilises multiple clock cycles in two directions.
FIFO	Yes	Yes	Yes, but in the case of fast to slow transfer the aggregate write data rate needs to be no greater than the aggregate read data rate, or FIFO full.

library ieee;
  use ieee.std_logic_1164.all;

entity bus_data_valid_synch is
  generic(
    width_g       : positive := 4;
    -- Synchroniser chain length
    len_g         : positive := 2;
    -- There should be no logic between the final register in the source clock domain
    -- and the first register in the destination clock domain in order to maximise
    -- the positive slack and hence settling time. This register is advised but
    -- optional since data_in and data_valid_in may already come directly from a register.
    src_reg_g     : boolean := true
  );
  port(
    clk_src        : in  std_logic;
    clk_dest       : in  std_logic;
    reset_dest     : in  std_logic;
    data_in        : in  std_logic_vector(width_g-1 downto 0);
    data_valid_in  : in  std_logic;
    data_out       : out std_logic_vector(width_g-1 downto 0) := (others => '0');
    data_valid_out : out std_logic                            := '0'
  );
end entity;


architecture rtl of bus_data_valid_synch is

  signal dv_in : std_logic                            := '0';
  signal di    : std_logic_vector(width_g-1 downto 0) := (others => '0');
  -- Retime the data valid only, to give time for data_in to settle and then be sampled.
  signal dv    : std_logic_vector(0 to len_g-1)       := (others => '0');
  signal dv_d  : std_logic                            := '0';

  attribute ASYNC_REG    : boolean;
  attribute DIRECT_RESET : boolean;

  attribute ASYNC_REG    of dv         : signal is true;
  -- Make sure 'data_valid_out' does not have a LUT in front of it which causes a hold time
  -- violation after synthesis.
  -- NB. The ASYNC_REG attributes already prevent the LUTs infront of the synchronising
  -- registers for 'dv'.
  attribute DIRECT_RESET of reset_dest : signal is true;

begin

  capture : if src_reg_g generate

    process(clk_src)
    begin
      if rising_edge(clk_src) then
        dv_in <= data_valid_in;
        di    <= data_in;
      end if;
    end process;

  else generate

      dv_in <= data_valid_in;
      di    <= data_in;

  end generate;


  sample : process(clk_dest)
  begin
    if rising_edge(clk_dest) then
      if reset_dest = '1' then
        dv             <= (others => '0');
        dv_d           <= '0';
        data_valid_out <= '0';
      else
        dv             <= dv_in & dv(0 to dv'high-1);
        dv_d           <= dv(dv'high);
        data_valid_out <= dv(dv'high) and not dv_d;
      end if;

      if dv(dv'high) = '1' and dv_d = '0' then
        data_out <= di;
      end if;
    end if;
  end process;

end architecture;

CDC synchroniser solution without additional dynamic checking.

Dynamic Timing Check For A Standard Clock Domain Crossing Solution — Simulation showing the re-sampling of the data in the source clock domain and then the transfer to the destination clock domain.

The simulation illustrates the use of a pulse generator in the destimation clock domain to ensure that the "data valid" output signal is not more than one clock cycle long just because of the way the clock edges fell between fast (source) and slow (destination) clock domains.

Constraints and Static Timing Analysis

The VHDL code includes Xilinx's ASYNC_REG attributes for the "data valid" synchronising chain, ensuring the registers in the chain are correctly handled by Vivado, including being packed into the same SLICE/CLB. There are paths in this design that must not be left unconstrainted by using set_false_path constraints. The issue here is that false paths turn off timing analysis leaving open the possibility (as unlikely as it might be) that the connecting wires will circumnavigate the device multiple times before reaching their destination. We wish to avoid an unconstrained delay so that the delay is smaller than the time taken for the data valid to propgate its synchronising chain. This means using set_max_delay on both the data bus and the "data valid" hop between clock domains with absolutely no logic in any path until safely synchronised.

The image above illustrates the two clock domains, the source registers are in green and the destination registers are in pink. The solution works by synchronising the "data valid" control signal to the new clock domain and then using it to re-sample the data originally presented in the source clock domain. The settling time for the data lines is the destination clock period multiplied by the length of the synchronising chain, in this example that of length 3 registers or clock periods. The "data valid" synchronising chain with ASYNC_REG attributes are shown with red diamonds. You will note from the VHDL that the source clock registers are optional. As long as the data and data valid signal leave the source clock domain without any logic on their outputs, the signals do not need to be re-registered on receipt. Vivado has analysis tools for clock domain crossings that are very likely to catch these mistakes, but only if you use them.

set max_delay \
    [get_property PERIOD \
        [get_clocks -of_objects \
            [get_cells {data_out_reg[*]}]]]

set_max_delay -datapath_only \
    -from [get_cells {capture.di_reg[*]}] \
    -to   [get_cells {data_out_reg[*]}] \
    $max_delay

set_max_delay -datapath_only \
    -from [get_cells {capture.dv_in_reg}] \
    -to   [get_cells {dv_reg[0]}] \
    $max_delay

# The paths between 'di' and 'data_out' still come up as 'warnings' even though they are also marked
# as 'safe' with a set_max_delay exception.
create_waiver -type CDC -id CDC-15 \
  -from [get_pins {capture.di_reg[*]/C}] \
  -to   [get_pins {data_out_reg[*]/D}] \
  -user Author \
  -description {Controlled clock domain crossing of a data bus} \
  -tags {Data Valid CDC}

Constraints touse for this CDC design.

The aim here is to augment the above VHDL with the code required to report data input 'abuse', values that change too fast for correct sampling. To illustrate, Xilinx's XPM use Verilog code to issue warnings like the following:

# ** Warning: [XPM_CDC_ARRAY_SINGLE S-1] Input data (src_in[0]) at 599970000000 is not stable long enough to be sampled twice by the destination clock. Data in source domain may not transfer to destination clock domain.
#    Time: 599970 ns Started: 599954 ns  Scope: <path>.<register>[x] File: C:/Xilinx/Vivado/2020.1/data/ip/xpm/xpm_cdc/hdl/xpm_cdc.sv Line: 1044

XPM library component warning for synchroniser inputs changing too soon.

Dynamic Timing Verification

The theory is that the input data must settle before it is sampled by the destination clock or the sampled value could go metastable. In our CDC of interest, the data is sampled by a register with a chip enable such that the sampling time can be controlled. The desire is to print a message to the simulation window if the input data is changed before it can be safely sampled by the destination clock. This is typically done using an assert statement as the least assuming method of writing to the transcript. The design of the dynamic checking will therefore assume that the time required for the input data to settle is the length of the synchroniser chain specified by the user multiplied by the destination clock period.

Checking for Data Stability

di_stbl <= di'stable(dest_period_g * len_g)

VHDL already provides what appears to be a suitable means of checking for stability. The S'stable(T) attribute creates "a signal that is true if and only if no event has occured on signal S for time T" (Ref: Doulos Golden Reference Guide). The value T must be a globally static parameter, like a generic value passed in to the component or a local constant. So the initial attempt used this attribute with the destination clock period passed by a generic. I phased this out in favour of another method because I did not trust the generic value to be updated correctly when the clock period changed, and it was the only reason for passing in this generic value in.

I looked for a means to both measure the actual clock period, and then create my own version of the stability signal.

stbl : block

  type slv_arr_t is array(integer range <>) of std_logic_vector;

  signal di_stbl     : boolean                                     := false;
  signal di_dly_vec  : slv_arr_t(0 to len_g-1)(width_g-1 downto 0) := (others => (others => '0'));
  signal clk_period  : time                                        := 0 ns;

  function is_same(vec : slv_arr_t) return boolean is
    variable v : std_logic_vector(width_g-1 downto 0);
  begin
    v := vec(vec'low);
    for i in vec'low+1 to vec'high loop
      if v /= vec(i) then
        return false;
      end if;
    end loop;
    return true;
  end function;

begin

  -- We can verify the destination clock period has been correctly set, but we can't determine it and
  -- use the value in 'stable().
  timer : process(clk_dest)
    variable last_ev     : time    := 0 ns;
    variable last_ev_asn : boolean := false;
  begin
    if rising_edge(clk_dest) then
      if last_ev_asn then
        clk_period <= now - last_ev;
      end if;
      last_ev     := now;
      last_ev_asn := true;
    end if;
  end process;

  -- 'stable attribute requires a globally static parameter, i.e. 'dest_period_g * len_g' not a
  -- signal/variable like 'clk_period'.
  --    E.g. di_stbl <= di'stable(dest_period_g * len_g);
  -- We can recreate this from the measured 'clk_period' instead, NB. must not use a rising clock edge:
  di_dly_vec <= di & di_dly_vec(0 to di_dly_vec'high-1) after clk_period;
  di_stbl    <= (di_dly_vec(di_dly_vec'high) = di) and is_same(di_dly_vec);

end block;

The means to measure the destination clock period internally and independently, then create a signal equivalent to 'stable.

The di_dly_vec shift register became necessary for two reasons. Firstly because the following could under some circumstances fail:

di_dly <= di after (dest_period_g * len_g);

The problem occurs when di changes within the time period dest_period_g * len_g and hence before di_dly has been updated. di_dly then gets assigned the incorrect value in quite a random way. This is inertial delay mode, the default behaviour of signal assignments in VHDL where pulses shorter than the specified delay are ignored. As its the default, you've probably never seen the keyword in use and never needed to know about the other options either. Using the transport delay mode gives me what I had intended.

di_dly <= transport di after (dest_period_g * len_g);

Only now I have a second problem when the value of di, "1101" say, changes say to "1001" and then back again to "1101". The transport delay mode indicates the correct value, di_dly = di, even though di changed immediately afterwards and just happened to change back to the same value in time for the test. We need a stronger test for stability without using 'stable. Hence the use of a delay vector, di_dly_vec, that allows the checking for the presence of different values in the delay vector using a local function is_same().

This is not done with a clocked process as the shift must not only happen on a destination clock edge, but must happen relative to the change in di, which is in the source clock domain. Now we have a correctly delayed version of the original data in, we can compare to that original to test it has not changed. This is a bit more involved than using the 'stable attribute, but avoids having to specify the clock period on a generic. If you prefer the latter, make sure you pass in a generic for the destination clock period and keep it up to date with any changes in your code.

Having got a measure of stability, we then need to test that the data is stable when the receiving registers are chip enabled. If not we print a warning message. For the purposes of verification in simulation, we also need to create a boolean signal that can be spied on by an external signal in the test bench.

verify : process(clk_dest)
begin
  if rising_edge(clk_dest) then
    if dv(dv'high) = '1' then
      if not di_stbl then
        report "Metastability Risk: 'di' was not stable for " & integer'image(len_g) & " destination clock periods when sampled." severity warning;
        stbl_at_clk <= false;
      else
        stbl_at_clk <= true;
      end if;
    end if;
  end if;
end process;

Using the stability test to generate the warning message for the simulation transcript.

It is possible to use an external signal to pull out this stbl_at_clk signal into a test bench and test it is always in the required state, or cause the test bench to fail in some way. A simple way to achieve this would be to use an assert statement like the following code, or alertIfNot() in OSVVM.

assert <<signal path.to.stbl_at_clk : boolean >>
  report "CDC metastability occurred"
  severity error;

One final issue uncovered during testing was the effect of sending in new data on consecutive clock cycles. The CDC design uses a pulse generator which means if the data valid input stays high to cover multiple data values, then not only has the brief of the CDC design been exceeded, but any several consecutive data valid clock cycles get reduced to one on the output. This means we need to detect the design brief has been exceeded and both print a message to the transcript as well as provide a signal for the test bench to verify the situation has been correctly detected.

dv_test : process(clk_src)
begin
  if rising_edge(clk_src) then
    dv_in_d <= dv_in;

    if dv_err then
      report "Data supplied faster than the CDC solution is designed for." severity warning;
    end if;
  end if;
end process;

dv_err <= (dv_in = '1' and dv_in_d = '1');

Detecting the data ingress rate exceeds the design's intended ability.

The assembled dynamic checking code now looks like this in totality and can be appended to the RTL code within the architecture. Note this is wrapped between synthesis translate_off/on pragmas to prevent synthesis from trying to interpret the code.

-- synthesis translate_off
stbl : block

  type slv_arr_t is array(integer range <>) of std_logic_vector;

  signal dv_in_d     : std_logic                                   := '0';
  signal dv_err      : boolean                                     := false;
  signal di_stbl     : boolean                                     := false;
  signal di_dly_vec  : slv_arr_t(0 to len_g-1)(width_g-1 downto 0) := (others => (others => '0'));
  -- Signal used by test bench external signal to verify correct operation
  signal stbl_at_clk : boolean                                     := false;
  signal clk_period  : time                                        := 0 ns;

  -- Is each element of 'vec' the same value? i.e. has the value remained stable over all
  -- clock periods?
  --
  -- If the value flips and flips back there has been instability that is in danger of being
  -- overlooked by the test "di_dly_vec(di_dly_vec'high) = di" alone.
  --
  function is_same(vec : slv_arr_t) return boolean is
    variable v : std_logic_vector(width_g-1 downto 0);
  begin
    v := vec(vec'low);
    for i in vec'low+1 to vec'high loop
      if v /= vec(i) then
        return false;
      end if;
    end loop;
    return true;
  end function;

begin

  dv_test : process(clk_src)
  begin
    if rising_edge(clk_src) then
      dv_in_d <= dv_in;

      if dv_err then
        report "Data supplied faster than the CDC solution is designed for." severity warning;
      end if;
    end if;
  end process;

  dv_err <= (dv_in = '1' and dv_in_d = '1');

  -- We can verify the destination clock period has been correctly set, but we can't determine it and
  -- use the value in 'stable().
  timer : process(clk_dest)
    variable last_ev     : time    := 0 ns;
    variable last_ev_asn : boolean := false;
  begin
    if rising_edge(clk_dest) then
      if last_ev_asn then
        clk_period <= now - last_ev;
      end if;
      last_ev     := now;
      last_ev_asn := true;
    end if;
  end process;

  -- 'stable attribute requires a globally static parameter, i.e. 'dest_period_g * len_g' not a
  -- signal/variable like 'clk_period'.
  --    E.g. di_stbl <= di'stable(dest_period_g * len_g);
  -- We can recreate this from the measured 'clk_period' instead, NB. must not use a rising clock edge:
  di_dly_vec <= di & di_dly_vec(0 to di_dly_vec'high-1) after clk_period;
  di_stbl    <= (di_dly_vec(di_dly_vec'high) = di) and is_same(di_dly_vec);

  verify : process(clk_dest)
  begin
    if rising_edge(clk_dest) then
      if dv(dv'high) = '1' then
        if not di_stbl then
          report "Metastability Risk: 'di' was not stable for " & integer'image(len_g) &
                 " destination clock periods when sampled." severity warning;
          stbl_at_clk <= false;
        else
          stbl_at_clk <= true;
        end if;
      end if;
    end if;
  end process;

end block;
-- synthesis translate_on

The assembled dynamic checking code to ensure data is correctly transferred between clock domains.

The entire RTL code with dynamic checking is too large for a practical presentation here, but the latest version can be viewed on Github.

Testing

Testing this component was particulalry awkward because:

Verification works across an asynchronous clock boundary such that clock edges slide relative to each other and non-synchronous sampling of control signals cause timing variations in data checking;
Not all valid data inputs produce valid data outputs.
The VHDL testbench cannot verify that a message has been printed to the transcript, hence the proxy signals in the RTL code.
The use of generics meant that a number of instantiations need to be tested in order to provide a decent sampling of the "generic space".

OSVVM was particularly helpful, notably in two ways:

The use of ScoreBoards to provide test bench FIFOs to cope with the different delays across different instantiations. Note that the checking facilitiy in these could not be used out of the box as the values pushed needed test applied to them on receipt as detailed below.
The constrained random generation of test data, once basic test issues had been resolved (i.e. not immediately), produced a truely useful and testing set of sequences. These sequences uncovered problems that would easily have been overlooked.

The testbench must cope with the following situations:

The writer process must anticipate that samples sent deliberately too quickly will get coalesced, and only push one value into the scoreboard.
When data is sent on the cusp of being too soon, the data output from the synchroniser could be one of two values. This typically happens when the delay between input data samples is the same as the delay through the synchroniser, i.e. the DUT generic len_g destination clock cycles.
The DUT must correctly predict when data is not stable for checking.
The DUT must correctly predict when data is applied too soon such that its design remit is exceeded.

In order to create an OSVVM scoreboard for use in this test bench I found I had to create a separate (generic) package sent_pkg for my data type that varied in data size with each test instance, and create the customised scoreboard (itself a generic package) within sent_pkg. If I tried to place this VHDL code in the test bench file to avoid creating an extra file to compile, the compiler seemed unable to find osvvm.ScoreboardGenericPkg. This remains unresolved, and as a result there is an extra level of package indirection through yet another generic package.

As you might expect the DUT is exercised by a writer process and verified by a checker process. The writer process must anticipate that samples sent deliberately too quickly, i.e. on adjacent source clock cycles, will get coalesced and the writer must only push one value into the scoreboard. In this situation, the DUT must detect that the synchroniser is being abused and report a message, backed up by a signal that is inspected. The verifying process then tests how many destination clock cycles the input change delay represents and decides whether there is a clear cut check situation or some ambiguity to resolve. The table below describes the checking decision process. Remember the design of this CDC requires that the destination clock period is less than the source clock period, and the input delay will be a multiple of one or more source clock periods.

Verification decision making process.
\(floor \left( \frac{\text{Input data delay}} {\text{destination clock period}} \right)\)	Stability Check	Input-Output Data check
0 or 1 (i.e. the input delay = 1 source clock period)	Source data abuse must be flagged	Sequence of adjacent values coalesced to one output data value that is not checked.
< synchroniser chain length	Instability must have been detected	Not checked
= synchroniser chain length DUT indicates unstable data	Queried	Not checked
= synchroniser chain length DUT indicates stable data	Queried	Must match
> synchroniser chain length	Must be stable	Must match

When the time delay between input data changing divided by the destination clock period equals the synchronisation chain length, it becomes impossible to predict the output data value with certainty. An example simulation waveform is given below. The input data does not reach the output before the input data is changed. In simulation the second value is output instead of the first. In reality we have to assume a metastable transfer. For verification, as its is tricky to predict how the clock edges will fall without much complication of code, we now rely on the DUT's stability check and verify the data only if the DUT says it should be correct. In this particular situation we are unable to verify the correct functioning of the stability check in the DUT and this situation has the DUT informing verification!

Within the test bench there are also some metrics collected for different delays to help understand to what level verification could absolutely determined. These can be turned on by editing the VHDL code constant print_stats_c. They will show the distribution of correctly and incorrectly received data values and the state of the DUT's internal stability test.

OSVVM Wishlist

On my wish list for OSVVM is the ability to CreateClock()s not just with different periods, but starting at a specified phase difference. This would allow a 12 ns period clock to never have a coincident clock edge with a 6 ns period clock. I think that could assist with testing CDC solutions. I would also like to verify that every item in all the coverage bins for one DUT had a corresponding affirmation. Note that the test bench instantiates multiple DUTs with different generic values to sample the "generic space". ReportAlerts can be localised to a chosen AlertLogIDType and internally provides a localised affirmation count for a single AlertLogIDType (1 per DUT). GetAffirmCount cannot, it can only return the global affirmation count across all DUT instances, and no provision is made to supply an AlertLogIDType parameter. This seems like an oversight, and not too difficult to implement. This means I cannot verify that the number of affirmations per DUT equals the total number of items in all the coverage bins for a DUT and make this part of the DUT local pass/fail criteria. I can only do this at the level of all DUTs in the testbench which does not immediately tell you which DUT missed an affirmation, and requires some information to be passed out of each individual generated test.

Conclusions

The blog describes how the dynamic checks can be derived in this style of CDC. This has been achieved without hard coding the destination clock period in some form, e.g. a generic, which can then fail to be maintained. The CDC solution is shown to work reliably when the input data rate is limited to one clock cycle in every len_g+1, where len_g is the generic parameter used to specify the length of the synchroniser chain in the CDC solution.

References

No feedback yet

Form is loading...

Mon	Tue	Wed	Thu	Fri	Sat	Sun
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31
<< <				> >>

Technology Blogs