This blog aims to provide a worked example to derive device physics for Xilinx UltraScale devices such that the effect of multiple synchronisation stages can be understood, and the synchroniser chain length selected. To achieve this, we need to understand how to calculate the Mean Time Between Failure (MTBF) of a Clock Domain Crossing's (CDC) synchroniser chain. Then how to translate the overall MTBF design goal into the MTBF requirement of a single synchroniser and hence determine the length of each synchroniser required.
- Essential Background
- Reverse Engineering Device Physics
- Change Frequency
- Confirming the MTBF Calculation
- Metastability Window's Relationship to Setup and Hold Times
- Design MTBF
- Choosing the Number of Stages in a Basic CDC Synchroniser
- Conclusions
- References
Essential Background
If any of this section is unfamiliar, alternative texts need to be read first. Here I just state our starting point in order to support subsequent sections. There are several texts on the subject, and Performance Analysis of Synchronization Circuits provides a good explanation.
This equation describes the interface between clock domains. The destination clock domain's frequency is given by \(F_c\), but we also need to know something about the source clock domain, the "change frequency". This rate of change depends on the presenting data, so if we assume the data (well control signal actually) changes 1 in every n clock cycles and we know the launch clock domain's frequency we can calculate \(F_d\). More on this separately later as Xilinx's Vivado has a TCL command to specify this.
The term \(W F_c\) is a probability more easily recognised as \(\frac W {P_c}\), where \(P_c\) is the sampling clock period and hence this term is the probability of data arrival during the metastability window for a given destination clock cycle. The settling time, \(S\), can then be spread over multiple clock cycles, and the rate of decay is governed by \(\tau\).
Reverse Engineering Device Physics
Xilinx does not provide any values for the physical quantities of any of its devices. This might be because they get updated with new released of their device libraries in Vivado. The last publication is "XAPP 094 Metastable Recovery" dated 24 Nov 1997 and is not available via their DoCNav. We can however reverse engineer the values by using Vivado's TCL command report_synchronizer_mtbf, and analysing the results it produces. For this we need a design to be synthesised by Vivado and from which to make measurements. We can use a generic n-stage synchroniser, assign n using TCL, and then create a report of the MTBF calculated by Vivado.
library ieee;
use ieee.std_logic_1164.all;
entity retime is
generic (
num_bits_g : positive := 2;
reg_depth_g : positive := 2 -- Deliberately does not start at 2 in order to measure the MTBF of a 1-logn resync chain.
);
port (
clk_src : in std_logic;
reset_src : in std_logic;
clk_dest : in std_logic;
reset_dest : in std_logic;
flags_in : in std_logic_vector(num_bits_g-1 downto 0);
flags_out : out std_logic_vector(num_bits_g-1 downto 0)
);
end entity;
architecture rtl of retime is
signal reg_capture : std_logic_vector(num_bits_g-1 downto 0);
type reg_array_t is array(natural range <>) of std_logic_vector(num_bits_g-1 downto 0);
signal reg_retime : reg_array_t(reg_depth_g-1 downto 0);
-- Could be placed in a constraints file
attribute ASYNC_REG : string;
attribute ASYNC_REG of reg_retime : signal is "TRUE";
begin
-- Remove glitches from any unregistered combinatorial logic on the source data.
-- Glitches must not be captured by accident in the new clock domain.
process(clk_src)
begin
if rising_edge(clk_src) then
if reset_src = '1' then
reg_capture <= (others => '0');
else
reg_capture <= flags_in;
end if;
end if;
end process;
process(clk_dest)
begin
if rising_edge(clk_dest) then
if reset_dest = '1' then
reg_retime <= (others => (others => '0'));
else
reg_retime <= reg_retime(reg_depth_g-2 downto 0) & reg_capture;
end if;
end if;
end process;
flags_out <= reg_retime(reg_depth_g-1);
end architecture;
Using out of context synthesis seemed to affect the settling times used in the results. So, an outer wrapper of registers needs to be provided to isolate the synchroniser from any direct and manual specification of boundary timing constraints.
library ieee;
use ieee.std_logic_1164.all;
entity retime_wrapper is
generic (
num_bits_g : positive := 2;
reg_depth_g : positive := 2 -- Deliberately does not start at 2 in order to measure the MTBF of a 1-logn resync chain.
);
port (
clk_src : in std_logic;
reset_src : in std_logic;
clk_dest : in std_logic;
reset_dest : in std_logic;
flags_in : in std_logic_vector(num_bits_g-1 downto 0);
flags_out : out std_logic_vector(num_bits_g-1 downto 0)
);
end entity;
architecture rtl of retime_wrapper is
signal flags_in_i : std_logic_vector(num_bits_g-1 downto 0);
signal flags_out_i : std_logic_vector(num_bits_g-1 downto 0);
begin
process(clk_src)
begin
if rising_edge(clk_src) then
if reset_src = '1' then
flags_in_i <= (others => '0');
else
flags_in_i <= flags_in;
end if;
end if;
end process;
retime_i : entity work.retime
generic map (
num_bits_g => num_bits_g,
reg_depth_g => reg_depth_g
)
port map (
clk_src => clk_src,
reset_src => reset_src,
clk_dest => clk_dest,
reset_dest => reset_dest,
flags_in => flags_in_i,
flags_out => flags_out_i
);
process(clk_dest)
begin
if rising_edge(clk_dest) then
if reset_dest = '1' then
flags_out <= (others => '0');
else
flags_out <= flags_out_i;
end if;
end if;
end process;
end architecture;
Next, we supply the Out of Context (OOC) constraints for timing analysis.
# Clock uncertainty (from a timing report), looks to be device independent
set tcu 0.035
# Getting these from timing reports is painful, but only needs doing once per device/part
#
# Part: xczu2cg-sbva484-2-e
# FDRE Setup Time (Setup_FDRE_C_D) in ns (Slow Process, max delay for Setup times)
set tsus 0.025
# FDRE Hold Time (Hold_FDRE_C_D) in ns (Fast Process, min delay for Hold times)
set ths 0.046
# Choose these:
#
# Extra slack (on hold time), designer's choice
set txs 0.008
# Additional clock uncertainty desired for over constraining the design, set by designer choice
set tcu_add 0.000
create_clock -period 7.000 -name clk_src [get_ports clk_src]
create_clock -period 6.000 -name clk_dest [get_ports clk_dest]
set input_ports_src [get_ports {flags_in[*] reset_src}]
set input_ports_dest [get_ports {reset_dest}]
set output_ports [get_ports {flags_out[*]}]
#
# Standard timing setup, allocate the device delays into the meaningful variables
#
# https://www.xilinx.com/publications/prod_mktg/club_vivado/presentation-2015/paris/Xilinx-TimingClosure.pdf
# Recommended technique for over-constraining a design:
set_clock_uncertainty -setup $tcu_add [get_clocks]
# Input Hold = Input Setup (slow corner)
set input_delay [expr $ths + $tcu + $txs]
# Output Hold = Output Setup (slow corner)
set output_delay $tsus
set_input_delay -clock [get_clocks clk_src] $input_delay $input_ports_src
set_input_delay -clock [get_clocks clk_dest] $input_delay $input_ports_dest
set_output_delay -clock [get_clocks clk_dest] $output_delay $output_ports
# Manage False Paths (small design, taking a short cut here, don't typically recommend blanket turning off
# static timing analysis between clocks like this. Specify the registers more precisely instead.
set_false_path -from [get_clocks clk_src] -to [get_clocks clk_dest]
Finally, we can derive measurement results for analysis using the following TCL script.
set_property part xczu2cg-sbva484-2-e [current_project]
set num_bits 4
set design synth_1
set jobs 6
set resultsfile {path\to\results.log}
set logfile [open $resultsfile a]
puts $logfile "------------- Configuration -----------------"
puts $logfile "Part: [get_project_part]"
puts $logfile "Version: [version -short]"
puts $logfile "---------------------------------------------"
close $logfile
for {set reg_depth 2} {$reg_depth <= 5} {incr reg_depth} {
puts "Loop for reg_depth_g=$reg_depth"
set_property generic "num_bits_g=$num_bits reg_depth_g=$reg_depth" [current_fileset]
set d [current_design -quiet]
if {[llength $d] > 0} {
puts "Closing design [lindex $d 0]"
close_design
}
reset_run $design
launch_runs $design -jobs $jobs
wait_on_run $design
open_run $design -name $design
show_schematic [list [get_ports *] [get_cells -hier *]]
colour_selected_primitives_by_clock_source [get_cells -hier *]
set logfile [open $resultsfile a]
puts $logfile ""
puts $logfile ""
puts $logfile "------------- New Run -----------------"
puts $logfile "Generics: [get_property generic [current_fileset]]"
puts $logfile "clk_src: [get_property PERIOD [get_clocks clk_src]] ns"
puts $logfile "clk_dest: [get_property PERIOD [get_clocks clk_dest]] ns"
puts $logfile "---------------------------------------"
close $logfile
report_synchronizer_mtbf -no_header -file $resultsfile -append
}
The plan here is to measure the MTBF of the synchroniser with between 2 and 5 flip-flop stages in the synchroniser chain. For practical reasons, Xilinx will not recognise a single flip-flop synchroniser, that's just a CDC error. The report_synchronizer_mtbf TCL command help text is provided here for reference. The output is a log file that needs parsing, the values are not available directly to TCL for use in script-based calculations. I copied and pasted the results into Excel for analysis with various forumlae.
Refer to the report_synchronizer_mtbf documentation.
Change Frequency
The source clock domain's change frequency, \(F_d\), needs a little explanation. Vivado's default switching rate is 12.5% (=0.125). That is the rate at which the output of a synchronous logic element switches compared to a given clock input. A toggle rate of 100% means that on average the output toggles once during every clock cycle, changing on either the rising or falling clock edges, and making the effective output signal frequency half of the clock frequency. For clock and DDR signals only, the toggle rate can be specified up to 200%. You convert the switching activity value to a "change frequency" by multiplying the fractional value (percentage as a fraction) by half the source clock domain's frequency. The factor of a half is because 100% (=1.0) would be "half of the clock frequency".
Refer to the set_switching_activity documentation.
Confirming the MTBF Calculation
Note this is not about confirming the correctness of any method of calculating MTBF, just confirming how Xilinx does it. Experimental data was created for a variety of different source and destination clock frequencies when the destination clock frequency was always faster than the source. The results did vary meaning that the values of \(\tau\) and \(W\) derived from Vivado's TCL function have not been precisely determined. However, we do get sensible values in the ranges expected. For brevity a single data set is given below.
NB. Take care with the units used, years vs ns, MHz etc.
Stages | MTBF (s) | ln(MTBF) | Settling Time (s) |
---|---|---|---|
2 | 111.08E+72 | 170.50 | 7.44E-9 |
3 | 154.00E+111 | 260.62 | 11.10E-9 |
4 | 212.07E+150 | 350.74 | 14.90E-9 |
5 | 292.85E+189 | 440.87 | 18.60E-9 |
The settling time is actually the "sum of all slack" in the n-stages of the CDC synchroniser, i.e. not just a multiple of the destination clock domain's clock frequency. This is confirmed by reading Quartus Prime documentation, Synchronization Register Chains. As it happens, timing analysis in Vivado consistently reports a slack (after synthesis only) in each stage of the synchroniser (3.711 ns), but it does not match the accumulating values used in the table above. Its close, but the settling times used here are clearly different and variable too. I expect the best settling times to use will be post implementation for the precise path delays achieved (if the approximation is not inherent in the function). Plotting the \(ln(MTBF)\) against settling time, \(S\), does yield a clear linear relationship.

Using the linear relationship we can map the gradient and axis crossing point back to the device's physical values. Here I used linear regression in an Excel spreadsheet to calculate the gradient and offset and substitute those values to derive sensible physical values.
\[ \begin{align} \ln(MTBF) &= \frac S \tau + \ln\left(\frac 1 {W F_c F_d}\right) \\ \text{Mapping to:} \quad y &= mx + c \quad \\ \tau &= \frac1 m = 41.37 \text{ps} \\ W &= \frac {e^c} {F_c F_d} = 2.16 \text{ps} \end{align} \]Parameter Set | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|
Source Clock Frequency (MHz) | 200.0 | 166.7 | 166.7 | 142.9 | 142.9 |
Change frequency, \(F_d\) (M transitions / s) | 12.50 | 10.42 | 10.42 | 8.93 | 8.93 |
Destination Clock Frequency, \(F_c\) (MHz) | 250.0 | 250.0 | 200.0 | 200.0 | 166.7 |
Settling time constant, \(\tau\) (ps) | 41.37 | 41.37 | 41.33 | 41.33 | 41.40 |
Metastability window, \(W\) (ps) | 2.16 | 2.15 | 2.16 | 2.16 | 0.619 |
We can see a trend in the figures as well as a loss in numerical precision caused by exponentiation of a value that may not have been stated sufficiently precisely. We have however reverse engineered the physical properties of a Xilinx xczu2cg-sbva484-2-e part. We have also demonstrated the relationship used by Xilinx to calculate the MTBF of an n-stage CDC synchroniser. Each synchroniser stage increases the settling time by a destination clock period. So now S is a multiple of \(1 / F_c\).
\[ \begin{align} MTBF_n &= \frac {e^{\frac {nS} {\tau}}} {W F_c F_d} = MTBF_1^n \\[18 pt] \therefore n &= \frac {\ln(MTBF_n)} {\ln(MTBF_1)} \end{align} \]The experimental data confirms Xilinx assumes an exponential relationship between improvement of MTBF with the number stages in the synchroniser. This is worth confirming since a literature search on the Internet struggled to find any reference article willing to make this claim. Instead, they offer much more imprecise claims like "For most synchronization applications, the two flip-flop synchronizer is sufficient to remove all likely metastability." I have of course assumed Xilinx have got their calculations correctly modelled, and the paper MTBF Bounds for Multistage Synchronizers does confirm this, and includes the subtraction of the intra synchronising flip-flop delays from the settling time. Then develops a more accurate model for MTBF prediction of an n-stage synchroniser. A simpler explanation for the n-stage synchroniser can be found at Lecture 11 - Mitigating Metastability. These sources have taken me some time to find on the Internet.
Metastability Window's Relationship to Setup and Hold Times
Quantity | Min (ps) | Max (ps) | Average (ps) |
---|---|---|---|
TSU Setup_FDRE_C_D (ps) | 23 | 25 | 24 |
TH Hold_FDRE_C_D (ps) | 46 | 60 | 53 |
Sum (ps) | 69 | 85 | 77 |

We talk about violating the setup and hold times of a flip-flop as causing a metastable state. Now we see that the metastability window, \(W\) of 2-3 ps, is much smaller than the sum of setup and hold times of ~77ps. So which window does cause a flip-flop to go metastable? Also, I have carelessly talked about extracting "physical properties" of the devices, here I stand corrected.
W "is an extrapolated value and is nonphysical; however, is related to the setup/hold window of a flop."
"It is important to understand that is a mathematical tool to enable one to determine the MTBF of a circuit - it is not a physical property of the circuit in the same manner as the setup/hold window."
Miller and Noise Effects in a Synchronizing Flip-Flop, Charles Dike and Edward Burton
I am not currently aware of a particular relationship than can determine the "mathematical tool" that is \(W\) from the actual physical parameters of setup and hold times.
"All flip-flops have a metastability window around the clock edge which lies somewhere between the setup and hold times. When the data changes during this metastability window a flipflop takes longer to reach its final output value than the normal propagation delay."
Asynchronous Inputs and Flip-Flop Metastability in the CLAS Trigger at CEBAF, David Doughty, Stephan Lemon, Peter Bonneau
"The window of time relative to the clock edge where metastability will actually be triggered is much smaller than the window defined by the setup and hold times (on the order of femtoseconds in modern FPGAs), however it’s exact location is not known and is a function of a number of variables including temperature and voltage. Meeting the setup and hold requirements guarantee a metastable state will not be triggered."
Metastability and Clock Uncertainty in FPGA Designs, Ray Andraka

"Any particular flip-flop at a particular temperature and supply voltage clocks in the data that happens to be at its input during an extremely narrow picosecond timing window. (If data changes during this narrow window, the flip-flop goes metastable). The width of this window is constant, but its position varies, depending on processing, temperature and Vcc."
Xilinx's Xcell Journal Issue 6 from Q4 1990 gives their explanation for the varying position of the metastability window.
The metastability's window location being dependent on external factors like voltage and temperature explains that we have a movable range within a guaranteed range. Add in some statistical distribution with high probabilities perhaps for the rest of the explanation? But don't get hung-up on the metastability window size because...
"Because the time-resolving constant \(\tau\) has the greatest impact on the mean-time-between-failure (MTBF) of the flip-flop due to its exponential relationship, the design of metastable-hardened flip-flops is focused exclusively on the optimization of \(\tau\)."
Design and Analysis of Metastable-Hardened, High-Performance, Low-Power Flip-Flops, PhD Thesis, David Li
Design MTBF
There are two approaches to this section. Firstly the text Performance Analysis of Synchronization Circuits provides a derivation of how to calculate the MTBF for multiple CDCs. Secondly Xilinx offer a formula they use for accumulating different CDC's MTBF into a single result. Thankfully they both converge.
You will note the second becomes the first when each CDC being aggregated has the same MTBF. So, the second is more general, but the first equation demonstrates clearly that MTBF gets worse linearly as the number CDCs increases. The VHDL analysed in this CDC example uses 4 parallel CDCs for no particular reason, and the results for each are shown along with the aggregated MTBF. The values used for analysis here have been taken from a single CDC.
Choosing the Number of Stages in a Basic CDC Synchroniser
The choice for the number of stages in each CDC synchroniser needs to be goal driven. Here the goal is set by the overall design, and we work back to a single synchroniser, using knowledge of the MTBF for a 1-stage synchroniser, which Xilinx's report_synchronizer_mtbf TCL command will not give us. We now know how this is calculated from the MTBF equation, or how to "fudge" it from the results that are returned by report_synchronizer_mtbf. A similar analysis ought to be possible in Intel's Quartus Prime using their report_metastability TCL command.
Estimate:
- The number of CDCs in your design, which could be tough,
- Decide the overall MTBF you should be aiming to achieve,
- Calculate the MTBF you need for a single CDC,
- Calculate the number of synchroniser stages required.
In practice it will be tempting to get Vivado to calculate the MTBF for the whole design and then just add a stage if we were not happy. However, this requires the number of stages in each CDC to be driven by a single generic value. Also, that changes in this generic value do not upset any functional timing. Hence some "rule of thumb" might be helpful in advance.
Conclusions
It is now possible to understand the MTBF calculation method and relationships with precision, even if in practice it is non-trivial to put into use. Adding a synchroniser stage to all CDCs in a design gives multiplicative benefit and increasing the number of CDC synchronisers gives a linear degradation in MTBF.
References
- Github Source Code.
- Performance Analysis of Synchronization Circuits, MPhil Thesis, Zhen Zhang.
- Clock Domain Crossing (CDC) Design & Verification Techniques Using System Verilog, Clifford E. Cummings.
- Wikipedia Metastability (electronics)
- A survey and taxonomy of GALS design styles, Paul Teehan, Mark Greenstreet, Guy G. Lemieux
- Quartus Prime Synchronization Register Chains
- Understanding Metastability in FPGAs, Altera
- Metastability and Synchronizers: A Tutorial, Ran Ginosar
- Lecture 11 - Mitigating Metastability, Ryan Robucci
- MTBF Bounds for Multistage Synchronizers, Salomon Beer, Jerome Cox, Tom Chaney and David Zar
- Xilinx's Xcell Journal Issue 6, Q4 1990
- Metastability and Clock Uncertainty in FPGA Designs, Ray Andraka
2 comments
Comment from: Adrian Byszuk Visitor

Comment from: philip Member

Thank you for your kind words.
Philip
I’ve just discovered this blog post and I’d like to thank you for it.
This post, just like the whole blog is just great!