I recently reviewed some Clock Domain Crossing (CDC) material provided by Doulos that they have made available as free webinars on demand. I thought I was pretty knowledgeable about CDCs, so I was pleased to find two new solutions they suggested in their webinar. To review their material, you will need to register your interest via their website at https://www.doulos.com/webinars/on-demand/clock-domain-crossing/, which means supplying them with your email address so they can send you a personalised link to the video.
Regarding copyright, this is claimed by Doulos and as such I have been careful to only reproduce the small fraction that I am critiquing here. I believe I have remained within the bounds laid out in Exceptions to copyright for non-commercial research and private study, criticism and review. I must also state that although I have made mention of two disagreements with their content as provided below, I personally consider Doulos to be a reputable provider of training from whom I have benefitted over the years of my career. I suggest the reader registers for the suggested on demand CDC video before reading this blog.
Update November 2024: I more recently joined a Doulos webinar that happened to be the same material. I put the issue I found in this post to Doulos and to their credit they later agreed with my findings as follows:
Clock Domain Crossing - Doulos KnowHow Webinars
Audience Question:
Q: If the input to the bin-to-gray converter (pure combinatorial logic) changes from "0111" to "1000", then surely there are still a sequence of possible changes such that more than one output from the converter can change? Meaning in this particular example a more than one metastable bit might be sampled by the destination register? Vivado seems to think there is a CDC-10 error with the design being proposed. (I did some homework before joining.)
Vivado says your gray code solution as presented is not optimal, not completely right. It’s subtle I grant you, but I can see why it points out this issue. I propose that the solution presented needs the outputs from the gray converters registered in both directions to avoid CDC-10 errors. i.e. not LUTs on inputs to the synchroniser in the destination clock domain. This will then avoid sampling of any intermediate metastable events.
A: I agree with you. The diagram is simplified and the bin2gray and gray2bin blocks must include FF outputs.
Please Note: The recording of the video presentation has been edited to address this – thank you.
Their revised diagram is equivalent to mine below.
- Doulos' "CDC for Registers" Solution
- Code
- Constraints
- Results
- Doulos' "CDC for Counters" Solution
- Code
- CDC Critical Warnings
- Constraints
- Testing
- Conclusions
- References
Doulos' "CDC for Registers" Solution

I've seen data bus solutions before, e.g. a 'data valid' control signal that gets synchronised by 2+ flops whilst the bits of the data bus settle, so that the data bus can be resampled safely in the new clock domain. That's fine for slow to fast clock domain crossings. When crossing from fast to slow clock domains a handshake is required in order to avoid the data changing before it can be safely sampled. The solution that I have used so far is for slowly changing data where there is no need for any back pressure to indicate that new data cannot yet be presented. In that solution, any new data presented too soon is simply discarded. The solution presented by Doulos is a neat way to manage the back pressure. In any solutions using acknowledgement the handshake requires synchronisation in both directionsas it crosses back and forth over the clock domain boundary.
Code
This is my take on their described solution in VHDL.
library ieee;
use ieee.std_logic_1164.all;
entity toggle_synchroniser is
generic(
width_g : positive := 8;
-- Synchroniser chain length
len_g : positive := 2
);
port(
clk_wr : in std_logic;
reset_wr : in std_logic;
clk_rd : in std_logic;
reset_rd : in std_logic;
data_wr : in std_logic_vector(width_g-1 downto 0);
wr_rdy : out std_logic;
wr_tgl : in std_logic;
data_rd : out std_logic_vector(width_g-1 downto 0) := (others => '0');
rd_rdy : out std_logic;
rd_tgl : in std_logic
);
end entity;
architecture rtl of toggle_synchroniser is
-- The range is this way round in order to make the specification of constraints specific & easy.
-- We will want to tell the synthesis tool about a false path of max delay constraint to
-- *_tgl_sync[0]. If the range is reverse, we have to specify *_tgl_sync[*] as we can't pull out
-- the highest index value in XDC.
signal wr_tgl_sync : std_logic_vector(0 to len_g-1) := (others => '0');
signal rd_tgl_sync : std_logic_vector(0 to len_g-1) := (others => '0');
attribute ASYNC_REG : boolean;
attribute ASYNC_REG of wr_tgl_sync : signal is true;
attribute ASYNC_REG of rd_tgl_sync : signal is true;
begin
process(clk_wr)
begin
if rising_edge(clk_wr) then
wr_tgl_sync <= rd_tgl & wr_tgl_sync(0 to len_g-2);
if reset_wr = '1' then
wr_tgl_sync <= (others => '0');
end if;
end if;
end process;
wr_rdy <= wr_tgl xnor wr_tgl_sync(len_g-1);
process(clk_rd)
begin
if rising_edge(clk_rd) then
data_rd <= data_wr;
rd_tgl_sync <= wr_tgl & rd_tgl_sync(0 to len_g-2);
if reset_rd = '1' then
rd_tgl_sync <= (others => '0');
end if;
end if;
end process;
rd_rdy <= rd_tgl xor rd_tgl_sync(len_g-1);
end architecture;
The same design can be used for transfers from fast to slow or slow to fast clock domains, i.e. without coding anything differently. This is show in the simulation waveforms below.


Constraints
set_false_path -to [get_cells {data_rd_reg[*]}]
set_false_path -from [get_port {rd_tgl}] -to [get_cells {wr_tgl_sync_reg[0]}]
set_false_path -from [get_port {wr_tgl}] -to [get_cells {rd_tgl_sync_reg[0]}]
These contrains should be SCOPED_TO_REF in Vivado so they are applied for every instance of the synchonriser. I would suggest that because the constraints applied here are in parallel paths, it would be better to use set_max_delay -datapath_only -from <objects> instead of set_false_path -from <objects> -to <objects>. Maximum delays are preferable because we want to avoid a situation where one of the parallel paths routes around the device mutliple times slowly (by comparison to the paths) before arriving at the synchronisers. A false path simply turns off timing constraints and could lead to this perverse situation. However, without the contextual logic, it is not possible to use set_max_delay -datapath_only -from <objects> as the source registers feeding data_rd_reg[*] are not available. Instead one might use unmanaged TCL-based constraints to find the fan-in registers to use as object for the -from option, but only after elaboration of the fuller design.
Results

The logic synthesised above is coloured by clock domain with the synchronising registers maked by red diamonds. Where data is expected to arrive more quickly than the previous data word can traverse the clock domain, back pressure must be used. Here the toggle nature of the 'ready' signal is not exactly AXI compliant, so the changes required to interface with standard bus control signals is left as an exercise for the keen reader.
Doulos' "CDC for Counters" Solution

In realising this code I believe I have found two mistakes in the given description. However I do have a working example and the result passes CDC verifcation checks in Xilinx's Vivado synthesis tool. This solution is intended for sequential counters incrementing or decrementing by one. Those are the two conditions under which the gray encoding works, ensuring only one data (or counter) bit crossing the clock domain boundary changes at a time. The use case for this design might be for a pair of addresses being applied to each side of a FIFO and needing to use counter comparison each side of the clock domain boundary for (almost) full and empty control signals.
Code
library ieee;
use ieee.std_logic_1164.all;
entity counter_synchroniser is
generic(
width_g : positive := 8;
-- Synchroniser chain length
len_g : positive := 2
);
port(
clk_wr : in std_logic;
reset_wr : in std_logic;
clk_rd : in std_logic;
reset_rd : in std_logic;
cnt_wr : in std_logic_vector(width_g-1 downto 0);
cnt_rd : out std_logic_vector(width_g-1 downto 0) := (others => '0')
);
end entity;
architecture rtl of counter_synchroniser is
type sync_arr_t is array(0 to len_g-1) of std_logic_vector(width_g-1 downto 0);
signal gray : std_logic_vector(width_g-1 downto 0) := (others => '0');
-- The range is this way round in order to make the specification of constraints specific & easy.
-- We will want to tell the synthesis tool about a false path of max delay constraint to
-- *_tgl_sync[0]. If the range is reverse, we have to specify *_tgl_sync[*] as we can't pull out
-- the highest index value in XDC.
signal gray_sync : sync_arr_t := (others => (others => '0'));
attribute ASYNC_REG : boolean;
attribute ASYNC_REG of gray_sync : signal is true;
begin
-- N.B. The Doulos video has a mistake, need shift right not rotate right (ror).
-- See https://en.wikipedia.org/wiki/Gray_code#Converting_to_and_from_Gray_code
-- Therefore do not use:
-- gray <= cnt_wr XOR (cnt_wr ror 1);
-- Synchronisers must be fed from a registered value, with no logic before the first flop in the
-- destination clock domain. The video implies this step can be purely combinatorial, but that's
-- bad CDC practice according to Xilinx and their Vivado tool.
-- Reference https://docs.amd.com/r/en-US/ug906-vivado-design-analysis/Combinatorial-Logic
bin_to_gray : process(clk_wr)
begin
if rising_edge(clk_wr) then
-- VHDL-2008: srl is a shift right logic (srl) operator, short hand for:
-- '0' & cnt_wr(width_g-1 downto 1)
gray <= cnt_wr XOR (cnt_wr srl 1);
if reset_wr = '1' then
gray <= (others => '0');
end if;
end if;
end process;
sync_gray_to_bin : process(clk_rd)
variable bin_v : std_logic_vector(width_g-1 downto 0);
begin
if rising_edge(clk_rd) then
-- Each counter bit gets its own synchroniser
gray_sync <= gray & gray_sync(0 to len_g-2);
-- Gray to binary conversion
bin_v(width_g-1) := gray_sync(len_g-1)(width_g-1);
for i in width_g-2 downto 0 loop
bin_v(i) := bin_v(i+1) xor gray_sync(len_g-1)(i);
end loop;
cnt_rd <= bin_v;
if reset_rd = '1' then
gray_sync <= (others => (others => '0'));
cnt_rd <= (others => '0');
end if;
end if;
end process;
end architecture;
The Doulos' CDC video explains the operation of this design well, and won't be repeated here. I will draw attention to the mistake in the implementation of the binary to gray encoder.

When I first used this to implement the encoder, the sequence of values did not change by only one bit each time. According to Wikipedia, the encoder should be using a shift right operation (as per the text) where the least significant bit is dropped, instead of being rotated into the most significant bit on the left for the XOR operation (as per their diagram where bit 0 appears at the MSB). A simple enough mistake to make, but this does emphasise the need to double check your material before making an educational resource! In VHDL-2008 the difference is between an ror and an srl operator, which some might say would be better coded as explicit bit manipulations for readability. (I had to look up the shift operators as I don't routinely use them.)
CDC Critical Warnings
gray <= cnt_wr XOR (cnt_wr srl 1);
The VHDL code presented above includes a change from the description given by Doulos, to correct a critical warning in the CDC. The video is quite clear about the need to insert registers for the later gray to binary encoding in order to aid timing closure. It suggests the same is not necessary for the binary to gray encoder because the logic depth is trivial (a two input XOR gate for each bit). Intuitively one might think there is little need to register the bus before a synchroniser because the bits are asynchronous anyway, it ought not to matter. However, getting Vivado to check the CDC for issues throws up a critical warning as show below.
Thinking this through, if the counter changes, it alters multiple signals to XOR gates, e.g. the transition from "0111" to "1000", with all XOR inputs changing. The gray encoder is designed such that only one XOR gate's output should change with the others remaining constant. But if glitches from those inputs changing in a random order causes the XOR outputs to go through several iterations before settling are then not cleaned up with a register, then more than one data bit feeding the synchronisers may be changing and negating the use of a gray encoding! That feels very subtle, and perhaps surprising given the gray encoding is supposed to be a mitigation.

Vivado's report_cdc TCL command complains that there is combinational logic feeding the 2+ flip-flop synchroniser with a critical unsafe CDC-10 structure.
This structure is traditionally not recommended due the potential occurrence of glitches on the output of the combinatorial logic, which is captured by the synchronizer and propagated downward to the rest of the design.
Vivado Design Suite User Guide: Design Analysis and Closure Techniques (UG906)
The synthesised logic is shown below.

In my version of the synchroniser I have registered the gray encoded counter values in the source clock domain before feeding the bus to the array of 2+ flip-flop synchronisers per bit. The CDC critical warnings then disappear.
bin_to_gray : process(clk_wr)
begin
if rising_edge(clk_wr) then
if reset_wr = '1' then
gray <= (others => '0');
else
gray <= cnt_wr XOR (cnt_wr srl 1);
end if;
end if;
end process;
Constraints
set_false_path -from [get_cells {gray_reg[*]}] -to [get_cells {gray_sync_reg[0][*]}]
These contrains should be SCOPED_TO_REF in Vivado so they are applied for every instance of the synchonriser.
Testing

The main purpose of the test code (linked below) is to verify that counter values are encoded and decoded correctly. The test bench simply increases one, checks the comparison, then increments the other and re-checks the comparison on each side of the clock domain boundary. The real test of the solution comes from synthesis in Vivado and the clear report_cdc result.

It should be possible to pick out the three steps:
- Binary to Gray conversion
- Array of 2+ flip-flop synchronisers, one per gray counter bit
- Gray to Binary conversion
Conclusions
I propose that the diagram given in the video should be amended as follows.

I think this serves to remind us all that getting cross clock domain solutions correct is non-trivial. I am delighted to have another solution in my armoury, one that I had previously only given passing thought to, but now have the code for.
References
- Github Source Code
- Doulo Webinars On Demand: Clock Domain Crossing
- Visualising Clock Domain Crossings in Vivado