Notes on Fixing Hold Time Violations
Having spent my FPGA design life solving timing closure problems by fixing setup time violations, I wondered what I should be doing for hold time violations? Here follows a collection of notes pulled from various Internet-based sources to explain more fully how a hold time violation occurs, what the full range of possible fixes are, what is most appropriate for FPGAs, and how to fix the violations in Xilinx's Vivado.
- What is the Hold Time?
- Options for Fixing Hold Time
- Xilinx Vivado Methods
- Worked Example
- Conclusions
- References
What is the Hold Time?

Quantity | Represents |
---|---|
TH | Hold Time |
TSU | Setup Time |
- Definition of hold time
- "Hold time is defined as the minimum amount of time after arrival of clock's active edge so that it can be latched properly. In other words, each flip-flop (or any sequential element, in general) needs data to be stable for some time after arrival of clock edge such that it can reliably capture the data. This amount of time is known as hold time."
Ref: Hold time, VLSI Universe.
- What if setup and/or hold violations occur in a design?
- "As said earlier, setup and hold timings are to be met in order to ensure that data launched from one flop is captured properly at another and in accordance to the state machine designed. In other words, no timing violations means that the data launched by one flip-flop at one clock edge is getting captured by another flip-flop at the desired clock edge. If the setup check is violated, data will not be captured properly at the next clock edge. Similarly, if hold check is violated, data intended to get captured at the next edge will get captured at the same edge. Moreover, setup/hold violations can lead to data getting captured within the setup/hold window which can lead to metastability of the capturing flip-flop (as explained in our post metastability). So, it is very important to have setup and hold requirements met for all the registers in the design and there should not be any setup/hold violations."
Ref: Setup and hold violations, VLSI Universe.
So I interpret that as the launch register could go metastable and mis-represent the value it should currently have back to itself and the destination register could get the value (data) too soon, e.g. in advance of the clock edge it was supposed to receive the value on.
Options for Fixing Hold Time
The following strategies can be useful in reducing the magnitude of hold violations and bringing the hold slack towards a positive value. They are general strategies that are perhaps more suitable to full custom ASICs, so each has been tabled in order to comment on their suitability for use with FPGAs.
Method | Description | FPGA Applicability |
---|---|---|
Insert delay elements | This is the simplest we can do, if we are to decrease the magnitude of a hold time violation. The increase in data path delay can be increased if we insert delay elements in the data-path. Thus, the hold violating path's delay can be increased, and hence, slack can be made positive by inserting buffers in hold violating data-path. | No. This would need to be effected in VHDL and does not sound reliable since optimisation steps would remove the logic. |
Reduce the drive strength of data-path logic gates | Replacing a cell with a similar cell of less drive strength will certainly add delay to data-path. However, there is a slight chance of decrease in data-path delay if the cell load is dominated by intrinsic capacitance as we discussed in how delay of a standard cell changes with drive strength. | No. Cannot alter the physics of the FPGA fabric. |
Use data-path cells with higher threshold voltages | If you have multiple flavours of threshold voltages in your design, the cells with higher threshold voltage will certainly have higher delays. So, this must be the first option you must be looking for to resolve hold violations. | No. Cannot alter the physics of the FPGA fabric. You are stuck with the primitives you are given. |
Improve hold time of capturing flip-flop | Using a capturing flip-flop with higher drive strength and/or lower threshold voltage will give a lower hold time requirement. Also, improving the transition at flip-flop's clock pin reduces its hold time requirement. | No. Cannot alter the physics of the FPGA fabric. You are stuck with the primitives you are given. |
Detoured routing | Detoured routing can be adoped as an alternative to insertion of delay elements as it will add load to the driving cell as well as provide additional net delay thereby increasing the data-path delay. | Yes. This is what the place and route tools do for you. |
Play with clock skew | A positive skew degrades hold timing and a negative skew aids hold timing. So, if a data-path is violating, we can either decrease the latency of capturing flip-flop or increase the clock latency of launching flip-flop. However, in doing so, we need to keep in mind the setup and hold slacks of other timing paths starting and/or ending at these flip-flops. | Not really. If this were to be done, it would need to be done by the place and route tools. |
Increase the clk->q delay of launching flip-flop | A launching flip-flop with more clk->q delay will help ease the hold timing of the data-path. For this, either we can decrease the drive strength of the flip-flop or move it to higher threshold voltage. | No. Cannot alter the physics of the FPGA fabric. You are stuck with the primitives you are given. |
Ref: How to fix hold violations, VLSI Universe
So there's only really one option for solving hold time violations in an FPGA, and that's to let the tools perform their task. Where the hold times are on the output pins of the FPGA, then physical properties do have some flexibility, however there is no way of having a hold check done on an output, so that flexibility is of little consequence.
Xilinx Vivado Methods
The tools will automatically fix "reasonable" hold time violations - you shouldn't need to worry about them.
Let's look at the two cases.
- Between FFs, hold violations are caused by different arrival times of the clocks at the source and destination FF. If the two FFs are on the same clock network, then the skew will be small, and the tool will be able to fix them.
- Even if the two FFs are on different, but related (and properly constructed), clocks, then the skew between them should still be small, and the tool will fix them.
- However, if you have a bad clock design - say one clock coming through one BUFG, and another coming through two BUFGs, then the skew will be large (several nanoseconds), then the tool will likely not be able to fix them. This is not a tool bug, but an error in design - you need to design clocking systems that use the resources of the FPGA properly.
- As for the second case - hold time violations on inputs generally need to be fixed using proper interface design. You either need to use an MMCM/DCM to adjust the phase of the capture clock so that the clock is centred in the data eye, or use IDELAY cells to move the data over the clock. Again, it is really up to the user to design the capture mechanism, using the dedicated clock and IOB resources in the FPGA. The OFFSET IN in these cases merely acts as a mechanism to ensure that you have the system designed properly. You should never rely on the tool to fix hold time violations in interfaces by adding buffers/routing delay...
(There is no way of having a hold check done on an output - the OFFSET OUT format doesn't have a mechanism for specifying a minimum clock to out).
Ref: How to fix hold violation, any general solution?, Xilinx Forum.
Since hold times are only fixed by route_design, it is normal to see some small hold time violations in a synthesized or placed (but not routed) design. As long as the magnitude of these violations are small (like 100ps or so), then you can ignore them - the tools will fix this in route_design.
Ref: How to fix hold violation, any general solution?, Xilinx Forum.
In order to encourage place and route to resolve hold time violations, here are the Vivado commands that can be (TCL) scripted.
phys_opt_design -help [-hold_fix] Attempt to improve slack of high hold violators To perform hold fixing you must specify the -hold_fix option, or the -directive ExploreWithHoldFix option. -hold_fix - (Optional) Performs optimizations to insert data path delay to fix hold time violations. -aggressive_hold_fix - (Optional) Performs optimizations to insert data path delay to fix hold time violations. Considers significantly more hold violations than the standard hold fix algorithm. -directive <arg> - (Optional) Directs the mode of physical optimization with specific design objectives. Only one directive can be specified for a single phys_opt_design command, and values are case-sensitive. Supported values include: <snip> * ExploreWithHoldFix - Run different algorithms in multiple passes of optimization, including hold violation fixing and replication for very high fanout nets. * ExploreWithAggressiveHoldFix - Run different algorithms in multiple passes of optimization, including aggressive hold violation fixing and replication for very high fanout nets. For example: * phys_opt_design -hold_fix * phys_opt_design -directive ExploreWithHoldFix
Worked Example
Taking the example below of hold time violation in the arrangement of a PLL feeding a SERDES (SERial-DESerialier) primitive, after synthesis the hold time is violated. Scrolling through these reports will show lines that are highlighted for inspection.
current_design synth_1 config_timing_corners -corner Fast -delay_type min config_timing_corners -corner Slow -delay_type none report_timing -from [get_pins {i_hdmi_encoder/U0/LockLostReset/SyncAsyncx/oSyncStages_reg[1]/C}] -to [get_pins i_hdmi_encoder/U0/ClockSerializer/SerializerMaster/RST] -delay_type min -max_paths 1 -sort_by group -input_pins -routable_nets INFO: [Timing 38-91] UpdateTimingParams: Speed grade: -1, Delay Type: min. INFO: [Timing 38-191] Multithreading enabled for timing update using a maximum of 2 CPUs INFO: [Timing 38-78] ReportTimingParams: -from_pins -to_pins -max_paths 1 -nworst 1 -delay_type min -sort_by group. Copyright 1986-2022 Xilinx, Inc. All Rights Reserved. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | Tool Version : Vivado v.2022.1 (win64) Build 3526262 Mon Apr 18 15:48:16 MDT 2022 | Date : Sun Nov 27 12:14:18 2022 | Host : Rievaulx running 64-bit major release (build 9200) | Command : report_timing -from [get_pins {i_hdmi_encoder/U0/LockLostReset/SyncAsyncx/oSyncStages_reg[1]/C}] -to [get_pins i_hdmi_encoder/U0/ClockSerializer/SerializerMaster/RST] -delay_type min -max_paths 1 -sort_by group -input_pins -routable_nets | Design : zybo_count | Device : 7z010-clg400 | Speed File : -1 PRODUCTION 1.12 2019-11-22 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Timing Report Slack (VIOLATED) : -0.173ns (arrival time - required time) Source: i_hdmi_encoder/U0/LockLostReset/SyncAsyncx/oSyncStages_reg[1]/C (rising edge-triggered cell FDPE clocked by clk_out_2_pll {rise@0.000ns fall@20.000ns period=40.000ns}) Destination: i_hdmi_encoder/U0/ClockSerializer/SerializerMaster/RST (rising edge-triggered cell OSERDESE2 clocked by clk_out_2_pll {rise@0.000ns fall@20.000ns period=40.000ns}) Path Group: clk_out_2_pll Path Type: Hold (Min at Fast Process Corner) Requirement: 0.000ns (clk_out_2_pll rise@0.000ns - clk_out_2_pll rise@0.000ns) Data Path Delay: 0.478ns (logic 0.141ns (29.492%) route 0.337ns (70.508%)) Logic Levels: 0 Clock Path Skew: 0.145ns (DCD - SCD - CPR) Destination Clock Delay (DCD): -0.354ns Source Clock Delay (SCD): -0.692ns Clock Pessimism Removal (CPR): 0.193ns Location Delay type Incr(ns) Path(ns) Netlist Resource(s) ------------------------------------------------------------------- ------------------- (clock clk_out_2_pll rise edge) 0.000 0.000 r L16 0.000 0.000 r clk_port (IN) net (fo=0) 0.000 0.000 i_pll/inst/clk_in L16 r i_pll/inst/clkin1_ibufg/I L16 IBUF (Prop_ibuf_I_O) 0.259 0.259 r i_pll/inst/clkin1_ibufg/O net (fo=1, unplaced) 0.114 0.373 i_pll/inst/clk_in_pll r i_pll/inst/plle2_adv_inst/CLKIN1 PLLE2_ADV (Prop_plle2_adv_CLKIN1_CLKOUT1) -1.638 -1.265 r i_pll/inst/plle2_adv_inst/CLKOUT1 net (fo=1, unplaced) 0.337 -0.928 i_pll/inst/clk_out_2_pll r i_pll/inst/clkout2_buf/I BUFG (Prop_bufg_I_O) 0.026 -0.902 r i_pll/inst/clkout2_buf/O net (fo=165, unplaced) 0.210 -0.692 i_hdmi_encoder/U0/LockLostReset/SyncAsyncx/PixelClk FDPE r i_hdmi_encoder/U0/LockLostReset/SyncAsyncx/oSyncStages_reg[1]/C ------------------------------------------------------------------- ------------------- FDPE (Prop_fdpe_C_Q) 0.141 -0.551 r i_hdmi_encoder/U0/LockLostReset/SyncAsyncx/oSyncStages_reg[1]/Q net (fo=8, unplaced) 0.337 -0.214 i_hdmi_encoder/U0/ClockSerializer/aRst OLOGIC_X0Y74 OSERDESE2 r i_hdmi_encoder/U0/ClockSerializer/SerializerMaster/RST ------------------------------------------------------------------- ------------------- (clock clk_out_2_pll rise edge) 0.000 0.000 r L16 0.000 0.000 r clk_port (IN) net (fo=0) 0.000 0.000 i_pll/inst/clk_in L16 r i_pll/inst/clkin1_ibufg/I L16 IBUF (Prop_ibuf_I_O) 0.447 0.447 r i_pll/inst/clkin1_ibufg/O net (fo=1, unplaced) 0.259 0.706 i_pll/inst/clk_in_pll r i_pll/inst/plle2_adv_inst/CLKIN1 PLLE2_ADV (Prop_plle2_adv_CLKIN1_CLKOUT1) -1.799 -1.093 r i_pll/inst/plle2_adv_inst/CLKOUT1 net (fo=1, unplaced) 0.355 -0.738 i_pll/inst/clk_out_2_pll r i_pll/inst/clkout2_buf/I BUFG (Prop_bufg_I_O) 0.029 -0.709 r i_pll/inst/clkout2_buf/O net (fo=165, unplaced) 0.355 -0.354 i_hdmi_encoder/U0/ClockSerializer/PixelClk OLOGIC_X0Y74 OSERDESE2 r i_hdmi_encoder/U0/ClockSerializer/SerializerMaster/CLKDIV clock pessimism -0.193 -0.547 OLOGIC_X0Y74 OSERDESE2 (Hold_oserdese2_CLKDIV_RST) 0.506 -0.041 i_hdmi_encoder/U0/ClockSerializer/SerializerMaster ------------------------------------------------------------------- required time 0.041 arrival time -0.214 ------------------------------------------------------------------- slack -0.173
Recall that Hold time slack = Arrival time - Required time. The first half of the calculation above derives the arrival time, and the second half derives the required time. Compare these figures to the report below for the implementation stage, where the hold time slack violation is fixed by selecting nets with sufficient delay to compensate:
current_design impl_1 config_timing_corners -corner Fast -delay_type min config_timing_corners -corner Slow -delay_type none report_timing -from [get_pins {i_hdmi_encoder/U0/LockLostReset/SyncAsyncx/oSyncStages_reg[1]/C}] -to [get_pins i_hdmi_encoder/U0/ClockSerializer/SerializerMaster/RST] -delay_type min -max_paths 1 -sort_by group -input_pins -routable_nets INFO: [Timing 38-91] UpdateTimingParams: Speed grade: -1, Delay Type: min. INFO: [Timing 38-191] Multithreading enabled for timing update using a maximum of 2 CPUs INFO: [Timing 38-78] ReportTimingParams: -from_pins -to_pins -max_paths 1 -nworst 1 -delay_type min -sort_by group. Copyright 1986-2022 Xilinx, Inc. All Rights Reserved. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | Tool Version : Vivado v.2022.1 (win64) Build 3526262 Mon Apr 18 15:48:16 MDT 2022 | Date : Sun Nov 27 12:14:57 2022 | Host : Rievaulx running 64-bit major release (build 9200) | Command : report_timing -from [get_pins {i_hdmi_encoder/U0/LockLostReset/SyncAsyncx/oSyncStages_reg[1]/C}] -to [get_pins i_hdmi_encoder/U0/ClockSerializer/SerializerMaster/RST] -delay_type min -max_paths 1 -sort_by group -input_pins -routable_nets | Design : zybo_count | Device : 7z010-clg400 | Speed File : -1 PRODUCTION 1.12 2019-11-22 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Timing Report Slack (MET) : 0.271ns (arrival time - required time) Source: i_hdmi_encoder/U0/LockLostReset/SyncAsyncx/oSyncStages_reg[1]/C (rising edge-triggered cell FDPE clocked by clk_out_2_pll {rise@0.000ns fall@20.000ns period=40.000ns}) Destination: i_hdmi_encoder/U0/ClockSerializer/SerializerMaster/RST (rising edge-triggered cell OSERDESE2 clocked by clk_out_2_pll {rise@0.000ns fall@20.000ns period=40.000ns}) Path Group: clk_out_2_pll Path Type: Hold (Min at Fast Process Corner) Requirement: 0.000ns (clk_out_2_pll rise@0.000ns - clk_out_2_pll rise@0.000ns) Data Path Delay: 0.807ns (logic 0.128ns (15.855%) route 0.679ns (84.145%)) Logic Levels: 0 Clock Path Skew: 0.031ns (DCD - SCD - CPR) Destination Clock Delay (DCD): -0.199ns Source Clock Delay (SCD): -0.431ns Clock Pessimism Removal (CPR): 0.201ns Location Delay type Incr(ns) Path(ns) Netlist Resource(s) ------------------------------------------------------------------- ------------------- (clock clk_out_2_pll rise edge) 0.000 0.000 r L16 0.000 0.000 r clk_port (IN) net (fo=0) 0.000 0.000 i_pll/inst/clk_in L16 r i_pll/inst/clkin1_ibufg/I L16 IBUF (Prop_ibuf_I_O) 0.259 0.259 r i_pll/inst/clkin1_ibufg/O net (fo=1, routed) 0.440 0.699 i_pll/inst/clk_in_pll PLLE2_ADV_X0Y1 r i_pll/inst/plle2_adv_inst/CLKIN1 PLLE2_ADV_X0Y1 PLLE2_ADV (Prop_plle2_adv_CLKIN1_CLKOUT1) -2.231 -1.531 r i_pll/inst/plle2_adv_inst/CLKOUT1 net (fo=1, routed) 0.497 -1.034 i_pll/inst/clk_out_2_pll BUFGCTRL_X0Y16 r i_pll/inst/clkout2_buf/I BUFGCTRL_X0Y16 BUFG (Prop_bufg_I_O) 0.026 -1.008 r i_pll/inst/clkout2_buf/O net (fo=110, routed) 0.578 -0.431 i_hdmi_encoder/U0/LockLostReset/SyncAsyncx/PixelClk SLICE_X43Y77 FDPE r i_hdmi_encoder/U0/LockLostReset/SyncAsyncx/oSyncStages_reg[1]/C ------------------------------------------------------------------- ------------------- SLICE_X43Y77 FDPE (Prop_fdpe_C_Q) 0.128 -0.303 r i_hdmi_encoder/U0/LockLostReset/SyncAsyncx/oSyncStages_reg[1]/Q net (fo=8, routed) 0.679 0.377 i_hdmi_encoder/U0/ClockSerializer/aRst OLOGIC_X0Y74 OSERDESE2 r i_hdmi_encoder/U0/ClockSerializer/SerializerMaster/RST ------------------------------------------------------------------- ------------------- (clock clk_out_2_pll rise edge) 0.000 0.000 r L16 0.000 0.000 r clk_port (IN) net (fo=0) 0.000 0.000 i_pll/inst/clk_in L16 r i_pll/inst/clkin1_ibufg/I L16 IBUF (Prop_ibuf_I_O) 0.447 0.447 r i_pll/inst/clkin1_ibufg/O net (fo=1, routed) 0.481 0.928 i_pll/inst/clk_in_pll PLLE2_ADV_X0Y1 r i_pll/inst/plle2_adv_inst/CLKIN1 PLLE2_ADV_X0Y1 PLLE2_ADV (Prop_plle2_adv_CLKIN1_CLKOUT1) -2.543 -1.615 r i_pll/inst/plle2_adv_inst/CLKOUT1 net (fo=1, routed) 0.544 -1.071 i_pll/inst/clk_out_2_pll BUFGCTRL_X0Y16 r i_pll/inst/clkout2_buf/I BUFGCTRL_X0Y16 BUFG (Prop_bufg_I_O) 0.029 -1.042 r i_pll/inst/clkout2_buf/O net (fo=110, routed) 0.843 -0.199 i_hdmi_encoder/U0/ClockSerializer/PixelClk OLOGIC_X0Y74 OSERDESE2 r i_hdmi_encoder/U0/ClockSerializer/SerializerMaster/CLKDIV clock pessimism -0.201 -0.400 OLOGIC_X0Y74 OSERDESE2 (Hold_oserdese2_CLKDIV_RST) 0.505 0.105 i_hdmi_encoder/U0/ClockSerializer/SerializerMaster ------------------------------------------------------------------- required time -0.105 arrival time 0.377 ------------------------------------------------------------------- slack 0.271
The pertinent values have been extracted to the table below. Now both the arrival and required times have been increased because we have the true net delays, but the arrival time has been increased much more to solve the hold time.
Time (ns) | Synthesis | Implementation | Difference |
---|---|---|---|
Arrival | -0.214 | 0.377 | 0.591 |
Required | -0.041 | 0.105 | 0.146 |
Hold Time Slack (Arrival - Required) | -0.173 | 0.272 | 0.445 |
Conclusions
Fixing the hold time by adding delay, but this also reduces the setup time slack in the same path. So if your setup time slack is small, you might just be moving the problem. Fixing hold time is more important than fixing setup time. This is because state held in registers must be maintained over subsequent clock cycles without reason to change state. Setup time can be mitigated by reducing the clock frequency, and this is better than an unpredictable design.
References
- Hold time, VLSI Universe
- Setup and hold violations, VLSI Universe
- How to fix hold violations, VLSI Universe
- How to fix hold violation, any general solution?, Xilinx Forum.
- Setup and Hold Slack Explained, ICDesignTips
- Vivado Design Suite User Guide, Implementation, UG904 (v2020.2) February 26, 2021