Explaining Minimum Output Delays
I feel frustrated by my lack of understanding about external timing constraints, and clearly this is shared by several others. The questions I am grappling with are:
- How is value of the minimum output delay derived? What's the drawing that makes min & max output delays feel tangible and real?
- Can these constraints on external pins that specify the delays outside the device affect place & route?
- Timing Diagrams
- Basic Clocking
- PLL Clocking
- Question 1 - How is value of the minimum output delay derived?
- Literature Search
- Summing up
- Question 2 - Can external constraints affect the internal design?
- Conclusions
Timing Diagrams
For the specific purpose of studying output delays, I'll use this diagram below to illustrate arrival and required times and the slack between them. The figures in the diagram are taken from the basic clocking scheme without a PLL provided in the next section. The main point of this diagram is to show how the arrival and required times fall for hold and then setup on the external flip-flop. As usual, the minimum delay times are used for hold time calculations relative to the same clock, and the maximum delay times are used for the setup time calculation relative to the following clock.

Eli Billaur provides a description of how Vivado uses the values in his blog Vivado's timing analysis on set_input_delay and set_output_delay constraints. He explains that as with all hold times, the fast process minimum delays are used for timing analysis, and relative to the same active clock edge. I note his example does not seem to include much of a destination clock path. In my second example below, because I use a PLL in the trivial design, I have a destination clock path through the same primitives as for the source clock path, but with different delays. That seemed odd initially until I realised the tool was using the fast process maximum delay values for that path. So that's the timing analysis explained for each of the input and output delays with -min and -max.
Basic Clocking
This design is used as it removes a complication introduced when using PLLs. PLLs introduce a negative delay which makes drawing the diagrams a little more complicated.

set slack 0.2
create_clock -name clk_no_pll -period 10.0 [get_ports {clk_no_pll}]
set_input_delay -clock {clk_no_pll} -max [expr 11.311 - $slack] -rise [get_ports {i_no_pll}]
set_input_delay -clock {clk_no_pll} -min [expr -1 * (-4.375 - $slack)] -rise [get_ports {i_no_pll}]
# Max at Slow Process
set_output_delay -clock {clk_no_pll} -max [expr 2.219 - $slack] -rise [get_ports {o_no_pll}]
# Min at Fast process
set_output_delay -clock {clk_no_pll} -min [expr -1 * (3.079 - $slack)] -rise [get_ports {o_no_pll}]
set_property IOB true [get_cells {o_no_pll_reg d_no_pll_reg}]
Slack (MET) : 0.200ns (required time - arrival time) Source: o_no_pll_reg/C (rising edge-triggered cell FDRE clocked by clk_no_pll {rise@0.000ns fall@5.000ns period=10.000ns}) Destination: o_no_pll (output port clocked by clk_no_pll {rise@0.000ns fall@5.000ns period=10.000ns}) Path Group: clk_no_pll Path Type: Max at Slow Process Corner Requirement: 10.000ns (clk_no_pll rise@10.000ns - clk_no_pll rise@0.000ns) Data Path Delay: 2.861ns (logic 2.861ns (100.000%) route 0.000ns (0.000%)) Logic Levels: 1 (OBUF=1) Output Delay: 2.019ns Clock Path Skew: -4.885ns (DCD - SCD + CPR) Destination Clock Delay (DCD): 0.000ns = ( 10.000 - 10.000 ) Source Clock Delay (SCD): 4.885ns Clock Pessimism Removal (CPR): 0.000ns Clock Uncertainty: 0.035ns ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE Total System Jitter (TSJ): 0.071ns Total Input Jitter (TIJ): 0.000ns Discrete Jitter (DJ): 0.000ns Phase Error (PE): 0.000ns Location Delay type Incr(ns) Path(ns) Netlist Resource(s) ------------------------------------------------------------------- ------------------- (clock clk_no_pll rise edge) 0.000 0.000 r R22 0.000 0.000 r clk_no_pll (IN) net (fo=0) 0.000 0.000 clk_no_pll R22 r clk_no_pll_IBUF_inst/I R22 IBUF (Prop_ibuf_I_O) 0.839 0.839 r clk_no_pll_IBUF_inst/O net (fo=1, routed) 2.140 2.979 clk_out_no_pll_OBUF BUFGCTRL_X0Y0 r clk_out_no_pll_OBUF_BUFG_inst/I BUFGCTRL_X0Y0 BUFG (Prop_bufg_I_O) 0.120 3.099 r clk_out_no_pll_OBUF_BUFG_inst/O net (fo=3, routed) 1.786 4.885 clk_out_no_pll_OBUF_BUFG OLOGIC_X0Y2 FDRE r o_no_pll_reg/C ------------------------------------------------------------------- ------------------- OLOGIC_X0Y2 FDRE (Prop_fdre_C_Q) 0.415 5.300 r o_no_pll_reg/Q net (fo=1, routed) 0.000 5.300 o_no_pll_OBUF R18 r o_no_pll_OBUF_inst/I R18 OBUF (Prop_obuf_I_O) 2.446 7.746 r o_no_pll_OBUF_inst/O net (fo=0) 0.000 7.746 o_no_pll R18 r o_no_pll (OUT) ------------------------------------------------------------------- ------------------- (clock clk_no_pll rise edge) 10.000 10.000 r clock pessimism 0.000 10.000 clock uncertainty -0.035 9.965 output delay -2.019 7.946 ------------------------------------------------------------------- required time 7.946 arrival time -7.746 ------------------------------------------------------------------- slack 0.200
The timing report above can be illustrated with the following blocks of delays. Due to the desire to fit text in boxes, the diagrams are not easily drawn to scale! It shows the output delay being part of the "required time".

Slack (MET) : 0.200ns (arrival time - required time) Source: o_no_pll_reg/C (rising edge-triggered cell FDRE clocked by clk_no_pll {rise@0.000ns fall@5.000ns period=10.000ns}) Destination: o_no_pll (output port clocked by clk_no_pll {rise@0.000ns fall@5.000ns period=10.000ns}) Path Group: clk_no_pll Path Type: Min at Fast Process Corner Requirement: 0.000ns (clk_no_pll rise@0.000ns - clk_no_pll rise@0.000ns) Data Path Delay: 1.467ns (logic 1.467ns (100.000%) route 0.000ns (0.000%)) Logic Levels: 1 (OBUF=1) Output Delay: -2.879ns Clock Path Skew: -1.647ns (DCD - SCD - CPR) Destination Clock Delay (DCD): 0.000ns Source Clock Delay (SCD): 1.647ns Clock Pessimism Removal (CPR): -0.000ns Clock Uncertainty: 0.035ns ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE Total System Jitter (TSJ): 0.071ns Total Input Jitter (TIJ): 0.000ns Discrete Jitter (DJ): 0.000ns Phase Error (PE): 0.000ns Location Delay type Incr(ns) Path(ns) Netlist Resource(s) ------------------------------------------------------------------- ------------------- (clock clk_no_pll rise edge) 0.000 0.000 r R22 0.000 0.000 r clk_no_pll (IN) net (fo=0) 0.000 0.000 clk_no_pll R22 r clk_no_pll_IBUF_inst/I R22 IBUF (Prop_ibuf_I_O) 0.103 0.103 r clk_no_pll_IBUF_inst/O net (fo=1, routed) 0.863 0.966 clk_out_no_pll_OBUF BUFGCTRL_X0Y0 r clk_out_no_pll_OBUF_BUFG_inst/I BUFGCTRL_X0Y0 BUFG (Prop_bufg_I_O) 0.026 0.992 r clk_out_no_pll_OBUF_BUFG_inst/O net (fo=3, routed) 0.655 1.647 clk_out_no_pll_OBUF_BUFG OLOGIC_X0Y2 FDRE r o_no_pll_reg/C ------------------------------------------------------------------- ------------------- OLOGIC_X0Y2 FDRE (Prop_fdre_C_Q) 0.192 1.839 r o_no_pll_reg/Q net (fo=1, routed) 0.000 1.839 o_no_pll_OBUF R18 r o_no_pll_OBUF_inst/I R18 OBUF (Prop_obuf_I_O) 1.275 3.114 r o_no_pll_OBUF_inst/O net (fo=0) 0.000 3.114 o_no_pll R18 r o_no_pll (OUT) ------------------------------------------------------------------- ------------------- (clock clk_no_pll rise edge) 0.000 0.000 r clock pessimism 0.000 0.000 clock uncertainty 0.035 0.035 output delay 2.879 2.914 ------------------------------------------------------------------- required time -2.914 arrival time 3.114 ------------------------------------------------------------------- slack 0.200
Again, the timing report above can be illustrated with the following blocks of delays.

PLL Clocking

The external constraints below have been amended from the previous ones in order to provide the same 0.2 ns slack.
set slack 0.2
set clk_for_pll [get_clocks -of_objects [get_port {clk_for_pll}]]
set_input_delay -clock $clk_for_pll -max [expr 7.071 - $slack] -rise [get_ports {i_pll}]
set_input_delay -clock $clk_for_pll -min [expr -1 * (0.349 - $slack)] -rise [get_ports {i_pll}]
# Max at Slow Process
set_output_delay -max [expr 3.240 - $slack] -rise [get_ports {o_pll}]
# Min at Fast process
# Making this more negative increases the arrival time. Doing so will also causes a hold time violation on the launch register.
# Assume that's because the Q output from the launch flop will need to change sooner.
set_output_delay -min [expr -1 * (2.782 - $slack)] -rise [get_ports {o_pll}]
set_property IOB true [get_cells {o_pll_reg d_pll_reg}]
Slack (MET) : 0.200ns (required time - arrival time) Source: o_pll_reg/C (rising edge-triggered cell FDRE clocked by clk_out_pll {rise@0.000ns fall@5.000ns period=10.000ns}) Destination: o_pll (output port clocked by clk_out_pll {rise@0.000ns fall@5.000ns period=10.000ns}) Path Group: clk_out_pll Path Type: Max at Slow Process Corner Requirement: 10.000ns (clk_out_pll rise@10.000ns - clk_out_pll rise@0.000ns) Data Path Delay: 2.834ns (logic 2.834ns (100.000%) route 0.000ns (0.000%)) Logic Levels: 1 (OBUF=1) Output Delay: 3.040ns Clock Path Skew: -3.848ns (DCD - SCD + CPR) Destination Clock Delay (DCD): -5.468ns = ( 4.532 - 10.000 ) Source Clock Delay (SCD): -2.216ns Clock Pessimism Removal (CPR): -0.596ns Clock Uncertainty: 0.077ns ((TSJ^2 + DJ^2)^1/2) / 2 + PE Total System Jitter (TSJ): 0.071ns Discrete Jitter (DJ): 0.138ns Phase Error (PE): 0.000ns Location Delay type Incr(ns) Path(ns) Netlist Resource(s) ------------------------------------------------------------------- ------------------- (clock clk_out_pll rise edge) 0.000 0.000 r P23 0.000 0.000 r clk_for_pll (IN) net (fo=0) 0.000 0.000 pll_i/inst/clk_in P23 r pll_i/inst/clkin1_ibufg/I P23 IBUF (Prop_ibuf_I_O) 0.845 0.845 r pll_i/inst/clkin1_ibufg/O net (fo=1, routed) 1.253 2.098 pll_i/inst/clk_in_pll PLLE2_ADV_X0Y0 r pll_i/inst/plle2_adv_inst/CLKIN1 PLLE2_ADV_X0Y0 PLLE2_ADV (Prop_plle2_adv_CLKIN1_CLKOUT0) -8.162 -6.064 r pll_i/inst/plle2_adv_inst/CLKOUT0 net (fo=1, routed) 1.943 -4.121 pll_i/inst/clk_out_pll BUFGCTRL_X0Y1 r pll_i/inst/clkout1_buf/I BUFGCTRL_X0Y1 BUFG (Prop_bufg_I_O) 0.120 -4.001 r pll_i/inst/clkout1_buf/O net (fo=3, routed) 1.785 -2.216 clk_out_pll_OBUF OLOGIC_X0Y3 FDRE r o_pll_reg/C ------------------------------------------------------------------- ------------------- OLOGIC_X0Y3 FDRE (Prop_fdre_C_Q) 0.415 -1.801 r o_pll_reg/Q net (fo=1, routed) 0.000 -1.801 o_pll_OBUF T17 r o_pll_OBUF_inst/I T17 OBUF (Prop_obuf_I_O) 2.419 0.618 r o_pll_OBUF_inst/O net (fo=0) 0.000 0.618 o_pll T17 r o_pll (OUT) ------------------------------------------------------------------- ------------------- (clock clk_out_pll rise edge) 10.000 10.000 r clock pessimism -0.596 9.404 clock uncertainty -0.077 9.327 output delay -3.040 6.287 ------------------------------------------------------------------- required time 0.819 arrival time -0.618 ------------------------------------------------------------------- slack 0.200
Slack (MET) : 0.200ns (arrival time - required time) Source: o_pll_reg/C (rising edge-triggered cell FDRE clocked by clk_out_pll {rise@0.000ns fall@5.000ns period=10.000ns}) Destination: o_pll (output port clocked by clk_out_pll {rise@0.000ns fall@5.000ns period=10.000ns}) Path Group: clk_out_pll Path Type: Min at Fast Process Corner Requirement: 0.000ns (clk_out_pll rise@0.000ns - clk_out_pll rise@0.000ns) Data Path Delay: 1.441ns (logic 1.441ns (100.000%) route 0.000ns (0.000%)) Logic Levels: 1 (OBUF=1) Output Delay: -2.582ns Clock Path Skew: -1.419ns (DCD - SCD - CPR) Destination Clock Delay (DCD): -2.242ns Source Clock Delay (SCD): -0.537ns Clock Pessimism Removal (CPR): -0.286ns Clock Uncertainty: 0.077ns ((TSJ^2 + DJ^2)^1/2) / 2 + PE Total System Jitter (TSJ): 0.071ns Discrete Jitter (DJ): 0.138ns Phase Error (PE): 0.000ns Location Delay type Incr(ns) Path(ns) Netlist Resource(s) ------------------------------------------------------------------- ------------------- (clock clk_out_pll rise edge) 0.000 0.000 r P23 0.000 0.000 r clk_for_pll (IN) net (fo=0) 0.000 0.000 pll_i/inst/clk_in P23 r pll_i/inst/clkin1_ibufg/I P23 IBUF (Prop_ibuf_I_O) 0.109 0.109 r pll_i/inst/clkin1_ibufg/O net (fo=1, routed) 0.503 0.612 pll_i/inst/clk_in_pll PLLE2_ADV_X0Y0 r pll_i/inst/plle2_adv_inst/CLKIN1 PLLE2_ADV_X0Y0 PLLE2_ADV (Prop_plle2_adv_CLKIN1_CLKOUT0) -2.568 -1.956 r pll_i/inst/plle2_adv_inst/CLKOUT0 net (fo=1, routed) 0.738 -1.218 pll_i/inst/clk_out_pll BUFGCTRL_X0Y1 r pll_i/inst/clkout1_buf/I BUFGCTRL_X0Y1 BUFG (Prop_bufg_I_O) 0.026 -1.192 r pll_i/inst/clkout1_buf/O net (fo=3, routed) 0.655 -0.537 clk_out_pll_OBUF OLOGIC_X0Y3 FDRE r o_pll_reg/C ------------------------------------------------------------------- ------------------- OLOGIC_X0Y3 FDRE (Prop_fdre_C_Q) 0.192 -0.345 r o_pll_reg/Q net (fo=1, routed) 0.000 -0.345 o_pll_OBUF T17 r o_pll_OBUF_inst/I T17 OBUF (Prop_obuf_I_O) 1.249 0.904 r o_pll_OBUF_inst/O net (fo=0) 0.000 0.904 o_pll T17 r o_pll (OUT) ------------------------------------------------------------------- ------------------- (clock clk_out_pll rise edge) 0.000 0.000 r clock pessimism 0.286 0.286 clock uncertainty 0.077 0.363 output delay 2.582 2.945 ------------------------------------------------------------------- required time -0.703 arrival time 0.904 ------------------------------------------------------------------- slack 0.200
To try and simplify the analysis when the PLL is added, the following table extracts the various delays from the PLL in the timing report. As you can see, all the values are negative, so that both setup and hold times are shifted to the left in time. It is mainly the setup time that is most affected by the left shift in time.
Timing | Hold (–min) (ns) | Setup (–max) (ns) |
---|---|---|
Arrival | DELAY_FAST_MIN_RISE | DELAY_SLOW_MAX_RISE |
-2.568 | -8.162 | |
Required | DELAY_FAST_MAX_RISE | DELAY_SLOW_MIN_RISE |
-3.153 | -7.407 |
Pulling out the arrival timings only for illustration gives the following diagram. You can see the period between hold and setup has diminished. It becomes confusing to include the required times as they overlap, and clearly the required times cannot be met when the required setup time is before the hold arrival time, or the hold required time is after the setup arrival time. This makes a mockery of the intended slack times, but nevertheless you can specify the constraints still and the tool will work with them even if they are non-sensical. The change time between arrival times 0.116 ns.

Question 1 - How is value of the minimum output delay derived?
Literature Search
Xilinx's Vivado tool offers the following two templates for deriving the output delay values.
# A System Synchronous design interface is a clocking technique in which the same
# active-edge of a system clock is used for both the source and destination device.
#
# dest __________ __________
# clk ____| |__________|
# |
# (trce_dly_max+tsu) <---------|
# (trce_dly_min-thd) <-|
# __ __
# data XXXXXXXXXXXXXXXX__DATA__XXXXXXXXXXXXX
#
set destination_clock <clock_name>; # Name of destination clock
set tsu 0.000; # Destination device setup time requirement
set thd 0.000; # Destination device hold time requirement
set trce_dly_max 0.000; # Maximum board trace delay
set trce_dly_min 0.000; # Minimum board trace delay
set output_ports <output_ports>; # List of output ports
# Output Delay Constraint
set_output_delay -clock $destination_clock -max [expr $trce_dly_max + $tsu] [get_ports $output_ports];
set_output_delay -clock $destination_clock -min [expr $trce_dly_min - $thd] [get_ports $output_ports];
# Source synchronous output interfaces can be constrained either by the max data skew
# relative to the generated clock or by the destination device setup/hold requirements.
#
# Setup/Hold Case:
# Setup and hold requirements for the destination device and board trace delays are known.
# Setup and hold requirements for the destination device and board trace delays are known.
#
# forwarded ____ ___________________
# clock |____________________| |____________
# |
# tsu | thd
# <---------->|<--------->
# ____________|___________
# data @ destination XXXXXXXXX________________________XXXXX
#
# Example of creating generated clock at clock output port
# create_generated_clock -name <gen_clock_name> -multiply_by 1 -source [get_pins <source_pin>] [get_ports <output_clock_port>]
# gen_clock_name is the name of forwarded clock here. It should be used below for defining "fwclk".
set fwclk <clock-name>; # forwarded clock name (generated using create_generated_clock at output clock port)
set tsu 0.000; # destination device setup time requirement
set thd 0.000; # destination device hold time requirement
set trce_dly_max 0.000; # maximum board trace delay
set trce_dly_min 0.000; # minimum board trace delay
set output_ports <output_ports>; # list of output ports
# Output Delay Constraints
set_output_delay -clock $fwclk -max [expr $trce_dly_max + $tsu] [get_ports $output_ports];
set_output_delay -clock $fwclk -min [expr $trce_dly_min - $thd] [get_ports $output_ports];
I think it is easily understood that the maximum output delay is the sum of the trace delay and the external register's setup time. These two templates tell us that the minimum output delay to be specified is the minimum trace delay minus the hold time of the destination register. But why? Where is this derived from?
Output_delay with –min delay is a bit more tricky, and explaining it was the main motivation for writing this blog article.
In general, there will be some time of flight. Especially as designs become larger, the signals will have to travel long distances from one block to another.
Because of this delay on the data path, the HOLD requirement will mostly be met on the destination flop. Thus, the min specification is not given that much importance.
Output Delay on Xilinx's support forum.
Not sure the following is actually much help!
The -min value works similarly stating that the external delay could be as short as -1ns. Since our hold relationship between the clocks is 0ns, the data must get across the interface in 0ns. So if the external delay is -1ns, then the FPGA must be at least +1ns to meet timing. In this case, the min value usually matches up to the negative of the external devices hold relationship. So, if your external device had a hold relationship of 1.1ns, and the board trace delay could be as fast as 0.1ns, then the external delay is (-1.1 + 0.1) = -1ns.
set_output_delay explained for dummies, Intel Coummunity
The set_output_delay constraint is not used to adjust a delay in order to make a path pass timing analysis. Delays outside the FPGA that need to be described by set_output_delay are:
- t_pxd = circuit board trace delay for the data
- t_pxc = circuit board trace delay for the clock
- t_sux = setup time for the external register that receives the clock and data
- t_hdx = hold time for the external register that receives the clock and data
These quantities are used in a pair of set_output_delay constraints as follows:
set_output_delay -clock FCLK -max t_max [get_ports data_out] set_output_delay -clock FCLK -min t_min [get_ports data_out]
where:
t_max = max(t_pxd - t_pxc) + t_sux t_min = min(t_pxd - t_pxc) - t_hdxSetting output delay on Xilinx's support forum.
The reason that the set_output_delay -min requires the negative value is the direction of time when describing these values. The set_input_delay values are forward propagation delays - moving forward in time; a \+2 means 2ns later. For the min and max of the set_input_delay, this is the correct direction.
For the set_output_delays, they are "backward" propagations; they are subtracted off the arrival time of the clock. For them, a positive value is a negative time propagation. Thus if the set_output_delay -max is 2ns, then the data must be ready 2ns earlier than the clock. This direction is consistent with the setup measure, since it, too, is measured moving backward in time (from the clock edge toward earlier times). For the hold time, though, it is moving the other way - a positive hold time is moving toward later times - but since the set_output_delay is moving toward earlier times, the hold time needs to be negated.
Significance of set_output_delay -max/-min negative and positive values?, AMD Support Forum
Defining Output Delays in UltraFast Design Methodology Guide for Xilinx FPGAs and SoCs, UG949 has a promising diagram that explains the paths with clock skew for source and destination registers. It has a timing diagram to show the minimum and maximum output delays, but not quite enough to illustrate all the ingredients required to fully understand the Vivado templates.
Eli Billaur's description of Vivado timing analysis does not explain the often quoted formula for the value to be used for minimum output delay. Another of Eli's blogs, I/O timing constraints in SDC syntax, explains that the minimum output delay specified the negative hold time of the external register. This would be true if the trace delay was 0 ns, and I'm still missing the derivation or picture that explains this result.
An indirectly useful lead came from set_output_delay explained for dummies on the Intel support forum for Altera device users. It referenced a now missing document by Ryan Scoville, which thankfully has been copied elsewhere on the Internet. "TimeQuest User Guide" from December 2010, pages 19-20, is perhaps the best explanation so far. I did not fully appreciate the explanation without disecting a timing report from Vivado first. It too states the formula for minimum output delay without explanation. An interesting sounds bite by conclusions is:
One thing to note is that, as the difference between -max and -min values grows, the more difficult it is for the FPGA to meet timing.
TimeQuest User Guide, Ryan Scoville
Summing up
The trace delays used depend on how the clocks are distributed to source and destination flops in the timing analysis. As illustrated above, the different destination clock path change affects the forumulae quoted by literature. Note that Vivado's XDC template for source synchronous clocks is the same as for system synchronous, so one of them must be wrong! Hence the deviation in the illustrations above.
In all of this what seems to be missing is the simplest of relatable diagrams to show the relationships being compared and the overall timing. We need to account for the difference in data path (plus source clock path) and destination clock path. I conclude that difference for minimum output delays is that the clock trace has extra time over the data trace because the data path must be held TH longer. Is that right?

Question 2 - Can external constraints affect the internal design?
Timing data for the table below can be extracted using the following two TCL commands:
report_timing -through [get_nets {i_no_pll_reg}] -delay_type min_max -sort_by group -input_pins -routable_nets -name {No PLL Input Slack}
report_timing -through [get_nets {o_no_pll_reg}] -delay_type min_max -sort_by group -input_pins -routable_nets -name {No PLL Output Slack}
report_timing -through [get_nets {i_pll_reg}] -delay_type min_max -sort_by group -input_pins -routable_nets -name {PLL Input Slack}
report_timing -through [get_nets {o_pll_reg}] -delay_type min_max -sort_by group -input_pins -routable_nets -name {PLL Output Slack}
The results in this table are explained below.
IOBs | Delay | Packed | Unpacked | Unpacked with phys_opt_design |
---|---|---|---|---|
Input | min | 0.000 | 0.371 | 0.010 |
max | 0.000 | -0.755 | 0.540 | |
Output | min | 0.000 | 1.155 | 1.155 |
max | 0.000 | -0.862 | -0.862 |
This simple experiment shows that the external constraints do not force IOB packing of registers. IOB packing is off by default and you must enable this explicitly through attributed or constraints (see below). When the edge registers are not packed into IOBs, there is room for external constraints to affect place and route of combinatorial logic or paths that do not end with a IOB packable register. Taking the design without IOB packing and running phys_opt_design, the implementation results drastically change the clock buffer structure to create a new clock with a greater delay, and succeeds in improving the timing of the input negative setup slack, although not sufficiently to eradicate the output's negative setup slack. This shows that external constraints can affect the internal design and timing.

If the design is slackened off with the following replacement constraints:
set_input_delay -clock $clk_in -max 4.0 [get_ports {i}]
set_output_delay -clock $clk_out -max 2.0 [get_ports {o}]
Then the timing is met without IOB packing. phys_opt_design has no effect, leaving the simple clock structure unaffected.
Specifying IOB packing:
set_property IOB true [get_cells {o_pll_reg d_pll_reg}]Values:
- TRUE: Place a connected register into the I/O Block.
- FALSE: Do not place the specified register into the I/O Block (default).
Searching online, there appears to be confusion in the discussions on whether external constraints affect internal placement. Some references are made to optimisation steps that only discuss logic on paths fully internal to the device, and hence are not explicitly applicable to paths going to I/O which need to include a delay specification for the external section.
The set_output_delay constraint is not used to adjust a delay in order to make a path pass timing analysis.
Setting output delay on Xilinx's support forum.
OK so to be precise is that if you will do the external interface correctly then it won't affect implementation. By correctly I mean you will use the input/output delays and use IOB resources (IOB flops, IDDR, ODDR, ISERDES, ISERDES) as they have fixed placement and clocking resources. If you won't use those resources then yes, it can affect implementation and Vivado may move things around but that's not recommended way of doing external interfaces.
input and output delay constraints on reddit
From this we can conclude that the tighter external timing constraints do affect the place and route results. Perhaps these paths ought to be examined more carefully and treated for IOB packing for best and consistent I/O timing?
Conclusions
The minimum output delay constraint seems to cause most brain pain, certainly it did for me, but I wasn't the only person looking for an explanation of how it worked. Analysis of Vivado timing reports yields a relatable picture, that does not seem to match other people's attempts to explain output delays.
External constraints can affect the internal placement and routing of a design. However I suggest that if that's the case, more effort needs to be put into packing registers into IOBs which is intended to yield the best I/O timing results. Once this is done, the external constraints no longer affect the internal timing and implementatino results should be stable and repeatable.