I see forum posts asking what the maximum clock speed of an FPGA is, or even of a particular device. The replies are suitably polite given "it depends" not least because the tools stop once they meet the required timing goal. But what if the question is "if I can run on device X at xx MHz, how fast might I go on device Y?" Okay the real answer is target the device with your design and perform timing analysis. But in reality that might not be convenient for a range of reasons, and one might only be looking for a ballpark estimate of which device family to upgrade to. What if there was a way to perform some form of comparison?
Method
Essentially we're going to compare the timing of a simple design across multiple devices. For my candidate design I have created a shift register chain with a single LUT between the registers. I am expecting this design to avoid the issues of the tools stopping once timing has been met as there is no optimisation to perform. I want static timing analysis to tell me the path delays from a register/Q pin through a LUT to a destination register/D pin. A concern was that in order to implement the design I would need to anchor the logic each end to pads to prevent logic optimisation reducing the design to nothing. What I want to avoid is any pollution of that analysis by timing of the input and output pads. The internal timing is intended to tell me how much slack I might have in a device, and might I expect to get a faster clock speed after implementation in a different device?
One such simple candidate design might be register → inverter → register. If you try this, you will discover that logic optimisation removes all, or all but one of the inverters. Instead I propose a slightly different 'nonsense' design. I have not even bothered to simulate it as I do not care what it might produce, I just want a simple repeatable structure.

The VHDL for this design avoiding optimisation is described by the following code.
library ieee;
use ieee.std_logic_1164.all;
entity clock_speed is
generic (
clk_wiz_g : natural;
length_g : positive := 255
);
port (
clk_ext : in std_logic;
input : in std_logic;
output : out std_logic := '0'
);
end entity;
architecture rtl of clock_speed is
signal clk : std_logic := '0';
signal input_r : std_logic := '0';
signal vector : std_logic_vector(length_g-1 downto 0) := (others => '0');
begin
-- Artix-7 and Spartan-7 families
cw : if clk_wiz_g = 1 generate
-- MMCME2_ADV primitive
mmcm_i : entity work.clk_wiz_1
port map (
clk_in1 => clk_ext,
reset => '0',
clk_out1 => clk
);
else generate
-- MMCME3_ADV primitive
mmcm_i : entity work.clk_wiz_0
port map (
clk_in1 => clk_ext,
reset => '0',
clk_out1 => clk
);
end generate;
process(clk)
begin
if rising_edge(clk) then
input_r <= input;
(output, vector) <= (vector & input_r) XOR ('1' & vector);
end if;
end process;
end architecture;
During attempts to silence setup and hold time violations on the clock, input and output pads I decided to include a clock management component, the standard MMCM in order to multiply up the clock. This threw up an interesting 'feature' of both devices and the IP Core generator. The first is that the limit of the internal clock speed is limited by the clock buffers, e.g. the global clock buffer, BUFG. The limit of this primitive can be extracted from device data sheets.

The appropriate value from the BUFG row is used by the IP Core generator, it is both displayed to the user as the maximum specifiable clock speed, and checked against the user's requested clock speed in order to show an error message. I have not been able to find a Vivado TCL command that will extract the data from the part library or timing results for automatic tabulation.
The "Safe Clock Startup feature" must be disabled as I had specified (a rather racey) clock speed of 400 MHz. The default results include this setting and create a timing issue in the MMCM core.
"The Safe Clock Startup feature enables a stable and valid clock at the output. Enabling the Sequencing feature provides sequenced output clocks."
Clocking Wizard v6.0 LogiCORE IP Product Guide, Xilinx PG065
"For UltraScale™ or UltraScale+™ devices, when the Safe Clock Startup feature is enabled and the clocks are operating at more than 300 MHz frequency, the design might not meet timing."
Clocking Wizard v6.0 LogiCORE IP Product Guide, Xilinx PG065
Note that Artix-7 and Spartan-7 devices use a different version of the MMCM primitive, MMCME2_ADV instead of MMCME3_ADV. Once the IP is generated for one primitive, one cannot use the TCL command upgrade_ip to make the IP Core compatible with the other. Hence, the need for two MMCM IP Cores, conditionally instantiated in VHDL. The VHDL is passed a generic value so that the MMCM instantiation can be swapped by the TCL script.
I gave up specifying input and output delays on the pads. In order to silence the timing analysis on the pads I had chosen impossible values where the minimum was more than the maximum delay and I decided a simpler solution was to disable timing with false path constraints. Finally I specified some pin constraints, but quickly decided if I left the pins unallocated the tool would do a decent enough job of the allocation for me and the solution was device independent. In particular I was unable to find a suitable non-differential pair pad for the clock and did not have the patience to locate one.
# Hold on input
set_input_delay -clock [get_clocks {clk_ext}] -min 0.0 [get_ports {input}]
# Setup on input
set_input_delay -clock [get_clocks {clk_ext}] -max 0.0 [get_ports {input}]
# Hold on output
set_output_delay -clock [get_clocks {clk_ext}] -min 0.0 [get_ports {output}]
# Setup on output
set_output_delay -clock [get_clocks {clk_ext}] -max -0.0 [get_ports {output}]
set_false_path -to [get_cells {input_r_reg}]
set_false_path -from [get_cells {output_reg}]
Before using the false path constraints I had packed the end registers into the I/O buffers (IBUF & OBUF). It turns out that at high clock frequencies you can get a timing issue with pulse width (as opposed to setup or hold) timing.
However, a pulse width problem usually means that the frequency of a clock has exceeded the FMAX/FMIN specifications for a clocking component in your design. Refer to the datasheet for your device and check FMAX/FMIN for MMCMs, PLLs, BUFGs, BUFRs, etc.
What is Pulse Width Slack? How to calculate? How to rectify if negative slack occurs?
This is the first example I have seen of a negative consequence of IOB packing, and the constraint (set_property IOB TRUE <register>) was removed. The static timing analysis results now provide the amount of slack for a 400 MHz clock. To calculate an estimate of fmax, I use the method proposed by Eli Billauer at Vivado: Finding the "maximal frequency" after synthesis. The clock period less the slack provides the minimum clock period or the maximum frequency. Having got a design to provide results for one device, the next part is to automate working through all the parts available in order to build up the bigger picture of relative timing, or highest attainable clock speed.
set srcdir {path1/speed_test}
# Path and prefix to Vivado project
set destdir {path2/speed_test/speed_test}
# source -notrace ${srcdir}/parts.tcl
set resfilename "${srcdir}/fmax.txt"
set projdir [get_property DIRECTORY [current_project]]
proc fmax {} {
set tp [get_timing_paths -max_paths 1 -nworst 1 -setup]
set maxSetup [get_property SLACK $tp]
set maxClkPeriod [expr [get_property REQUIREMENT $tp] - $maxSetup]
# MHz divide by ns * MHz => (1e-9 * 1e6) = 1e-3
return [expr 1e3 / $maxClkPeriod]
}
# ERROR: [Common 17-577] Internal error: No controller created for part xc7z007sclg400-2. Maximum number of controllers reached!
# WARNING: [Device 21-150] Attempt to create more than 16 speed controllers
#
# Need to have a restartable list as only about 11 devices are tested before Vivado crashes with a part cache error. The remedy
# is to restart Vivado and pick up.
set parts [get_parts -filter {(SPEED == -2) && (TEMPERATURE_GRADE_LETTER == I || TEMPERATURE_GRADE_LETTER == "")}]
set devices {}
set cnt 0
set total [llength $parts]
# Initialise this from the existing results file
set devices_done {}
if {[file exists $resfilename]} {
set resfile [open $resfilename r]
set linenum 0
while {[gets $resfile line] >= 0} {
if {$linenum > 0} {
if {[string length $line] > 0} {
set p [get_parts -quiet [lindex [split $line ","] 0]]
if {[string length $p] > 0} {
set key "[get_property DEVICE $p][get_property SPEED $p]"
lappend devices_done $key
}
}
}
incr linenum
}
close $resfile
puts "NOTE: devices_done = $devices_done"
set resfile [open $resfilename a+]
} else {
puts "NOTE: Checking all devices"
set resfile [open $resfilename a+]
puts $resfile "Part,Architecture,Device,Speed,Temperature,Flip-flops,LUTs,DSP,BlockRAMs,Slices,Fmax (MHz)"
flush $resfile
}
foreach i $parts {
incr cnt
set key "[get_property DEVICE $i][get_property SPEED $i]"
# Only try each sort of device once, the results are similar
if {[lsearch -exact $devices_done $key] == -1} {
foreach ip [get_ips] {
set_property is_enabled true [get_files [get_property IP_FILE $ip]]
}
# clk_wiz_1 for Artix-7 or Spartan-7, clk_wiz_0 otherwise
# NB. The VHDL file does not automatically switch the instance used.
if {[get_property ARCHITECTURE_FULL_NAME $i] == "Artix-7" ||
[get_property ARCHITECTURE_FULL_NAME $i] == "Spartan-7" ||
[get_property ARCHITECTURE_FULL_NAME $i] == "Kintex-7"} {
set clkwiz [get_ips clk_wiz_1]
set_property generic clk_wiz_g=1 [current_fileset]
set_property is_enabled false [get_files [get_property IP_FILE [get_ips clk_wiz_0]]]
} else {
set clkwiz [get_ips clk_wiz_0]
set_property generic clk_wiz_g=0 [current_fileset]
set_property is_enabled false [get_files [get_property IP_FILE [get_ips clk_wiz_1]]]
}
set_property CUSTOMIZED_DEFAULT_IP_LOCATION "${projdir}/[file tail $projdir].gen/sources_1/ip/${clkwiz}" [current_project]
set_property -quiet part $i [current_project]
upgrade_ip \
-vlnv xilinx.com:ip:clk_wiz:6.0 \
-log ip_upgrade.log \
-quiet \
$clkwiz
reset_target -quiet all $clkwiz
# Can't extract the maximum BUFG frequency for CLKOUT1_REQUESTED_OUT_FREQ
set_property -dict [list \
CONFIG.PRIM_IN_FREQ {100.000} \
CONFIG.CLKOUT1_REQUESTED_OUT_FREQ {400.000} \
CONFIG.USE_LOCKED {false} \
] $clkwiz
validate_ip -save_ip $clkwiz
generate_target -quiet -force all $clkwiz
export_ip_user_files \
-of_objects $clkwiz \
-no_script \
-sync \
-force \
-quiet
create_ip_run -force $clkwiz
reset_run -quiet ${clkwiz}_synth_1
launch_runs ${clkwiz}_synth_1 -jobs 1
export_simulation \
-of_objects $clkwiz \
-directory ${destdir}.ip_user_files/sim_scripts \
-ip_user_files_dir ${destdir}.ip_user_files \
-ipstatic_source_dir ${destdir}.ip_user_files/ipstatic \
-lib_map_path [list \
{modelsim=${destdir}.cache/compile_simlib/modelsim} \
{questa=${destdir}.cache/compile_simlib/modelsim} \
{riviera=${destdir}.cache/compile_simlib/riviera} \
{activehdl=${destdir}.cache/compile_simlib/activehdl} \
] \
-use_ip_compiled_libs \
-force \
-quiet
reset_run -quiet synth_1
launch_runs impl_1
# 'wait_on_runs' replaces 'wait_on_run' by Vivado version 2023.2
wait_on_runs impl_1
open_run impl_1
lappend devices_done $key
puts $resfile "$i,[get_property ARCHITECTURE_FULL_NAME $i],[get_property DEVICE $i],[get_property SPEED $i],[get_property TEMPERATURE_GRADE_LETTER $i],[get_property FLIPFLOPS $i],[get_property LUT_ELEMENTS $i],[get_property DSP $i],[get_property BLOCK_RAMS $i],[get_property SLICES $i],[fmax]"
flush $resfile
close_design
puts "NOTE: $i tested, $cnt of $total completed."
} else {
puts "NOTE: $i skipped, $cnt of $total completed."
}
}
close $resfile
At the start of the script, there's a section of code that reads the current results file and initialises a cache of previous results. this is because after about 11 device part changes Vivado will crash. The only fix is to close Vivado completely and restart it. Closing the project and re-opening is insufficient, but would have been easier to script 😒. Clearly Vivado was never intended to be used this way. So after each crash the TCL script is able to pick up where it left off.
There are 1161 available parts in a licensed version of Vivado 2020.1, all speed grades, all packages, all families. I decided to down select to one speed grade, used in a project I cared about, and only test one package of that speed grade in each of the other families. This reduced the work load considerably given each test requires both synthesis and implementation. I believe that the libraries containing each primitive's timing for each device are updated over time as confidence increases, therefore I expect later versions of Vivado to provide slightly higher values of fmax. My initial run of the script was performed on a free version of Vivado, version 2023.2. But being free, it is limited by the available devices. So to augment the devices I filled in some blanks with a licensed version at work, version 2020.1. Yes that's a little behind, but we're mid project… The results are therefore from a mix of two versions of Vivado, but as we're only talking about estimates at present to test the water I'll overlook that. Ideally I would install the latest version of Vivado in a licensed environment and pick up all the latest devices like Versal ACAP. Maybe another time…
Results
The full results are presented in a spreadsheet to make it easier to select rows based on your criteria. For summary here, the maximum fmax per family is plotted.
Device Family | Date | Minimum fmax (MHz) | Maximum fmax (MHz) |
---|---|---|---|
Artix-7 | Jun-10 | 643 | 752 |
Kintex-7 | Jun-10 | 898 | 1153 |
Virtex-7 | Jun-10 | 837 | 1096 |
Zynq-7000 | Mar-11 | 719 | 1140 |
Kintex UltraScale | Dec-13 | 891 | 1241 |
Virtex UltraScale | Dec-13 | 693 | 1325 |
Zynq UltraScale+ | Sep-15 | 1179 | 1623 |
Kintex UltraScale+ | Jan-16 | 1178 | 1590 |
Virtex UltraScale+ | Jan-16 | 1395 | 1616 |
Spartan-7 | May-17 | 733 | 752 |
Zynq UltraScale+ RFSOC | Feb-19 | 1209 | 1504 |

From the spreadsheet, I note that the ratio of registers to LUTs in the device is a constant 2 for all devices. However the ratios of registers per slice and LUTs per slice are often unexpected. Those values have been extracted from the part with TCL code. Before Ultrascale we expect the {register, LUT} to slice ratio to be {8, 4} and after Ultrascale to be {16, 8}. These ratios are often seen, but as often not seen. The slice counts must be wrong, and I assume never corrected in the parts library because they are never used in a way that makes much difference for synthesis by Vivado.
Conclusions
The chart demonstrates the relationship between Kintex and Virtex devices, with their fmax being similar and hence reinforces they are produced on the same fabrication process. This reminds us that Virtex devices differentiate themselves from Kintex by premium features (and price £$) rather than speed. Clock speed has continued to improve at least up to 2014 even through Intel CPUs levelled off in about 2004. This might be because Xilinx have been behind Moore's Law over the long term (apart from a brief claim to be ahead - read "catching up"). Moore's law was holding until at least 2021, doubling transistor count every 2 years. But Xilinx have been doubling their transistor count every 2.2 years, and hence perhaps it's not just their transistor count that is lagging but also their clock speed? (Analysis to back this up these doubling times has been omitted here for brevity.)
The internal clock frequencies charted here are unobtainable since the BUFG primitives cannot achieve them. But the aim here was to provide a means of estimating the clock speed attainable on specific device given a real design on a reference device. For example, given an achievable clock speed of 200 MHz on a xcku060 device, what might we expect from a xczu7ev?
\[ 200 \times \frac{\mbox {xczu7ev}_{f_{max}}}{\mbox {xcku060}_{f_{max}}} = 285.7 \; \mbox {MHz} \]So is this a realistic way of estimating the possible clock frequency attainable? This still needs testing as the method remains just an idea at present.