Low Speed Serial I/O
Eli Billauer provides some advice on his blog for source-synchronous inputs and offers three fundamental solutions. This blog covers a practical example of each of "Strategy #2: Using a PLL" and "Strategy #3: Phase shifting". The aim is to put the theory into practice, mainly because where the FPGA meets external I/O, the situation is no longer simulatable and requires a real device to verify correctness. There are also questions about setting up the constraints file with input and output delays that require some thought and again such constraints are hard to verify as correct.
- Test Setup
- Development Board
- Simple Sample Test Results
- Phase Delay
- Variable Phase Delay
- Conclusions
- References
Test Setup

The basic plan is shown above. Create a random serial stream of data from a PRBS. Here I use the Multiple Bit Pseudorandom Binary Sequence code from a previous blog in order to pretend I'm being in some way ITU-T standards compliant, or I just need a source of varying serial data bits. The data is then transmitted n-bits wide with its source synchronous clock. On receipt a PLL is used to clean up the clock with an effective zero delay, and a 180° phase shift intended to put the outputted clock's rising edge into the eye of the data so that it can be captured cleanly by a register. The captured data is fed into a FIFO in order to cross the clock domain back to the internal clock where the data can be compared with a time delayed copy of the PRBS to check the data integrity.
If the data is in error it will be wrong with a probability of 0.5. This means both the correct and wrong LEDs will glimmer. Therefore a surer test of correctness is that the LED for "wrong" is completely off. Hence the additional assurance provided by the use of both LEDs.
By packing the last registers in the launch path and the first registers in the capture path into the IOBs the slack in the data flight off and back on the FPGA is maximised. A fuller explanation of the benefits of IOB registers is provided by another Eli Billauer blog.
IOB directs the Vivado tool to place a register that is connected to the specified port into the input or output logic block. Place this attribute on a port, connected to a register that you want to place into the I/O block.
IMPORTANT: With this property set to TRUE, the Vivado placer will only place the register into the IOB. The tool will not move the flop out of the IOB to improve timing since the IOB constraint takes precedence.
Vivado Design Suite Properties Reference Guide, UG912 (v2022.1) June 8, 2022
Development Board

This test has been carried out on a Zybo (legacy) development board using a PMOD connector. There are 8 user defined pins, or four pairs with transmit and receive. Of the four pairs, one must the be source synchronous clock, leaving 3 pairs for testing data. A very rough and ready setup uses unshielded wires to loop back the source synchronous clock and 3 serial transmission data bits. The initial testing was done using the "Standard PMOD" connector as the pins are connected to the Zynq PL via 200Ω series resistors. The series resistors prevent short circuits that can occur if the user accidentally drives a signal that is supposed to be used as an input. Subsequently, after checking no such mistake had been made in my design, I moved to a High-Speed PMOD and retested. This connector has the data signals routed as impedance matched differential pairs for maximum switching speeds, with the traces routed with 80Ω (± 10%) impedence. If used as single-ended, coupled pairs may have significant cross-talk. An option to alleviate this would be to ground one of the signals and use its pair for the single-ended signal. This was not done, and the expectation is that the source synchronous clock will have significant cross-talk to data bit 2 on rx(2).
[Place 30-172] Sub-optimal placement for a clock-capable IO pin and PLL pair. If this sub optimal condition is acceptable for this design, you may use the CLOCK_DEDICATED_ROUTE constraint in the .xdc file to demote this message to a WARNING. However, the use of this override is highly discouraged.
The constraints for which pins can be used means the clock input placement was not optimal. In order to use a PLL on the received clock a (relax) constraint had to be used to accept the non-optimal pin placement and PLL usage.
set_property CLOCK_DEDICATED_ROUTE FALSE <net>
This allows the design to complete implementation in Vivado and hence allows the experiment to continue with the following caveat:
The CLOCK_DEDICATED_ROUTE property is enabled (TRUE) by default, and ensures that clock resource placement DRCs are considered error conditions that must be corrected prior to routing or bitstream generation. CLOCK_DEDICATED_ROUTE=FALSE downgrades the placement DRC to a warning and lets the Vivado router use fabric routing to connect from a clock-capable IO (CCIO) to a global clock resource such as an MMCM.
CAUTION! Setting CLOCK_DEDICATED_ROUTE to FALSE can result in sub-optimal clock delays, resulting in potential timing violations and other issues.
Vivado Design Suite Properties Reference Guide, UG912 (v2022.1) June 8, 2022
The TRUE value is used when the IBUF and MMCM/PLL are in the same Clock Region... A CLOCK_DEDICATED_ROUTE =FALSE would mean that the net can be routed on fabric resources which negatively impacts timing and performance.
Adaptive SoC & FPGA Support Article 75692 Clocking - CLOCK_DEDICATED_ROUTE values and usage

The design allows for a reset of the PRBS sequences as well as enabling the checking of each bit so that it is possible to tell if any individually work. The LED to watch for is the "Errored bits". Even a faint glow here suggests a reception problem as shown above.
Simple Sample Test Results
Simple 180° phase shift for capture in the eye of the data.
PMOD Connector | Highest speed working (MHz) | Lowest speed failure (MHz) |
---|---|---|
Low speed (JE) | 34 | 35 |
High speed (JD) | 64 | 65 |
Phase Delay
Taking further advice from Eli Billauer under the heading of "Strategy #3: Phase shifting", I'll now incorporate the phase delay.

The next step was to vary the phase of the receive PLL's clock at 64 MHz, to find the range of reliable capture points and hence the range of the data eye. The rising edge of the phase delayed clock was able to capture data correctly on all three data bits between 61 and 202° phase shift. This equates to a delay of between Θmin = 2.648 and Θmax = 8.767 ns from the rising edge of the received data clock (clk_rx). In the range 203-250 MHz, rx(2) was wrong when rx(1:0) were correct, confirming the cross talk expected from the source synchronous clock. This is the first confirmation of what the input and output delays should be set to.
[However,] when using dedicated clocking and IOB resources for an interface, all the resources are fixed, and hence cannot be affected by the placer and router (which is a good thing - this leads to the best margins and predictable results). In this case, the set_input_delay and set_output_delay only affect the static timing reporting.
Xilinx Forums: Effects of set_input_delay and set_output_delay
# Affects setup time
set_input_delay -clock [get_clocks clk_rx] -max 8.767 [get_ports {rx[*]}]
# Affects hold time
set_input_delay -clock [get_clocks clk_rx] -min 2.648 [get_ports {rx[*]}]
From this result we can calculate that the simple 180° phase shift was sampling the data at 84% of the distance across the data eye instead of the desired 50%. I am surprised by this as I would have expected that as I decreased the clock period, the sampling point would tend towards the left side of the data eye which remained at a fixed time offset from the clock. I cannot explain this and do not have the test equiment at home to analyse the problem fully.
The next step was to identify the centre of the eye, and test raising the data rate again. In order to keep the receive PLL's clock aligned with the eye of the data the following MMCM IP configuration was used:
# Clock speed (MHz) for the Low Speed Serial IO under test
set lssio_freq 77.000
# For a set delay in the middle of the good capture range, calculate the phase required for the receive PLL's clock.
# NB. period (ns) = 1000 / frequency (MHz)
set lssio_phase [expr 5.707 * $lssio_freq * 360 / 1000]
create_ip \
-name clk_wiz \
-vendor xilinx.com \
-library ip \
-version 6.0 \
-module_name $ip_inst \
-dir $ip_dest_f
set_property \
-dict [list \
CONFIG.PRIMITIVE {PLL} \
CONFIG.PRIM_IN_FREQ $lssio_freq \
CONFIG.CLKOUT1_REQUESTED_OUT_FREQ $lssio_freq \
CONFIG.CLKOUT1_REQUESTED_PHASE $lssio_phase \
CONFIG.USE_SAFE_CLOCK_STARTUP {true} \
CONFIG.FEEDBACK_SOURCE {FDBK_AUTO} \
CONFIG.USE_RESET {false} \
CONFIG.USE_LOCKED {true} \
CONFIG.PRIMARY_PORT {clk_in} \
CONFIG.CLK_OUT1_PORT {clk_out} \
] \
[get_ips $ip_inst]
The clock frequency achieved by this method was 99 MHz and is based on the characterisation of this particular setup. This gives a decent improvement on 64 MHz with the naïve 180° phase shift. Now with more fiddling and extra time it should be possible to re-check the range of the eye and amend the centre point for capture again to squeeze the last bit of performance out. But again it would be a fixed solution that did not flex for temperature and process variation across different development boards.
Variable Phase Delay
Here I extend the solution of "Strategy #3: Phase shifting", to incorporate an adaptive mechanism to find the optimal timing.
The method above is of course very brittle. It is only calibrated for one development board under the test operating conditions. It also treats each data path the same, when the data transmission conditions might be slightly different for each serial bit. Xilinx input pads provide IDELAY components that can be used to dynamically adjust a delay within reason. Firstly the IDELAYE2 component in the Zynq-7000 series is limited to 25 steps of 78 ps. The simulation model indicates a minimum delay of 600 ps, hence the maximum is 3.018 ns, with an adjustable range of 2.418 ns in 32 steps. You'll note this range is not that big for a low speed serial interface.
We can use it by adding a finite state machine to vary the 5-bit delay control from 0 through to 31, and measure the bit error at each offset by dwelling for say 100 clock cycles. If the sampling position is well off it would be about 50% wrong on average, or a 'wrong' count of about 50 for my dwell time. The test here is failure for anything above a wrong count of 0. I use the output of the comparator to track from bad to good to bad again, and take the average of those transition points (a simple sum and shift right one bit) to choose the 'eye' of the data.

So from the point of view of the I/O, we have a 'training' phase to find the eye of the data before a 'checking' phase where the phase offset is held constant in what should be the eye of the data. The FSM here does not then switch to a real data source but just keeps checking the PRBS sequence.
Using this method I was able to raise the transmission clock speed to 101 MHz. A small improvement, but perhaps one that compensates for changing environmental conditions.
Conclusions
The test setup was harsh:
- Poor PLL placement for the chosen clock pin
- Adverse transmission path
- Single-ended signal instead of differential pair
- No grounding wire for sheilding between data bits
This experiment shows the practical advice from Eli Billauer's blog translates into a practical implementation on a Xilinx device. It also takes advantage of the same FPGA being at both ends of the serial transmission which is not a practical scenario. This work also shows the Xilinx IDELAY components have a limited range of delays, perhaps a little small for low speeds and a little course for higher speeds?
References
- Github Source Code
- Source-synchronous inputs, Eli Billauer