01 Signal Sampling
Eli Billauer provides a simple scheme for transferring data into an FPGA on his blog Using 01-signal sampling with source-synchronous inputs. I had not encountered this scheme before, usually relying on Xilinx's SelectIO resources. It is listed at "Strategy #1: 01-signal sampling" on a related I/O blog, and I was wondering why Eli was so keen on this method. The aim here is to put the theory into practice, mainly because where the FPGA meets external I/O, the situation is no longer simulatable and requires a real device to verify correctness.
Design
In short, very similar to the designs in the Low Speed Serial I/O blog. I continue to use a Zybo (legacy) FPGA development board and test for successful data transfer off and back on the same FPGA device using a source synchronous clock.

The source synchronous clock is treated as data. Both clock and data are sampled using a new clock that is about three times faster than the data clock using a clock domain crossing synchroniser. A pulse is generated off the sampled clock to write the data to a FIFO so that it can be read back out in the original clock domain and the serial data generated from the PRBS verified.
Initially I used the rising edge for the transmit data on exit from the FPGA, and then synchronised the data on the rising edge on receipt. Once I re-read "The timing requirements" section of the blog I realised my mistake. By sending on the falling edge and synchronising on the rising edge (still) the maximum data clock frequency performance was improved by 30-45% as measured over an initial test range 100-200 MHz sample frequency.
Sampling Asynchronous Inputs
I remain a little peturbed by the use of a double-flop synchroniser on the received data bus. Usually the data bus situation requires either:
- A synchroniser applied to a 'data valid' control signal an re-sampling in the new clock domain (with appropriate maximum delay constraints), or
- A FIFO
The former can't be used here as there is no 'data valid' or 'toggle' control signal. The latter can't be used as the clock is being treated as data at this point and being sampled. This means we are using a naïve single bit synchroniser for a whole data bus where the consequence is "functional jitter" on the retimed result, i.e. some bits might arrive one cycle later than others. Even if we had a qualifying 'data valid' signal, the former solution would fail as the data is potentially changing too fast. I do not have a better suggestion for the synchroniser solution to use here. I note that by design, the re-sampling of the data bus occurs on the second of the two clock cycles where the functional jitter is resolved.

To illustate the point in the above waveform, rx_r captures some of the functional jitter with a spurious value '5' which is resolved on the subsequent clock cycle where the FIFO's wr_en control signal captures the correct value. As such, as unexpected as it might appear, appropriate risk mitigations have been taken to guard against both meta-stability and functional jitter of the naïve synchroniser being used here.
Limits on PLL settings
When creating two clock outputs from the PLL, you may not get the clock frequencies you request as shown below.

This explains the results below where the 'requested' and 'actual' columns are slightly different. The cause is the choice of multipliers and divisors to satisfy two different outputs, and this is illustrated below.
set pll_inst [get_ips pll]
set clk_set1 [expr \
[get_property CONFIG.PRIM_IN_FREQ $pll_inst] * \
[get_property CONFIG.MMCM_CLKFBOUT_MULT_F $pll_inst] / \
[get_property CONFIG.MMCM_DIVCLK_DIVIDE $pll_inst] / \
[get_property CONFIG.MMCM_CLKOUT0_DIVIDE_F $pll_inst] \
]
set clk_set2 [expr \
[get_property CONFIG.PRIM_IN_FREQ $pll_inst] * \
[get_property CONFIG.MMCM_CLKFBOUT_MULT_F $pll_inst] / \
[get_property CONFIG.MMCM_DIVCLK_DIVIDE $pll_inst] / \
[get_property CONFIG.MMCM_CLKOUT1_DIVIDE $pll_inst] \
]
puts [list Data Clock Frequency\
[get_property CONFIG.CLKOUT1_REQUESTED_OUT_FREQ $pll_inst] ~= \
[get_property CONFIG.PRIM_IN_FREQ $pll_inst] * \
[get_property CONFIG.MMCM_CLKFBOUT_MULT_F $pll_inst] / \
[get_property CONFIG.MMCM_DIVCLK_DIVIDE $pll_inst] / \
[get_property CONFIG.MMCM_CLKOUT0_DIVIDE_F $pll_inst] = $clk_set1 MHz \
]
puts [list Sampling Clock Frequency \
[get_property CONFIG.CLKOUT2_REQUESTED_OUT_FREQ $pll_inst] ~= \
[get_property CONFIG.PRIM_IN_FREQ $pll_inst] * \
[get_property CONFIG.MMCM_CLKFBOUT_MULT_F $pll_inst] / \
[get_property CONFIG.MMCM_DIVCLK_DIVIDE $pll_inst] / \
[get_property CONFIG.MMCM_CLKOUT1_DIVIDE $pll_inst] = $clk_set2 MHz\
]
puts "Ratio of sampling to data clock frequencies is [expr $clk_set2 / $clk_set1]."
When one output clock frequency is a multiple of the other it is simple to get both the required values. There are a few system constraints such as the operating range of the PLL's Voltage Controlled Oscillator (VCO) which for example might be constrained to the range 600-1300/1600 MHz, and also the size of the integer multipliers and divisors. So for the example below the output clocks 'snapped' to settings that gave a frequency ratio of 3 as the choice of property CONFIG.MMCM_CLKOUTn_DIVIDE was all that could be altered.
Data Clock Frequency 135.000 ~= 125.000 * 13 / 2 / 6 = 135.41666666666666 MHz Sampling Clock Frequency 400.000 ~= 125.000 * 13 / 2 / 2 = 406.25 MHz Ratio of sampling to data clock frequencies is 3.0.
This could be solved by using a second independent PLL since the assumption is both clock are asynchronous, except in this design, when I use one external clock to drive two separate PLLs I get these error messages:
CRITICAL WARNING: [Shape Builder 18-119] Failed to create I/OLOGIC Route Through shape for instance pll_samp_i/inst/clkin1_ibufg. Found overlapping instances within the shape: pll_i/inst/clkin1_ibufg and pll_samp_i/inst/clkin1_ibufg. CRITICAL WARNING: [Vivado 12-1411] Cannot set LOC property of ports, Cannot set PACKAGE_PIN property of ports, port clk_port can not be placed on PACKAGE_PIN K17 because the PACKAGE_PIN is occupied by port clk_port. Please note that for projects targeting board parts, user LOC constraints cannot override constraints provided with the board. [<directory path>/constraints/Zybo-Master.xdc:23] ERROR: [Place 30-602] IO port 'clk_port' is driving multiple buffers. This will lead to unplaceable/unroutable situation. The buffers connected are: pll_samp_i/inst/clkin1_ibufg {IBUF} pll_i/inst/clkin1_ibufg {IBUF}
So I can only use a single PLL for this example design, which may be limited by the device type. I therefore have to work within the PLL frequency setting limits.
Results
Sample Clock (MHz) | Maximum Serial Data Clock (MHz) | ||
---|---|---|---|
Requested | Actual | Requested | Actual |
100 | 99.61 | 42 | 41.94 |
125 | 125.00 | 50 | 50.00 |
150 | 148.81 | 58 | 57.87 |
175 | 175.93 | 66 | 65.97 |
200 | 200.00 | 73 | 72.73 |
225 | 227.68 | 76 | 75.89 |
250 | 250.00 | 85 | 85.00 |
300 | 308.33 | 103 | 102.78 |
400 | 406.25 | 135 | 135.42 |
The results largely obey a linear relationship. The 'kink' in the line at ~225 MHz sampling frequency is not explained by the requested vs actual frequency discrepancy as the plot below uses only the actual values. Without test equipment it is not possible to investigate the cause further.

400 MHz seems to be the limit of this technique on this device and setup. Moving to 450 MHz meant the PLL got the requested clock frequencies for a 135 MHz data clock, but the design failed to transfer the data correctly. My results are consistent with Eli's rule of thumb:
If @stable_clk is three times as fast as @data_clk, that is often enough.
Using 01-signal sampling with source-synchronous inputs - Eli Billauer
Conclusions
In a previous I/O experiment blog I showed that on the same FPGA development board with a similar setup I was able to reach a little more than 100 MHz data clock for reliable data transfer. Using this method I can reach 135 MHz with much less effort. Previously I had to trim the phase of a PLL for optimum reception, even with an automatical calibration scheme for the IDELAY component, since the phase had to be within 'striking distance' of a ±1.2 ns variable delay. This scheme gives superior data transfer for less work as can be seen from the experimental results.
References
- Github Source Code
- Using 01-signal sampling with source-synchronous inputs, Eli Billauer
- Source-synchronous inputs, Eli Billauer