The initial lead was a YouTube Video by Adaptive Design. The video explains the mathematics, so it will not be repeated here. It also mentioned the existence of some pipelined HDL code, but as its a video its not available, so I have recreated and generalised it here.
Source Material
In short, the bit position of the first bit set to 1 directly gives you the integer part of the Log2() result. A fractional approximation can be added for each additional bit below the first bit. A lookup table can optionally be used to add each fractional bit's contribution. Watch the video for the detail. Note the code below does not employ the 4-bit OR gates used in the video.
VHDL Code
library ieee;
use ieee.std_logic_1164.all;
use ieee.fixed_pkg.all;
library local;
entity quick_log2 is
generic(
data_width_g : positive := 16;
frac_width_g : positive := 4;
use_lut_g : boolean := true -- true more accurate, false better timing
);
port (
clk : in std_logic;
linear : in std_logic_vector(data_width_g-1 downto 0);
log : out ufixed(local.math_pkg.ceil_log(data_width_g, 2)-1 downto -frac_width_g)
);
end entity;
library ieee;
use ieee.numeric_std_unsigned.all;
architecture rtl of quick_log2 is
type lut_type is array (0 to 2**frac_width_g-1) of std_logic_vector(frac_width_g-1 downto 0);
function log2_lut return lut_type is
use ieee.math_real.log2;
use ieee.math_real.round;
variable ret : lut_type;
begin
for i in lut_type'range loop
ret(i) := to_slv(
integer(round(real(2**frac_width_g) * log2(real(1.0 + real(i)/real(2**frac_width_g))))),
ret(i)'length
);
end loop;
return ret;
end function;
constant log2_frac_lut_c : lut_type := log2_lut;
function get_msb_pos(val : std_logic_vector) return integer is
begin
for i in val'range loop
if val(i) = '1' then
return i;
end if;
end loop;
return 0;
end function;
begin
process(clk)
variable msb_idx_v : integer range 0 to data_width_g-1;
variable mantissa_v : std_logic_vector(frac_width_g-1 downto 0);
variable fraction_v : std_logic_vector(frac_width_g-1 downto 0);
begin
if rising_edge(clk) then
-- Find MSB position (Integer part)
msb_idx_v := get_msb_pos(linear);
-- Extract fractional bits directly following the leading '1'
-- Handles boundary cases where remaining bits are fewer than frac_width_g
mantissa_v := (others => '0');
for j in 0 to frac_width_g-1 loop
if (msb_idx_v - 1 - j) >= 0 then
mantissa_v(frac_width_g - 1 - j) := linear(msb_idx_v - 1 - j);
else
mantissa_v(frac_width_g - 1 - j) := '0';
end if;
end loop;
-- Look up fractional bits
if use_lut_g then
-- Maximum error = 0.1074 for 4 fractional bits
fraction_v := log2_frac_lut_c(to_integer(mantissa_v));
else
-- Really simple, note the log2_frac_lut_c is nearly output = input
-- Maximum error = 0.1485 for 4 fractional bits
fraction_v := mantissa_v;
end if;
-- Combined 8-bit result in unsigned fixed point Qm.f format where:
-- * m = log2(data_width_g)-1
-- * f = frac_width_g
log <= to_ufixed(to_slv(msb_idx_v, 4) & fraction_v, log);
end if;
end process;
end architecture;
If you need a simpler implementation to work with older tools, the LUT can be hard coded, the 4-bit fractional LUT is as follows:
-- Mathematically accurate log2 LUT for a 4-bit mantissa
constant log2_frac_lut_c : lut_type := (
X"0", X"1", X"3", X"4", X"5", X"6", X"7", X"8",
X"9", X"A", X"B", X"C", X"D", X"E", X"F", X"F"
);
The lookup table above illustrates how the LUT's output value is very nearly the same as the LUT's input value. This means for a coarser result, the LUT can be omitted, hence the use_lut_g generic in the VHDL code.
Results
All the results have used a linear inputs of 16-bits and only worried about changing the number of fractional bits.
The results below show a favourable clock speed and accuracy without any pipelining. The Xilinx device used was xc7z020clg484-1 and Vivado version 2023.2. The device was chosen from one of those usable in the free tier.
| Fractional bits | Maximum Log2() Error | Fmax (MHz) |
|---|---|---|
| 1 | 0.5849 | 412.9 |
| 2 | 0.3349 | 412.9 |
| 3 | 0.1969 | 296.8 |
| 4 | 0.1074 | 213.1 |
| 5 | 0.0562 | 277.2 |
| 6 | 0.0287 | 279.4 |
| 7 | 0.0145 | 248.6 |
| 8 | 0.0073 | 206.2 |
Each additional fractional bit might be assumed to reduce the output error by a half in line with the resolution of the additional fraction that can now be added to the output. More precisely, using linear regression on the log2(output), the output error is reduced by 0.533 rather than 0.5.

The cause of the poor performance of the 4 fractional bits clock speed over the 5 fractional bits is illustrated above. The lookup table has been realised as logic rather than a ROM, and the depth of the logic to the fractional bits is deeper for 4 over 5 factional bits. This looks like bad luck, the logic mapping was just more favourable in the former case.
Optional Lookup Table
Taking away the lookup table gives synthesis results constantly at 412.9 MHz for all fraction bit sizes in the table below (1-8). Whilst the error grows with fraction bits, the application, e.g. plotting a real-time spectrogram only for consumption by the human eye, may mean the error remains undetected, and the clock speed improvement may be more important.
| Fractional bits | Error with LUT | Error without LUT | Improvement in error |
|---|---|---|---|
| 1 | 58.5% | 58.5% | 0.0% |
| 2 | 33.5% | 33.5% | 0.0% |
| 3 | 19.7% | 21.0% | 1.3% |
| 4 | 10.7% | 14.9% | 4.1% |
| 5 | 5.6% | 11.7% | 6.1% |
| 6 | 2.9% | 10.2% | 7.3% |
| 7 | 1.5% | 9.4% | 7.9% |
| 8 | 0.7% | 9.0% | 8.3% |

Conclusions
Without the lookup table, whilst the error grows with fraction bits, the application, e.g. plotting a real-time spectrogram, may mean the error remains undetected, and the clock speed improvement may be more important. Even with the lookup table there's the opportunity to pipeline the calculation. A sensible pipeline would be an AXI-Stream shift register, which is the next job.
References
- Github Source Code
- DSP Trick: Quick-and-Dirty Logarithms by Ray Andraka
- FPGA Quick-and-Dirty Logarithms YouTube explanation by Adaptive Design
- Fixed point logarithm to base 2

