I recently attended an online workshop called Using AMD High Level Synthesis to Supercharge your Design Performance sponsored by AMD, core-vision and Technically Speaking. The design tool being promoted here is AMD's Vitis HLS. On completion I tried out the tools on a very simple example. As an HDL-based FPGA designer, I was sceptical about the benefits High Level Synthesis might offer me, but I had no experience to speak of.
- My Simple Example
- C Code
- C Simulation
- Initial C Synthesis and C/RTL Simulation
- Amended C Synthesis
- C/RTL Simulation
- Using Pragmas
- C Code
- C Synthesis
- Conclusions
- References
My Simple Example
C Code
The first step is to create the algorithm in software.
int arr_sum (int a[], unsigned int len) {
int b = 0;
for(int i=0; i<len; i++) {
b += a[i];
}
return b;
}
int arr_sum4 (int a[]) {
return arr_sum(a, 4);
}
The second step is to create a test bench for the code that can be re-applied after synthesis by Vitis HLS.
#include <iostream>
#include "arr_sum.h"
#define ARRAY_LEN 4
int main() {
int ret = 0;
int res = 0;
res = arr_sum4((int[]) {1, 2, -3, 4});
printf("res = %d\r\n", res);
if (res == 4) {
printf ("Pass\n");
} else {
printf ("Fail\n");
ret = 1;
}
res = arr_sum4((int[]) {10, 20, 30, 40});
printf("res = %d\r\n", res);
if (res == 100) {
printf ("Pass\n");
} else {
printf ("Fail\n");
ret = 1;
}
res = arr_sum4((int[]) {5, 24, 61, -6});
printf("res = %d\r\n", res);
if (res == 84) {
printf ("Pass\n");
} else {
printf ("Fail\n");
ret = 1;
}
return ret;
}
C Simulation
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
vitis-run.bat --mode hls --csim --config %PATH%\hls_test\hls_config.cfg --work_dir hls_test
****** vitis-run v2025.2.1 (64-bit)
**** SW Build 6397637 on 2026-03-13-19:08:23
**** Start of session at: Fri Apr 24 13:36:33 2026
** Copyright 1986-2022 Xilinx, Inc. All Rights Reserved.
** Copyright 2022-2026 Advanced Micro Devices, Inc. All Rights Reserved.
**** HLS Build v2025.2.1 6397637
INFO: [HLS 200-2005] Using work_dir %PATH%/hls_test/hls_test
INFO: [HLS 200-2176] Writing Vitis IDE component file %PATH%/hls_test/hls_test/vitis-comp.json
INFO: [HLS 200-10] Creating and opening component '%PATH%/hls_test/hls_test'.
INFO: [HLS 200-1505] Using default flow_target 'vivado'
Resolution: For help on HLS 200-1505 see docs.amd.com/access/sources/dita/topic?Doc_Version=2025.2%20English&url=ug1448-hls-guidance&resourceid=200-1505.html
INFO: [HLS 200-2174] Applying component config ini file hls_config.cfg
INFO: [HLS 200-1465] Applying config ini 'syn.file=arr_sum.cpp' from hls_config.cfg(11)
INFO: [HLS 200-10] Adding design file '%PATH%/hls_test/arr_sum.cpp' to the project
INFO: [HLS 200-1465] Applying config ini 'syn.file=arr_sum.h' from hls_config.cfg(12)
INFO: [HLS 200-10] Adding design file '%PATH%/hls_test/arr_sum.h' to the project
INFO: [HLS 200-1465] Applying config ini 'tb.file=test_arr_sum.cpp' from hls_config.cfg(13)
INFO: [HLS 200-10] Adding test bench file '%PATH%/hls_test/test_arr_sum.cpp' to the project
INFO: [HLS 200-1465] Applying config ini 'syn.top=arr_sum4' from hls_config.cfg(9)
INFO: [HLS 200-1465] Applying config ini 'flow_target=vivado' from hls_config.cfg(4)
INFO: [HLS 200-1505] Using flow_target 'vivado'
Resolution: For help on HLS 200-1505 see docs.amd.com/access/sources/dita/topic?Doc_Version=2025.2%20English&url=ug1448-hls-guidance&resourceid=200-1505.html
INFO: [HLS 200-1465] Applying config ini 'part=xc7z007sclg400-1' from hls_config.cfg(1)
INFO: [HLS 200-1611] Setting target device to 'xc7z007s-clg400-1'
INFO: [HLS 200-1465] Applying config ini 'clock=5ns' from hls_config.cfg(7)
INFO: [SYN 201-201] Setting up clock 'default' with a period of 5ns.
INFO: [HLS 200-1465] Applying config ini 'clock_uncertainty=25%' from hls_config.cfg(8)
INFO: [SYN 201-201] Setting up clock 'default' with an uncertainty of 1.25ns.
INFO: [HLS 200-1465] Applying config ini 'cosim.rtl=vhdl' from hls_config.cfg(10)
INFO: [HLS 200-1465] Applying config ini 'package.output.format=ip_catalog' from hls_config.cfg(5)
INFO: [HLS 200-2176] Writing Vitis IDE component file %PATH%/hls_test/hls_test/vitis-comp.json
INFO: [SIM 211-2] *************** CSIM start ***************
INFO: [HLS 200-2191] C-Simulation will use clang-16 as the compiler
INFO: [HLS 200-2036] Building debug C Simulation binaries
Compiling ../../../../arr_sum.cpp in debug mode
Generating csim.exe
res = 4
Pass
res = 100
Pass
res = 84
Pass
INFO: [SIM 211-1] CSim done with 0 errors.
INFO: [SIM 211-3] *************** CSIM finish ***************
INFO: [HLS 200-112] Total CPU user time: 1 seconds. Total CPU system time: 1 seconds. Total elapsed time: 2.976 seconds; peak allocated memory: 168.957 MB.
INFO: [vitis-run 60-791] Total elapsed time: 0h 0m 6s
C-simulation finished successfully
Initial C Synthesis and C/RTL Simulation
Skipping the initial synthesis details for a moment, after synthesis one had best perform a C/RTL simulation to ensure nothing has broken in translation. Having passed the C simulation, here I discovered an issue as the co-simulation failed.
INFO: [COSIM 212-302] Starting C TB testing ... res = -1034531568 Fail res = -1034531559 Fail res = -1034531564 Fail
The VHDL code produced did not make sense, it could never work. After puzzling for a while I amended the C code as follows to put an array size into the function prototype.
int arr_sum4 (int a[4]) {
return arr_sum(a, 4);
}
Amended C Synthesis
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
v++.bat -c --mode hls --config %PATH%\hls_test\hls_config.cfg --work_dir hls_test
****** v++ v2025.2.1 (64-bit)
**** SW Build 6397637 on 2026-03-13-19:08:23
**** Start of session at: Fri Apr 24 13:47:03 2026
** Copyright 1986-2022 Xilinx, Inc. All Rights Reserved.
** Copyright 2022-2026 Advanced Micro Devices, Inc. All Rights Reserved.
**** HLS Build v2025.2.1 6397637
INFO: [HLS 200-2005] Using work_dir %PATH%/hls_test/hls_test
INFO: [HLS 200-2176] Writing Vitis IDE component file %PATH%/hls_test/hls_test/vitis-comp.json
INFO: [HLS 200-10] Creating and opening component '%PATH%/hls_test/hls_test'.
INFO: [HLS 200-1505] Using default flow_target 'vivado'
Resolution: For help on HLS 200-1505 see docs.amd.com/access/sources/dita/topic?Doc_Version=2025.2%20English&url=ug1448-hls-guidance&resourceid=200-1505.html
INFO: [HLS 200-2174] Applying component config ini file hls_config.cfg
INFO: [HLS 200-1465] Applying config ini 'syn.file=arr_sum.cpp' from hls_config.cfg(11)
INFO: [HLS 200-10] Adding design file '%PATH%/hls_test/arr_sum.cpp' to the project
INFO: [HLS 200-1465] Applying config ini 'syn.file=arr_sum.h' from hls_config.cfg(12)
INFO: [HLS 200-10] Adding design file '%PATH%/hls_test/arr_sum.h' to the project
INFO: [HLS 200-1465] Applying config ini 'tb.file=test_arr_sum.cpp' from hls_config.cfg(13)
INFO: [HLS 200-10] Adding test bench file '%PATH%/hls_test/test_arr_sum.cpp' to the project
INFO: [HLS 200-1465] Applying config ini 'syn.top=arr_sum4' from hls_config.cfg(9)
INFO: [HLS 200-1465] Applying config ini 'flow_target=vivado' from hls_config.cfg(4)
INFO: [HLS 200-1505] Using flow_target 'vivado'
Resolution: For help on HLS 200-1505 see docs.amd.com/access/sources/dita/topic?Doc_Version=2025.2%20English&url=ug1448-hls-guidance&resourceid=200-1505.html
INFO: [HLS 200-1465] Applying config ini 'part=xc7z007sclg400-1' from hls_config.cfg(1)
INFO: [HLS 200-1611] Setting target device to 'xc7z007s-clg400-1'
INFO: [HLS 200-1465] Applying config ini 'clock=5ns' from hls_config.cfg(7)
INFO: [SYN 201-201] Setting up clock 'default' with a period of 5ns.
INFO: [HLS 200-1465] Applying config ini 'clock_uncertainty=25%' from hls_config.cfg(8)
INFO: [SYN 201-201] Setting up clock 'default' with an uncertainty of 1.25ns.
INFO: [HLS 200-1465] Applying config ini 'cosim.rtl=vhdl' from hls_config.cfg(10)
INFO: [HLS 200-1465] Applying config ini 'package.output.format=ip_catalog' from hls_config.cfg(5)
INFO: [HLS 200-2176] Writing Vitis IDE component file %PATH%/hls_test/hls_test/vitis-comp.json
INFO: [HLS 200-111] Finished File checks and directory preparation: CPU user time: 1 seconds. CPU system time: 1 seconds. Elapsed time: 1.731 seconds; current allocated memory: 168.418 MB.
INFO: [HLS 200-2191] C-Synthesis will use clang-16 as the compiler
INFO: [HLS 200-10] Analyzing design file 'arr_sum.cpp' ...
INFO: [HLS 200-111] Finished Source Code Analysis and Preprocessing: CPU user time: 0 seconds. CPU system time: 0 seconds. Elapsed time: 0.728 seconds; current allocated memory: 171.070 MB.
INFO: [HLS 200-777] Using interface defaults for 'Vivado' flow target.
INFO: [HLS 200-1995] There were 17 instructions in the design after the 'Compile/Link' phase of compilation. See the Design Size Report for more details (%PATH%/hls_test/hls_test/hls/syn/report/csynth_design_size.rpt:2)
INFO: [HLS 200-1995] There were 16 instructions in the design after the 'Unroll/Inline (step 1)' phase of compilation. See the Design Size Report for more details (%PATH%/hls_test/hls_test/hls/syn/report/csynth_design_size.rpt:2)
INFO: [HLS 200-1995] There were 13 instructions in the design after the 'Unroll/Inline (step 2)' phase of compilation. See the Design Size Report for more details (%PATH%/hls_test/hls_test/hls/syn/report/csynth_design_size.rpt:2)
INFO: [HLS 200-1995] There were 10 instructions in the design after the 'Unroll/Inline (step 3)' phase of compilation. See the Design Size Report for more details (%PATH%/hls_test/hls_test/hls/syn/report/csynth_design_size.rpt:2)
INFO: [HLS 200-1995] There were 10 instructions in the design after the 'Unroll/Inline (step 4)' phase of compilation. See the Design Size Report for more details (%PATH%/hls_test/hls_test/hls/syn/report/csynth_design_size.rpt:2)
INFO: [HLS 200-1995] There were 10 instructions in the design after the 'Array/Struct (step 1)' phase of compilation. See the Design Size Report for more details (%PATH%/hls_test/hls_test/hls/syn/report/csynth_design_size.rpt:2)
INFO: [HLS 200-1995] There were 10 instructions in the design after the 'Array/Struct (step 2)' phase of compilation. See the Design Size Report for more details (%PATH%/hls_test/hls_test/hls/syn/report/csynth_design_size.rpt:2)
INFO: [HLS 200-1995] There were 10 instructions in the design after the 'Array/Struct (step 3)' phase of compilation. See the Design Size Report for more details (%PATH%/hls_test/hls_test/hls/syn/report/csynth_design_size.rpt:2)
INFO: [HLS 200-1995] There were 10 instructions in the design after the 'Array/Struct (step 4)' phase of compilation. See the Design Size Report for more details (%PATH%/hls_test/hls_test/hls/syn/report/csynth_design_size.rpt:2)
INFO: [HLS 200-1995] There were 11 instructions in the design after the 'Array/Struct (step 5)' phase of compilation. See the Design Size Report for more details (%PATH%/hls_test/hls_test/hls/syn/report/csynth_design_size.rpt:2)
INFO: [HLS 200-1995] There were 11 instructions in the design after the 'Performance (step 1)' phase of compilation. See the Design Size Report for more details (%PATH%/hls_test/hls_test/hls/syn/report/csynth_design_size.rpt:2)
INFO: [HLS 200-1995] There were 10 instructions in the design after the 'Performance (step 2)' phase of compilation. See the Design Size Report for more details (%PATH%/hls_test/hls_test/hls/syn/report/csynth_design_size.rpt:2)
INFO: [HLS 200-1995] There were 10 instructions in the design after the 'Performance (step 3)' phase of compilation. See the Design Size Report for more details (%PATH%/hls_test/hls_test/hls/syn/report/csynth_design_size.rpt:2)
INFO: [HLS 200-1995] There were 10 instructions in the design after the 'Performance (step 4)' phase of compilation. See the Design Size Report for more details (%PATH%/hls_test/hls_test/hls/syn/report/csynth_design_size.rpt:2)
INFO: [HLS 200-1995] There were 12 instructions in the design after the 'HW Transforms (step 1)' phase of compilation. See the Design Size Report for more details (%PATH%/hls_test/hls_test/hls/syn/report/csynth_design_size.rpt:2)
INFO: [HLS 200-1995] There were 17 instructions in the design after the 'HW Transforms (step 2)' phase of compilation. See the Design Size Report for more details (%PATH%/hls_test/hls_test/hls/syn/report/csynth_design_size.rpt:2)
INFO: [HLS 214-178] Inlining function 'arr_sum(int*, unsigned int)' into 'arr_sum4(int*)' (arr_sum.cpp:13:0)
INFO: [HLS 214-376] Pipelining loop< VITIS_LOOP_6_1> at arr_sum.cpp:6:21 due to pipeline_loops threshold
INFO: [HLS 200-111] Finished Compiling Optimization and Transform: CPU user time: 0 seconds. CPU system time: 0 seconds. Elapsed time: 7.596 seconds; current allocated memory: 172.891 MB.
INFO: [HLS 200-111] Finished Checking Pragmas: CPU user time: 0 seconds. CPU system time: 0 seconds. Elapsed time: 0.003 seconds; current allocated memory: 172.891 MB.
INFO: [HLS 200-10] Starting code transformations ...
INFO: [HLS 200-111] Finished Standard Transforms: CPU user time: 0 seconds. CPU system time: 0 seconds. Elapsed time: 0.009 seconds; current allocated memory: 176.871 MB.
INFO: [HLS 200-10] Checking synthesizability ...
INFO: [HLS 200-111] Finished Checking Synthesizability: CPU user time: 0 seconds. CPU system time: 0 seconds. Elapsed time: 0.009 seconds; current allocated memory: 178.062 MB.
INFO: [HLS 200-111] Finished Loop, function and other optimizations: CPU user time: 0 seconds. CPU system time: 0 seconds. Elapsed time: 0.024 seconds; current allocated memory: 198.754 MB.
INFO: [HLS 200-111] Finished Architecture Synthesis: CPU user time: 0 seconds. CPU system time: 0 seconds. Elapsed time: 0.015 seconds; current allocated memory: 198.758 MB.
INFO: [HLS 200-10] Starting hardware synthesis ...
INFO: [HLS 200-10] Synthesizing 'arr_sum4' ...
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [HLS 200-42] -- Implementing module 'arr_sum4'
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [SCHED 204-11] Starting scheduling ...
INFO: [SCHED 204-61] Pipelining loop 'VITIS_LOOP_6_1'.
WARNING: [HLS 200-880] The II Violation in module 'arr_sum4' (loop 'VITIS_LOOP_6_1'): Unable to enforce a carried dependence constraint (II = 1, distance = 1, offset = 1) between 'store' operation ('b_write_ln2', arr_sum.cpp:2->arr_sum.cpp:14) of variable 'b', arr_sum.cpp:7->arr_sum.cpp:14 32 bit on local variable 'b', arr_sum.cpp:2->arr_sum.cpp:14 and 'add' operation 32 bit ('b', arr_sum.cpp:7->arr_sum.cpp:14).
Resolution: For help on HLS 200-880 see docs.amd.com/access/sources/dita/topic?Doc_Version=2025.2%20English&url=ug1448-hls-guidance&resourceid=200-880.html
INFO: [HLS 200-1470] Pipelining result : Target II = NA, Final II = 2, Depth = 3, loop 'VITIS_LOOP_6_1'
INFO: [SCHED 204-11] Finished scheduling.
INFO: [HLS 200-111] Finished Scheduling: CPU user time: 0 seconds. CPU system time: 0 seconds. Elapsed time: 0.125 seconds; current allocated memory: 198.758 MB.
INFO: [HLS 200-2250] Rewind delay = 1 for the pipelined loop 'VITIS_LOOP_6_1' due to a read-after-write dependence on variable 'i (arr_sum.cpp:6)'.
INFO: [BIND 205-100] Starting micro-architecture generation ...
INFO: [BIND 205-101] Performing variable lifetime analysis.
INFO: [BIND 205-101] Exploring resource sharing.
INFO: [BIND 205-101] Binding ...
INFO: [BIND 205-100] Finished micro-architecture generation.
INFO: [HLS 200-111] Finished Binding: CPU user time: 0 seconds. CPU system time: 0 seconds. Elapsed time: 0.013 seconds; current allocated memory: 198.758 MB.
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [HLS 200-10] -- Generating RTL for module 'arr_sum4'
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [RTGEN 206-500] Setting interface mode on port 'arr_sum4/a' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on function 'arr_sum4' to 'ap_ctrl_hs'.
INFO: [HLS 200-1030] Apply Unified Pipeline Control on module 'arr_sum4' pipeline 'VITIS_LOOP_6_1' pipeline type 'loop pipeline'
INFO: [RTGEN 206-100] Finished creating RTL model for 'arr_sum4'.
INFO: [HLS 200-111] Finished Creating RTL model: CPU user time: 0 seconds. CPU system time: 0 seconds. Elapsed time: 0.023 seconds; current allocated memory: 199.582 MB.
INFO: [HLS 200-111] Finished Generating all RTL models: CPU user time: 1 seconds. CPU system time: 0 seconds. Elapsed time: 0.174 seconds; current allocated memory: 203.148 MB.
INFO: [HLS 200-2225] Wrote inferred directives to file %PATH%/hls_test/hls_test/hls/syn/inferred_directives.ini
INFO: [HLS 200-111] Finished Updating report files: CPU user time: 0 seconds. CPU system time: 0 seconds. Elapsed time: 1.611 seconds; current allocated memory: 203.867 MB.
INFO: [VHDL 208-304] Generating VHDL RTL for arr_sum4.
INFO: [VLOG 209-307] Generating Verilog RTL for arr_sum4.
INFO: [HLS 200-790] **** Loop Constraint Status: All loop constraints were NOT satisfied.
INFO: [HLS 200-789] **** Estimated Fmax: 317.16 MHz
INFO: [HLS 200-112] Total CPU user time: 2 seconds. Total CPU system time: 1 seconds. Total elapsed time: 12.076 seconds; peak allocated memory: 203.949 MB.
INFO: [v++ 60-791] Total elapsed time: 0h 0m 15s
Synthesis finished successfully, open report.
From this report it is worth pulling out two lines as shown below:
INFO: [RTGEN 206-500] Setting interface mode on port 'arr_sum4/a' to 'ap_memory'. INFO: [RTGEN 206-500] Setting interface mode on function 'arr_sum4' to 'ap_ctrl_hs'.
The input array a is being mapped to a RAM, and the interface ports setup accordingly. The generated VHDL entity looks as follows, with a_address0 the address to the RAM and a_q0 the returned data. So the impementation follows the sequential loop counter in the software.
-- ==============================================================
-- Generated by Vitis HLS v2025.2.1
-- Copyright 1986-2022 Xilinx, Inc. All Rights Reserved.
-- Copyright 2022-2026 Advanced Micro Devices, Inc. All Rights Reserved.
-- ==============================================================
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.numeric_std.all;
entity arr_sum4 is
port (
ap_clk : IN STD_LOGIC;
ap_rst : IN STD_LOGIC;
ap_start : IN STD_LOGIC;
ap_done : OUT STD_LOGIC;
ap_idle : OUT STD_LOGIC;
ap_ready : OUT STD_LOGIC;
a_address0 : OUT STD_LOGIC_VECTOR (1 downto 0);
a_ce0 : OUT STD_LOGIC;
a_q0 : IN STD_LOGIC_VECTOR (31 downto 0);
ap_return : OUT STD_LOGIC_VECTOR (31 downto 0) );
end;
architecture behav of arr_sum4 is
attribute DowngradeIPIdentifiedWarnings : STRING;
attribute DowngradeIPIdentifiedWarnings of behav : architecture is "yes";
attribute CORE_GENERATION_INFO : STRING;
attribute CORE_GENERATION_INFO of behav : architecture is
"arr_sum4_arr_sum4,hls_ip_2025_2_1,{HLS_INPUT_TYPE=cxx,HLS_INPUT_FLOAT=0,HLS_INPUT_FIXED=0,HLS_INPUT_PART=xc7z007s-clg400-1,HLS_INPUT_CLOCK=5.000000,HLS_INPUT_ARCH=others,HLS_SYN_CLOCK=3.153000,HLS_SYN_LAT=10,HLS_SYN_TPT=none,HLS_SYN_MEM=0,HLS_SYN_DSP=0,HLS_SYN_FF=75,HLS_SYN_LUT=138,HLS_VERSION=2025_2_1}";
constant ap_const_logic_1 : STD_LOGIC := '1';
constant ap_const_logic_0 : STD_LOGIC := '0';
constant ap_ST_fsm_pp0_stage0 : STD_LOGIC_VECTOR (1 downto 0) := "01";
constant ap_ST_fsm_pp0_stage1 : STD_LOGIC_VECTOR (1 downto 0) := "10";
constant ap_const_lv32_0 : STD_LOGIC_VECTOR (31 downto 0) := "00000000000000000000000000000000";
constant ap_const_boolean_1 : BOOLEAN := true;
constant ap_const_lv32_1 : STD_LOGIC_VECTOR (31 downto 0) := "00000000000000000000000000000001";
constant ap_const_boolean_0 : BOOLEAN := false;
constant ap_const_lv1_1 : STD_LOGIC_VECTOR (0 downto 0) := "1";
constant ap_const_lv2_0 : STD_LOGIC_VECTOR (1 downto 0) := "00";
constant ap_const_lv2_3 : STD_LOGIC_VECTOR (1 downto 0) := "11";
constant ap_const_lv2_1 : STD_LOGIC_VECTOR (1 downto 0) := "01";
signal ap_CS_fsm : STD_LOGIC_VECTOR (1 downto 0) := "01";
attribute fsm_encoding : string;
attribute fsm_encoding of ap_CS_fsm : signal is "none";
signal ap_CS_fsm_pp0_stage0 : STD_LOGIC;
attribute fsm_encoding of ap_CS_fsm_pp0_stage0 : signal is "none";
signal ap_enable_reg_pp0_iter0 : STD_LOGIC;
signal ap_enable_reg_pp0_iter1 : STD_LOGIC := '0';
signal ap_idle_pp0 : STD_LOGIC;
signal ap_CS_fsm_pp0_stage1 : STD_LOGIC;
attribute fsm_encoding of ap_CS_fsm_pp0_stage1 : signal is "none";
signal ap_block_pp0_stage1_subdone : BOOLEAN;
signal ap_enable_reg_pp0_iter0_reg : STD_LOGIC := '0';
signal icmp_ln6_reg_130 : STD_LOGIC_VECTOR (0 downto 0);
signal ap_condition_exit_pp0_iter0_stage1 : STD_LOGIC;
signal ap_loop_exit_ready : STD_LOGIC;
signal ap_ready_int : STD_LOGIC;
signal i_1_reg_120 : STD_LOGIC_VECTOR (1 downto 0);
signal ap_block_pp0_stage0_11001 : BOOLEAN;
signal icmp_ln6_fu_77_p2 : STD_LOGIC_VECTOR (0 downto 0);
signal b_3_fu_86_p2 : STD_LOGIC_VECTOR (31 downto 0);
signal b_3_reg_134 : STD_LOGIC_VECTOR (31 downto 0);
signal ap_block_pp0_stage1_11001 : BOOLEAN;
signal ap_block_pp0_stage0_subdone : BOOLEAN;
signal zext_ln6_fu_72_p1 : STD_LOGIC_VECTOR (63 downto 0);
signal ap_block_pp0_stage0 : BOOLEAN;
signal i_fu_38 : STD_LOGIC_VECTOR (1 downto 0) := "00";
signal i_3_fu_92_p2 : STD_LOGIC_VECTOR (1 downto 0);
signal ap_loop_init : STD_LOGIC;
signal ap_sig_allocacmp_i_1 : STD_LOGIC_VECTOR (1 downto 0);
signal b_fu_42 : STD_LOGIC_VECTOR (31 downto 0) := "00000000000000000000000000000000";
signal ap_block_pp0_stage1 : BOOLEAN;
signal a_ce0_local : STD_LOGIC;
signal ap_loop_exit_ready_pp0_iter1_reg : STD_LOGIC;
signal ap_condition_exit_pp0_iter1_stage0 : STD_LOGIC;
signal ap_idle_pp0_0to0 : STD_LOGIC;
signal ap_done_reg : STD_LOGIC := '0';
signal ap_continue_int : STD_LOGIC;
signal ap_done_int : STD_LOGIC;
signal ap_NS_fsm : STD_LOGIC_VECTOR (1 downto 0);
signal ap_enable_pp0 : STD_LOGIC;
signal ap_start_int : STD_LOGIC;
signal ap_ready_sig : STD_LOGIC;
signal ap_done_sig : STD_LOGIC;
signal ap_ce_reg : STD_LOGIC;
component arr_sum4_flow_control_loop_pipe IS
port (
ap_clk : IN STD_LOGIC;
ap_rst : IN STD_LOGIC;
ap_start : IN STD_LOGIC;
ap_ready : OUT STD_LOGIC;
ap_done : OUT STD_LOGIC;
ap_start_int : OUT STD_LOGIC;
ap_loop_init : OUT STD_LOGIC;
ap_ready_int : IN STD_LOGIC;
ap_loop_exit_ready : IN STD_LOGIC;
ap_loop_exit_done : IN STD_LOGIC;
ap_continue_int : OUT STD_LOGIC;
ap_done_int : IN STD_LOGIC;
ap_continue : IN STD_LOGIC );
end component;
begin
flow_control_loop_pipe_U : component arr_sum4_flow_control_loop_pipe
port map (
ap_clk => ap_clk,
ap_rst => ap_rst,
ap_start => ap_start,
ap_ready => ap_ready_sig,
ap_done => ap_done_sig,
ap_start_int => ap_start_int,
ap_loop_init => ap_loop_init,
ap_ready_int => ap_ready_int,
ap_loop_exit_ready => ap_condition_exit_pp0_iter0_stage1,
ap_loop_exit_done => ap_done_int,
ap_continue_int => ap_continue_int,
ap_done_int => ap_done_int,
ap_continue => ap_const_logic_1);
ap_CS_fsm_assign_proc : process(ap_clk)
begin
if (ap_clk'event and ap_clk = '1') then
if (ap_rst = '1') then
ap_CS_fsm <= ap_ST_fsm_pp0_stage0;
else
ap_CS_fsm <= ap_NS_fsm;
end if;
end if;
end process;
ap_done_reg_assign_proc : process(ap_clk)
begin
if (ap_clk'event and ap_clk = '1') then
if (ap_rst = '1') then
ap_done_reg <= ap_const_logic_0;
else
if ((ap_continue_int = ap_const_logic_1)) then
ap_done_reg <= ap_const_logic_0;
elsif (((ap_const_boolean_0 = ap_block_pp0_stage0_subdone) and (ap_const_logic_1 = ap_CS_fsm_pp0_stage0) and (ap_loop_exit_ready_pp0_iter1_reg = ap_const_logic_1))) then
ap_done_reg <= ap_const_logic_1;
end if;
end if;
end if;
end process;
ap_enable_reg_pp0_iter0_reg_assign_proc : process(ap_clk)
begin
if (ap_clk'event and ap_clk = '1') then
if (ap_rst = '1') then
ap_enable_reg_pp0_iter0_reg <= ap_const_logic_0;
else
if ((ap_const_logic_1 = ap_CS_fsm_pp0_stage0)) then
ap_enable_reg_pp0_iter0_reg <= ap_start_int;
end if;
end if;
end if;
end process;
ap_enable_reg_pp0_iter1_assign_proc : process(ap_clk)
begin
if (ap_clk'event and ap_clk = '1') then
if (ap_rst = '1') then
ap_enable_reg_pp0_iter1 <= ap_const_logic_0;
else
if (((ap_const_boolean_0 = ap_block_pp0_stage0_subdone) and (ap_enable_reg_pp0_iter1 = ap_const_logic_1) and (ap_const_logic_1 = ap_CS_fsm_pp0_stage0))) then
ap_enable_reg_pp0_iter1 <= ap_const_logic_0;
elsif (((ap_const_boolean_0 = ap_block_pp0_stage1_subdone) and (ap_const_logic_1 = ap_CS_fsm_pp0_stage1))) then
ap_enable_reg_pp0_iter1 <= ap_enable_reg_pp0_iter0;
end if;
end if;
end if;
end process;
ap_loop_exit_ready_pp0_iter1_reg_assign_proc : process (ap_clk)
begin
if (ap_clk'event and ap_clk = '1') then
if ((((ap_loop_exit_ready = ap_const_logic_0) and (ap_const_boolean_0 = ap_block_pp0_stage0_subdone) and (ap_const_logic_1 = ap_CS_fsm_pp0_stage0)) or ((ap_const_logic_1 = ap_condition_exit_pp0_iter1_stage0) and (ap_idle_pp0_0to0 = ap_const_logic_1)))) then
ap_loop_exit_ready_pp0_iter1_reg <= ap_const_logic_0;
elsif (((ap_const_boolean_0 = ap_block_pp0_stage1_11001) and (ap_const_logic_1 = ap_CS_fsm_pp0_stage1))) then
ap_loop_exit_ready_pp0_iter1_reg <= ap_loop_exit_ready;
end if;
end if;
end process;
b_fu_42_assign_proc : process (ap_clk)
begin
if (ap_clk'event and ap_clk = '1') then
if (((ap_const_boolean_0 = ap_block_pp0_stage0_11001) and (ap_const_logic_1 = ap_CS_fsm_pp0_stage0))) then
if (((ap_loop_init = ap_const_logic_1) and (ap_enable_reg_pp0_iter0 = ap_const_logic_1))) then
b_fu_42 <= ap_const_lv32_0;
elsif ((ap_enable_reg_pp0_iter1 = ap_const_logic_1)) then
b_fu_42 <= b_3_reg_134;
end if;
end if;
end if;
end process;
i_fu_38_assign_proc : process (ap_clk)
begin
if (ap_clk'event and ap_clk = '1') then
if (((ap_loop_init = ap_const_logic_1) and (ap_const_boolean_0 = ap_block_pp0_stage0_11001) and (ap_enable_reg_pp0_iter0 = ap_const_logic_1) and (ap_const_logic_1 = ap_CS_fsm_pp0_stage0))) then
i_fu_38 <= ap_const_lv2_0;
elsif (((ap_enable_reg_pp0_iter0_reg = ap_const_logic_1) and (ap_const_boolean_0 = ap_block_pp0_stage1_11001) and (ap_const_logic_1 = ap_CS_fsm_pp0_stage1))) then
i_fu_38 <= i_3_fu_92_p2;
end if;
end if;
end process;
process (ap_clk)
begin
if (ap_clk'event and ap_clk = '1') then
if (((ap_enable_reg_pp0_iter0_reg = ap_const_logic_1) and (ap_const_boolean_0 = ap_block_pp0_stage1_11001) and (ap_const_logic_1 = ap_CS_fsm_pp0_stage1))) then
b_3_reg_134 <= b_3_fu_86_p2;
end if;
end if;
end process;
process (ap_clk)
begin
if (ap_clk'event and ap_clk = '1') then
if (((ap_const_boolean_0 = ap_block_pp0_stage0_11001) and (ap_const_logic_1 = ap_CS_fsm_pp0_stage0))) then
i_1_reg_120 <= ap_sig_allocacmp_i_1;
icmp_ln6_reg_130 <= icmp_ln6_fu_77_p2;
end if;
end if;
end process;
ap_NS_fsm_assign_proc : process (ap_CS_fsm, ap_idle_pp0, ap_block_pp0_stage1_subdone, ap_block_pp0_stage0_subdone, ap_condition_exit_pp0_iter1_stage0, ap_idle_pp0_0to0)
begin
case ap_CS_fsm is
when ap_ST_fsm_pp0_stage0 =>
if (((ap_const_logic_1 = ap_condition_exit_pp0_iter1_stage0) and (ap_idle_pp0_0to0 = ap_const_logic_1))) then
ap_NS_fsm <= ap_ST_fsm_pp0_stage0;
elsif (((ap_const_boolean_0 = ap_block_pp0_stage0_subdone) and (ap_idle_pp0 = ap_const_logic_0))) then
ap_NS_fsm <= ap_ST_fsm_pp0_stage1;
else
ap_NS_fsm <= ap_ST_fsm_pp0_stage0;
end if;
when ap_ST_fsm_pp0_stage1 =>
if ((ap_const_boolean_0 = ap_block_pp0_stage1_subdone)) then
ap_NS_fsm <= ap_ST_fsm_pp0_stage0;
else
ap_NS_fsm <= ap_ST_fsm_pp0_stage1;
end if;
when others =>
ap_NS_fsm <= "XX";
end case;
end process;
a_address0 <= zext_ln6_fu_72_p1(2 - 1 downto 0);
a_ce0 <= a_ce0_local;
a_ce0_local_assign_proc : process(ap_CS_fsm_pp0_stage0, ap_enable_reg_pp0_iter0, ap_block_pp0_stage0_11001)
begin
if (((ap_const_boolean_0 = ap_block_pp0_stage0_11001) and (ap_enable_reg_pp0_iter0 = ap_const_logic_1) and (ap_const_logic_1 = ap_CS_fsm_pp0_stage0))) then
a_ce0_local <= ap_const_logic_1;
else
a_ce0_local <= ap_const_logic_0;
end if;
end process;
ap_CS_fsm_pp0_stage0 <= ap_CS_fsm(0);
ap_CS_fsm_pp0_stage1 <= ap_CS_fsm(1);
ap_block_pp0_stage0 <= not((ap_const_boolean_1 = ap_const_boolean_1));
ap_block_pp0_stage0_11001 <= not((ap_const_boolean_1 = ap_const_boolean_1));
ap_block_pp0_stage0_subdone <= not((ap_const_boolean_1 = ap_const_boolean_1));
ap_block_pp0_stage1 <= not((ap_const_boolean_1 = ap_const_boolean_1));
ap_block_pp0_stage1_11001 <= not((ap_const_boolean_1 = ap_const_boolean_1));
ap_block_pp0_stage1_subdone <= not((ap_const_boolean_1 = ap_const_boolean_1));
ap_condition_exit_pp0_iter0_stage1_assign_proc : process(ap_CS_fsm_pp0_stage1, ap_block_pp0_stage1_subdone, ap_enable_reg_pp0_iter0_reg, icmp_ln6_reg_130)
begin
if (((icmp_ln6_reg_130 = ap_const_lv1_1) and (ap_enable_reg_pp0_iter0_reg = ap_const_logic_1) and (ap_const_boolean_0 = ap_block_pp0_stage1_subdone) and (ap_const_logic_1 = ap_CS_fsm_pp0_stage1))) then
ap_condition_exit_pp0_iter0_stage1 <= ap_const_logic_1;
else
ap_condition_exit_pp0_iter0_stage1 <= ap_const_logic_0;
end if;
end process;
ap_condition_exit_pp0_iter1_stage0_assign_proc : process(ap_CS_fsm_pp0_stage0, ap_enable_reg_pp0_iter1, icmp_ln6_reg_130, ap_block_pp0_stage0_subdone)
begin
if (((icmp_ln6_reg_130 = ap_const_lv1_1) and (ap_const_boolean_0 = ap_block_pp0_stage0_subdone) and (ap_enable_reg_pp0_iter1 = ap_const_logic_1) and (ap_const_logic_1 = ap_CS_fsm_pp0_stage0))) then
ap_condition_exit_pp0_iter1_stage0 <= ap_const_logic_1;
else
ap_condition_exit_pp0_iter1_stage0 <= ap_const_logic_0;
end if;
end process;
ap_done <= ap_done_sig;
ap_done_int_assign_proc : process(ap_CS_fsm_pp0_stage0, ap_block_pp0_stage0_subdone, ap_loop_exit_ready_pp0_iter1_reg, ap_done_reg)
begin
if (((ap_const_boolean_0 = ap_block_pp0_stage0_subdone) and (ap_const_logic_1 = ap_CS_fsm_pp0_stage0) and (ap_loop_exit_ready_pp0_iter1_reg = ap_const_logic_1))) then
ap_done_int <= ap_const_logic_1;
else
ap_done_int <= ap_done_reg;
end if;
end process;
ap_enable_pp0 <= (ap_idle_pp0 xor ap_const_logic_1);
ap_enable_reg_pp0_iter0_assign_proc : process(ap_CS_fsm_pp0_stage0, ap_enable_reg_pp0_iter0_reg, ap_start_int)
begin
if ((ap_const_logic_1 = ap_CS_fsm_pp0_stage0)) then
ap_enable_reg_pp0_iter0 <= ap_start_int;
else
ap_enable_reg_pp0_iter0 <= ap_enable_reg_pp0_iter0_reg;
end if;
end process;
ap_idle_assign_proc : process(ap_CS_fsm_pp0_stage0, ap_idle_pp0, ap_start_int)
begin
if (((ap_idle_pp0 = ap_const_logic_1) and (ap_const_logic_1 = ap_CS_fsm_pp0_stage0) and (ap_start_int = ap_const_logic_0))) then
ap_idle <= ap_const_logic_1;
else
ap_idle <= ap_const_logic_0;
end if;
end process;
ap_idle_pp0_assign_proc : process(ap_enable_reg_pp0_iter0, ap_enable_reg_pp0_iter1)
begin
if (((ap_enable_reg_pp0_iter1 = ap_const_logic_0) and (ap_enable_reg_pp0_iter0 = ap_const_logic_0))) then
ap_idle_pp0 <= ap_const_logic_1;
else
ap_idle_pp0 <= ap_const_logic_0;
end if;
end process;
ap_idle_pp0_0to0_assign_proc : process(ap_enable_reg_pp0_iter0)
begin
if ((ap_enable_reg_pp0_iter0 = ap_const_logic_0)) then
ap_idle_pp0_0to0 <= ap_const_logic_1;
else
ap_idle_pp0_0to0 <= ap_const_logic_0;
end if;
end process;
ap_loop_exit_ready <= ap_condition_exit_pp0_iter0_stage1;
ap_ready <= ap_ready_sig;
ap_ready_int_assign_proc : process(ap_CS_fsm_pp0_stage1, ap_block_pp0_stage1_subdone, ap_enable_reg_pp0_iter0_reg)
begin
if (((ap_enable_reg_pp0_iter0_reg = ap_const_logic_1) and (ap_const_boolean_0 = ap_block_pp0_stage1_subdone) and (ap_const_logic_1 = ap_CS_fsm_pp0_stage1))) then
ap_ready_int <= ap_const_logic_1;
else
ap_ready_int <= ap_const_logic_0;
end if;
end process;
ap_return <= b_3_reg_134;
ap_sig_allocacmp_i_1_assign_proc : process(ap_CS_fsm_pp0_stage0, ap_enable_reg_pp0_iter0, ap_block_pp0_stage0, i_fu_38, ap_loop_init)
begin
if (((ap_loop_init = ap_const_logic_1) and (ap_const_boolean_0 = ap_block_pp0_stage0) and (ap_enable_reg_pp0_iter0 = ap_const_logic_1) and (ap_const_logic_1 = ap_CS_fsm_pp0_stage0))) then
ap_sig_allocacmp_i_1 <= ap_const_lv2_0;
else
ap_sig_allocacmp_i_1 <= i_fu_38;
end if;
end process;
b_3_fu_86_p2 <= std_logic_vector(unsigned(a_q0) + unsigned(b_fu_42));
i_3_fu_92_p2 <= std_logic_vector(unsigned(i_1_reg_120) + unsigned(ap_const_lv2_1));
icmp_ln6_fu_77_p2 <= "1" when (ap_sig_allocacmp_i_1 = ap_const_lv2_3) else "0";
zext_ln6_fu_72_p1 <= std_logic_vector(IEEE.numeric_std.resize(unsigned(ap_sig_allocacmp_i_1),64));
end behav;
The architecture includes two addition operators, one for the accumulated sum (pink) and the other for the address sequence (green) and what feels like a lot of logic to manage and control (e.g. FSM), i.e. presentation of arguments and flagging a completed result.

C/RTL Simulation
INFO: [COSIM 212-316] Starting C post checking ... res = 4 Pass res = 100 Pass res = 84 Pass
The amended software code now passes co-simulation. I have my initial design, but in my mind the efficient implementation would have a tree of 3 adders for 4 inputs. I would have coded a fully parallel addition, perhaps with some pipelining up the tree.
Using Pragmas
C Code
The designer can control how Vitis HLS realises the digital logic implementation by telling it how to behave with 'directives'. The VSCode-based IDE allows the designer to select an element of the code in the "HLS Directives" pane on the right. Each element has a "+" sign when hovered over and on clicking it a dialogue box opens with a drop down of the appropriate pragmas that can be applied. There are the two example I chose, ARRAY_PARTITION intended to realise the array as registers instead of memory so that they can all be access at the same time, and PIPELINE intended to unroll the for loop.


The resulting code looks like this:
int arr_sum (int a[], unsigned int len) {
int b = 0;
#pragma HLS ARRAY_PARTITION variable=a complete
#pragma HLS PIPELINE
for(int i=0; i<len; i++) {
b += a[i];
}
return b;
}
int arr_sum4 (int a[4]) {
return arr_sum(a, 4);
}
C Synthesis
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
v++.bat -c --mode hls --config %PATH%\hls_test\hls_config.cfg --work_dir hls_test
****** v++ v2025.2.1 (64-bit)
**** SW Build 6397637 on 2026-03-13-19:08:23
**** Start of session at: Fri Apr 24 14:40:19 2026
** Copyright 1986-2022 Xilinx, Inc. All Rights Reserved.
** Copyright 2022-2026 Advanced Micro Devices, Inc. All Rights Reserved.
**** HLS Build v2025.2.1 6397637
INFO: [HLS 200-2005] Using work_dir %PATH%/hls_test/hls_test
INFO: [HLS 200-2176] Writing Vitis IDE component file %PATH%/hls_test/hls_test/vitis-comp.json
INFO: [HLS 200-10] Creating and opening component '%PATH%/hls_test/hls_test'.
INFO: [HLS 200-1505] Using default flow_target 'vivado'
Resolution: For help on HLS 200-1505 see docs.amd.com/access/sources/dita/topic?Doc_Version=2025.2%20English&url=ug1448-hls-guidance&resourceid=200-1505.html
INFO: [HLS 200-2174] Applying component config ini file hls_config.cfg
INFO: [HLS 200-1465] Applying config ini 'syn.file=arr_sum.cpp' from hls_config.cfg(11)
INFO: [HLS 200-10] Adding design file '%PATH%/hls_test/arr_sum.cpp' to the project
INFO: [HLS 200-1465] Applying config ini 'syn.file=arr_sum.h' from hls_config.cfg(12)
INFO: [HLS 200-10] Adding design file '%PATH%/hls_test/arr_sum.h' to the project
INFO: [HLS 200-1465] Applying config ini 'tb.file=test_arr_sum.cpp' from hls_config.cfg(13)
INFO: [HLS 200-10] Adding test bench file '%PATH%/hls_test/test_arr_sum.cpp' to the project
INFO: [HLS 200-1465] Applying config ini 'syn.top=arr_sum4' from hls_config.cfg(9)
INFO: [HLS 200-1465] Applying config ini 'flow_target=vivado' from hls_config.cfg(4)
INFO: [HLS 200-1505] Using flow_target 'vivado'
Resolution: For help on HLS 200-1505 see docs.amd.com/access/sources/dita/topic?Doc_Version=2025.2%20English&url=ug1448-hls-guidance&resourceid=200-1505.html
INFO: [HLS 200-1465] Applying config ini 'part=xc7z007sclg400-1' from hls_config.cfg(1)
INFO: [HLS 200-1611] Setting target device to 'xc7z007s-clg400-1'
INFO: [HLS 200-1465] Applying config ini 'clock=5ns' from hls_config.cfg(7)
INFO: [SYN 201-201] Setting up clock 'default' with a period of 5ns.
INFO: [HLS 200-1465] Applying config ini 'clock_uncertainty=25%' from hls_config.cfg(8)
INFO: [SYN 201-201] Setting up clock 'default' with an uncertainty of 1.25ns.
INFO: [HLS 200-1465] Applying config ini 'cosim.rtl=vhdl' from hls_config.cfg(10)
INFO: [HLS 200-1465] Applying config ini 'package.output.format=ip_catalog' from hls_config.cfg(5)
INFO: [HLS 200-2176] Writing Vitis IDE component file %PATH%/hls_test/hls_test/vitis-comp.json
INFO: [HLS 200-111] Finished File checks and directory preparation: CPU user time: 1 seconds. CPU system time: 1 seconds. Elapsed time: 1.808 seconds; current allocated memory: 169.617 MB.
INFO: [HLS 200-2191] C-Synthesis will use clang-16 as the compiler
INFO: [HLS 200-10] Analyzing design file 'arr_sum.cpp' ...
INFO: [HLS 200-111] Finished Source Code Analysis and Preprocessing: CPU user time: 0 seconds. CPU system time: 0 seconds. Elapsed time: 0.833 seconds; current allocated memory: 171.832 MB.
INFO: [HLS 200-777] Using interface defaults for 'Vivado' flow target.
INFO: [HLS 200-1995] There were 18 instructions in the design after the 'Compile/Link' phase of compilation. See the Design Size Report for more details (%PATH%/hls_test/hls_test/hls/syn/report/csynth_design_size.rpt:2)
INFO: [HLS 200-1995] There were 16 instructions in the design after the 'Unroll/Inline (step 1)' phase of compilation. See the Design Size Report for more details (%PATH%/hls_test/hls_test/hls/syn/report/csynth_design_size.rpt:2)
INFO: [HLS 200-1995] There were 13 instructions in the design after the 'Unroll/Inline (step 2)' phase of compilation. See the Design Size Report for more details (%PATH%/hls_test/hls_test/hls/syn/report/csynth_design_size.rpt:2)
INFO: [HLS 200-1995] There were 13 instructions in the design after the 'Unroll/Inline (step 3)' phase of compilation. See the Design Size Report for more details (%PATH%/hls_test/hls_test/hls/syn/report/csynth_design_size.rpt:2)
INFO: [HLS 200-1995] There were 13 instructions in the design after the 'Unroll/Inline (step 4)' phase of compilation. See the Design Size Report for more details (%PATH%/hls_test/hls_test/hls/syn/report/csynth_design_size.rpt:2)
INFO: [HLS 200-1995] There were 10 instructions in the design after the 'Array/Struct (step 1)' phase of compilation. See the Design Size Report for more details (%PATH%/hls_test/hls_test/hls/syn/report/csynth_design_size.rpt:2)
INFO: [HLS 200-1995] There were 10 instructions in the design after the 'Array/Struct (step 2)' phase of compilation. See the Design Size Report for more details (%PATH%/hls_test/hls_test/hls/syn/report/csynth_design_size.rpt:2)
INFO: [HLS 200-1995] There were 10 instructions in the design after the 'Array/Struct (step 3)' phase of compilation. See the Design Size Report for more details (%PATH%/hls_test/hls_test/hls/syn/report/csynth_design_size.rpt:2)
INFO: [HLS 200-1995] There were 10 instructions in the design after the 'Array/Struct (step 4)' phase of compilation. See the Design Size Report for more details (%PATH%/hls_test/hls_test/hls/syn/report/csynth_design_size.rpt:2)
INFO: [HLS 200-1995] There were 10 instructions in the design after the 'Array/Struct (step 5)' phase of compilation. See the Design Size Report for more details (%PATH%/hls_test/hls_test/hls/syn/report/csynth_design_size.rpt:2)
INFO: [HLS 200-1995] There were 10 instructions in the design after the 'Performance (step 1)' phase of compilation. See the Design Size Report for more details (%PATH%/hls_test/hls_test/hls/syn/report/csynth_design_size.rpt:2)
INFO: [HLS 200-1995] There were 10 instructions in the design after the 'Performance (step 2)' phase of compilation. See the Design Size Report for more details (%PATH%/hls_test/hls_test/hls/syn/report/csynth_design_size.rpt:2)
INFO: [HLS 200-1995] There were 10 instructions in the design after the 'Performance (step 3)' phase of compilation. See the Design Size Report for more details (%PATH%/hls_test/hls_test/hls/syn/report/csynth_design_size.rpt:2)
INFO: [HLS 200-1995] There were 10 instructions in the design after the 'Performance (step 4)' phase of compilation. See the Design Size Report for more details (%PATH%/hls_test/hls_test/hls/syn/report/csynth_design_size.rpt:2)
INFO: [HLS 200-1995] There were 18 instructions in the design after the 'HW Transforms (step 1)' phase of compilation. See the Design Size Report for more details (%PATH%/hls_test/hls_test/hls/syn/report/csynth_design_size.rpt:2)
INFO: [HLS 200-1995] There were 20 instructions in the design after the 'HW Transforms (step 2)' phase of compilation. See the Design Size Report for more details (%PATH%/hls_test/hls_test/hls/syn/report/csynth_design_size.rpt:2)
INFO: [HLS 214-291] Loop 'VITIS_LOOP_6_1' is marked as complete unroll implied by the pipeline pragma (arr_sum.cpp:6:21)
INFO: [HLS 214-186] Unrolling loop 'VITIS_LOOP_6_1' (arr_sum.cpp:6:21) in function 'arr_sum' completely with a factor of 4 (arr_sum.cpp:1:0)
INFO: [HLS 214-248] Applying array_partition to 'a': Complete partitioning on dimension 1. (arr_sum.cpp:13:0)
INFO: [HLS 200-111] Finished Compiling Optimization and Transform: CPU user time: 0 seconds. CPU system time: 0 seconds. Elapsed time: 7.462 seconds; current allocated memory: 173.871 MB.
INFO: [HLS 200-111] Finished Checking Pragmas: CPU user time: 0 seconds. CPU system time: 0 seconds. Elapsed time: 0.002 seconds; current allocated memory: 173.871 MB.
INFO: [HLS 200-10] Starting code transformations ...
INFO: [HLS 200-111] Finished Standard Transforms: CPU user time: 0 seconds. CPU system time: 0 seconds. Elapsed time: 0.009 seconds; current allocated memory: 177.438 MB.
INFO: [HLS 200-10] Checking synthesizability ...
INFO: [HLS 200-111] Finished Checking Synthesizability: CPU user time: 0 seconds. CPU system time: 0 seconds. Elapsed time: 0.004 seconds; current allocated memory: 178.723 MB.
INFO: [XFORM 203-11] Balancing expressions in function 'arr_sum' (arr_sum.cpp:5:5)...3 expression(s) balanced.
INFO: [HLS 200-111] Finished Loop, function and other optimizations: CPU user time: 0 seconds. CPU system time: 0 seconds. Elapsed time: 0.024 seconds; current allocated memory: 198.664 MB.
INFO: [HLS 200-111] Finished Architecture Synthesis: CPU user time: 0 seconds. CPU system time: 0 seconds. Elapsed time: 0.03 seconds; current allocated memory: 200.992 MB.
INFO: [HLS 200-10] Starting hardware synthesis ...
INFO: [HLS 200-10] Synthesizing 'arr_sum4' ...
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [HLS 200-42] -- Implementing module 'arr_sum'
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [SCHED 204-11] Starting scheduling ...
INFO: [SCHED 204-61] Pipelining function 'arr_sum'.
INFO: [HLS 200-1470] Pipelining result : Target II = NA, Final II = 1, Depth = 2, function 'arr_sum'
INFO: [SCHED 204-11] Finished scheduling.
INFO: [HLS 200-111] Finished Scheduling: CPU user time: 0 seconds. CPU system time: 0 seconds. Elapsed time: 0.161 seconds; current allocated memory: 204.645 MB.
INFO: [BIND 205-100] Starting micro-architecture generation ...
INFO: [BIND 205-101] Performing variable lifetime analysis.
INFO: [BIND 205-101] Exploring resource sharing.
INFO: [BIND 205-101] Binding ...
INFO: [BIND 205-100] Finished micro-architecture generation.
INFO: [HLS 200-111] Finished Binding: CPU user time: 0 seconds. CPU system time: 0 seconds. Elapsed time: 0.011 seconds; current allocated memory: 205.664 MB.
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [HLS 200-42] -- Implementing module 'arr_sum4'
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [SCHED 204-11] Starting scheduling ...
INFO: [SCHED 204-11] Finished scheduling.
INFO: [HLS 200-111] Finished Scheduling: CPU user time: 0 seconds. CPU system time: 0 seconds. Elapsed time: 0.014 seconds; current allocated memory: 205.777 MB.
INFO: [BIND 205-100] Starting micro-architecture generation ...
INFO: [BIND 205-101] Performing variable lifetime analysis.
INFO: [BIND 205-101] Exploring resource sharing.
INFO: [BIND 205-101] Binding ...
INFO: [BIND 205-100] Finished micro-architecture generation.
INFO: [HLS 200-111] Finished Binding: CPU user time: 0 seconds. CPU system time: 0 seconds. Elapsed time: 0.011 seconds; current allocated memory: 205.848 MB.
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [HLS 200-10] -- Generating RTL for module 'arr_sum'
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [HLS 200-1030] Apply Unified Pipeline Control on module 'arr_sum' pipeline 'arr_sum' pipeline type 'function pipeline'
INFO: [RTGEN 206-100] Finished creating RTL model for 'arr_sum'.
INFO: [HLS 200-111] Finished Creating RTL model: CPU user time: 0 seconds. CPU system time: 0 seconds. Elapsed time: 0.019 seconds; current allocated memory: 207.254 MB.
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [HLS 200-10] -- Generating RTL for module 'arr_sum4'
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [RTGEN 206-500] Setting interface mode on port 'arr_sum4/a_0' to 'ap_none'.
INFO: [RTGEN 206-500] Setting interface mode on port 'arr_sum4/a_1' to 'ap_none'.
INFO: [RTGEN 206-500] Setting interface mode on port 'arr_sum4/a_2' to 'ap_none'.
INFO: [RTGEN 206-500] Setting interface mode on port 'arr_sum4/a_3' to 'ap_none'.
INFO: [RTGEN 206-500] Setting interface mode on function 'arr_sum4' to 'ap_ctrl_hs'.
INFO: [RTGEN 206-100] Finished creating RTL model for 'arr_sum4'.
INFO: [HLS 200-111] Finished Creating RTL model: CPU user time: 0 seconds. CPU system time: 0 seconds. Elapsed time: 1.727 seconds; current allocated memory: 208.070 MB.
INFO: [HLS 200-111] Finished Generating all RTL models: CPU user time: 1 seconds. CPU system time: 0 seconds. Elapsed time: 0.227 seconds; current allocated memory: 210.551 MB.
INFO: [HLS 200-2225] Wrote inferred directives to file %PATH%/hls_test/hls_test/hls/syn/inferred_directives.ini
INFO: [HLS 200-111] Finished Updating report files: CPU user time: 0 seconds. CPU system time: 0 seconds. Elapsed time: 0.474 seconds; current allocated memory: 212.066 MB.
INFO: [VHDL 208-304] Generating VHDL RTL for arr_sum4.
INFO: [VLOG 209-307] Generating Verilog RTL for arr_sum4.
INFO: [HLS 200-789] **** Estimated Fmax: 391.85 MHz
INFO: [HLS 200-112] Total CPU user time: 2 seconds. Total CPU system time: 1 seconds. Total elapsed time: 12.834 seconds; peak allocated memory: 212.145 MB.
INFO: [v++ 60-791] Total elapsed time: 0h 0m 16s
Synthesis finished successfully, open report.
From the report above (scroll to nearly the bottom), the input array a now forms separate inputs each with an interface mode of ap_none. The block level controls remain ap_ctrl_hs for passing values between blocks, and feels something like AXI-Stream in nature (with a few more signals). The newly created code shows the separated array values on the entity and the removal of the memory interface.
-- ==============================================================
-- Generated by Vitis HLS v2025.2.1
-- Copyright 1986-2022 Xilinx, Inc. All Rights Reserved.
-- Copyright 2022-2026 Advanced Micro Devices, Inc. All Rights Reserved.
-- ==============================================================
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.numeric_std.all;
entity arr_sum4_unrolled is
port (
ap_clk : IN STD_LOGIC;
ap_rst : IN STD_LOGIC;
ap_start : IN STD_LOGIC;
ap_done : OUT STD_LOGIC;
ap_idle : OUT STD_LOGIC;
ap_ready : OUT STD_LOGIC;
a_0 : IN STD_LOGIC_VECTOR (31 downto 0);
a_1 : IN STD_LOGIC_VECTOR (31 downto 0);
a_2 : IN STD_LOGIC_VECTOR (31 downto 0);
a_3 : IN STD_LOGIC_VECTOR (31 downto 0);
ap_return : OUT STD_LOGIC_VECTOR (31 downto 0) );
end;
architecture behav of arr_sum4_unrolled is
attribute DowngradeIPIdentifiedWarnings : STRING;
attribute DowngradeIPIdentifiedWarnings of behav : architecture is "yes";
attribute CORE_GENERATION_INFO : STRING;
attribute CORE_GENERATION_INFO of behav : architecture is
"arr_sum4_arr_sum4,hls_ip_2025_2_1,{HLS_INPUT_TYPE=cxx,HLS_INPUT_FLOAT=0,HLS_INPUT_FIXED=0,HLS_INPUT_PART=xc7z007s-clg400-1,HLS_INPUT_CLOCK=5.000000,HLS_INPUT_ARCH=others,HLS_SYN_CLOCK=2.552000,HLS_SYN_LAT=1,HLS_SYN_TPT=none,HLS_SYN_MEM=0,HLS_SYN_DSP=0,HLS_SYN_FF=68,HLS_SYN_LUT=133,HLS_VERSION=2025_2_1}";
constant ap_const_logic_1 : STD_LOGIC := '1';
constant ap_const_logic_0 : STD_LOGIC := '0';
constant ap_ST_fsm_state1 : STD_LOGIC_VECTOR (1 downto 0) := "01";
constant ap_ST_fsm_state2 : STD_LOGIC_VECTOR (1 downto 0) := "10";
constant ap_const_lv32_0 : STD_LOGIC_VECTOR (31 downto 0) := "00000000000000000000000000000000";
constant ap_const_boolean_1 : BOOLEAN := true;
constant ap_const_lv32_1 : STD_LOGIC_VECTOR (31 downto 0) := "00000000000000000000000000000001";
signal ap_CS_fsm : STD_LOGIC_VECTOR (1 downto 0) := "01";
attribute fsm_encoding : string;
attribute fsm_encoding of ap_CS_fsm : signal is "none";
signal ap_CS_fsm_state1 : STD_LOGIC;
attribute fsm_encoding of ap_CS_fsm_state1 : signal is "none";
signal grp_arr_sum_fu_52_ap_start : STD_LOGIC;
signal grp_arr_sum_fu_52_ap_done : STD_LOGIC;
signal grp_arr_sum_fu_52_ap_idle : STD_LOGIC;
signal grp_arr_sum_fu_52_ap_ready : STD_LOGIC;
signal grp_arr_sum_fu_52_ap_return : STD_LOGIC_VECTOR (31 downto 0);
signal ap_CS_fsm_state2 : STD_LOGIC;
attribute fsm_encoding of ap_CS_fsm_state2 : signal is "none";
signal ap_NS_fsm : STD_LOGIC_VECTOR (1 downto 0);
signal ap_ST_fsm_state1_blk : STD_LOGIC;
signal ap_ST_fsm_state2_blk : STD_LOGIC;
signal ap_ce_reg : STD_LOGIC;
component arr_sum4_arr_sum IS
port (
ap_clk : IN STD_LOGIC;
ap_rst : IN STD_LOGIC;
ap_start : IN STD_LOGIC;
ap_done : OUT STD_LOGIC;
ap_idle : OUT STD_LOGIC;
ap_ready : OUT STD_LOGIC;
a_0_val : IN STD_LOGIC_VECTOR (31 downto 0);
a_1_val : IN STD_LOGIC_VECTOR (31 downto 0);
a_2_val : IN STD_LOGIC_VECTOR (31 downto 0);
a_3_val : IN STD_LOGIC_VECTOR (31 downto 0);
ap_return : OUT STD_LOGIC_VECTOR (31 downto 0) );
end component;
begin
grp_arr_sum_fu_52 : component arr_sum4_arr_sum
port map (
ap_clk => ap_clk,
ap_rst => ap_rst,
ap_start => grp_arr_sum_fu_52_ap_start,
ap_done => grp_arr_sum_fu_52_ap_done,
ap_idle => grp_arr_sum_fu_52_ap_idle,
ap_ready => grp_arr_sum_fu_52_ap_ready,
a_0_val => a_0,
a_1_val => a_1,
a_2_val => a_2,
a_3_val => a_3,
ap_return => grp_arr_sum_fu_52_ap_return);
ap_CS_fsm_assign_proc : process(ap_clk)
begin
if (ap_clk'event and ap_clk = '1') then
if (ap_rst = '1') then
ap_CS_fsm <= ap_ST_fsm_state1;
else
ap_CS_fsm <= ap_NS_fsm;
end if;
end if;
end process;
ap_NS_fsm_assign_proc : process (ap_start, ap_CS_fsm, ap_CS_fsm_state1)
begin
case ap_CS_fsm is
when ap_ST_fsm_state1 =>
if (((ap_start = ap_const_logic_1) and (ap_const_logic_1 = ap_CS_fsm_state1))) then
ap_NS_fsm <= ap_ST_fsm_state2;
else
ap_NS_fsm <= ap_ST_fsm_state1;
end if;
when ap_ST_fsm_state2 =>
ap_NS_fsm <= ap_ST_fsm_state1;
when others =>
ap_NS_fsm <= "XX";
end case;
end process;
ap_CS_fsm_state1 <= ap_CS_fsm(0);
ap_CS_fsm_state2 <= ap_CS_fsm(1);
ap_ST_fsm_state1_blk_assign_proc : process(ap_start)
begin
if ((ap_start = ap_const_logic_0)) then
ap_ST_fsm_state1_blk <= ap_const_logic_1;
else
ap_ST_fsm_state1_blk <= ap_const_logic_0;
end if;
end process;
ap_ST_fsm_state2_blk <= ap_const_logic_0;
ap_done_assign_proc : process(ap_CS_fsm_state2)
begin
if ((ap_const_logic_1 = ap_CS_fsm_state2)) then
ap_done <= ap_const_logic_1;
else
ap_done <= ap_const_logic_0;
end if;
end process;
ap_idle_assign_proc : process(ap_start, ap_CS_fsm_state1)
begin
if (((ap_start = ap_const_logic_0) and (ap_const_logic_1 = ap_CS_fsm_state1))) then
ap_idle <= ap_const_logic_1;
else
ap_idle <= ap_const_logic_0;
end if;
end process;
ap_ready_assign_proc : process(ap_CS_fsm_state2)
begin
if ((ap_const_logic_1 = ap_CS_fsm_state2)) then
ap_ready <= ap_const_logic_1;
else
ap_ready <= ap_const_logic_0;
end if;
end process;
ap_return <= grp_arr_sum_fu_52_ap_return;
grp_arr_sum_fu_52_ap_start_assign_proc : process(ap_start, ap_CS_fsm_state1)
begin
if (((ap_start = ap_const_logic_1) and (ap_const_logic_1 = ap_CS_fsm_state1))) then
grp_arr_sum_fu_52_ap_start <= ap_const_logic_1;
else
grp_arr_sum_fu_52_ap_start <= ap_const_logic_0;
end if;
end process;
end behav;
When elaborated in Vivado, you can see the tree of 3 adders with one stage of pipelining inserted.

The course instructor went on to demonstrate how the packaged IP could be included in a Vivado block diagram of a Zynq design.
Conclusions
Have I been converted to using high level synthesis? As a hardened FPGA designer using VHDL for many years, that's going to take some pursuading. The significant argument is that it is so very much quicker (1000s x faster) to simulate in software than HDL. Then its easy to explore the algorithm design's solution space without significant recoding. I'm not convinced the time savings are that credible when compared to the human thinking time required to hone the original algorithm. Then the directives need to be applied to achieve the desired outcome, and that needs to be compared against up front thinking about how data moves through a design or algorithm and coding the HDL to achieve it directly. FPGA designers might naturally code much simpler interfaces or result passing mechanisms with much fewer handshake signals, e.g. just a "data valid" might suffice. HLS does not feel like a natural way to specify the data flow even if software might be a simpler way to code and test an algorithm implementation. FPGA designers also need to consider a wider design including chip I/O, whether pins or hard Ethernet macros, and (I/O) constraints because in the real world not everyone uses pre-defined development boards. But then software developers now rely on compilers rather than hand assembling machine code, so perhaps that day will come for the majority of chip designers too?