The plan here is to simulate your design with synchronous counters which show an easy to read incrementing integer, then swap out the synchronous counter for a faster equivalent Linear Feedback Shift Register (LFSR). This is perhaps a moot point with modern FPGAs being so capable of arithmetic with DSP slices that carry chain propagation delays do not get noticed, but remains relevant in the ASIC world.
When I first started working on VLSI, I discovered that my office had previously employed a summer student to write a C program to spit out VHDL that created LFSR counters to specification, that's how important they were to the local ASIC designs. No thought had been given to the realisation that the solution could be entirely realised in VHDL alone. Absolutely no need for software using the ncurses library. No need to return to the software to request a new LFSR just because you mis-specified the counter's maximal value by one. Instead you can just change a generic value.
This is the basic architecture, very similar to polynomial division modulo 2 as it uses the Galois LFSR "one-to-many" format. The logic in the bottom half of the diagram is a verbose way of performing an equality test (I only had 2-input AND gates). There are authoritative articles across the Internet explaining the mathematical theory behind LFSRs, for example how they don't immediately provide a full range 0..(2n-1) counter for n-bits of state unless additional action is taken to enter and leave the missing state (all '0's). So the theory will be omitted here, the Wikipedia link is a good start for references or skip to Tutorial: Linear Feedback Shift Registers (LFSRs) by Max Maxfield. Here I'll just provide the code that allows a synchronous counter to be safely swapped for an LFSR.
The first aim is to provide two identical (or similar enough) entities that the swap between the two architectures is easy. I find using multiple architectures for a single entity cumbersome, but you are right to point out this is an excellent opportunity here and "just use a configuration" to swap them over. However I think it would be better in this case to use a generic and then generate the code required inside the architecture. The first step though is to work out how to make the LFSR count given it requires a different set of taps (polynomial) for each different length register.
Common Interface
Synchronous Counter
LFSR Counter
I've separated the complication of the supporting functions out into a package to hide complexity and allow re-use of the functions over multiple LFSR counters.
Under the hood, the taps_array has been taken from the table listed in Tutorial: Linear Feedback Shift Registers (LFSRs) by Max Maxfield, who with others has also lifted them from a book "Bebop to the Boolean Boogie (An Unconventional Guide to Electronics)". VHDL constrains all elements of an array to be the same size, so they are all the size of the maximum polynomial, and then bus ripped down to size on use. I have chosen to avoid trying to reach the normally unreachable state on the grounds that for maximum count values of 2n its little cost to add one more bit to the internal counter register. Note that the LFSR can count to values beyond that which you can specify with a VHDL integer generic of type positive, so the 'swapability' of this solution breaks down for count values greater than or equal to 231-1.
Performance
A very rough and ready check on clock speeds after place and route for a Xilinx xc7z007sclg225-2 part using Vivado version 2019.1.1.
'max' Generic
Synchronous
LFSR
Clock Frequency (MHz) post Place
Clock Frequency (MHz) post Route
Clock Frequency (MHz) post Place
Clock Frequency (MHz) post Route
50
671.1
677.0
678.0
685.9
300
509.2
509.2
612.7
610.9
4000
430.1
430.7
661.4
657.0
50000
430.3
410.2
534.5
561.5
As mentioned earlier FPGAs do well for synchronous counters, so they typically get clock speeds in the range 410-670 MHz. Swapping them to LFSRs feels like a pointless exercise to increase the speed of the larger counters from 410 to 560 MHz. As previously said, the situation is different for ASICs.
A word of warning when using Vivado with this code, for large enough maximum count values you will get an error message:
This is just one of several irritating behaviours Vivado exhibits when assigning constants using a function to calculate the value. In order for the LFSR counter to know what the terminal count value is (assigned to a constant), it must iterate a function a "few" times as it can't be predicted. Intel's Quartus Prime does not error out in this way.
I leave it as an exercise for the reader to combine the code for each architecture into a single one with a generic to flip between the two types of counter, e.g. based on whether you are simulating to synthesising.
It is also worth noting an online tool for generating LFSR code, provided by Guy Eschemann at New Generator: Linear-Feedback Shift Registers. Some may consider this more convenient, and easier for teaching purposes.