Deriving AXI Crossbar Address Maps
AXI crossbars connect multiple master devices sending out requests, to multiple slave devices serving the requests. Because AXI allows multiple master devices, it is not possible to treat the crossbar hierarchy as a simple tree structure. This blog aims to illustrate how address maps are derived, and a couple of tricks to be aware of, perfectly legal, but might make you look twice at what has been described. Whilst the first example may be obvious, it serves to confirm the notation used. Subsequent examples add the tricks worth knowing, and how to derive the address maps when those tricks are being played. All the examples are contrived for this exercise.
- Standard Tree Example
- Aperture Into A Wider Crossbar
- Splicing Address Space
- Practical Realities
- Generalising the Concepts
Standard Tree Example
/32] -- 0x00000000/22 --> C2[C2
/22] C1 -- 0x00700000/14 --> S1 C1 -- 0x80000000/31 --> C3 C3[C3
/31] -- 0x00240000/18 --> S2 C3 -- 0x30000000/28 --> S3 C3 -- 0x40000000/29 --> S4 C3 -- 0x60000000/29 --> S5
Address space is divided up on a boundary that forms a power of 2. Since there is a constraint on where address space can be sliced, I adopt the / notation used in networking by IP subnets. Hence an address of /8 is 256 individual addresses wide in the range 0x00-0xFF. That address range is then offset, e.g. 0x100/8 would mean the range 0x100-0x1FF.
In the simple tree above, the first crossbar has two "master devices" (M1 & M2) on the "slave interface" of the crossbar and three "slave devices" (C2, S1 and C3) on the "master interface" of the crossbar. I found this nomenclature confusing initially, with NUM_SI in the Xilinx documentation referring to the number of master devices. The number of masters also needs to be known in order to calculate the correct ID_WIDTH, which is the number of bits required to provide a master source address for slave devices to use when responding to requests, but let's curtail that line of enquiry here.
An AXI crossbar passes requests on behalf of master devices via a table lookup. The table identifies which AXI slave port to send the request out on via an address lookup. In the above example crossbar C1 has a table with three entries, one each for C2, S1 and C3, and the address of the request is matched against the address ranges for each slave device to select the output port. (Remember these are slave devices on the "master interface" when you read Xilinx documentation.)
Moving down the hierarchy to crossbar C3 with slaves 2-5 (S2, S3, S4, S5), each Local Table Entry in C3 will be offset by the base address of the port output C3 from C1. C3 starts at address 0x80000000 and spans the upper half of the global address space (31 bits). Any request from M1 or M2 addressed to 0x80000000 or above will traverse through to C3 for onward routing. Therefore the base addresses of each slave port are offset by 0x80000000 in addition to their local offset e.g. 0x30000000 for S3. The final global address for S3 is therefore 0xB0000000 as shown in the fully worked address table below.
Slave | C1 | |||||
---|---|---|---|---|---|---|
Local Table Entry | 0x00000000/32 | |||||
Global Address Range | 0x00000000 - 0xFFFFFFFF | |||||
Slave | C2 | S1 | C3 | |||
Local Table Entry | 0x00000000/22 | 0x00700000/14 | 0x80000000/31 | |||
Global Address Range | 0x00000000 - 0x003FFFFF | 0x00700000 - 0x00703FFF | 0x80000000 - 0xFFFFFFFF | |||
Slave | S2 | S3 | S4 | S5 | ||
Local Table Entry | 0x00240000/18 | 0x30000000/28 | 0x40000000/29 | 0x60000000/29 | ||
Global Address Range | 0x80240000 - 0x8027FFFF | 0xB0000000 - 0xBFFFFFFF | 0xC0000000 - 0xDFFFFFFF | 0xE0000000 - 0xFFFFFFFF |
Aperture Into A Wider Crossbar
/32] -- 0x80000000/20 --> C2 C2[C2
/20] -- 0x00000000/18 --> S2 C2 -- 0x00080000/19 --> C3 M3 --> C3 C3[C3
/32] -- 0x00000000/30 --> S3 C3 -- 0x80000000/30 --> S4 classDef green fill:#9e9,stroke:#080; class C2 green classDef red fill:#f99,stroke:#800; class C3 red class M3 red
The passage from C2 to C3 above sees a path widening from /20 to /32. My understanding is that M1 and M2 will only ever see a /20 sub-range of C3's full address space, whilst M3 will see the full /32 range. This introduces the concept of "aperture". A 32-bit address in the range 0x00080000/19 from M1 will reach C3 and pass to the port with S4. In theory, only a limited range of S4's address space will be visible. By contrast M3 will see all the address space for both S3 and S4. This means that the address map that is visible depends on your starting point (master device) or vantage point. Separate address maps need to be drawn up for each vantage point. Note that master devices M1 and M2 share the same vantage point.
Slave | C1 | |||
---|---|---|---|---|
Local Table Entry | 0x00000000/32 | |||
Global Address Range | 0x00000000 - 0xFFFFFFFF | |||
Slave | S1 | C2 | ||
Local Table Entry | 0x00000000/12 | 0x80000000/20 | ||
Global Address Range | 0x00000000 - 0x00000FFF | 0x80000000 - 0x800FFFFF | ||
Slave | S2 | C3 | ||
Local Table Entry | 0x00000000/18 |
M3: 0x00000000/32 C2: 0x00080000/19 |
||
Global Address Range | 0x80000000 - 0x8003FFFF |
M3: 0x00000000 - 0xFFFFFFFF C2: 0x80080000 - 0x800FFFFF |
||
Slave | S3 | S4 | ||
Local Table Entry | 0x00000000/30 | 0x80000000/30 | ||
Global Address Range | 0x00000000 - 0x3FFFFFFF |
M3: 0x80000000 - 0xBFFFFFFF C2: 0x80080000 - 0x800FFFFF |
The example above is the regular example reverting to the full address width. What happens if your design opens up to a /30 instead of a /32 address range, i.e. not the entire range available from the address? In the /30 example, how do 32-bit addresses map? The crossbar would cover a quarter of the available space, so which quarter are the master addresses mapped to? I don't know if this is a realistic proposition or a toy problem, but let's go with it.
2-bits of address space at the most significant end will be ignored as the least significant end must be anchored says common sense. The unused 2-bits will still be passed down to a slave port, just not used for routing. So I favour ripping the bottom 30 bits off the incoming 32-bit address and performing the routing table lookup with those. The top two bits will then be included in any subsequent /32 crossbar routing decisions on that slave. In effect, as the top two bits are ignored here, the /30 could be considered to be replicated four times, once in each quarter of the /32 address space.
Splicing Address Space
Now we've looked at what happens when a narrow width crossbar meets a wider one, there's an interesting trick to play with this understanding. Here though, we're not widening the crossbar width, just not narrowing it, so going from /32 to /32, along a constrained width aperture of /28 or /31.
/32] -- 0x30000000/28 --> C2[C2
/32] C1 -- 0x80000000/31 --> C2 M2 --> C2 C2 -- 0x30000000/18 --> S2 C2 -- 0x80000000/30 --> S3 C2 -- 0xC0000000/30 --> S4 classDef red fill:#f99,stroke:#800; class C2 red
Consider M1 needs to address slave S1, but wants more than half the remaining address space to be allocated to another crossbar, C2. As crossbars may have multiple masters, there is nothing stopping two of the masters being the same crossbar, C1, just two different slave ports, each with their own address range.
Slave | C1 | ||||
---|---|---|---|---|---|
Local Table Entry | 0x00000000/32 | ||||
Global Address Range | 0x00000000 - 0xFFFFFFFF | ||||
Slave | S1 | C2 | C2 | ||
Local Table Entry | 0x10000000/14 | 0x30000000/28 | 0x80000000/31 | ||
Global Address Range | 0x10000000 - 0x10003FFF | 0x30000000 - 0x3FFFFFFF | 0x80000000 - 0xFFFFFFFF | ||
Slave | S2 | S3 | S4 | ||
Local Table Entry | 0x30000000/18 | 0x80000000/30 | 0xC0000000/30 | ||
Global Address Range | 0x30000000 - 0x3003FFFF | 0x80000000 - 0xBFFFFFFF | 0xC0000000 - 0xFFFFFFFF |
The concept of "aperture" works here. If the request address is in the range 0x30000000/28, it will exit via a different port on C1 by comparison to when the address is in the range 0x80000000/31. Both will reach the same slave and address routing will continue. Another reason to separate the address ranges like this could be to partition the bandwidth usage, keep M2's traffic local to C2, and keeping C1's bandwidth for S1 alone.
Practical Realities
Having shown the derivation of what address space can be seen from different points of view, there is also a reality from a practical implementation point of view. Just because address space should not be viewable through various apertures, does not mean it isn't. It is possible for a master to send requests to a slave address that should not be addressable and to get a response. Somehow the request leaks through the various apertures, I assume only when nothing else correctly decodes the address first. There seems to be no "security" of the address space. This can lead to a reliance on addressing slave devices known to be in the address map of one master, from a different master which should not be able to see the slave device. As the transaction works this mistaken understanding of what is supposed to be visible is not detected until the address map is reconfigured. AMBA AXI and ACE Protocol Specification (H.c) does talk about protected and privileged access, but only in a referential and sterile way with no real explanation of 'what' and 'why' that you might hope to find in a accompanying tutorial.
Generalising the Concepts
The "aperture" can be considered to be the visible address space from any port. Noting that the port's aperture might be wider than the slave device responds to. Then the aperture gets restricted to the smallest range. At each crossbar, typically the aperture is further divided, but we have also seen the address space broaden. The crossbar might server a broader address space to other local master devices, but the present aperture from the currently considered master does not broaden. The aperture for address routing can only ever stay the same or get smaller until the request reaches a slave device that services the request and responds. Splicing in additional address map sections does not increase the size of an aperture, it adds a new aperture for additional routing.
Each master device can potentially have its own address map. There is no guarantee that there is only one single true address map unless a simple tree-like structure is constructed with all masters at the apex. Once request serving master devices get sprinkled around the hierarchy, best group them by vantage point or let them each have their own individual map derived even if some are similar or the same.