Contiguous store of bytes from multiple strided vectors (immediate index)
Contiguous store of bytes from elements of two or four strided vector registers to the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.
Inactive elements are not written to memory.
It has encodings from 2 classes: Two registers and Four registers
31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | imm4 | 0 | 0 | 0 | PNg | Rn | T | 0 | Zt | |||||||||||
msz<1> | msz<0> | N |
if !HaveSME2() then UNDEFINED; integer n = UInt(Rn); integer g = UInt('1':PNg); constant integer nreg = 2; integer tstride = 8; integer t = UInt(T:'0':Zt); constant integer esize = 8; integer offset = SInt(imm4);
31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | imm4 | 1 | 0 | 0 | PNg | Rn | T | 0 | 0 | Zt | ||||||||||
msz<1> | msz<0> | N |
if !HaveSME2() then UNDEFINED; integer n = UInt(Rn); integer g = UInt('1':PNg); constant integer nreg = 4; integer tstride = 4; integer t = UInt(T:'00':Zt); constant integer esize = 8; integer offset = SInt(imm4);
<Zt3> |
Is the name of the third scalable vector register Z8-Z11 or Z24-Z27 to be transferred, encoded as "T:'10':Zt". |
<Zt4> |
Is the name of the fourth scalable vector register Z12-Z15 or Z28-Z31 to be transferred, encoded as "T:'11':Zt". |
<PNg> |
Is the name of the governing scalable predicate register P8-P15, with predicate-as-counter encoding, encoded in the "PNg" field. |
<Xn|SP> |
Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field. |
CheckStreamingSVEEnabled(); constant integer VL = CurrentVL; constant integer PL = VL DIV 8; constant integer elements = VL DIV esize; constant integer mbytes = esize DIV 8; bits(64) base; bits(VL) src; bits(PL) pred = P[g, PL]; bits(PL * nreg) mask = CounterToPredicate(pred<15:0>, PL * nreg); boolean contiguous = TRUE; boolean nontemporal = FALSE; boolean tagchecked = n != 31; AccessDescriptor accdesc = CreateAccDescSVE(MemOp_STORE, nontemporal, contiguous, tagchecked); if !AnyActiveElement(mask, esize) then if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then CheckSPAlignment(); else if n == 31 then CheckSPAlignment(); base = if n == 31 then SP[] else X[n, 64]; for r = 0 to nreg-1 src = Z[t, VL]; for e = 0 to elements-1 if ElemP[mask, r * elements + e, esize] == '1' then bits(64) addr = base + (offset * nreg * elements + r * elements + e) * mbytes; Mem[addr, mbytes, accdesc] = Elem[src, e, esize]; t = t + tstride;
Internal version only: isa v33.53, AdvSIMD v29.11, pseudocode v2022-09_rel, sve v2022-09_rel ; Build timestamp: 2022-09-30T16:37
Copyright © 2010-2022 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.