A faster way, don't use just for a few bytes:
hl = initial address
a = value
bc = length / 2 (so, don't use this routine to fill an odd number of bytes
)
Code:
di ; NO INTERRUPTS ALLOWED INSIDE THIS CODE
ld [STACK],SP ; Saves Stack Pointer value
add hl,bc ; Go to the last address
add hl,bc ; Useless if you set initially the correct value on hl
ld SP,hl ; Stack Pointer -> Next byte to the desired fill area
ld d,a ; d = a
ld e,a ; de = aa
@@loop:
push de ; two bytes filled, SP decremented
cpi ; A-(hl) , inc hl (useless), dec bc (Parity flag set until BC=0)
jp PE,@@loop ; close loop
ld SP,[STACK] ; restore Stack (don't forget it!)
ei ; Enable interrupts (also, don't forget it!)
ret ; return
PUSH = 11 cycles
CPI = 16 cycles
JP PE = 10 cycles
Total = 37 cycles each 2 bytes
Using ldir = 21 cycles each byte -> 42 cycles each 2 bytes
Not many, but, of course, you can unroll the loop to avoid executing CPI and JP PE each PUSH... so you can gain more extra cycles