Implementing aref Operator in SBCL Common Lisp for a Custom Vector Type

If we would like to derive a custom array type, in Common Lisp, we can not inherit from some of the built-in classes. The standard forbids it: "Attempting to use defclass to define subclasses of a built-in-class signals an error. Calling slot-value on a generalized instance of a built-in class signals an error. Redefining a built-in class or using change-class to change the class of an object to or from a built-in class signals an error."

It appears that they did not intend built-in classes to be extended directly but rather to be used via composition. At least I understand it so, correct me if I am wrong. A problem arises when we create our own type which wraps a built-in vector type, but we wish to use our type as a built-in type. Consider implementing another Lisp dialect in Common Lisp. In this dialect, a string data type, which is a subtype of vector type, can have some extra properties. So I have a definition like this:

(defstruct string
 (data :type simple-string)
 (properties :type interval-tree))

One problem is I can't use the built-in element access operator to access the elements directly on an instance of my string type. Instead, I have to access individual characters by peeling out the data via (string-data instance) accessor. Something like this:

(defparameter *s1* (make-string :data "hello"))
(aref (string-data *s1*) 0)

This is doable for a newly written code, but if that particular Lisp dialect is already in use, and there is lots of code that uses the aref operator directly on its strings, then it won't work without pre-processing all files, which I certainly don't want to do. What we want is to be able to use our string type directly with the aref operator as if it were a built-in type:

(aref *s1* 0)

In other words, it is important to make the custom string type look like a built-in type. It is easily done by implementing our own operator. We have to shadow the built-in aref:

(defstruct myarray
 (data #() :type (simple-array fixnum (*))))

(defun aref (array &rest indices)
 (if (myarray-p array)
     (apply #'cl:aref (myarray-data array) indices)
     (apply #'cl:aref array indices)))

Aref is Common Lisp's array subscript operator. If we were to compare it to something more familiar, as array access in C: (aref arr 0) is C's arr[0] or *(arr + 0), whichever notation is your cup of tea. Here I use a custom vector type, instead of string, but the reasoning is the same. A problem with this is, of course, that we are doing branching on each element access. What we want is direct access to elements for performance reasons. If we branch on each element access, we will both do unnecessary checks at runtime and, probably even worse, trash the cache. Below are small tests for the illustration:

(defparameter *a1* (make-myarray
                   :data (cl:make-array 10000000 :element-type 'fixnum)))
(defparameter *a2* (cl:make-array 10000000 :element-type 'fixnum))

(defun t1 ()
 (declare (optimize (speed 3) (safety 0)))
 (let ((a (cl:make-array 10000000 :element-type 'fixnum)))
   (dotimes (i 10000000) (setf (aref a i) i))
   (aref a 0)))

(defun t2 ()
 (declare (optimize (speed 3) (safety 0)))
 (aref *a1* 0))

(defun t3 ()
 (declare (optimize (speed 3) (safety 0)))
 (cl:aref (myarray-data *a1*) 0))

Those tests are just so I have something to compile and disassemble. If we disassemble each of those functions, t2 will be much longer, compared to t1 and t3. I am not going to post it here, because of the length, but check it yourself in the repl, 116 bytes versus 39 bytes on my computer. What we want is for SBCL to emit the same assembly for t2 as for t3:

MYA> (disassemble 't3)
; disassembly for T3
; Size: 39 bytes. Origin: #x1000C17C9B                        ; T3
; 9B:       4184442469       TEST AL, [R12+105]               ; safepoint
; A0:       488B05C9FFFFFF   MOV RAX, [RIP-55]                ; '*A1*
; A7:       8B48F5           MOV ECX, [RAX-11]
; AA:       4A8B0C29         MOV RCX, [RCX+R13]
; AE:       4883F9FF         CMP RCX, -1
; B2:       480F444801       CMOVE RCX, [RAX+1]
; B7:       488B4105         MOV RAX, [RCX+5]
; BB:       488B5001         MOV RDX, [RAX+1]
; BF:       C9               LEAVE
; C0:       F8               CLC
; C1:       C3               RET

One way to do this is to install a compiler macro:

(define-compiler-macro aref (&whole form array &rest indices)
 (declare (ignore form))
 `(if (myarray-p ,array)
      (cl:aref (myarray-data ,array) ,@indices)
      (cl:aref ,array ,@indices)))

so that SBCL can rewrite our aref form before it reaches the rest of the compilation pipeline. Also, we have to tell SBCL our \*a1* is of the myarray type:

(defun t2 ()
 (declare (optimize (speed 3) (safety 0))
          (type myarray *a1*))
 (aref *a1* 0))

If we check the assembly output now:

MYA> (disassemble 't2)
; disassembly for T2
; Size: 39 byte
s. Origin: #x1000D0FB8B                        ; T2
; 8B:       4184442469       TEST AL, [R12+105]               ; safepoint
; 90:       488B05C9FFFFFF   MOV RAX, [RIP-55]                ; '*A1*
; 97:       8B48F5           MOV ECX, [RAX-11]
; 9A:       4A8B0C29         MOV RCX, [RCX+R13]
; 9E:       4883F9FF         CMP RCX, -1
; A2:       480F444801       CMOVE RCX, [RAX+1]
; A7:       488B4105         MOV RAX, [RCX+5]
; AB:       488B5001         MOV RDX, [RAX+1]
; AF:       C9               LEAVE
; B0:       F8               CLC
; B1:       C3               RET

We can also confirm with a simple benchmark:

(defun b1 ()
 (declare (optimize (speed 3) (safety 0)))
 (let ((a (cl:make-array 10000000 :element-type 'fixnum))
       last)
   (dotimes (i 10000000)
     (setf (aref a i) i
           last (aref a i)))
   last))

(defun b2 ()
 (declare (optimize (speed 3) (safety 0)))
 (let (last)
   (dotimes (i 10000000)
     (setf (aref *a1* i) i
           last (aref *a1* i)))
   last))

(defun b3 ()
 (declare (optimize (speed 3) (safety 0))
          (type myarray *a1*))
 (let (last)
   (dotimes (i 10000000)
     (setf (aref *a1* i) i
           last (aref *a1* i)))
   last))

(defun b4 ()
 (declare (optimize (speed 3) (safety 0)))
 (let (last)
   (dotimes (i 10000000)
     (setf (cl:aref (myarray-data *a1*) i) i
           last (cl:aref (myarray-data *a1*) i)))
   last))

While the difference is not big, b2 seems ~0.05 seconds slower:

MYA> (time (b1))
Evaluation took:
 0.046 seconds of real time
 0.046875 seconds of total run time (0.015625 user, 0.031250 system)
 [ Real times consist of 0.022 seconds GC time, and 0.024 seconds non-GC time. ]
 [ Run times consist of 0.015 seconds GC time, and 0.032 seconds non-GC time. ]
 102.17% CPU
 93,611,580 processor cycles
 80,000,016 bytes consed
 
9999999
MYA> (time (b2))
Evaluation took:
 0.018 seconds of real time
 0.015625 seconds of total run time (0.015625 user, 0.000000 system)
 88.89% CPU
 35,847,700 processor cycles
 0 bytes consed
 
9999999
MYA> (time (b3))
Evaluation took:
 0.012 seconds of real time
 0.015625 seconds of total run time (0.000000 user, 0.015625 system)
 133.33% CPU
 25,066,180 processor cycles
 0 bytes consed
 
9999999
MYA> (time (b4))
Evaluation took:
 0.011 seconds of real time
 0.015625 seconds of total run time (0.000000 user, 0.015625 system)
 145.45% CPU
 23,716,520 processor cycles
 0 bytes consed
 
9999999
MYA> (time (b2))
Evaluation took:
 0.017 seconds of real time
 0.031250 seconds of total run time (0.031250 user, 0.000000 system)
 182.35% CPU
 35,233,600 processor cycles
 0 bytes consed
 
9999999

B1 and t1 use lexically defined array, and I am surprised they are more costly than using the special variable. Anyhow, the point here was that the notation used in t2 and b3 gets rewritten into the same notation as in t3 and b4 due to the compiler macro kicking in and SBCL can than access elements of the array directly, i.e. no runtime if-check.

In order to actually use custom vector or string classes, we have to treat other operators involved in array operations the same way as aref. It means to write compiler macros for (setf aref), length, elt, etc. Some automation would be nice to have (a macro), but I haven't gotten that far yet.

I have to admit, while I was originally learning and testing this, I made a mistake and tested directly from the repl. I had the understanding that a compiler macro was a tool for this job, but I couldn't get it to work. Since then I have learned that SBCL does not seem to treat defun defined in the repl, the same way it does when written in a file and compiled. I think I solved this a long time ago, I just didn't know about it :-).

Where I learned it from is actually embarrassing: I learned it from an AI-prompt (Gemini). I have never used AI to code something before, so I thought I would test it today. I taught it to write those compiler macros and asked why they don't kick in. I thought I would need to install ir1 transformations for aref operator, in order for SBCL to optimize away the runtime if-test, however, AI actually gave me four tips about caveats why the compiler macro didn't kick in. One was to test from the repl, and another one was not to use inline on the shadowed aref operator, which I also had. That actually was a correct guess by AI. When I wrote those small tests and benchmarks and put them into the file, it worked.

I love Common Lisp and repl-based development, but there are caveats. This is probably some common knowledge; it is just me not being familiar with it enough.

Anyway, back to the topic, I don't know if other Common Lisp compilers would also respect the compiler macros as SBCL does. I don't see a reason why they wouldn't, but I didn't test with other implementations. I also haven't tested with other operators than aref either, but the difficult part here was to find the mechanism. The rest is the boring parts of rinse and repeat, at least I hope so :).