Instruction encoding

Detailed bytecode structure for all instructions (currently - from Cairo). Proving of program considerations.

In our processor architecture instructions have variable length: they may take one or two words. Two-word instructions are called immediate and can be distinguished by the value of bits 50-52 of the first word being 000 - it means that next word within instruction will be used as immediate value. One-word instructions are called relative in our terminology. Instruction size will be referred later as insn_size and is equal to 1 for one-word instructions and 2 for two-word instructions.

DST_OUT is a 1-bit field, determining if res should be written to *dst

0: res would not be written
1: res would be written, *dst <- res

AP_MODE is a 2-bit field, representing ruleset for updating AP register during instruction execution:

00: Same, AP <- AP.
01: Increment AP, AP <- AP + 1.
10: Increment AP by res, AP <- AP + res.

PC_MODE is a 3-bit field, representing ruleset for updating PC register during instruction execution:

000: PC moving to the next instruction, PC <- PC + insn_size.
001: Absolute jump, PC <- res.
010: Relative jump, PC <- PC + res.
100: Conditional relative jump, PC <- PC + insn_size if *dst == 0, PC <- PC + *op1 otherwise

OPCODE is a 4-bit field, representing primary operation of instruction and, therefore, main constraint it generates:

0000: ADD, res = *op0 + *op1. ADD and SUB instructions will use this opcode, as constraint is working both ways.
0001: MUL, res = *op0 * *op1. MUL and DIV instructions will use this opcode, as constraint is working both ways.
0010: CALL, *ap = SP; *(AP + 1) = PC; AP <- AP + 2; SP <- AP + 2. CALL, CALL ABS, CALL REL instructions will use this opcode, each with corresponding PC_MODE
0100: RET, SP <- *dst
1000: MOV, res = *op

OP0_MODE is a 1-bit field, representing addressing mode for op0:

0: use SP as the offset base.
1: use AP as the offset base.

OP1_MODE is a 3-bit field, representing addressing mode for op1:

000: use PC as the offset base. This instruction is immediate and therefore is 2 words long.
001: use SP as the offset base.
010: use AP as the offset base.
100: user *op0 as the offset base.

DST_MODE is a 1-bit field, representing addressing mode for dst:

0: use SP as the offset base.
1: use AP as the offset base.

DUMMY_BIT is a 1-bit field that shows if the current instruction is a dummy instruction.

0: Normal instruction.
1: Dummy instruction. For dummy instructions all bits except LSB and DUMMY_BIT should be zero. Normally, compiler should not explicitly use dummy instructions, which are needed for public memory mechanism and trace padding.

DST is a 16-bit field, containing biased-representation of a memory address offset for the destination operand. Address meaning depends on the value of DST_MODE, but technically any value is permitted here and should not cause Undefined Behavior.

OP1 is a 16-bit field, containing biased-representation of a memory address offset for the second operand. Address meaning depends on the value of OP1_MODE, but technically any value is permitted here and should not cause Undefined Behavior.

OP0 is a 16-bit field, containing biased-representation of a memory address offset for the first operand. Address meaning depends on the value of OP0_MODE, but technically any value is permitted here and should not cause Undefined Behavior.

Any field value not listed before this point should cause Undefined Behavior.

The first word of every instruction have following field structure with LE encoding (i.e. bits 0-15 means the 16 LSB of the word):

Bits 00-15: OP0
Bits 16-31: OP1
Bits 32-47: DST
Bits 48-48: DUMMY_BIT
Bits 49-49: DST_MODE
Bits 50-52: OP1_MODE
Bits 53-53: OP0_MODE
Bits 54-57: OPCODE
Bits 58-60: PC_MODE
Bits 61-62: AP_MODE
Bits 63-63: DST_OUT

Note that DUMMY_BIT is placed in a very particular position to make instruction always fit the modulus of Goldilock field.

We will need two dummy instructions: Dummy Access Instruction and Dummy Padding Instruction. Neither of these instructions are included in the bytecode. These instructions don’t have to be “unpacked”, as they are not a part of the program itself. They are added by prover and more like auxiliary rows added to the trace.

Dummy Access Instruction is a one-word instruction having only DUMMY_BIT and LSB set as 1, other fields must be 0. It does nothing and enforces no state transitions other than:

addresses in PC, OP0, OP1, and DST are zero.
values at PC, OP0, OP1, and DST are zero.
if the previous instruction is dummy, res <- res, else res <- pc.

This allows us to add dummy (0, 0) access for the public memory trace (i.e. (addr, value) pairs to be shared with verifier) to the un-sorted memory trace while still including the actual (addr, value) pairs in sorted memory trace.

Dummy Padding Instruction is a one-word instruction having only DUMMY_BIT set as 1, other fields must be 0. It does absolutely nothing and enforces that all registers values stay exactly the same as after the previous instruction. It is used to pad the trace so the number of rows is a power of two.

PreviousInstruction set NextHigh-level frontend

Last updated 1 year ago