关于AES加解密中CBC模式的IV初始化向量的安全性问题

前段时间,在研究HLS的AES加密,由于一个地方电视台的HLS流有AES加密,在查看了相关的加解密方案后发现使用的是简单的AES的CBC模式,在CBC的模式下,会设置一个IV,初始化向量。但是我在解密的时候,使用了一个由于理解错误而产生的一个错误IV居然也能解密视频并进行播放,于是就有了这篇张文章。

AES五种加密模式(CBC、ECB、CTR、OCF、CFB)

虽然有五种加密,但是常用的还是CBC,CBC的全称Cipher Block Chaining ,有点类似于区块链哈,我们先来看下加密方式

CBC加密

上面的图片从左往右看,初始化IV只有在第一个块加密的时候才会用到,而第N个块的加密IV则是用的N-1(N>1)个加密后的二进制数组。

CBC解密

CBC的解密则也是从左往右看,但是加密时IV在解密时候,只会用于对第一个块进行解密,其他块的解密则是使用上一块的加密二进制作为IV进行解密操作。

加密时,用iv和key去加密第一个块,然后用第一个块的加密数据作为下一个块的iv,依次迭代。解密时,用n-1个块的加密数据作为iv和key去解密第n个块(n>1),只有第一个块才会用加密时的iv去解密第一个块。按照这样的逻辑来看,那么如果解密时,使用了iv错误,出问题的数据应该只有第一个块。

通过上面的分析就能知道,加密的时候,iv会影响所有数据的加密结果,而解密时,iv只会影响第一个加密块的解密结果,其他块的解密可以直接通过分隔加密数据获取正确是N-1块的IV。

这也就印证了为什么ff能播放我用错误的iv解密出来的视频流数据,因为第一块的大小存储大概是id3 的数据,ff直接丢掉id3的数据,直接解码后面的视频数据 ,ff 应该是识别h264编码头开始解码视频。

那么从这点可以看出,在使用了key的情况下,IV对整个数据的保密性没有太大的作用。

再来说说我是怎么发现这个问题,因为错误的IV只会影响第一个块的数据,我在第一次尝试解密后,直接用ff进行播放,ff能够解码并且播放成功,但是相对于其他未加密的HLS流来说,播放我解密后的数据会有3-5s的延时,这让我很是头疼,后来我就通过ffplay -v trace 打印播放解密的日志,发现视频数据的id3信息丢失了,那么出问题出在第一个块。然后再次研究m3u8加解密IV的赋值方法,后发来经过多次试验,正确赋值IV,解密出来的数据能够及时播放。再后来,就出现了这篇文章的来由。

由此带来的延展,针对iv在CBC模式下的弱用,AES提供了更多的模式可供选择,详细的可以到wiki上了解。

https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation

From: https://www.jianshu.com/p/45848dd484a9

AS directives: .frame, .mask and .fmask

   high memory   +------------------------+
                 |          ....          |
                 |          ....          |
virtual frame    |       argument n       |
pointer $(fp) ->/+------------------------+\
               / |                        | \
  frame offset   |   local & temporaries  |  \
               \ |                        |   \
                \+------------------------+    \
                 |     saved registers    |      frame size
                 |  (including returnreg) |    /
                 +------------------------+   /
                 |          ....          |  /
   stack         |     argument build     | /
pointer $(sp) -> +------------------------+/
 (framereg)      |          ....          |
                 |          ....          |
                 |                        |
                 |                        |
                 |                        |
    low memory   +------------------------+

.frame framereg, framesize, returnreg
The virtual frame pointer is a frame pointer as used in other compiler systems but has no register allocated for it. It consists of the framereg ($sp, in most cases) added to the framesize. The returnreg specifies the register containing the return address (usually $ra).

.mask bitmask, frameoffset
The .mask directive specifies the registers to be stored and where they are stored. A bit should be on in bitmask for each register saved (for example, if register $31 is saved, bit 31 should be ‘1’ in bitmask. Bits are set in bitmask in little-endian order, even if the machine configuration is big-endian).The frameoffset is the offset from the virtual frame pointer (this number is usually negative).

.fmask bitmask, frameoffset
Notice that saving floating-point registers is identical to saving general registers except we use the .fmask pseudo-op instead of .mask, and the stores are of floating-point singles or doubles.

        .text
        .cfi_sections   .debug_frame
        .align  2
        .globl  main
        .cfi_startproc
        .ent    main
        .type   main, @function
main:
        .frame  $fp,32,$31 # vars= 16, regs= 1/0, args= 0, gp= 0
        .mask   0x40000000,-8
        .fmask  0x00000000,0
        .set    noreorder
        .set    nomacro
        daddiu  $sp,$sp,-32
        .cfi_def_cfa_offset 32
        sd      $fp,24($sp)
        .cfi_offset 30, -8
        move    $fp,$sp
        .cfi_def_cfa_register 30
        move    $2,$4
        sd      $5,8($fp)
        sll     $2,$2,0
        sw      $2,0($fp)
        .loc 1 6 12
        move    $2,$0
        .loc 1 7 1
        move    $sp,$fp
        .cfi_def_cfa_register 29
        ld      $fp,24($sp)
        daddiu  $sp,$sp,32
        .cfi_restore 30
        .cfi_def_cfa_offset 0
        jr      $31
        nop
        .set    macro
        .set    reorder
        .end    main
        .cfi_endproc
        .size   main, .-main

References
From: http://www.cs.unibo.it/~solmi/teaching/arch_2002-2003/AssemblyLanguageProgDoc.pdf

CFI directives in assembly files

(This post uses x86-64 for illustration throughout. The fundamentals are similar for other platforms but will need some translation that I don’t cover here.)

Despite compilers getting better over time, it’s still the case that hand-written assembly can be worthwhile for certain hot-spots. Sometimes there are special CPU instructions for the thing that you’re trying to do, sometimes you need detailed control of the resulting code and, to some extent, it remains possible for some people to out-optimise a compiler.

But hand-written assembly doesn’t automatically get some of the things that the compiler generates for normal code, such as debugging information. Perhaps your assembly code never crashes (although any function that takes a pointer can suffer from bugs in other code) but you probably still care about accurate profiling information. In order for debuggers to walk up the stack in a core file, or for profilers to correctly account for CPU time, they need be able to unwind call frames.

Unwinding used to be easy as every function would have a standard prologue:

push rbp
mov rbp, rsp

This would make the stack look like this (remember that stacks grow downwards in memory):

Caller’s stackRSP value before CALLRSP at function entryCaller’s RBPCallee’s local variablesSaved RIP(pushed by CALL)RBP always points here

So, upon entry to a function, the CALL instruction that jumped to the function in question will have pushed the previous program counter (from the RIP register) onto the stack. Then the function prologue saves the current value of RBP on the stack and copies the current value of the stack pointer into RBP. From this point until the function is complete, RBP won’t be touched.

This makes stack unwinding easy because RBP always points to the call frame for the current function. That gets you the saved address of the parent call and the saved value of its RBP and so on.

The problems with this scheme are that a) the function prologue can be excessive for small functions and b) we would like to be able to use RBP as a general purpose register to avoid spills. Which is why the GCC documentation says that “-O also turns on -fomit-frame-pointer on machines where doing so does not interfere with debugging”. This means that you can’t depend on being able to unwind stacks like this. A process can be comprised of various shared libraries, any of which might be compiled with optimisations.

To be able to unwind the stack without depending on this convention, additional debugging tables are needed. The compiler will generate these automatically (when asked) for code that it generates, but it’s something that we need to worry about when writing assembly functions ourselves if we want profilers and debuggers to work.

The reference for the assembly directives that we’ll need is here, but they are very lightly documented. You can understand more by reading the DWARF spec, which documents the data that is being generated. Specifically see sections 6.4 and D.6. But I’ll try to tie the two together in this post.

The tables that we need the assembler to emit for us are called Call Frame Information (CFI). (Not to be confused with Control Flow Integrity, which is very different.) Based on that name, all the assembler directives begin with .cfi_.

Next we need to define the Canonical Frame Address (CFA). This is the value of the stack pointer just before the CALL instruction in the parent function. In the diagram above, it’s the value indicated by “RSP value before CALL”. Our first task will be to define data that allows the CFA to be calculated for any given instruction.

The CFI tables allow the CFA to be expressed as a register value plus an offset. For example, immediately upon function entry the CFA is RSP + 8. (The eight byte offset is because the CALL instruction will have pushed the previous RIP on the stack.)

As the function executes, however, the expression will probably change. If nothing else, after pushing a value onto the stack we would need to increase the offset.

So one design for the CFI table would be to store a (register, offset) pair for every instruction. Conceptually that’s what we do but, to save space, only changes from instruction to instruction are stored.

It’s time for an example, so here’s a trivial assembly function that includes CFI directives and a running commentary.

  .globl  square
  .type   square,@function
  .hidden square
square:

This is a standard preamble for a function that’s unrelated to CFI. Your assembly code should already be full of this.

  .cfi_startproc

Our first CFI directive. This is needed at the start of every annotated function. It causes a new CFI table for this function to be initialised.

  .cfi_def_cfa rsp, 8

This is defining the CFA expression as a register plus offset. One of the things that you’ll see compilers do is express the registers as numbers rather than names. But, at least with GAS, you can write names. (I’ve included a table of DWARF register numbers and names below in case you need it.)

Getting back to the directive, this is just specifying what I discussed above: on entry to a function, the CFA is at RSP + 8.

  push    rbp
  .cfi_def_cfa rsp, 16

After pushing something to the stack, the value of RSP will have changed so we need to update the CFA expression. It’s now RSP + 16, to account for the eight bytes we pushed.

  mov     rbp, rsp
  .cfi_def_cfa rbp, 16

This function happens to have a standard prologue, so we’ll save the frame pointer in RBP, following the old convention. Thus, for the rest of the function we can define the CFA as RBP + 16 and manipulate the stack without having to worry about it again.

  mov     DWORD PTR [rbp-4], edi
  mov     eax, DWORD PTR [rbp-4]
  imul    eax, DWORD PTR [rbp-4]
  pop     rbp
  .cfi_def_cfa rsp, 8

We’re getting ready to return from this function and, after restoring RBP from the stack, the old CFA expression is invalid because the value of RBP has changed. So we define it as RSP + 8 again.

  ret
  .cfi_endproc

At the end of the function we need to trigger the CFI table to be emitted. (It’s an error if a CFI table is left open at the end of the file.)

The CFI tables for an object file can be dumped with objdump -W and, if you do that for the example above, you’ll see two tables: something called a CIE and something called an FDE.

The CIE (Common Information Entry) table contains information common to all functions and it’s worth taking a look at it:

… CIE
  Version:         1
  Augmentation:    "zR"
  Code alignment factor: 1
  Data alignment factor: -8
  Return address column: 16
  Augmentation data:     1b

  DW_CFA_def_cfa: r7 (rsp) ofs 8
  DW_CFA_offset: r16 (rip) at cfa-8

You can ignore everything until the DW_CFA_… lines at the end. They define CFI directives that are common to all functions (that reference this CIE). The first is saying that the CFA is at RSP + 8, which is what we had already defined at function entry. This means that you don’t need a CFI directive at the beginning of the function. Basically RSP + 8 is already the default.

The second directive is something that we’ll get to when we discuss saving registers.

If we look at the FDE (Frame Description Entry) for the example function that we defined, we see that it reflects the CFI directives from the assembly:

… FDE cie=…
  DW_CFA_advance_loc: 1 to 0000000000000001
  DW_CFA_def_cfa: r7 (rsp) ofs 16
  DW_CFA_advance_loc: 3 to 0000000000000004
  DW_CFA_def_cfa: r6 (rbp) ofs 16
  DW_CFA_advance_loc: 11 to 000000000000000f
  DW_CFA_def_cfa: r7 (rsp) ofs 8

The FDE describes the range of instructions that it’s valid for and is a series of operations to either update the CFA expression, or to skip over the next n bytes of instructions. Fairly obvious.

Optimisations for CFA directives

There are some shortcuts when writing CFA directives:

Firstly, you can update just the offset, or just the register, with cfi_def_cfa_offset and cfi_def_cfa_register respectively. This not only saves typing in the source file, it saves bytes in the table too.

Secondly, you can update the offset with a relative value using cfi_adjust_cfa_offset. This is useful when pushing lots of values to the stack as the offset will increase by eight each time.

Here’s the example from above, but using these directives and omitting the first directive that we don’t need because of the CIE:

  .globl  square
  .type   square,@function
  .hidden square
square:
  .cfi_startproc
  push    rbp
  .cfi_adjust_cfa_offset 8
  mov     rbp, rsp
  .cfi_def_cfa_register rbp
  mov     DWORD PTR [rbp-4], edi
  mov     eax, DWORD PTR [rbp-4]
  imul    eax, DWORD PTR [rbp-4]
  pop     rbp
  .cfi_def_cfa rsp, 8
  ret
  .cfi_endproc

Saving registers

Consider a profiler that is unwinding the stack after a profiling signal. It calculates the CFA of the active function and, from that, finds the parent function. Now it needs to calculate the parent function’s CFA and, from the CFI tables, discovers that it’s related to RBX. Since RBX is a callee-saved register, that’s reasonable, but the active function might have stomped RBX. So, in order for the unwinding to proceed it needs a way to find where the active function saved the old value of RBX. So there are more CFI directives that let you document where registers have been saved.

Registers can either be saved at an offset from the CFA (i.e. on the stack), or in another register. Most of the time they’ll be saved on the stack though because, if you had a caller-saved register to spare, you would be using it first.

To indicate that a register is saved on the stack, use cfi_offset. In the same example as above (see the stack diagram at the top) the caller’s RBP is saved at CFA – 16 bytes. So, with saved registers annotated too, it would start like this:

square:
  .cfi_startproc
  push    rbp
  .cfi_adjust_cfa_offset 8
  .cfi_offset rbp, -16

If you need to save a register in another register for some reason, see the documentation for cfi_register.

If you get all of that correct then your debugger should be able to unwind crashes correctly, and your profiler should be able to avoid recording lots of detached functions. However, I’m afraid that I don’t know of a better way to test this than to zero RBP, add a crash in the assembly code, and check whether GBD can go up correctly.

(None of this works for Windows. But Per Vognsen, via Twitter, notes that there are similar directives in MASM.)

CFI expressions

New in version three of the DWARF standard are CFI Expressions. These define a stack machine for calculating the CFA value and can be useful when your stack frame is non-standard (which is fairly common in assembly code). However, there’s no assembler support for them that I’ve been able to find, so one has to use cfi_escape and provide the raw DWARF data in a .s file. As an example, see this kernel patch.

Since there’s no assembler support, you’ll need to read section 2.5 of the standard, then search for DW_CFA_def_cfa_expression and, perhaps, search for cfi_directive in OpenSSL’s perlasm script for x86-64 and the places in OpenSSL where that is used. Good luck.

(I suggest testing by adding some instructions that write to NULL in the assembly code and checking that gdb can correctly step up the stack and that info reg shows the correct values for callee-saved registers in the parent frame.)

CFI register numbers

In case you need to use or read the raw register numbers, here they are for a few architectures:

(may be EBP on MacOS) (may be ESP on MacOS)

Register number x86-64 x86 ARM
0 RAX EAX r0
1 RDX ECX r1
2 RCX EDX r2
3 RBX EBX r3
4 RSI ESP r4
5 RDI EBP r5
6 RBP ESI r6
7 RSP EDI r7
8 R8 r8
9 R9 r9
10 R10 r10
11 R11 r11
12 R12 r12
13 R13 r13
14 R14 r14
15 R15 r15
16 RIP

(x86 values taken from page 25 of this doc. x86-64 values from page 57 of this doc. ARM taken from page 7 of this doc.)

References
From: https://www.imperialviolet.org/2017/01/18/cfi.html

CFI – Call Frame Information

.cfi_sections section_list
.cfi_sections may be used to specify whether CFI directives should emit .eh_frame section and/or .debug_frame section. If section_list is .eh_frame, .eh_frame is emitted, if section_list is .debug_frame, .debug_frame is emitted. To emit both use .eh_frame, .debug_frame. The default if this directive is not used is .cfi_sections .eh_frame.

On targets that support compact unwinding tables these can be generated by specifying .eh_frame_entry instead of .eh_frame.

Some targets may support an additional name, such as .c6xabi.exidx which is used by the target.

The .cfi_sections directive can be repeated, with the same or different arguments, provided that CFI generation has not yet started. Once CFI generation has started however the section list is fixed and any attempts to redefine it will result in an error.

.cfi_startproc [simple]
.cfi_startproc is used at the beginning of each function that should have an entry in .eh_frame. It initializes some internal data structures. Don’t forget to close the function by .cfi_endproc.

Unless .cfi_startproc is used along with parameter simple it also emits some architecture dependent initial CFI instructions.

.cfi_endproc
.cfi_endproc is used at the end of a function where it closes its unwind entry previously opened by .cfi_startproc, and emits it to .eh_frame.

.cfi_personality encoding [, exp]
.cfi_personality defines personality routine and its encoding. encoding must be a constant determining how the personality should be encoded. If it is 255 (DW_EH_PE_omit), second argument is not present, otherwise second argument should be a constant or a symbol name. When using indirect encodings, the symbol provided should be the location where personality can be loaded from, not the personality routine itself. The default after .cfi_startproc is .cfi_personality 0xff, no personality routine.

.cfi_personality_id id
.cfi_personality_id defines a personality routine by its index as defined in a compact unwinding format. Only valid when generating compact EH frames (i.e. with .cfi_sections eh_frame_entry.

.cfi_fde_data [opcode1 [, …]]
.cfi_fde_data is used to describe the compact unwind opcodes to be used for the current function. These are emitted inline in the .eh_frame_entry section if small enough and there is no LSDA, or in the .gnu.extab section otherwise. Only valid when generating compact EH frames (i.e. with .cfi_sections eh_frame_entry.

.cfi_lsda encoding [, exp]
.cfi_lsda defines LSDA and its encoding. encoding must be a constant determining how the LSDA should be encoded. If it is 255 (DW_EH_PE_omit), the second argument is not present, otherwise the second argument should be a constant or a symbol name. The default after .cfi_startproc is .cfi_lsda 0xff, meaning that no LSDA is present.

.cfi_inline_lsda [align]
.cfi_inline_lsda marks the start of a LSDA data section and switches to the corresponding .gnu.extab section. Must be preceded by a CFI block containing a .cfi_lsda directive. Only valid when generating compact EH frames (i.e. with .cfi_sections eh_frame_entry.

The table header and unwinding opcodes will be generated at this point, so that they are immediately followed by the LSDA data. The symbol referenced by the .cfi_lsda directive should still be defined in case a fallback FDE based encoding is used. The LSDA data is terminated by a section directive.

The optional align argument specifies the alignment required. The alignment is specified as a power of two, as with the .p2align directive.

.cfi_def_cfa register, offset
.cfi_def_cfa defines a rule for computing CFA as: take address from register and add offset to it.

.cfi_def_cfa_register register
.cfi_def_cfa_register modifies a rule for computing CFA. From now on register will be used instead of the old one. Offset remains the same.

.cfi_def_cfa_offset offset
.cfi_def_cfa_offset modifies a rule for computing CFA. Register remains the same, but offset is new. Note that it is the absolute offset that will be added to a defined register to compute CFA address.

.cfi_adjust_cfa_offset offset
Same as .cfi_def_cfa_offset but offset is a relative value that is added/subtracted from the previous offset.

.cfi_offset register, offset
Previous value of register is saved at offset offset from CFA.

.cfi_val_offset register, offset
Previous value of register is CFA + offset.

.cfi_rel_offset register, offset
Previous value of register is saved at offset offset from the current CFA register. This is transformed to .cfi_offset using the known displacement of the CFA register from the CFA. This is often easier to use, because the number will match the code it’s annotating.

.cfi_register register1, register2
Previous value of register1 is saved in register register2.

.cfi_restore register
.cfi_restore says that the rule for register is now the same as it was at the beginning of the function, after all initial instruction added by .cfi_startproc were executed.

.cfi_undefined register
From now on the previous value of register can’t be restored anymore.

.cfi_same_value register
Current value of register is the same like in the previous frame, i.e. no restoration needed.

.cfi_remember_state and .cfi_restore_state
.cfi_remember_state pushes the set of rules for every register onto an implicit stack, while .cfi_restore_state pops them off the stack and places them in the current row. This is useful for situations where you have multiple .cfi_* directives that need to be undone due to the control flow of the program. For example, we could have something like this (assuming the CFA is the value of rbp):

        je label
        popq %rbx
        .cfi_restore %rbx
        popq %r12
        .cfi_restore %r12
        popq %rbp
        .cfi_restore %rbp
        .cfi_def_cfa %rsp, 8
        ret
label:
        /* Do something else */

Here, we want the .cfi directives to affect only the rows corresponding to the instructions before label. This means we’d have to add multiple .cfi directives after label to recreate the original save locations of the registers, as well as setting the CFA back to the value of rbp. This would be clumsy, and result in a larger binary size. Instead, we can write:

        je label
        popq %rbx
        .cfi_remember_state
        .cfi_restore %rbx
        popq %r12
        .cfi_restore %r12
        popq %rbp
        .cfi_restore %rbp
        .cfi_def_cfa %rsp, 8
        ret
label:
        .cfi_restore_state
        /* Do something else */

That way, the rules for the instructions after label will be the same as before the first .cfi_restore without having to use multiple .cfi directives.

.cfi_return_column register
Change return column register, i.e. the return address is either directly in register or can be accessed by rules for register.

.cfi_signal_frame
Mark current function as signal trampoline.

.cfi_window_save
SPARC register window has been saved.

.cfi_escape expression[, …]
Allows the user to add arbitrary bytes to the unwind info. One might use this to add OS-specific CFI opcodes, or generic CFI opcodes that GAS does not yet support.

.cfi_val_encoded_addr register, encoding, label
The current value of register is label. The value of label will be encoded in the output file according to encoding; see the description of .cfi_personality for details on this encoding.

The usefulness of equating a register to a fixed label is probably limited to the return address register. Here, it can be useful to mark a code segment that has only one return address which is reached by a direct branch and no copy of the return address exists in memory or another register.

Referneces
From: https://sourceware.org/binutils/docs-2.35/as/