Serval Site!

The venerable fastbin dup attack is one of several ways to abuse the standard glibc malloc allocator that has been around for pretty much forever. By utilising a double free, an attacker is able to coerce the allocator into returning a (mostly) arbitrary chunk. After years of updates, it has not changed much.

Recently I had been looking at a CTF problem and found that the fastbin attack has an interesting subtlety due interactions with the tcache, which allows for a marginally better primitive.

Mostly, I just thought it was funny!

The Fastbin Dup

Recall the malloc_chunk structure:

struct malloc_chunk {
  INTERNAL_SIZE_T      mchunk_prev_size;  /* Size of previous chunk (if free).  */
  INTERNAL_SIZE_T      mchunk_size;       /* Size in bytes, including overhead. */

  struct malloc_chunk* fd;         /* double links -- used only if free. */
  struct malloc_chunk* bk;

  /* Only used for large blocks: pointer to next larger size.  */
  struct malloc_chunk* fd_nextsize; /* double links -- used only if free. */
  struct malloc_chunk* bk_nextsize;
};

Source: https://elixir.bootlin.com/glibc/glibc-2.37/source/malloc/malloc.c#L1130

In principle, the fastbin attack relies on a double free to insert the same chunk into the fastbin freelist twice:

void *A = malloc(8);
free(A); // Fastbin: A -> 0
free(A); // Fastbin: A -> A -> ...

Since the fastbin is a singly-linked list, the fd pointer of the the A chunk can be overwritten with an allocation, which (again in principle) causes the next allocation to return a controlled address:

// Fastbin: A -> A -> ...
size_t *B = malloc(8);
*B = (size_t)0xdeadbee0;

// Fastbin: 0xdeadbee0 -> ...
malloc(8); // returns chunk at 0xdeadbee0

Since then, a few mitigations have been put in place against such an attack (we only name those relevant for this article):

The same chunk cannot be freed into the same fastbin twice in a row, done by comparing with the current top of the fastbin:

static void _int_free(mstate av, mchunkptr p, int have_lock) {
  ...
      /* Check that the top of the bin is not the record we are going to
         add (i.e., double free).  */
      if (__builtin_expect (old == p, 0))
        malloc_printerr ("double free or corruption (fasttop)");
      old2 = old;
      p->fd = PROTECT_PTR (&p->fd, old);
  ...

Source: https://elixir.bootlin.com/glibc/glibc-2.37/source/malloc/malloc.c#L4532

This is easily bypassed by inserting a free of a second chunk in-between the two frees of the same chunk.

The PROTECT_PTR macro is used to encode the fd pointer with the address of the pointer itself:

#define PROTECT_PTR(pos, ptr) \
  ((__typeof (ptr)) ((((size_t) pos) >> 12) ^ ((size_t) ptr)))
#define REVEAL_PTR(ptr)  PROTECT_PTR (&ptr, ptr)

Source: https://elixir.bootlin.com/glibc/glibc-2.37/source/malloc/malloc.c#L328

Chunks allocated from the fastbin are checked to be aligned to 0x10 bytes (for 64-bit systems). See later listings for examples.
Chunks allocated from the fastbin are checked to have the correct size as the bucket they are linked into. See later listings for examples.

Bypassing this usually requires a heap leak in order to determine the pointer's address.

The Thread-Local Cache

In glibc 2.27, the tcache (thread-local cache) was introduced and is compiled in by default for any glibc with multithreading enabled. Like the fastbins, it is a bucketed singly linked list of chunks, with each chunk's next pointer protected with the PROTECT_PTR macro. Each bucket can store up to 7 chunks (by default), with the number of chunks stored in a counter alongside the bucket.

The tcache sits in front of the main allocation/free code, and since the pointer to the tcache is stored in thread local memory, no locks need to be taken. Since fastbin sizes are also tcache sizes, this typically means that the tcache needs to be emptied/filled if we would like to allocate from/free to the fastbin respectively.

With the addition of the tcache, fastbin behavior has changed slightly. When the first chunk is allocated from the fastbin, chunks are unlinked from the fastbin and linked into the tcache until the tcache is filled or the fastbin is empty:

static void *_int_malloc(mstate av, size_t bytes) {
  ...
    if (victim != NULL)
    {
      // 1: Fastbin alignment check
      if (__glibc_unlikely (misaligned_chunk (victim)))
        malloc_printerr ("malloc(): unaligned fastbin chunk detected 2");

      // 2: Victim chunk removed from fastbin
      if (SINGLE_THREAD_P)
        *fb = REVEAL_PTR (victim->fd);
      else
        REMOVE_FB (fb, pp, victim);

      if (__glibc_likely (victim != NULL))
        {
          // 3: Fastbin size check on victim
          size_t victim_idx = fastbin_index (chunksize (victim));
          if (__builtin_expect (victim_idx != idx, 0))
            malloc_printerr ("malloc(): memory corruption (fast)");
          check_remalloced_chunk (av, victim, nb);

#if USE_TCACHE
          /* While we're here, if we see other chunks of the same size,
             stash them in the tcache.  */
          size_t tc_idx = csize2tidx (nb);
          if (tcache && tc_idx < mp_.tcache_bins)
          {
            mchunkptr tc_victim;

            /* While bin not empty and tcache not full, copy chunks.  */
            while (tcache->counts[tc_idx] < mp_.tcache_count
                   && (tc_victim = *fb) != NULL)
              {
                // 4: Alignment check on chunk to be put into tcache
                if (__glibc_unlikely (misaligned_chunk (tc_victim)))
                  malloc_printerr ("malloc(): unaligned fastbin chunk detected 3");
                
                // 5: Unlink chunk from fastbin
                if (SINGLE_THREAD_P)
                  *fb = REVEAL_PTR (tc_victim->fd);
                else
                  {
                    REMOVE_FB (fb, pp, tc_victim);
                    if (__glibc_unlikely (tc_victim == NULL))
                      break;
                  }
                
                // 6: Link chunk into tcache
                tcache_put (tc_victim, tc_idx);
              }
          }
#endif
          void *p = chunk2mem (victim);
          alloc_perturb (p, bytes);
          return p;
        }
    }
  ...

Source: https://elixir.bootlin.com/glibc/glibc-2.37/source/malloc/malloc.c#L3831

Due to the first-in last-out nature of the tcache, the chunks end up being inserted in reverse order of that in the fastbin.

Of note is that glibc has a small quirk where malloc uses the tcache, but calloc does not. This can be seen in many examples online of fastbin dup attacks (e.g. how2heap), where calloc is used after filling the tcache to avoid allocating from it.

When the tcache was first added, a similar tcache dup attack was trivial as the tcache barely had any mitigations (likely for the sake of performance). Since then several mitigations were added, in addition to the aforementioned pointer protection:

/* Caller must ensure that we know tc_idx is valid and there's room
   for more chunks.  */
static __always_inline void
tcache_put (mchunkptr chunk, size_t tc_idx)
{
  tcache_entry *e = (tcache_entry *) chunk2mem (chunk);

  /* Mark this chunk as "in the tcache" so the test in _int_free will
     detect a double free.  */
  e->key = tcache_key;

  e->next = PROTECT_PTR (&e->next, tcache->entries[tc_idx]);
  tcache->entries[tc_idx] = e;
  ++(tcache->counts[tc_idx]);
}

/* Caller must ensure that we know tc_idx is valid and there's
   available chunks to remove.  */
static __always_inline void *
tcache_get (size_t tc_idx)
{
  tcache_entry *e = tcache->entries[tc_idx];
  if (__glibc_unlikely (!aligned_OK (e)))
    malloc_printerr ("malloc(): unaligned tcache chunk detected");
  tcache->entries[tc_idx] = REVEAL_PTR (e->next);
  --(tcache->counts[tc_idx]);
  e->key = 0;
  return (void *) e;
}

Source: https://elixir.bootlin.com/glibc/glibc-2.37/source/malloc/malloc.c#L3148

As can be seen above, tcache_put links a chunk into the tcache. In addition to the protected next pointer, it also stores the tcache_key random value in the key field. (At one point I believe this used to be the address of the tcache itself, but presumably they decided to not leak more metadata.) tcache_get retrieves a chunk from the tcache, if the count for that bucket is non-zero. It verifies the alignment of the chunk and then clears the key field.

The value of the key field is normally used to detect double frees, by checking in _int_free if it matches the random tcache_key value:

  ...
    /* Check to see if it's already in the tcache.  */
	tcache_entry *e = (tcache_entry *) chunk2mem (p);

	/* This test succeeds on double free.  However, we don't 100%
	   trust it (it also matches random payload data at a 1 in
	   2^<size_t> chance), so verify it's not an unlikely
	   coincidence before aborting.  */
	if (__glibc_unlikely (e->key == tcache_key))
	  {
	    tcache_entry *tmp;
	    size_t cnt = 0;
	    LIBC_PROBE (memory_tcache_double_free, 2, e, tc_idx);
	    for (tmp = tcache->entries[tc_idx];
		 tmp;
		 tmp = REVEAL_PTR (tmp->next), ++cnt)
	      {
		if (cnt >= mp_.tcache_count)
		  malloc_printerr ("free(): too many chunks detected in tcache");
		if (__glibc_unlikely (!aligned_OK (tmp)))
		  malloc_printerr ("free(): unaligned chunk detected in tcache 2");
		if (tmp == e)
		  malloc_printerr ("free(): double free detected in tcache 2");
		/* If we get here, it was a coincidence.  We've wasted a
		   few cycles, but don't abort.  */
	      }
	  }
  ...

Source: https://elixir.bootlin.com/glibc/glibc-2.37/source/malloc/malloc.c#L4438

The presence of this key generally makes tcache dups less feasible than before without some additional prerequisites.

Fastbin Duped, Or Why 3 = 4

If we only had control of malloc/free but not calloc, perhaps we could fill the tcache, conduct a fastbin dup, then drain the tcache (since we allocate from the tcache before the fastbin) and malloc a few times to receive our overlapping chunks:

int main() {
  void *tc[7];

  size_t *A = malloc(8);
  size_t *B = malloc(8);

  for (int i = 0; i < 7; i++) tc[i] = malloc(8);

  for (int i = 0; i < 7; i++) free(tc[i]);

  free(A);
  free(B);
  free(A);

  for (int i = 0; i < 7; i++) malloc(8);

  printf("%p\n", malloc(8));
  printf("%p\n", malloc(8));
  printf("%p\n", malloc(8));
}

Output:

$ ./a.out
0x55a80b7af2a0
0x55a80b7af2c0
0x55a80b7af2a0

Indeed, as expected, the fastbin dup yet lives. But wait! Since we drained the tcache, as mentioned above, after the first allocation from the fastbin the chunks in the fastbin get transplanted into the tcache. In fact, what was originally intended to be a fastbin dup turns into something like a 'fastbin dup into tcache dup'!

After the fastbin dup and draining the tcache, the fastbin and tcache state looks like this (ignoring pointer protection for the moment):

fastbin -> A -> B -> A -> B -> ...
tcache  -> NULL (count = 0)

The fastbin has become linked in a circle, while the tcache is empty. The first allocation of A from the fastbin triggers the following sequence of events:

First, A is removed from the fastbin head, to eventually be returned to the user:

fastbin -> B -> A -> B -> A ...
tcache  -> NULL (count = 0)

The allocator then tries to fill the corresponding tcache with chunks from the fastbin. It removes B from the fastbin...

fastbin -> A -> B -> A -> B ...
tcache  -> NULL (count = 0)

...and then puts it in the tcache:

fastbin -> A -> B -> NULL
tcache  -> B -> NULL (count = 1)

Our infinite lists have disappeared! Because the next and fd fields both occupy the first size_t bytes of the user data of the check, tcache_put setting B->next to NULL also truncates the fastbin list.

A and B are likewise moved into the tcache without incident. Since the key field of B is not checked in this code path, we end up with a tcache dup and a circular list in the tcache:

fastbin -> NULL
tcache  -> B -> A -> B -> A ... (count = 3)

Finally, A is returned back to the user as a result of the allocation.

Our three calls to free for the fastbin dup have magically become four chunks (three in the tcache and one in the hand)! If we can control the data of A we can overwrite its next pointer and then allocate three times (from the tcache this time) to return a controlled pointer. Since the count on the tcache bucket is three, we have just enough chunks to do so.

The controlled pointer is basically arbitrary, and only needs to be aligned correctly, with no need for faking any chunk sizes or anything of that sort, which is at least marginally better than the techniques that I found described online (granted, my search was not exactly comprehensive).

Completing the POC is left as an exercise :)

In any case, I hope you enjoyed this funny little interaction! Again, do drop me an email (see home page) to let me know of mistakes, typos, or just about anything cool!

~Jarsp

A Fastbin Dup Subtlety

⤷jarsp (2023-06-14)

The Fastbin Dup

The Thread-Local Cache

Fastbin Duped, Or Why 3 = 4