0%

jemalloc调研

jemalloc中提供的malloc函数叫做je_malloc, 释放的函数是je_free.

jemalloc基础知识

size_class

每个 size_class 代表 jemalloc 分配的内存大小,共有 NSIZES(232)?个小类(如果用户申请的大小位于两个小类之间,会取较大的,比如申请14字节,位于8和16字节之间,按16字节分配),分为2大类:

  • small_class小内存) : 对于64位机器来说,通常区间是 [8, 14kb],常见的有 8, 16, 32, 48, 64, …, 2kb, 4kb, 8kb,注意为了减少内存碎片并不都是2的次幂,比如如果没有48字节,那当申请33字节时,分配64字节显然会造成约50%的外部碎片
  • large_class大内存): 对于64位机器来说,通常区间是 [16kb, 7EiB],从 4 * page_size 开始,常见的比如 16kb, 32kb, …, 1mb, 2mb, 4mb等
  • size_index : size 位于 size_class 中的索引号,区间为 [0,231]?,比如8字节则为0,14字节(按16计算)为1,4kb字节为28,当 size 是 small_class 时,size_index 也称作 binind

base

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
struct base_s {
/* Associated arena's index within the arenas array. */
unsigned ind;
/* User-configurable extent hook functions. Points to an extent_hooks_t. */ extent分配回收等相关的函数指针
atomic_p_t extent_hooks;

/* Protects base_alloc() and base_stats_get() operations. */
malloc_mutex_t mtx;

/* Using THP when true (metadata_thp auto mode). */
bool auto_thp_switched;
/*
* Most recent size class in the series of increasingly large base
* extents. Logarithmic spacing between subsequent allocations ensures
* that the total number of distinct mappings remains small.
*/
pszind_t pind_last;

/* Serial number generation state. */ 下一个extent的sn号
size_t extent_sn_next;

/* Chain of all blocks associated with base. */ blocks链表
base_block_t *blocks;

/* Heap of extents that track unused trailing space within blocks. */ extent堆的root节点[NSIZES]
extent_heap_t avail[NSIZES];

/* Stats, only maintained if config_stats. */ 统计信息相关
size_t allocated;
size_t resident;
size_t mapped;
};

/* Embedded at the beginning of every block of base-managed virtual memory. */
struct base_block_s base_block_t {
/* Total size of block's virtual memory mapping. */
size_t size;

/* Next block in list of base's blocks. */
base_block_t *next;

/* Tracks unused trailing space. */ 保存extent的元数据信息
extent_t extent;
};

用于分配 jemalloc 元数据内存的结构

  • base.blocks.extent : 存放每个 size_classextent 元数据
  • base.blocks是一个链表, 通过其next可以找到当前ind维护的所有的extent元数据信息

bin

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
typedef struct bin_s bin_t;
struct bin_s {
/* All operations on bin_t fields require lock ownership. */
malloc_mutex_t lock;
/*
* Current slab being used to service allocations of this bin's size
* class. slabcur is independent of slabs_{nonfull,full}; whenever
* slabcur is reassigned, the previous slab must be deallocated or
* inserted into slabs_{nonfull,full}.
*/
extent_t *slabcur;

/*
* Heap of non-full slabs. This heap is used to assure that new
* allocations come from the non-full slab that is oldest/lowest in
* memory.
*/
extent_heap_t slabs_nonfull;

/* List used to track full slabs. */
extent_list_t slabs_full;

/* Bin statistics. */
bin_stats_t stats;
};

管理正在使用中的 slab(即用于小内存分配的 extent) 的集合,每个 bin 对应一个 size_class

  • bin.slabcur : 当前使用中的 slab
  • bin.slabs_nonfull : 有空闲内存块的 slab

bin_infos[binind]

1
2
3
4
5
6
7
8
9
10
11
typedef struct bin_info_s bin_info_t;
struct bin_info_s {
/* Size of regions in a slab for this bin's size class. */ region占了多少内存
size_t reg_size;
/* Total size of a slab for this bin's size class. */ slab 即 extent占了多少内存
size_t slab_size;
/* Total number of regions in a slab for this bin's size class. */ 一个slab内有多少个 region
uint32_t nregs;
/* Metadata used to manipulate bitmaps for slabs associated with this bin. */ 和extent->e_slab_data的bitmap信息配合, 查看region的使用情况
bitmap_info_t bitmap_info;
};
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
typedef struct bitmap_info_s {
/* Logical number of bits in bitmap (stored at bottom level). */
size_t nbits;
#ifdef BITMAP_USE_TREE
/* Number of levels necessary for nbits. */
unsigned nlevels;
/*
* Only the first (nlevels+1) elements are used, and levels are ordered
* bottom to top (e.g. the bottom level is stored in levels[0]).
*/
bitmap_level_t levels[BITMAP_MAX_LEVELS+1];
#else /* BITMAP_USE_TREE */
/* Number of groups necessary for nbits. */
size_t ngroups;
#endif /* BITMAP_USE_TREE */
} bitmap_info_t;

管理相同binind 的所有bin的信息, 注意不是维护bin的链表, 只是存bin所属的相关sizeclass的相关信息, 从bitmap_info中获取regions组的信息(在使用TREE的情况下, 获取TREE深度信息)

extent

管理 jemalloc 内存块(即用于用户分配的内存)的结构,每一个内存块大小可以是 N * page_size(4kb)(N >= 1)。每个 extent 有一个序列号(serial number)。

一个 extent 可以用来分配一次 large_class 的内存申请,但可以用来分配多次 small_class 的内存申请。

  • extent.e_bits : 8字节长,记录多种信息
  • extent.e_addr : 管理的内存块的起始地址
  • extent.e_slab_data : 位图bitmap,当此 extent 用于分配 small_class 内存时,用来记录这个 extent 的分配情况,此时每个 extent 内的小内存称为 region, 和bin_infos中的bitmap_info 配合使用, 每个region是否被使用占一个bit, 一个bitmap在64位上可以管理64个region. 查使用遍历每个bitmap, 64位上没有使用TREE结构, 即查看bitmap内从右边开始第一个非0的bit位, 在该bit位前的一个位置(未使用的), 是可以使用的region.

slab

当 extent 用于分配 small_class 内存时,称其为 slab。一个 extent 可以被用来处理多个同一 size_class 的内存申请。

extents

管理 extent 的集合。

  • extents.heaps[NPSIZES+1] : 各种 page(4kb) 倍数大小的 extent
  • extents.lru : 存放所有 extent 的双向链表
  • extents.delay_coalesce : 是否延迟 extent 的合并

arena

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
struct arena_s {
/*
* 0: Application allocation.
* 1: Internal metadata allocation. *
* Synchronization: atomic.
*/
atomic_u_t nthreads[2];

/*
* When percpu_arena is enabled, to amortize the cost of reading /
* updating the current CPU id, track the most recent thread accessing
* this arena, and only read CPU if there is a mismatch.
*/
tsdn_t *last_thd;

/* Synchronization: internal. */
arena_stats_t stats;

/*
* Lists of tcaches and cache_bin_array_descriptors for extant threads
* associated with this arena. Stats from these are merged
* incrementally, and at exit if opt_stats_print is enabled.
*
* Synchronization: tcache_ql_mtx.
*/
ql_head(tcache_t) tcache_ql;
ql_head(cache_bin_array_descriptor_t) cache_bin_array_descriptor_ql;
malloc_mutex_t tcache_ql_mtx;

/* Synchronization: internal. */
prof_accum_t prof_accum;
uint64_t prof_accumbytes;

/*
* Extent serial number generator state.
*
* Synchronization: atomic.
*/
atomic_zu_t extent_sn_next;

/*
* Extant large allocations.
*
* Synchronization: large_mtx.
*/
extent_list_t large;
/* Synchronizes all large allocation/update/deallocation. */
malloc_mutex_t large_mtx;

/*
* Collections of extents that were previously allocated. These are
* used when allocating extents, in an attempt to re-use address space.
*
* Synchronization: internal.
*/
extents_t extents_dirty;
extents_t extents_muzzy;
extents_t extents_retained;

/*
* Decay-based purging state, responsible for scheduling extent state
* transitions.
*
* Synchronization: internal.
*/
arena_decay_t decay_dirty; /* dirty --> muzzy */
arena_decay_t decay_muzzy; /* muzzy --> retained */

/*
* Next extent size class in a growing series to use when satisfying a
* request via the extent hooks (only if opt_retain). This limits the
* number of disjoint virtual memory ranges so that extent merging can
* be effective even if multiple arenas' extent allocation requests are
* highly interleaved.
*
* retain_grow_limit is the max allowed size ind to expand (unless the
* required size is greater). Default is no limit, and controlled
* through mallctl only.
*
* Synchronization: extent_grow_mtx
*/
pszind_t extent_grow_next;
pszind_t retain_grow_limit;
malloc_mutex_t extent_grow_mtx;

/*
* Available extent structures that were allocated via
* base_alloc_extent().
*/
extent_tree_t extent_avail;
malloc_mutex_t extent_avail_mtx;

/*
* bins is used to store heaps of free regions.
* Synchronization: internal.
*/
bin_t bins[NBINS];

/*
* Base allocator, from which arena metadata are allocated.
*/
base_t *base;
/* Used to determine uptime. Read-only after initialization. */
nstime_t create_time;
};

挂到arenas[ind]下, struct arena_s

用于分配&回收 extent 的结构,每个用户线程会被绑定到一个 arena 上,默认每个逻辑 CPU 会有 4 个 arena 来减少锁的竞争,各个 arena 所管理的内存相互独立。

  • arena.extents_dirty : 刚被释放后空闲 extent 位于的地方
  • arena.extents_muzzy : extents_dirty 进行 lazy purge 后位于的地方,dirty -> muzzy
  • arena.extents_retained : extents_muzzy 进行 decommit 或 force purge 后 extent 位于的地方,muzzy -> retained
  • arena.large : 存放 large extentextents
  • arena.extent_avail : heap,存放可用的 extent 元数据
  • arena.bins[NBINS] : 所以用于分配小内存的 bin
  • arena.base : 用于分配元数据的 base, base中有base_blocks管理所有当前ind的blocks链表, 而arena由areans[ind] 管理

img

rtree

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
typedef struct rtree_ctx_s rtree_ctx_t;
struct rtree_ctx_s {
/* Direct mapped cache. */
rtree_ctx_cache_elm_t cache[RTREE_CTX_NCACHE];
/* L2 LRU cache. */
rtree_ctx_cache_elm_t l2_cache[RTREE_CTX_NCACHE_L2];
};
typedef struct rtree_ctx_cache_elm_s rtree_ctx_cache_elm_t;
struct rtree_ctx_cache_elm_s {
uintptr_t leafkey;
rtree_leaf_elm_t *leaf;
};

struct rtree_s {
malloc_mutex_t init_lock;
rtree_leaf_elm_t root[1U << (RTREE_NSB/RTREE_HEIGHT)];
};
RTREE_HEIGHT = 2
static const rtree_level_t rtree_levels[] = {
#elif RTREE_HEIGHT == 2
{RTREE_NSB/2, RTREE_NHIB + RTREE_NSB/2}, // {18, 34}
{RTREE_NSB/2 + RTREE_NSB%2, RTREE_NHIB + RTREE_NSB} // {18, 52}
}
// leafkey 即是取 后34 位 注意leafkey 与 全局tree的遍历无关, 不要与rtree_subkey弄混了
JEMALLOC_ALWAYS_INLINE uintptr_t
rtree_leafkey(uintptr_t key) {
unsigned ptrbits = ZU(1) << (3+3) = 64;
unsigned cumbits = (rtree_levels[1].cumbits -
rtree_levels[1].bits);
unsigned maskbits = 64 - (48-18) = 34;
uintptr_t mask = ~((ZU(1) << maskbits) - 1) = 1 << 34 - 1 ;
return (key & mask); (key & 1 << 34 - 1);
}


// 这里tree的深度是2 level只能等于0 或 1, 在确定root数组长度时, level = 0
--> if (RTREE_HEIGHT > 1) {
RTREE_GET_CHILD(0)
}

// subkey 即是取 中间的18位[30-47]
JEMALLOC_ALWAYS_INLINE uintptr_t
rtree_subkey(uintptr_t key, unsigned level) {
unsigned ptrbits = ZU(1) << (LG_SIZEOF_PTR+3) = 64;
unsigned cumbits = rtree_levels[level].cumbits = 34;
unsigned shiftbits = ptrbits - cumbits = 30;
unsigned maskbits = rtree_levels[level].bits = 18;
uintptr_t mask = (ZU(1) << maskbits) - 1 = 1 << 18 -1;
return ((key >> shiftbits) & mask) = (key >> 30) & (1 << 18 -1) ;
}

// 在确定 root的子节点对应的数组长度时, level = 1
// subkey 即是取 18位 [12-30]
JEMALLOC_ALWAYS_INLINE uintptr_t
rtree_subkey(uintptr_t key, unsigned level) {
unsigned ptrbits = ZU(1) << (LG_SIZEOF_PTR+3) = 64;
unsigned cumbits = rtree_levels[level].cumbits = 52;
unsigned shiftbits = ptrbits - cumbits = 12;
unsigned maskbits = rtree_levels[level].bits = 18;
uintptr_t mask = (ZU(1) << maskbits) - 1 = 1 << 18 -1;
return ((key >> shiftbits) & mask) = (key >> 12) & (1 << 18 -1) ;
}
/

全局唯一的存放每个 extent 信息的 Radix Tree,以 extent->e_addruintptr_t 为 key,如uintptr_t 为64位(8字节), rtree 的高度为2

jemalloc_tree.drawio

cache_bin

img

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
typedef struct cache_bin_s cache_bin_t;
struct cache_bin_s {
/* Min # cached since last GC. */
cache_bin_sz_t low_water;
/* # of cached objects. */
cache_bin_sz_t ncached;
/*
* Stack of available objects.
*
* To make use of adjacent cacheline prefetch, the items in the avail
* stack goes to higher address for newer allocations. avail points
* just above the available space, which means that
* avail[-ncached, ... -1] are available items and the lowest item will
* be allocated first.
*/
void **avail;
};

每个线程独有的用于分配小内存的缓存

  • low_water : 上一次 gc 后剩余的缓存数量
  • cache_bin.ncached : 当前 cache_bin 存放的缓存数量
  • cache_bin.avail : 可直接用于分配的内存,从左往右依次分配(注意这里的寻址方式)

tcache

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
struct tcache_s {
/* Drives incremental GC. */
ticker_t gc_ticker;
/*
* The pointer stacks associated with bins follow as a contiguous array.
* During tcache initialization, the avail pointer in each element of
* tbins is initialized to point to the proper offset within this array.
*/
cache_bin_t bins_small[NBINS];

/*
* This data is less hot; we can be a little less careful with our
* footprint here.
*/
/* Lets us track all the tcaches in an arena. */
ql_elm(tcache_t) link;
/*
* The descriptor lets the arena find our cache bins without seeing the
* tcache definition. This enables arenas to aggregate stats across
* tcaches without having a tcache dependency.
*/
cache_bin_array_descriptor_t cache_bin_array_descriptor;

/* The arena this tcache is associated with. */
arena_t *arena;
/* Next bin to GC. */
szind_t next_gc_bin;
/* For small bins, fill (ncached_max >> lg_fill_div). */
uint8_t lg_fill_div[NBINS];
/*
* We put the cache bins for large size classes at the end of the
* struct, since some of them might not get used. This might end up
* letting us avoid touching an extra page if we don't have to.
*/
cache_bin_t bins_large[NSIZES-NBINS];
};

每个线程独有的缓存(Thread Cache),大多数内存申请都可以在 tcache 中直接得到,从而避免加锁

  • tcache.bins_small[NBINS] : 小内存的 cache_bin
  • tcache.bins_large[NSIZES - NBINS]: 大内存的cache_bin

tsd

1
2
3
4
5
6
7
8
struct tsd_s {
tsd_state_t state;
bool ...tcache_enabled;
...
rtree_ctx_t ...rtree_ctx;
arena_t* ...arena;
tcache_t ...tcache;
}

Thread Specific Data,每个线程独有,用于存放与这个线程相关的结构

  • tsd.rtree_ctx : 当前线程的 rtree context,用于快速访问 extent 信息
  • tsd.arena : 当前线程绑定的 arena
  • tsd.tcache : 当前线程的 tcache

je初始化及内存申请

进行内存申请时, 需要先进行初始化

1.1 初始化参数

1
2
3
4
5
6
7
8
9
10
11
12
13
--> 1.1 初始化参数
sopts.bump_empty_alloc = true;
sopts.null_out_result_on_error = true;
sopts.set_errno_on_error = true;
sopts.oom_string = "<jemalloc>: Error in malloc(): out of memory\n"; // 申请失败时返回的消息
dynamic_opts->item_size = 0;
dynamic_opts->alignment = 0; // 不对齐
dynamic_opts->zero = false;
dynamic_opts->tcache_ind = TCACHE_IND_AUTOMATIC;
dynamic_opts->arena_ind = ARENA_IND_AUTOMATIC;
dopts.result = &ret;
dopts.num_items = 1; // 申请的内存个数
dopts.item_size = size; // 申请的size

1.2 extent默认的处理函数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
--> 1.2 extent默认的处理函数, 没贴的函数就是没打开宏的
// eg: extent_hooks->alloc()
struct extent_hooks_s {
extent_alloc_t *alloc;
extent_dalloc_t *dalloc;
extent_destroy_t *destroy;
extent_commit_t *commit;
extent_decommit_t *decommit;
extent_purge_t *purge_lazy; // not supported
extent_purge_t *purge_forced;
extent_split_t *split;
extent_merge_t *merge;
};
const extent_hooks_t extent_hooks_default = {
extent_alloc_default,
extent_dalloc_default,
extent_destroy_default,
extent_commit_default,
extent_decommit_default
NULL,
extent_purge_forced_default,
extent_split_default,
extent_merge_default
};

1.3 SIZE_CLASSES定义

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
--> 1.3 宏展开
const size_t sz_pind2sz_tab[NPSIZES+1] = {
#define PSZ_yes(lg_grp, ndelta, lg_delta) \
(((ZU(1)<<lg_grp) + (ZU(ndelta)<<lg_delta))),
#define PSZ_no(lg_grp, ndelta, lg_delta)
#define SC(index, lg_grp, lg_delta, ndelta, psz, bin, pgs, lg_delta_lookup) \
PSZ_##psz(lg_grp, ndelta, lg_delta)
SIZE_CLASSES
#undef PSZ_yes
#undef PSZ_no
#undef SC
(LARGE_MAXCLASS + PAGE)
};
// arm64 走的这个分支
#if (LG_SIZEOF_PTR == 3 && LG_TINY_MIN == 3 && LG_QUANTUM == 4 && LG_PAGE == 12)
#define SIZE_CLASSES \
...
#define SMALL_MAXCLASS ((((size_t)1) << 13) + (((size_t)3) << 11)) //10240
#endif
// arm32 走的这个分支
#if (LG_SIZEOF_PTR == 2 && LG_TINY_MIN == 3 && LG_QUANTUM == 3 && LG_PAGE == 12)
#define SIZE_CLASSES \
...
#define SMALL_MAXCLASS ((((size_t)1) << 13) + (((size_t)3) << 11))
#endif
// 不同的配置设置不同的SIZE_CLASSES

1.4 arena_boot 展开

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
void
arena_boot(void) {
...
#define REGIND_bin_yes(index, reg_size) \
div_init(&arena_binind_div_info[(index)], (reg_size));
#define REGIND_bin_no(index, reg_size)
#define SC(index, lg_grp, lg_delta, ndelta, psz, bin, pgs, \
lg_delta_lookup) \
REGIND_bin_##bin(index, (1U<<lg_grp) + (ndelta << lg_delta))
SIZE_CLASSES
#undef REGIND_bin_yes
#undef REGIND_bin_no
#undef SC
}
// 粘贴到macro.c 中,
// gcc -E -P macro.c -o result.c , kate result.c 替换 `; div_init` 为 `; \n div_init`
void arena_boot(void) {
...
div_init(&arena_binind_div_info[(0)], ((1U<<3) + (0 << 3)));
div_init(&arena_binind_div_info[(1)], ((1U<<3) + (1 << 3)));
div_init(&arena_binind_div_info[(2)], ((1U<<4) + (1 << 4)));
div_init(&arena_binind_div_info[(3)], ((1U<<4) + (2 << 4)));
div_init(&arena_binind_div_info[(4)], ((1U<<4) + (3 << 4)));
div_init(&arena_binind_div_info[(5)], ((1U<<6) + (1 << 4)));
div_init(&arena_binind_div_info[(6)], ((1U<<6) + (2 << 4)));
div_init(&arena_binind_div_info[(7)], ((1U<<6) + (3 << 4)));
div_init(&arena_binind_div_info[(8)], ((1U<<6) + (4 << 4)));
div_init(&arena_binind_div_info[(9)], ((1U<<7) + (1 << 5)));
div_init(&arena_binind_div_info[(10)], ((1U<<7) + (2 << 5)));
div_init(&arena_binind_div_info[(11)], ((1U<<7) + (3 << 5)));
div_init(&arena_binind_div_info[(12)], ((1U<<7) + (4 << 5)));
div_init(&arena_binind_div_info[(13)], ((1U<<8) + (1 << 6)));
div_init(&arena_binind_div_info[(14)], ((1U<<8) + (2 << 6)));
div_init(&arena_binind_div_info[(15)], ((1U<<8) + (3 << 6)));
div_init(&arena_binind_div_info[(16)], ((1U<<8) + (4 << 6)));
div_init(&arena_binind_div_info[(17)], ((1U<<9) + (1 << 7)));
div_init(&arena_binind_div_info[(18)], ((1U<<9) + (2 << 7)));
div_init(&arena_binind_div_info[(19)], ((1U<<9) + (3 << 7)));
div_init(&arena_binind_div_info[(20)], ((1U<<9) + (4 << 7)));
div_init(&arena_binind_div_info[(21)], ((1U<<10) + (1 << 8)));
div_init(&arena_binind_div_info[(22)], ((1U<<10) + (2 << 8)));
div_init(&arena_binind_div_info[(23)], ((1U<<10) + (3 << 8)));
div_init(&arena_binind_div_info[(24)], ((1U<<10) + (4 << 8)));
div_init(&arena_binind_div_info[(25)], ((1U<<11) + (1 << 9)));
div_init(&arena_binind_div_info[(26)], ((1U<<11) + (2 << 9)));
div_init(&arena_binind_div_info[(27)], ((1U<<11) + (3 << 9)));
div_init(&arena_binind_div_info[(28)], ((1U<<11) + (4 << 9)));
div_init(&arena_binind_div_info[(29)], ((1U<<12) + (1 << 10)));
div_init(&arena_binind_div_info[(30)], ((1U<<12) + (2 << 10)));
div_init(&arena_binind_div_info[(31)], ((1U<<12) + (3 << 10)));
div_init(&arena_binind_div_info[(32)], ((1U<<12) + (4 << 10)));
div_init(&arena_binind_div_info[(33)], ((1U<<13) + (1 << 11)));
div_init(&arena_binind_div_info[(34)], ((1U<<13) + (2 << 11)));
div_init(&arena_binind_div_info[(35)], ((1U<<13) + (3 << 11)));
}

1.5 高级宏展开

MALLOC_TSD tsds 结构体

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
/*  O(name,			type,			nullable type */
#define MALLOC_TSD \
O(tcache_enabled, bool, bool) \
O(arenas_tdata_bypass, bool, bool) \
O(reentrancy_level, int8_t, int8_t) \
O(narenas_tdata, uint32_t, uint32_t) \
O(offset_state, uint64_t, uint64_t) \
O(thread_allocated, uint64_t, uint64_t) \
O(thread_deallocated, uint64_t, uint64_t) \
O(prof_tdata, prof_tdata_t *, prof_tdata_t *) \
O(rtree_ctx, rtree_ctx_t, rtree_ctx_t) \
O(iarena, arena_t *, arena_t *) \
O(arena, arena_t *, arena_t *) \
O(arenas_tdata, arena_tdata_t *, arena_tdata_t *)\
O(tcache, tcache_t, tcache_t) \
O(witness_tsd, witness_tsd_t, witness_tsdn_t) \
MALLOC_[[存储相关/[[存储相关/test|test]]|TEST]]_TSD

#define O(n, t, nt) \
JEMALLOC_ALWAYS_INLINE t * \
tsd_##n##p_get_unsafe(tsd_t *tsd) { \
return &tsd->use_a_getter_or_setter_instead_##n; \
}
MALLOC_TSD
#undef O
// 需要带入 n, t, nt , 根据MALLOC_TSD表确定类型 , 在n是 tcache时, t为tcache_t, nt为tcache_t
// 解释后为
tcache_t* tsd_tcachep_get_unsafe(tsd_t *tsd) { \
return &tsd->use_a_getter_or_setter_instead_tcache; \
}
struct tsd_s {
/*
* The contents should be treated as totally opaque outside the tsd
* module. Access any thread-local state through the getters and
* setters below.
*/
tsd_state_t state;
#define O(n, t, nt) \
t use_a_getter_or_setter_instead_##n;
MALLOC_TSD
#undef O
// 展开为
tcache_t use_a_getter_or_setter_instead_cache;
... //省略其他成员, 实际上会将MALLOC_TSD表中的所有字段全部展开, 成员为MALLOC_TSD的所有字段
};

相关的函数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
rb_proto(, extent_avail_, extent_tree_t, extent_t)
#define rb_proto(a_attr, a_prefix, a_rbt_type, a_type) \
a_attr void \
a_prefix##new(a_rbt_type *rbtree); \
a_attr bool \
a_prefix##empty(a_rbt_type *rbtree); \
a_attr a_type * \
a_prefix##first(a_rbt_type *rbtree); \
a_attr a_type * \
a_prefix##last(a_rbt_type *rbtree); \
a_attr a_type * \
a_prefix##next(a_rbt_type *rbtree, a_type *node); \
a_attr a_type * \
a_prefix##prev(a_rbt_type *rbtree, a_type *node); \
a_attr a_type * \
a_prefix##search(a_rbt_type *rbtree, const a_type *key); \
a_attr a_type * \
a_prefix##nsearch(a_rbt_type *rbtree, const a_type *key); \
a_attr a_type * \
a_prefix##psearch(a_rbt_type *rbtree, const a_type *key); \
a_attr void \
a_prefix##insert(a_rbt_type *rbtree, a_type *node); \
a_attr void \
a_prefix##remove(a_rbt_type *rbtree, a_type *node); \
a_attr a_type * \
a_prefix##iter(a_rbt_type *rbtree, a_type *start, a_type *(*cb)( \
a_rbt_type *, a_type *, void *), void *arg); \
a_attr a_type * \
a_prefix##reverse_iter(a_rbt_type *rbtree, a_type *start, \
a_type *(*cb)(a_rbt_type *, a_type *, void *), void *arg); \
a_attr void \
a_prefix##destroy(a_rbt_type *rbtree, void (*cb)(a_type *, void *), \
void *arg);
// 函数声明, gcc -E 预都出来
extent_t * extent_avail_first(extent_tree_t *rbtree);
...
// 函数实现 ph.h extent.c
#define ph_gen(a_attr, a_prefix, a_ph_type, a_type, a_field, a_cmp) \
a_prefix##remove_first(a_ph_type *ph) { \
a_type *ret; \
\
if (ph->ph_root == NULL) { \
return NULL; \
} \
ph_merge_aux(a_type, a_field, ph, a_cmp); \
\
ret = ph->ph_root; \
\
ph_merge_children(a_type, a_field, ph->ph_root, a_cmp, \
ph->ph_root); \
\
return ret; \
}
static inline int
extent_esnead_comp(const extent_t *a, const extent_t *b) {
int ret;

ret = extent_esn_comp(a, b);
if (ret != 0) {
return ret;
}

ret = extent_ead_comp(a, b);
return ret;
}
UNUSED extent_t * extent_avail_remove_first(extent_tree_t *ph) {
extent_t *ret;
if (ph->ph_root == ((void *)0)) {
return ((void *)0);
}
ph_merge_aux(extent_t, ph_link, ph, extent_esnead_comp);
ret = ph->ph_root;
ph_merge_children(extent_t, ph_link, ph->ph_root, extent_esnead_comp, ph->ph_root);
return ret;
}

e_bits相关

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
EXTENT_BITS_ARENA_WIDTH = 12 ;
EXTENT_BITS_ARENA_SHIFT =0;
EXTENT_BITS_ARENA_MASK =((((1 << (12)) - 1)) << (0)); //4095 111111111111 fff
EXTENT_BITS_SLAB_WIDTH =1;
EXTENT_BITS_SLAB_SHIFT =(12 + 0);
EXTENT_BITS_SLAB_MASK =((((1 << (1)) - 1)) << ((12 + 0))); //4096 1000000000000 1000
EXTENT_BITS_COMMITTED_WIDTH = 1;
EXTENT_BITS_COMMITTED_SHIFT = (1 + (12 + 0));
EXTENT_BITS_COMMITTED_MASK =((((1 << (1)) - 1)) << ((1 + (12 + 0)))); //8192 10000000000000 2000
EXTENT_BITS_DUMPABLE_WIDTH = 1;
EXTENT_BITS_DUMPABLE_SHIFT = (1 + (1 + (12 + 0))) ;
EXTENT_BITS_DUMPABLE_MASK =((((1 << (1)) - 1)) << ((1 + (1 + (12 + 0))))); //16384 100000000000000 4000
EXTENT_BITS_ZEROED_WIDTH =1;
EXTENT_BITS_ZEROED_SHIFT =(1 + (1 + (1 + (12 + 0)))); //15 1111 f
EXTENT_BITS_ZEROED_MASK = ((((1 << (1)) - 1)) << ((1 + (1 + (1 + (12 + 0)))))); //32768 1000000000000000 8000
EXTENT_BITS_STATE_WIDTH = 2;
EXTENT_BITS_STATE_SHIFT = (1 + (1 + (1 + (1 + (12 + 0))))); //16
EXTENT_BITS_STATE_MASK =((((1 << (2)) - 1)) << ((1 + (1 + (1 + (1 + (12 + 0))))))); //196608 110000000000000000 30000
EXTENT_BITS_SZIND_WIDTH =8;
EXTENT_BITS_SZIND_SHIFT =(2 + (1 + (1 + (1 + (1 + (12 + 0)))))); // 18 10010 12
EXTENT_BITS_SZIND_MASK =((((1 << (8)) - 1)) << ((2 + (1 + (1 + (1 + (1 + (12 + 0)))))))); // 66846720 11111111000000000000000000 3FC0000
EXTENT_BITS_NFREE_WIDTH =((12 - 3) + 1); //10 1010 a
EXTENT_BITS_NFREE_SHIFT =(8 + (2 + (1 + (1 + (1 + (1 + (12 + 0))))))); //26 11010 1a
EXTENT_BITS_NFREE_MASK =((((1 << (((12 - 3) + 1))) - 1)) << ((8 + (2 + (1 + (1 + (1 + (1 + (12 + 0))))))))); // 68652367872 111111111100000000000000000000000000 FFC000000
EXTENT_BITS_SN_SHIFT =(((12 - 3) + 1) + (8 + (2 + (1 + (1 + (1 + (1 + (12 + 0)))))))); // 36 100100 24
EXTENT_BITS_SN_MASK =(18446744073709551615 << (((12 - 3) + 1) + (8 + (2 + (1 + (1 + (1 + (1 + (12 + 0))))))))); // FFFFFFFFFFFFFFFF000000000

je_malloc 基本流程

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
-> je_malloc(size_t size)
| - static_opts_init(&sopts)
| - dynamic_opts_init(&dopts)
\ - imalloc(&sopts, &dopts) # "参考1.1 初始化参数"
\ - !malloc_initialized() && !malloc_init() # "这个地方通常不会失败, 第一次会进malloc_init()函数"
| \ - malloc_init_state == malloc_init_initialized # "即malloc_initialized为true时不会进入malloc_init函数"
| | - malloc_init_hard()
| \ - malloc_init_hard_needed() #判断是否有另一个线程正在初始化, 只要有一个线程进行初始化了即可
*| - malloc_init_hard_a0_locked() # 进行初始化的函数
| \ - config_prof -> prof_boot0() ? 待研究 // TODO
| | | | - malloc_conf_init() # 参数初始化 -> android 上只load case 0 1的情况, 只覆盖opts
应该是elf 编译时通过je_malloc_conf 引入的 ?待研究 //TODO
| - pages_boot() -> os_overcommits = true 看起来没有做其他额外的处理
| \ - init_thp_state() -> JEMALLOC_USE_SYSCALL /sys/kernel/mm/transparent_hugepage/enabled ? not exsit
| | \ - opt_thp = init_system_thp_mode = thp_mode_not_supported transparent_hugepage(THP)?
| | | | - base_boot(TSDN_NULL)
| \ - base_new(tsdn, 0, (extent_hooks_t *)&extent_hooks_default) #ind(index) = 0 QUANTUM(2^4对齐) //TODO
| | \ - block = base_block_alloc(tsdn, NULL, extent_hooks, ind, &pind_last, &extent_sn_next, sizeof(base_t), QUANTUM)
| | | \ - alignment = ALIGNMENT_CEILING(alignment, QUANTUM) #16字节对齐
| - size_t usize = ALIGNMENT_CEILING(size, alignment); #16字节对齐
| - min_block_size = HUGEPAGE_CEILING(sz_psz2u(header_size + gap_size + usize)); # header + block
# HUGEPAGE_CEILING 按`HUGEPAGE_MASK` 对齐 64位上是 2^21
| - next_block_size = HUGEPAGE_CEILING(sz_pind2sz(pind_next)) # "pind_next是1, 这里初始化传的ind是0, next即为1,
NPSIZES tab的总index->sizeClass 总的class的数目"
| \ - sz_pind2sz_lookup(pind) #对应table SIZE_CLASSES(需要进行宏展开) 参考1.3 SIZE_CLASSES展开 //TODO gdb查看该值
| - block_size = (min_block_size > next_block_size) ? min_block_size : next_block_size; #最小2^21
| | | | *| - block = base_map(tsdn, extent_hooks, ind, block_size);
| \ - <extent_hooks == &extent_hooks_default> -
addr = extent_alloc_mmap(NULL, size, alignment, &zero, &commit)
#"size(block_size) alignment = 1 << 21 slow时用到"
| - pages_map(new_addr, size, ALIGNMENT_CEILING(alignment, 1<< 12), commit)
| \ - mmap(addr, size, PROT_READ | PROT_WRITE, mmap_flags, -1, 0)
| - prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, ret, size, "libc_malloc");
| | - extent_boot() #锁相关的初始化, 先不用关注
| \ - rtree_new(&extents_rtree, true)
| | \ - malloc_mutex_init(&rtree->init_lock, "rtree", WITNESS_RANK_RTREE, malloc_mutex_rank_exclusive)
| - mutex_pool_init(&extent_mutex_pool, "extent_mutex_pool", WITNESS_RANK_EXTENT_POOL)
| - ctl_boot()
| \ - malloc_mutex_init(&ctl_mtx, "ctl", WITNESS_RANK_CTL, malloc_mutex_rank_exclusive)
malloc_mutex_rank_exclusive)
| - config_prof -> prof_boot1()
| | | | - arena_boot()
| \ - arena_dirty_decay_ms_default_set(opt_dirty_decay_ms)
| - arena_muzzy_decay_ms_default_set(opt_muzzy_decay_ms)
| - div_init(&arena_binind_div_info[(index)], (reg_size)) # 需要进行宏展开
// gcc -E -P macro.c -o result.c
| \ - div_init(&arena_binind_div_info[(0)], ((1U<<3) + (0 << 3))); ... "2^32/8 .. arena 表初始化
参考1.4 arena_boot 展开"
| - tcache_boot() # "cache_bin 用于分配小内存的缓存"
| \ - nhbins = sz_size2index(tcache_maxclass) + 1; # "共多少bins 其中包括small [0-NBINS]和 large[NBINS+1 - nhbins]的"
| - tcache_bin_info = (cache_bin_info_t *)base_alloc(tsdn, b0get(), nhbins * sizeof(cache_bin_info_t), CACHELINE);
| \ - base_alloc_impl(tsdn, base, size, alignment, NULL)
| \ - extent = extent_heap_remove_first(&base->avail[i]); #"从base->avail中取"
| - <extent == NULL?> - base_extent_alloc(tsdn, base, usize, alignment) # "取不到的话, 需要分配"
| - base_extent_bump_alloc(base, extent, usize, alignment)
| | | | - arena_init(TSDN_NULL, 0, (extent_hooks_t *)&extent_hooks_default)
| \ - arena = arena_init_locked(tsdn, ind, extent_hooks) #"ind=0 Create a new arena &insert into the arenas array
at index ind"
| - arena = arena_get(tsdn, ind, false); #"Another thread may have already initialized arenas[ind] for auto arena"
| - arena = arena_new(tsdn, ind, extent_hooks);
| \ - <ind == 0> base = b0get()
| - <ind != 0> base = base_new(tsdn, ind, extent_hooks)
| - arena = (arena_t *)base_alloc(tsdn, base, sizeof(arena_t), CACHELINE) #"分配内存单元"
| - extent_list_init(&arena->large) #"large 双向链表初始化"
| - extents_init(tsdn, &arena->extents_dirty, extent_state_dirty, true) #"extents_dirty 初始化"
| - extents_init(tsdn, &arena->extents_muzzy, extent_state_muzzy, false)
| - ... # "arena相关的数据结构初始化"
| - <i = 0; i < NBINS; i++> bin_init(&arena->bins[i]) # "arena 管理的bin 初始化"
| \ - extent_heap_new(&bin->slabs_nonfull)
| - extent_list_init(&bin->slabs_full); # "管理无空闲块的双向链表初始化"
| - arena->base = base; # "arena的元数据base 指向关系"
| - arena_set(ind, arena); # "arenas[inx] = arena"
| - a0 = arena_get(TSDN_NULL, 0, false) # 取arenas[ind] ind = 0
| | - malloc_init_state = malloc_init_a0_initialized;
| - tsd = malloc_tsd_boot0()
| \ - tsd_boot0() #创建tsd_tsd
| - tsd_t *tsd = tsd = tsd_fetch()
| - return tsd
| - < malloc_init_hard_recursible() true?>
| Y\ - return true
| - malloc_init_narenas()
| - background_thread_boot1(tsd_tsdn(tsd))
| - malloc_init_percpu() #"配置一个cpu使用的arena总数"
| \ - opt_percpu_arena = percpu_arena_as_initialized(opt_percpu_arena);
| - malloc_init_hard_finish()
| | \ - malloc_init_state = malloc_init_initialized #"根据此标记, 不会再次重新进malloc_init函数" end malloc_init()
| - malloc_tsd_boot1() #创建tsd ? 只有两个tsd吗? boot0 boot1?//TODO
| *| - tsd_t *tsd = tsd_fetch(); #如果已经初始化, 直接调到这一步
| \ - tsd_fetch_impl(true, false);
| \ - *tsd = tsd_get(true)
| | \ - wrapper = tsd_wrapper_get(init)
| - tsd_wrapper_t *wrapper = (tsd_wrapper_t *)pthread_getspecific(tsd_tsd) # tsd 区域
| - <wrapper==NULL> -wrapper = (tsd_wrapper_t *) malloc_tsd_malloc(sizeof(tsd_wrapper_t))
| - tsd_wrapper_set(wrapper)
| \ - pthread_setspecific(tsd_tsd, (void *)wrapper) # malloc后设置成tsd
| | -<tsd->state != tsd_state_nominal>- tsd_fetch_slow(tsd_t *tsd, false);
| | | \ - <tsd->state == tsd_state_uninitialized> - tsd->state = tsd_state_nominal
| - tsd_slow_update(tsd); # 设置状态为 tsd_state_nominal 或 tsd_state_nominal_slow(Initialized but on slow path)
| - tsd_set(tsd);
| - tsd_data_init(tsd);
| | \ - tsd_tcache_data_init(tsd) # Trigger tcache init
| | | *\ - *avail_array = ipallocztm(tsd_tsdn(tsd), size, CACHELINE, true, NULL, true, arena_get(TSDN_NULL, 0, true))
| \ - ret = arena_palloc(tsdn, arena, usize, CACHELINE, zero, tcache); # zero = true
| | | |if\ - <usize <= SMALL_MAXCLASS && (alignment < PAGE ||... > # SMALL_MAXCLASS=10240 CACHELINE=64<1<<12
- arena_malloc(tsdn, arena, usize, sz_size2index(usize), zero, tcache, true)
| - <tcache == NULL> #上面调用传过来的tcache 是NULL?
| *| Y\ - return arena_malloc_hard(tsdn, arena, size, ind, zero)
| \ - arena = arena_choose(tsdn_tsd(tsdn), arena)
| - arena_malloc_small(tsdn, arena, ind, zero) #直接从arena中分配
| \ - bin = &arena->bins[ind]
| - usize = sz_index2size(binind)
| - < slab = bin->slabcur) != NULL && extent_nfree_get(slab) > 0 ?> #bin->slabcur不为空
| | Y\ - arena_slab_reg_alloc(slab, &bin_infos[binind])
| \- *slab_data = extent_slab_data_get(slab)
| - bin_info = bin_infos[binind]
| - regind = bitmap_sfu(slab_data->bitmap, &bin_info->bitmap_info)
*| - ret = (void *)((uintptr_t)extent_addr_get(slab) + (bin_info->reg_size * regind))#"extent中的e_addr是内存单元(Region)的起始地址, regind是region的index"
| | <== | - return ret
| | N\ - ret = arena_bin_malloc_hard(tsdn, arena, bin, binind)
| \ - bin_info = &bin_infos[binind]
| - slab = arena_bin_nonfull_slab_get(tsdn, arena, bin, binind) # 参考1.6.1"arena_bin_nonfull_slab_get过程"
| - bin->slabcur = slab;
| | <== | - return arena_slab_reg_alloc(slab, bin_info)
*| N\ - return tcache_alloc_small(tsdn_tsd(tsdn), arena, tcache, size, ind, zero, slow_path=true)
| \ - tbin = tcache_small_bin_get(tcache, binind)
| *\ - return &tcache->bins_small[binind] # bin 从tcache->bins_small[ind]中取出
| - ret = cache_bin_alloc_easy(tbin, &tcache_success);
| \ - ret = *(bin->avail - bin->ncached); bin->ncached--;
| <== | - <bins_small 已经初始化> return ret
| | - <bins_small 未初始化>
| - < Y > - arena = arena_choose(tsd, arena) # "arena为NULL时, 为tcache绑定arena"
| - ret = tcache_alloc_small_hard(tsd_tsdn(tsd), arena, tcache,
bin, binind, &tcache_hard_success)
| *\ - arena_tcache_fill_small(tsdn, arena, tcache, tbin, binind,
config_prof ? tcache->prof_accumbytes : 0) #填充bins_small[ind]
| - ret = cache_bin_alloc_easy(tbin, tcache_success) #tbin = bins_small[ind]
| \ - ret = *(bin->avail - bin->ncached); bin->ncached--;
| <== | - return ret
| | | | |if| - < usize > SMALL_MAXCLASS && alignment <= CACHELINE > # "large malloc"
- large_malloc(tsdn, arena, usize, zero) # zero = true
| \ - arena = arena_choose(tsdn_tsd(tsdn), arena) #64位
| - extent = arena_extent_alloc_large(tsdn, arena, usize, alignment, &is_zeroed) #is_zeroed = true
| | | |*f\ - extent_t *extent = extents_alloc(tsdn, arena, &extent_hooks,
&arena->extents_dirty, NULL, usize, sz_large_pad, alignment, false, szind, zero, &commit)
| *\ - extent_recycle(tsdn, arena, r_extent_hooks, extents,
new_addr, size, pad, alignment, slab, szind, zero, commit, false)#"比较复杂, 后面分析", extents
|if\ - <extent "分配出来了">
| <== | - return extent
| | |*f| - <extent"没分配出来">- extents_alloc(tsdn, arena, &extent_hooks,
&arena->extents_muzzy, NULL, usize, sz_large_pad, alignment, false, szind, zero, &commit)
|if\ - <extent "分配出来了">
| <== | - return extent
| | |*f| - <extent"还没分配出来"> -
- size = usize + sz_large_pad
- extent_alloc_wrapper(tsdn, arena, &extent_hooks, NULL,
usize, sz_large_pad, alignment, false, szind, zero, &commit);
| \ - *extent = extent_alloc_retained(tsdn, arena, r_extent_hooks, new_addr, size, pad,
alignment, slab, szind, zero, commit)
| - <extent == NULL> # 没从"arena->extents_retained"分配出来
| <== | N\ - return extent
| Y\ - extent = extent_alloc_wrapper_hard(tsdn, arena, r_extent_hooks, new_addr, size, pad,
alignment, slab, szind, zero, commit)
| <== | - return extent
| | - *tcache = tsd_tcachep_get_unsafe(tsd) # returns a pointer to the thread-local instance 参考 1.5 宏展开扩展
| | | | - tcache_init(tsd, tcache, avail_array)
| \ - ticker_init(&tcache->gc_ticker, TCACHE_GC_INCR)
| - for <; i < NBINS; i++ >
| \ - tcache->lg_fill_div[i] = 1 #"填充除 /2^lg_fill_div"
| - tcache_small_bin_get(tcache, i)->avail = ((uintptr_t)avail_stack + (uintptr_t)stack_offset)
| \ - &tcache->bins_small[i] = avail_stack + stack_offset
| - for <i = NBINS; i < nhbins; i++> #
| \ - stack_offset += tcache_bin_info[i].ncached_max * sizeof(void *)
| - tcache_large_bin_get(tcache, i)->avail = ((uintptr_t)avail_stack + (uintptr_t)stack_offset)
| \ - &tcache->bins_large[i - NBINS] = avail_stack + stack_offset
| *| - imalloc_body(sopts, dopts, tsd) # "开始分配, 上面主要是填充avail_stack的过程"
| \ - compute_size_with_overflow(sopts->may_overflow, dopts, &size) #计算size
| \ - <may_overflow?>
| N\ - *size = dopts->item_size # "dopts->num_items = 1"
| Y\ - *size = dopts->item_size * dopts->num_items
| | - < dopts->alignment == 0 > # "不用对齐"
| Y\ - ind = sz_size2index(size) #不用对齐时使用ind分配
| N\ - usize = sz_sa2u(size, dopts->alignment) #对齐的情况下按usize分配
| - allocation = imalloc_no_sample(sopts, dopts, tsd, size, usize, ind)
| \ - tcache = tcaches_get(tsd, dopts->tcache_ind)
| \ - &tcaches[ind]->tcache
| - arena = arena_get(tsd_tsdn(tsd), dopts->arena_ind, true)
| \ - ret = (arena_t *)atomic_load_p(&arenas[ind], ATOMIC_ACQUIRE)
| - iallocztm(tsd_tsdn(tsd), size, ind, dopts->zero, tcache, false, arena, sopts->slow)
#"sopts->slow是true的时候, tcache有可能是null" // TODO
|<=| \ - ret = arena_malloc(tsdn, arena, size, ind, zero, tcache, slow_path) #"第二次调用, 这次的tcache一般不是空"
| \ - <tcache!=NULL && size<SMALL_MAXCLASS>
| <== | Y\ - tcache_alloc_small(tsdn_tsd(tsdn), arena, tcache, size, ind, zero, slow_path)
| - <tcache!=NULL && size <= tcache_maxclass> # "tcache不为NULL 且 size <= je_nhbins对应的class_size时"
| <== | Y\ - tcache_alloc_large(tsdn_tsd(tsdn), arena, tcache, size, ind, zero, slow_path)
| \ - bin = tcache_large_bin_get(tcache, binind)
| \ - bin = &tcache->bins_large[binind - NBINS]
| - ret = cache_bin_alloc_easy(bin, &tcache_success)
| \ - ret = *(bin->avail - bin->ncached)
| - <tcache_success?> #"从tcache拿到可用的内存了吗?"
| <== | Y\- return ret
| N\- ret = large_malloc(tsd_tsdn(tsd), arena, sz_s2u(size), zero)
| <== | - return ret
| <== | - arena_malloc_hard(tsdn, arena, size, ind, zero) #"> je_nhbins对应的class_size时不走tcache, 走arena的分配
|| 没有tcache的情况下, 也走arena的分配"
| \ - arena = arena_choose(tsdn_tsd(tsdn), arena);
| <== | - <size <= SMALL_MAXCLASS> - arena_malloc_small(tsdn, arena, ind, zero)
|<=| \ - return ((uintptr_t)extent_addr_get(slab) + (uintptr_t)(bin_info->reg_size * regind)) # "small 要寻bitmap"
| <== | - <size > SMALL_MAXCLASS > - large_malloc(tsdn, arena, sz_index2size(ind), zero)
| \ - extent = arena_extent_alloc_large(tsdn, arena, usize, alignment, &is_zeroed)
|<== | - return extent_addr_get(extent) #"large 只需要找e_addr返回即可"
| | - *dopts->result = allocation;

Small类型内存分配分为两类,一类是arena_malloc_hard,另一类是arena_malloc_small

这两类的区别是tcache_alloc_small申请的内存有和线程进行绑定,即申请的内存数据会设置为线程私有数据(tsd),而arena_malloc_hard申请的内存没有和线程绑定。

imalloc_body流程中, 会获取tcache tcache = tcaches_get(tsd, dopts->tcache_ind), 而在初始化或者slow malloc时, tcache是NULL, 就需要从arena->bins中获取.

  1. tcache中获取, 先从tcache->bins_small[binind]取出cache_bin的信息, 通过cache_bin_alloc_easy函数取出*(bin->avail - bin->ncached) 返回出去, 这个即为对应的可用的region, 此处bin->ncached保存的可用的缓存数目, 如果bin->ncached为0, 则需要重新装填bins_small结构, [见1.7 tsd->tcache->bins_small 填充流程](#1.7 tsd->tcache->bins_small 填充流程), 装填完后, 再次通过cache_bin_alloc_easy函数取出*(bin->avail - bin->ncached) 返回出region

  2. 不从tcache中获取, 需要加锁, 因为arena是全局的变量, 不是线程特有的. 通过arena_malloc_hard函数, 从arena->bins[ind]取出bin, 此处的bin不是tcache的cache_bin结构, 而是开篇介绍的bin结构, 其中bin->slabcur是当前bin使用的extent结构, 需要从其位图中找出可用的region. 如果slabcur是空的, 或者当前的slab(extent)没有可用的region, 则需要调用arena_bin_nonfull_slab_get函数获取, 这里和[1.6 tsd->tcache->bins_small 填充流程](#1.6 tsd->tcache->bins_small 填充流程)流程是重复的, 需要理清tache->bins_small[ind]和arena->bins[ind]之间的关系.

    在装填过程中, 可以看出 arena->bins[ind] 维护的 bin 是一个全局的大缓存, 而tache的cache_bin是一个和tsd绑定的小缓存, 当小缓存ncached变成0时, 即没有可用的region后, 需要装填cache_bin. 优先级是先从大缓存中装入(先找slabcur, slabcur满了, 再找slab_nofull出堆, 找free的region, 同时更新slabcur), 如果大缓存没有了, 就需要从extents_dirty(muzzy|retained)(heap)中回收, 回收也没有的话, 就需要mmap然后新建slab, 再分给小缓存(也包括更新大缓存, 和把extent注册到rtree中)

jemalloc.drawio

1.6 tsd->tcache->bins_small 填充流程

lg_fill_div用作tcache refill时作为除数. 当tcache耗尽时, 会请求arena bins_small进行refill. 但refill不会一次性灌满tcache, 而是依照其最大容量缩小2^lg_fill_div的倍数. 该数值同low_water一样是动态的, 两者互相配合确保tcache处于一个合理的充满度”

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
--> arena_tcache_fill_small(tsdn_t *tsdn, arena_t *arena, tcache_t *tcache, cache_bin_t *tbin, szind_t binind, uint64_t prof_accumbytes)
| - bin = &arena->bins[binind];
|for <i = 0, nfill = (tcache_bin_info[binind].ncached_max >> tcache->lg_fill_div[binind]); i < nfill; i+> -
|if\ - < slab = bin->slabcur) != NULL && extent_nfree_get(slab) > 0> - #"当前使用中的 slab 通过 bin->slabcur 分配"
| \ - (bin->slabcur->e_bits & EXTENT_BITS_NFREE_MASK)>>EXTENT_BITS_NFREE_SHIFT > 0
| - ptr = arena_slab_reg_alloc(slab, &bin_infos[binind])
| \ - arena_slab_data_t *slab_data = ent_slab_data_get(slab)
| - regind = bitmap_sfu(slab_data->bitmap, &bin_info->bitmap_info) #"计算可用region的在slab_data bitmap中的index"
| - ret = (void *)((uintptr_t)extent_addr_get(slab) + (uintptr_t)(bin_info->reg_size * regind)); #"返回可用region的地址"
| - extent_nfree_dec(slab) #"extent中可用的region数减去EXTENT_BITS_NFREE_SHIFT 数目, 表示这些region装填给了 tcache"
| - return ret
|el| - < slab = bin->slabcur) != NULL && extent_nfree_get(slab) > 0> #"即bin->slabcur为NULL && bin->slabcur->e_bits标识NFREE时"
| \ - ptr = arena_bin_malloc_hard(tsdn, arena, bin, binind) # Re-fill bin->slabcur, then call arena_slab_reg_alloc()
| \ - <!arena_is_auto(arena) && bin->slabcur != NULL> -
| Y\ - arena_bin_slabs_full_insert(arena, bin, bin->slabcur);
| - bin->slabcur = NULL;
| | - slab = arena_bin_nonfull_slab_get(tsdn, arena, bin, binind) # 1.6.1 arena_bin_nonfull_slab_get过程
| \ - slab = arena_bin_slabs_nonfull_tryget(bin) #"Look for a usable slab."
| \ - extent_t *slab = extent_heap_remove_first(&bin->slabs_nonfull) #nonfull 出堆 "从slabs_nonfull中获取"
| - <slab != NULL>
| |<-| Y\ - return slab
| N\ - bin_info = &bin_infos[binind]
| | | - slab = arena_slab_alloc(tsdn, arena, binind, bin_info) #Allocate a new slab
| \ - szind = sz_size2index(bin_info->reg_size)
| | - *slab = extents_alloc(tsdn, arena, &extent_hooks, #从"extents_dirty"中分配
&arena->extents_dirty, NULL, bin_info->slab_size, 0, PAGE, true, binind, &zero, &commit)
| \ - extent_recycle(tsdn, arena, r_extent_hooks, extents, new_addr, size,
pad, alignment, slab, szind, zero, commit, false) # "后面进行分析"
| - <extent "分配出来了">
| Y\ - extent_dumpable_get(extent) #
| | | <--| - return extent #end "arena_bin_nonfull_slab_get(tsdn, arena, bin, binind)"
| - < slab == NULL >
| | Y\ - slab = extents_alloc(tsdn, arena, &extent_hooks, ##" extents_dirty 中 没分配出来, 从extents_muzzy" 中分配
&arena->extents_muzzy, NULL, bin_info->slab_size, 0, PAGE, true, binind, &zero, &commit)
| - < slab == NULL > -
| N\ - extent_dumpable_get(extent)
| | | |<-| - return extent #end "arena_bin_nonfull_slab_get(tsdn, arena, bin, binind)"
| | | | | Y\ - slab = arena_slab_alloc_hard(tsdn, arena, &extent_hooks, bin_info, szind); #" extents_muzzy 中 没分配出来"
| - slab = extent_alloc_wrapper(tsdn, arena, r_extent_hooks, NULL,
bin_info->slab_size, 0, PAGE, true, szind, &zero, &commit)
| \ - *extent = extent_alloc_retained(tsdn, arena, r_extent_hooks,
new_addr, size, pad, alignment, slab, szind, zero, commit)
| \ - *extent = extent_recycle(tsdn, arena, r_extent_hooks, #从"extents_retained" 中分配
&arena->extents_retained, new_addr, size, pad, alignment, slab, szind, zero, commit, true)
| - < slab == NULL > - #"extent_alloc_retained 没分配出来"
| N\ - extent_dumpable_get(extent)
| | | | <== | - return extent #end "arena_bin_nonfull_slab_get(tsdn, arena, bin, binind)"
| | | | | Y\ - extent_alloc_wrapper_hard(tsdn, arena, r_extent_hooks,
new_addr, size, pad, alignment, slab, szind, zero, commit)
| | | - *extent = extent_alloc(tsdn, arena) # "参考 1.7 extent_alloc流程, 这里只是新建extent base的元数据"
| - <*r_extent_hooks == &extent_hooks_default> -
| Y\ - addr = extent_alloc_default_impl(tsdn, arena, new_addr,
esize, alignment, zero, commit) #"Call directly to propagate tsdn"
| - addr = extent_alloc_default_impl(tsdn, arena, NULL, alloc_size, PAGE, &zeroed, &committed)
| \ - extent_alloc_core(tsdn, arena, new_addr, size, alignment, zero, #(new_addr=NULL)
commit, (dss_prec_t)atomic_load_u(&arena->dss_prec, ATOMIC_RELAXED))
| - addr = extent_alloc_mmap(new_addr, size, alignment, zero, commit)
| \ - pages_map(new_addr, size, ALIGNMENT_CEILING(alignment, PAGE), commit)
| \ - os_pages_map(addr, size, os_page, commit)
| - addr = mmap(addr, size, PROT_READ | PROT_WRITE, mmap_flags, -1, 0)
#"addr 0 系统自动分配地址, 分配实际的地址, 挂到extent base元数据的e_addr下"
| - prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, addr, size, "libc_malloc")
| | | - return addr
| - extent_init(extent, arena, addr, esize, slab, szind, arena_extent_sn_next(arena),
extent_state_active, *zero, *commit, true);
| \ - extent_addr_set(extent, addr); #"这里应该是真正给extent设置region内存地址的地方"
| - extent_sn_set(extent, sn);
| - ...
| | | - extent_register(tsdn, extent)
| \ - *rtree_ctx = tsdn_rtree_ctx(tsdn, &rtree_ctx_fallback)
| \ - *rtree_ctx = &tsdn->tsd->use_a_getter_or_setter_instead_rtree_ctx
| \ - < extent_rtree_leaf_elms_lookup(tsdn, rtree_ctx, extent, false, true, &elm_a, &elm_b) ?>
| Y\ - # "rtree中找到了这个extent"
| - return true# 不用再注册, 因为已经在rtree中了
| N\ - # "rtree中没找到, 需要注册"
| - extent_rtree_write_acquired(tsdn, elm_a, elm_b, extent, szind, slab)
| \ - rtree_leaf_elm_write(tsdn, &extents_rtree, elm_a, extent, szind, slab)
| - rtree_leaf_elm_write(tsdn, &extents_rtree, elm_b, extent, szind, slab)
| - bool slab = extent_slab_get(extent)
| - <slab?>
| Y\ - #小内存 small_class
| - extent_interior_register(tsdn, rtree_ctx, extent, szind)
| \ - for (size_t i = 1; i < (extent_size_get(extent) >> LG_PAGE) - 1; i++)
# "key" = (uintptr_t)extent_base_get(extent) + (uintptr_t)(i << LG_PAGE)
| \ - <"elm" = rtree_leaf_elm_lookup(tsdn, rtree, rtree_ctx, "key", false, true) FOUND?>
| Y\ - rtree_leaf_elm_write(tsdn, rtree, elm, extent, szind, slab);
| - return extent
| | | | - *slab_data = extent_slab_data_get(slab);
| - extent_nfree_set(slab, bin_info->nregs);
| \ - extent->e_bits = (extent->e_bits & ~EXTENT_BITS_NFREE_MASK) | (nfree << EXTENT_BITS_NFREE_SHIFT)
| - bitmap_init(slab_data->bitmap, &bin_info->bitmap_info, false)
| | <-- | - return slab; # end " slab = arena_bin_nonfull_slab_get(tsdn, arena, bin, binind)"
| - #end "slab = arena_bin_nonfull_slab_get(tsdn, arena, bin, binind)"
| - bin->slabcur = slab; #"更新大缓存"
| - *(tbin->avail - nfill + i) = ptr;
| - tbin->ncached = i; #"for end"

首先从 tsd->tcache->bins_small[binind] 中获取对应 size_class 的内存,有的话将内存直接返回给用户,如果 bins_small[binind] 中没有的话,需要通过 slab(extent)tsd->tcache->bins_small[binind] 进行填充,一次填充多个以备后续分配,填充方式如下(当前步骤无法成功则进行下一步):

  1. 通过 bin->slabcur 分配
  2. bin->slabs_nonfull 中获取可使用的 extent
  3. arena->extents_dirty 中回收 extent,回收方式为 *first-fit*,即满足大小要求且序列号最小地址最低(最旧)extent,在 arena->extents_dirty->bitmap 中找到满足大小要求并且第一个非空 heap 的索引 i,然后从 extents->heaps[i] 中获取第一个 extent。由于 extent 可能较大,为了防止产生内存碎片,需要对 extent 进行分裂(伙伴算法),然后将分裂后不使用的 extent 放回extents_dirty
  4. arena->extents_muzzy 中回收 extent,回收方式为 *first-fit*,即满足大小要求且序列号最小地址最低(最旧)extent,遍历每个满足大小要求并且非空的 arena->extents_dirty->bitmap,获取其对应 extents->heaps 中第一个 extent,然后进行比较,找到最旧的 extent,之后仍然需要分裂
  5. arena->extents_retained 中回收 extent,回收方式与 extents_muzzy 相同
  6. 尝试通过 mmap 向内核获取所需的 extent 内存,并且在 rtree 中注册新 extent 的信息
  7. 再次尝试从 bin->slabs_nonfull 中获取可使用的 extent

简单来说,这个流程是这样的,cache_bin -> slab -> slabs_nonfull -> extents_dirty -> extents_muzzy -> extents_retained -> kernel

1.7 extent_alloc流程

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
--> extent_t * extent_alloc(tsdn_t *tsdn, arena_t *arena)
| \ - *extent = extent_avail_first(&arena->extent_avail) #"参考1.5 宏展开"
| - < extent == NULL >
| N\ - extent_avail_remove(&arena->extent_avail, extent) # "先从extent_avail 中获取"
| Y\ - return base_alloc_extent(tsdn, arena->base)
| - *extent = base_alloc_impl(tsdn, arena->base, sizeof(extent_t)
| \ - usize = ALIGNMENT_CEILING(size, alignment)
| - asize = usize + alignment - QUANTUM;
| - for (szind_t i = sz_size2index(asize); i < NSIZES; i++)
| \ - extent = extent_heap_remove_first(&base->avail[i])
| - < extent != NULL >
<==*| \ - break;
| | - < extent == NULL > # "上面遍历完, 仍没找到有效的extent"
| Y\ - extent = base_extent_alloc(tsdn, base, usize, alignment)
| \ - *block = base_block_alloc(tsdn, base, extent_hooks, base_ind_get(base),
&base->pind_last, &base->extent_sn_next, size, alignment)
| \ - *block = base_map(tsdn, extent_hooks, ind, block_size)
| \ - addr = extent_alloc_mmap(NULL, size, alignment, &zero, &commit)
| - pages_map(new_addr, size, ALIGNMENT_CEILING(alignment, 1<< 12), commit)
| \ - mmap(addr, size, PROT_READ | PROT_WRITE, mmap_flags, -1, 0)
| - prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, ret, size, "libc_malloc");
| | - return addr
| | - base_extent_init(extent_sn_next, &block->extent, *(block + header_size), block_size - header_size)
| \ - sn = *extent_sn_next; #"extent 还有sn号"
| - extent_binit(extent, addr, size, sn)
| \ - extent_arena_set(extent, NULL)
| - extent_addr_set(extent, addr);
| - extent_bsize_set(extent, bsize);
| - ... "设置extent e_bits 保存相关的数据结构关系"
| - block->next = base->blocks; base->blocks = block; # "挂入arena->base->blocks链表"

在small和large分配的函数中, 牵扯到的比较重要的函数就是这个extent_recycle

下面对其重点分析一下

extent_recycle 回收函数分析

Tries to satisfy the given allocation request by reusing one of the extents in the given extents_t.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
static extent_t *
extent_recycle(tsdn_t *tsdn, arena_t *arena, extent_hooks_t **r_extent_hooks,
extents_t *extents, void *new_addr, size_t size, size_t pad,
size_t alignment, bool slab, szind_t szind, bool *zero, bool *commit,
bool growing_retained) {

rtree_ctx_t rtree_ctx_fallback;
// 取tsd结构体的 use_a_getter_or_setter_instead_rtree_ctx成员
tsd_rtree_ctx(tsd_t *tsd) {
return tsd_rtree_ctxp_get(tsd);
}
rtree_ctx_t *
tsdn_rtree_ctx(tsdn_t *tsdn, rtree_ctx_t *fallback) {
...
return tsd_rtree_ctx(tsdn_tsd(tsdn));
}

rtree_ctx_t *rtree_ctx = tsdn_rtree_ctx(tsdn, &rtree_ctx_fallback);
// 从extents中回收extent extents是从上面传入的 arena->extents_dirty | extents_muzzy | extents_retained
extent_t *extent = extent_recycle_extract(tsdn, arena, r_extent_hooks,
rtree_ctx, extents, new_addr, size, pad, alignment, slab,
growing_retained);
if (extent == NULL) {
return NULL;
}
// 分裂, 把多余的放回到extents中
extent = extent_recycle_split(tsdn, arena, r_extent_hooks, rtree_ctx,
extents, new_addr, size, pad, alignment, slab, szind, extent,
growing_retained);
if (extent == NULL) {
return NULL;
}
// 正常的话, 如果能回收到extent, 其状态应该是e_bits中commited应该是置位的, 如果没置位, 则表示需要重新通过mmap分配内存, 这里os_overcommits比较迷惑
if (*commit && !extent_committed_get(extent)) {
if (extent_commit_impl(tsdn, arena, r_extent_hooks, extent,
0, extent_size_get(extent), growing_retained)) {
extent_record(tsdn, arena, r_extent_hooks, extents,
extent, growing_retained);
return NULL;
}
extent_zeroed_set(extent, true);
}

if (extent_committed_get(extent)) {
*commit = true;
}
if (extent_zeroed_get(extent)) {
*zero = true;
}

if (pad != 0) {
extent_addr_randomize(tsdn, extent, alignment);
}
assert(extent_state_get(extent) == extent_state_active);
if (slab) {
extent_slab_set(extent, slab);
extent_interior_register(tsdn, rtree_ctx, extent, szind);
}
if (*zero) {
void *addr = extent_base_get(extent);
size_t size = extent_size_get(extent);
// 如果不是0, 一般情况下, 上面肯定设置0了, 不是0, 会进行force purge过程, 告诉kernel 可以回收这块物理页, 虚拟内存不被mmu映射的话, 就是摆设
if (!extent_zeroed_get(extent)) {
if (pages_purge_forced(addr, size)) {
memset(addr, 0, size);
}
} else if (config_debug) {
size_t *p = (size_t *)(uintptr_t)addr;
for (size_t i = 0; i < size / sizeof(size_t); i++) {
assert(p[i] == 0);
}
}
}
return extent;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
--> extent_recycle(tsdn_t *tsdn, arena_t *arena, extent_hooks_t **r_extent_hooks,
extents_t *extents, void *new_addr, size_t size, size_t pad,
size_t alignment, bool slab, szind_t szind, bool *zero, bool *commit,
bool growing_retained) # "new_addr = NULL"
| \ - *rtree_ctx = tsdn_rtree_ctx(tsdn, &rtree_ctx_fallback)
| \ - *rtree_ctx = &tsdn->tsd->use_a_getter_or_setter_instead_rtree_ctx
| - *extent = extent_recycle_extract(tsdn, arena, r_extent_hooks, #"to find & remove an extent from extents that can be used..."
rtree_ctx, extents, new_addr, size, pad, alignment, slab, growing_retained)
| \ - <new_addr == NULL> - (new_addr = NULL)
| Y\ - extents_fit_locked(tsdn, arena, extents, esize, alignment)
| \ - max_size = esize + PAGE_CEILING(alignment) - PAGE;
| - *extent = extents_first_fit_locked(tsdn, arena, extents, max_size) #" Do first-fit extent selection,
i.e. select the oldest/lowest extent that is large enough."
| - <alignment > PAGE && extent == NULL> - #"没分配成功"
| Y\ - extent = extents_fit_alignment(extents, esize, max_size, alignment); #"再fit 一遍alignment
Find an extent with size [min_size, max_size) to satisfy the alignment requirement.
For each size, try only the first extent in the heap."
| - return extent
| - extent = extent_recycle_split(tsdn, arena, r_extent_hooks, rtree_ctx,
extents, new_addr, size, pad, alignment, slab, szind, extent, growing_retained)
#"extent_recycle_extract 后如果有多的未用的空间, 再split放回到extents中?"
| \ - extent_split_interior(tsdn, arena, r_extent_hooks, rtree_ctx, &extent,
&lead, &trail, &to_leak, &to_salvage, new_addr, size, pad, alignment, slab, szind, growing_retained)
| - < result == extent_split_interior_ok > #是否split成功了
| Y\ - < lead != NULL > #切到头部了
| Y\ - extent_deactivate(tsdn, arena, extents, lead) #设置成不活跃的
| <-- | - return extent
| N\ - # split失败 "should have picked an extent that was large enough to fulfill our allocation request" ...
| | <-- | - return NULL
| - < *commit && !extent_committed_get(extent) >
| Y\ - extent_commit_impl(tsdn, arena, r_extent_hooks, extent, 0, extent_size_get(extent), growing_retained))
| \ - (*r_extent_hooks)->commit(*r_extent_hooks, extent_base_get(extent), extent_size_get(extent),
offset, length, arena_ind_get(arena)))
| \ - extent_commit_default(...)
| - <extent_commit_impl return true?>
| <-- | Y\ - return NULL (os_overcommits = true时返回NULL /proc/sys/vm/overcommit_memory 值为0或1时)
| - < *zero == true >
| Y\ - < extent_zeroed_get(extent) == false > # "extent->e_bits zero位非0"
| Y\ - pages_purge_forced(addr, size)
| \ - return (madvise(addr, size, MADV_DONTNEED) != 0) # "一般走不到这个地方"

je_free 流程

上面大概只分析了small 分配的流程, 还需要深入调研下,留待后续进行研究。

先来看下 free的流程, 加深对extent_recycle过程的理解

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
--> je_free(void *ptr)
| \ - tsd_t *tsd = tsd_fetch_min();
| - return tsd_fetch_impl(true, true) <bool init, bool minimal)>;
| \ - tsd_t *tsd = tsd_get(true) #从前面的分析中,怀疑只有两个tsd
| \ - wrapper = tsd_wrapper_get(init) #wrapper结构体封装了 tsd的初始化的状态 和 tsd结构体实例
| \ - tsd_wrapper_t *wrapper = (tsd_wrapper_t *)pthread_getspecific(tsd_tsd)
| <==| return &wrapper->val;
| | - tcache = tsd_tcachep_get(tsd) #取出tsd结构体实例的tcache成员 参考 tsd的结构
| *| - ifree(tsd, ptr, tcache, false) #slow_path(false) 现在只关注fast的场景
| \ - rtree_ctx_t *rtree_ctx = tsd_rtree_ctx(tsd)
| - rtree_szind_slab_read(tsd_tsdn(tsd), &extents_rtree, rtree_ctx,
(uintptr_t)ptr, true, &alloc_ctx.szind, &alloc_ctx.slab)
| \ - rtree_leaf_elm_t *elm = rtree_read(tsdn, rtree, rtree_ctx, key, dependent) #key为ptr rtree遍历查找ptr,确定其位于哪个叶子节点
| - uintptr_t bits = rtree_leaf_elm_bits_read(tsdn, rtree, elm, dependent); # elm->le_bits
| - *r_szind = rtree_leaf_elm_bits_szind_get(bits); #(bits >> LG_VADDR)(64位为48)可以查出index
| - *r_slab = rtree_leaf_elm_bits_slab_get(bits) #bits & (uintptr_t)0x1 得到其是否是slab(small_class)
| - #查询的结果放到alloc_ctx中(*r_szind *r_slab )
| | - idalloctm(tsd_tsdn(tsd), ptr, tcache, &alloc_ctx, false, false)
| \ - arena_dalloc(tsdn, ptr, tcache, alloc_ctx, slow_path)
| - < tcache==NULL? >
| | Y\ - return arena_dalloc_no_tcache(tsdn, ptr) #"没有tcache的情况,一般走不到这"
| | N\ - szind = alloc_ctx->szind;
| - slab = alloc_ctx->slab;
| - < slab ?>
| | <== | Y\ - return arena_dalloc_small(tsdn, ptr) #"走small的释放"
| \ - extent_t *extent = iealloc(tsdn, ptr)
| \ - rtree_ctx_t *rtree_ctx = tsdn_rtree_ctx(tsdn, &rtree_ctx_fallback)
| - return rtree_extent_read(tsdn, &extents_rtree, rtree_ctx, (uintptr_t)ptr, true)
| \ - rtree_leaf_elm_t *elm = rtree_read(tsdn, rtree, rtree_ctx, key, dependent) #ptr为key, 找出叶子节点
| - return rtree_leaf_elm_extent_read(tsdn, rtree, elm, dependent) #通过叶子节点找到对应的extent指针
| \ - bits = rtree_leaf_elm_bits_read(tsdn, rtree, elm, dependent)
| <== | - return rtree_leaf_elm_bits_extent_get(bits) # 通过elm->le_bits 移位和位运算获得extent的指针
| - arena_t *arena = extent_arena_get(extent)
| \ - arena_ind = (unsigned)((extent->e_bits & EXTENT_BITS_ARENA_MASK) >> EXTENT_BITS_ARENA_SHIFT)
#"entent->e_bits 移位+位运算 得到arena ind"
| - return &arenas[arena_ind] #arenas ind 取出arena
*| - arena_dalloc_bin(tsdn, arena, extent, ptr)
| \ - binind = extent_szind_get(extent)
| \ - szind = (szind_t)((extent->e_bits & EXTENT_BITS_SZIND_MASK) >> EXTENT_BITS_SZIND_SHIFT)
| - bin_t *bin = &arena->bins[binind]; #"arena->bins[ind]取出bin"
*| - arena_dalloc_bin_locked_impl(tsdn, arena, extent, ptr, false)
| \ - *slab_data = extent_slab_data_get(slab)
| - binind = extent_szind_get(slab)
| - *bin_info = &bin_infos[binind]
| - arena_slab_reg_dalloc(slab, slab_data, ptr)
| \ - regind = arena_slab_regind(slab, binind, ptr) #"计算ptr的region id"
| - bitmap_unset(slab_data->bitmap, &bin_info->bitmap_info, regind) #重设该region在slab_data bitmap的使用标记
| - extent_nfree_inc(slab) #"entent 记录free的region数+1"
| - nfree = extent_nfree_get(slab) #"获得free的region数"
*| - < nfree == bin_info->nregs > #"该extent 变成了所有 region全是未使用的状态?"
*| Y\ - arena_dissociate_bin_slab(arena, slab, bin)
| - < slab == bin->slabcur >
| Y\ - bin->slabcur = NULL #该slab = slabcur时, 将slabcur 设成NULL结束
| N\ - bin_info = &bin_infos[binind]
| - < bin_info->nregs == 1 ?> #"判断该slab是处在full列中还是nonfull堆中"
| Y\ - arena_bin_slabs_full_remove(arena, bin, slab) #full中删除
| N\ - arena_bin_slabs_nonfull_remove(bin, slab); #从nonfull中删除
| | - arena_dalloc_bin_slab(tsdn, arena, slab, bin)
| \ - arena_slab_dalloc(tsdn, arena, slab)
| \ - arena_nactive_sub(arena, extent_size_get(slab) >> LG_PAGE)
| \ - &arena->nactive - (extent_size_get(slab) >> LG_PAGE) #"arena active page (就是使用活跃状态的
总page数 - extent占用的page数")
| | | *| - arena_extents_dirty_dalloc(tsdn, arena, &extent_hooks, slab)
#"这个地方的生效条件是 该extent变成了所有 region全是未使用的状态"
| \ - extents_dalloc(tsdn, arena, r_extent_hooks, &arena->extents_dirty, extent); #//TODO
| N\ - < nfree == 1 && slab != bin->slabcur ?> #"extent nfree释放后恰好变成了1, 即从full状态变成了nonfull状态"
| Y\ - arena_bin_slabs_full_remove(arena, bin, slab) #"将该slab从slab_full队中删除"
| \ - extent_list_remove(&bin->slabs_full, slab)
| - arena_bin_lower_slab(tsdn, arena, slab, bin)
| - < bin->slabcur != NULL && extent_snad_comp(bin->slabcur, slab) > 0 > #这个应该是比较哪个slab比较旧
| Y\ - < extent_nfree_get(bin->slabcur) > 0 > #slabcur nfree>0?
| Y\ - arena_bin_slabs_nonfull_insert(bin, bin->slabcur)
##当前slab旧, 轮转slab和slabcur, 即slabcur总是用最旧的, slabcur 入nonfull堆
| N\ - arena_bin_slabs_full_insert(arena, bin, bin->slabcur) # slabcur 入full堆, 因为nfree没了
| - bin->slabcur = slab #切换slabcur为当前的slab
| N\ - arena_bin_slabs_nonfull_insert(bin, slab) #slabcur旧, 不轮转, 将该slab其入nonfull堆, 从full列删除
| N\ - extent_t *extent = iealloc(tsdn, ptr)
| | <== | - return large_dalloc(tsdn, extent) #"走large的释放"

上述流程中涉及到了 arena_extents_dirty_dalloc方法是内存被释放后, 与dirty extents堆的交互过程, 重点看下这部分, 其他都是相关bin 数据结构的关系整理.

arena_extents_dirty_dalloc extent变成了所有 region全是未使用的状态时的处理

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# "输入为extent, 即上面传下来的slab, 还有arena, 触发条件是extent变成了所有 region全是未使用的状态" 函数执行前后需要加锁, 略过
-->void arena_extents_dirty_dalloc(tsdn_t *tsdn, arena_t *arena, extent_hooks_t **r_extent_hooks, extent_t *extent)
| \ - extents_dalloc(tsdn, arena, r_extent_hooks, &arena->extents_dirty, extent);
| \ - extent_addr_set(extent, extent_base_get(extent)) #"设置成addr page对齐的地址"
| - extent_zeroed_set(extent, false) # 是否是重置的, 传入的是false
| - extent_record(tsdn, arena, r_extent_hooks, extents, extent, false) # 这里用到了 extents, 上文是 dirty的heap
| \ - extent_szind_set(extent, NSIZES);
| - < extent_slab_get(extent) ?> "传入的entent是否是small class的slab " 上文传下来是的
*| Y\ - extent_interior_deregister(tsdn, rtree_ctx, extent); #"反注册, 从全局rtree中删除记录" //TODO
| - extent_slab_set(extent, false) # "消除slab标记, 变成普通的extent"
| - < !extents->delay_coalesce? > # 是否指定了延迟合并, 未指定则立即合并
| Y\ - extent = extent_try_coalesce(tsdn, arena, r_extent_hooks, rtree_ctx, extents, extent, NULL, growing_retained);
| N\ - < extent_size_get(extent) >= LARGE_MINCLASS > # "extent size >= (1<<14) Always coalesce large extents eagerly "
| Y\ - <for ; coalesced && extent_size_get(extent) >= prev_size + LARGE_MINCLASS; >
#"如果返回的coalesced为true && extent的size >= (合并前的size + 1<<14), 就会一直触发合并, 直到合并不下去"
| \ - prev_size = extent_size_get(extent) #"记住合并前的大小"
| - extent = extent_try_coalesce(tsdn, arena, r_extent_hooks, rtree_ctx, extents, extent, &coalesced,
growing_retained) #"进行合并, 返回是否合并成功了"
*| - extent_deactivate_locked(tsdn, arena, extents, extent)
| \ - extent_state_set(extent, extents_state_get(extents) #上文传入的是dirty堆, 所以这里将extent标记为了dirty
| - extents_insert_locked(tsdn, extents, extent) #将extent插入到 dirty堆中

小内存释放小结

总结上述小内存释放的流程, 当前仅关注fast场景

  1. 根据调用上下文执行流程, 找到其绑定的tsd

  2. 根据tsd找到其tcache成员

  3. fast场景, 根据全局rtree, ptr地址为key, 检索出其size ind, 以及是否是 slab (地址是否是小内存的)

  4. 如果没有tcache, 走arena_dalloc_no_tcache(tsdn, ptr)流程, 一般情况下都是有tcache的, 走arena_dalloc_small(tsdn, ptr)流程

  5. iealloc(tsdn, ptr) 根据全局rtree, ptr地址为key, 检索出其 extent 结构, 由extent extent_arena_get(extent)获取extent对应的arena, 由extent extent_szind_get(extent)获取binind, 进而&arena->bins[binind]获得bin结构, arena_slab_regind(slab, binind, ptr) 获得ptr在extent中对应的region id. 清除该regionid 在extent中的使用标记, extent_nfree_inc(slab) extent 元信息中记录其 free region数+1

  6. 如果释放后, 该extent变成了全free 状态(即所有region 都是free的), arena active page数 - 该extent所占的page数, 最后将该extent或者合并后的extent挂入dirty堆中, 这时, 如果支持合并的话(have_dss对应的宏打开), 查extent前后的extent, 循环合并, 直到合并不了为止

    6.1 ptr所在的slab是 bin->slabcur, 将slabcur 设成NULL

    6.2 ptr所在slab不是slabcur的情况下, 如果其bin_info->nregs == 1, 即释放前bin中的nregs(总region数)是1, 释放后, 该bin就变成未使用的状态了, 需要将其从slabfull 列中删除, 如果不是1(肯定也不是0, 就是>1的情况), 将其从nonfull堆中删除

    bin nregs 是所有ind extent的region的总数

  7. 如果释放后, nfree释放后恰好变成了1, 即从full状态变成了nonfull状态. 将 arena->bins[binind]->slabcur 切换为这个 extent,前提是这个 extent “更旧”(序列号更小地址更低),并且将替换后的 extent 移入 arena->bins[binind]->slabs_nonfull, 并从slab_full队中删除.

extent_try_coalesce 合并extent

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
static extent_t *
extent_try_coalesce(tsdn_t *tsdn, arena_t *arena,
extent_hooks_t **r_extent_hooks, rtree_ctx_t *rtree_ctx, extents_t *extents,
extent_t *extent, bool *coalesced, bool growing_retained) {
/*
* Continue attempting to coalesce until failure, to protect against
* races with other threads that are thwarted by this one.
*/
// 这里的again 是多线程防止竞争用的, 如果有别的线程再操作, 这里会直接退出while 循环
bool again;
do {
again = false;

/* Try to coalesce forward. 前向合并*/
extent_t *next = extent_lock_from_addr(tsdn, rtree_ctx,
extent_past_get(extent));
if (next != NULL) {
/*
* extents->mtx only protects against races for
* like-state extents, so call extent_can_coalesce()
* before releasing next's pool lock.
*/
bool can_coalesce = extent_can_coalesce(arena, extents,
extent, next);

extent_unlock(tsdn, next);

if (can_coalesce && !extent_coalesce(tsdn, arena,
r_extent_hooks, extents, extent, next, true,
growing_retained)) {
if (extents->delay_coalesce) {
/* Do minimal coalescing. */
*coalesced = true;
return extent;
}
again = true;
}
}

/* Try to coalesce backward. 后向合并*/
extent_t *prev = extent_lock_from_addr(tsdn, rtree_ctx,
extent_before_get(extent));
if (prev != NULL) {
bool can_coalesce = extent_can_coalesce(arena, extents,
extent, prev);
extent_unlock(tsdn, prev);

if (can_coalesce && !extent_coalesce(tsdn, arena,
r_extent_hooks, extents, extent, prev, false,
growing_retained)) {
extent = prev;
if (extents->delay_coalesce) {
/* Do minimal coalescing. */
*coalesced = true;
return extent;
}
again = true;
}
}
} while (again);

if (extents->delay_coalesce) {
*coalesced = false;
}
return extent;
}

去除多线程的逻辑外, 大致的流程是

1
2
3
4
5
6
7
8
9
10
11
-->extent_t * extent_try_coalesce(tsdn_t *tsdn, arena_t *arena, extent_hooks_t **r_extent_hooks, rtree_ctx_t *rtree_ctx, 
extents_t *extents, extent_t *extent, bool *coalesced, bool growing_retained)
| \ - while(again)
| \ - extent_t *next = extent_lock_from_addr(tsdn, rtree_ctx, extent_past_get(extent)) #"查找后一个extent"
| - bool can_coalesce = extent_can_coalesce(arena, extents, extent, next) #"查看是否可以合并"
| - < can_coalesce? && !extent_coalesce(tsdn, arena, r_extent_hooks, extents, extent, next, true, growing_retained) >
|<-| N\ - again = false
| - extent_lock_from_addr(tsdn, rtree_ctx, extent_before_get(extent)) #"查找前一个extent"
| - bool can_coalesce = extent_can_coalesce(arena, extents, extent, prev)
| - < can_coalesce? && !extent_coalesce(tsdn, arena, r_extent_hooks, extents, extent, next, false, growing_retained) >
|<-| N\ - again = false

extent_coalesce 合并extent

看起来没有实现, 没有配置合并, 如果配置了, 会走extent_dss_mergeable(extent a, extent b)流程

1
2
3
4
5
6
7
8
9
10
11
12
13
--> bool extent_coalesce(tsdn_t *tsdn, arena_t *arena, extent_hooks_t **r_extent_hooks,
extents_t *extents, extent_t *inner, extent_t *outer, bool forward, bool growing_retained)
| \ - extent_merge_impl(tsdn, arena, r_extent_hooks, forward ? inner : outer, forward ? outer : inner, growing_retained)
#"根据forward 标记倒换 inner outer次序, 根据上文看第二次后向合并的过程, forward是false, 则extent 是outer, back extent是 inner"
| \ - < *r_extent_hooks == &extent_hooks_default? >
| Y\ - err = extent_merge_default_impl(extent_base_get(a), extent_base_get(b))
| \ - err = extent_merge_default_impl(extent_base_get(a), extent_base_get(b)) #"空实现?" ??//TODO
| \ - <have_dss? JEMALLOC_DSS? > # "这个宏好像没开"
| \ - return !extent_dss_mergeable(addr_a, addr_b)
| N\ - err = (*r_extent_hooks)->merge(*r_extent_hooks, extent_base_get(a), extent_size_get(a), extent_base_get(b),
extent_size_get(b), extent_committed_get(a), arena_ind_get(arena))
| - extent_merge_default(...)
| \ - extent_merge_default_impl(void *addr_a, void *addr_b)