0%

Android fuse 文件系统调研

1. 开篇

本篇主要从存储挂载系统介绍fuse的相关流程. 在Android系统中, 在内部存储和外部存储之上新加了fuse 挂载, 对内部存储而言, 多用户情况下, 每个用户只能访问自己的内部存储目录(对应/mnt/runtime/read|write/emulated目录.) , 除此之外, 对应运行时权限的需求, 非特权应用(没有平台签名 , 不能申请安装权限android.permission.WRITE_MEDIA_STORAGE 的应用, 只能通过申请运行时WRITE_EXTERNAL_STORAGE权限的应用) 通过fuse实现了对存储的访问需求.

2. 从sdcard volume挂载说起

/system/bin/sdcard 进程为fuse的用户态守护进程. 在sd卡的volume卷挂载后, 会执行下列命令

1
2
3
4
5
6
7
8
9
10
11
/system/bin/sdcard -u 1023 -g 1023 -U userid -w /mnt/media_rw/XXXX-XXXX XXXX-XXXX
if (execl(kFusePath, kFusePath,
"-u", "1023", // AID_MEDIA_RW
"-g", "1023", // AID_MEDIA_RW
"-U", std::to_string(getMountUserId()).c_str(),
"-w",
mRawPath.c_str(),
stableName.c_str(),
NULL)) {
PLOG(ERROR) << "Failed to exec";
}

进入sdcard进程, uid 1023 gid 1023 userid (当前的用户id) -w full_write mRawPath对应 source_path, label 对应 stableName

没有配置sdcardfs情况下,

1
2
3
4
5
6
7
8
9
// 对于sd卡, multi_user参数没传, 这里为false. 对于内部存储, multi_user为true
run(source_path, label, uid, gid, userid, multi_user, full_write);
/* Physical storage is readable by all users on device, but
* the Android directories are masked off to a single user
* deep inside attr_from_stat(). */
// full_write 为 true
fuse_setup(&fuse_default, AID_SDCARD_RW, 0006)
|| fuse_setup(&fuse_read, AID_EVERYBODY, full_write ? 0027 : 0022)
|| fuse_setup(&fuse_write, AID_EVERYBODY, full_write ? 0007 : 0022))

指定了fuse_default/fuse_read/fuse_write 结构体挂载 /mnt/runtime/default | read | write.

对于前面的参数, 大多保存到了 struct fuse_global global 字段中. 初始化根目录node,sdcard也类似kernel fs为每个目录和文件维护了一个node结构体

2.1. 涉及的数据结构

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
/* Global data for all FUSE mounts */
struct fuse_global {
pthread_mutex_t lock;

uid_t uid;
gid_t gid;
bool multi_user;

char source_path[PATH_MAX];
char obb_path[PATH_MAX];

AppIdMap* package_to_appid;

__u64 next_generation;

struct node root;
__u32 inode_ctr;

struct fuse* fuse_default;
struct fuse* fuse_read;
struct fuse* fuse_write;
};

前面提到的fuse_setup的初始化过程涉及到两个数据结构

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
/* Single FUSE mount */
struct fuse {
// fuse_default | fuse_read | fuse_write 为 fuse 结构实例, 其中保存了 fuse_global的指针
struct fuse_global* global;
// 挂载路径
char dest_path[PATH_MAX];
// 相应打开 /dev/fuse设备的文件描述符
int fd;
// default read write的gid不同, 不写在global common中
gid_t gid;
// 同上, 挂载权限不同.
mode_t mask;
};

struct node {
__u32 refcount;
__u64 nid;
__u64 gen;
/*
* The inode number for this FUSE node. Note that this isn't stable across
* multiple invocations of the FUSE daemon.
*/
__u32 ino;

/* State derived based on current position in hierarchy. */
perm_t perm;
// 对应传进来的 user_id
userid_t userid;
// 分配的uid
uid_t uid;
bool under_android;

struct node *next; /* per-dir sibling list */
struct node *child; /* first contained file by this dir */
struct node *parent; /* containing directory */

size_t namelen;
char *name;
/* If non-null, this is the real name of the file in the underlying storage.
* This may differ from the field "name" only by case.
* strlen(actual_name) will always equal strlen(name), so it is safe to use
* namelen for both fields.
*/
char *actual_name;

/* If non-null, an exact underlying path that should be grafted into this
* position. Used to support things like OBB. */
char* graft_path;
size_t graft_pathlen;

bool deleted;
};

/* Private data used by a single FUSE handler */
struct fuse_handler {
struct fuse* fuse;
int token;

/* To save memory, we never use the contents of the request buffer and the read
* buffer at the same time. This allows us to share the underlying storage. */
union {
__u8 request_buffer[MAX_REQUEST_SIZE];
__u8 read_buffer[MAX_READ + PAGE_SIZE];
};
};

fuse结构体中保存了fuse_global的指针, fuse_global为common的, 被fuse_default | fuse_read | fuse_write共用.

node为表示目录层级的结构体, 其中保存了next | child | parent 节点的指针, 便于找到该节点的同级节点, 父目录 子目录

sdcard fuse

2.2. 初始化过程

在run函数中首先对fuse_global fuse_default fuse_read fuse_write进行初始化.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
   global.package_to_appid = new AppIdMap;
global.uid = uid;
global.gid = gid;
global.multi_user = multi_user;
global.next_generation = 0;
global.inode_ctr = 1;
// 定义global的root节点, 对应default read write三个目录
memset(&global.root, 0, sizeof(global.root));
global.root.nid = FUSE_ROOT_ID; /* 1 */
global.root.refcount = 2;
global.root.namelen = strlen(source_path);
// 挂载根目录为 /mnt/media_rw/XXXX-XXXX
global.root.name = strdup(source_path);
global.root.userid = userid;
global.root.uid = AID_ROOT; //uid为root, gid在fuse结构体中定制.
global.root.under_android = false;
strcpy(global.source_path, source_path);
if (multi_user) {
// 内部存储
global.root.perm = PERM_PRE_ROOT;
snprintf(global.obb_path, sizeof(global.obb_path), "%s/obb", source_path);
} else {
// 外部存储
global.root.perm = PERM_ROOT;
snprintf(global.obb_path, sizeof(global.obb_path), "%s/Android/obb", source_path);
}

fuse_handler 中用来处理数据

fuse_setup函数中打开/dev/fuse设备, 保存fd到相应的结构体中, 并对/dev/fuse设备进行挂载

2.3. 处理来的请求

setup完成后, 创建了三个线程分别负责接收 default | read | write 的数据并处理

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
if (pthread_create(&thread_default, NULL, start_handler, &handler_default)
|| pthread_create(&thread_read, NULL, start_handler, &handler_read)
|| pthread_create(&thread_write, NULL, start_handler, &handler_write)) {
LOG(FATAL) << "failed to pthread_create";
}
// 处理函数为 start_handler, 数据放在handler中
static void* start_handler(void* data) {
struct fuse_handler* handler = static_cast<fuse_handler*>(data);
// 由 handle_fuse_requests 函数处理
handle_fuse_requests(handler);
return NULL;
}

void handle_fuse_requests(struct fuse_handler* handler)
{
struct fuse* fuse = handler->fuse;
for (;;) {
// 循环读取 相应fd 上的内容, 该fd 绑定了 对应的 default read write, 读取的数据放在 request_buffer中.
ssize_t len = TEMP_FAILURE_RETRY(read(fuse->fd,
handler->request_buffer, sizeof(handler->request_buffer)));
if (len == -1) {
// 读取失败, 且errno 为 No Such Device
if (errno == ENODEV) {
LOG(ERROR) << "[" << handler->token << "] someone stole our marbles!";
exit(2);
}
PLOG(ERROR) << "[" << handler->token << "] handle_fuse_requests";
continue;
}

if (static_cast<size_t>(len) < sizeof(struct fuse_in_header)) {
LOG(ERROR) << "[" << handler->token << "] request too short: len=" << len;
continue;
}
// request_buffer 中包含 fuse_in_header 和 数据部分.
const struct fuse_in_header* hdr =
reinterpret_cast<const struct fuse_in_header*>(handler->request_buffer);
if (hdr->len != static_cast<size_t>(len)) {
LOG(ERROR) << "[" << handler->token << "] malformed header: len=" << len
<< ", hdr->len=" << hdr->len;
continue;
}
// 取出数据部分, 并计算数据大小
const void *data = handler->request_buffer + sizeof(struct fuse_in_header);
size_t data_len = len - sizeof(struct fuse_in_header);
// 从Fuse_in_header中解析出 unique号
__u64 unique = hdr->unique;
// 具体数据由 handle_fuse_request 进行处理
int res = handle_fuse_request(fuse, handler, hdr, data, data_len);

/* We do not access the request again after this point because the underlying
* buffer storage may have been reused while processing the request. */

if (res != NO_STATUS) {
if (res) {
DLOG(INFO) << "[" << handler->token << "] ERROR " << res;
}
// 还要回复状态.
fuse_status(fuse, unique, res);
}
}
}

// 由 handle_fuse_request 根据fuse_in_header中的 opcode 操作码转发给具体的业务函数
static int handle_fuse_request(struct fuse *fuse, struct fuse_handler* handler,
const struct fuse_in_header *hdr, const void *data, size_t data_len)
{
switch (hdr->opcode) {
// 遍历node
case FUSE_LOOKUP: { /* bytez[] -> entry_out */
const char *name = static_cast<const char*>(data);
return handle_lookup(fuse, handler, hdr, name);
}

case FUSE_FORGET: {
const struct fuse_forget_in *req = static_cast<const struct fuse_forget_in*>(data);
return handle_forget(fuse, handler, hdr, req);
}
// 读权限以及文件属性相关
case FUSE_GETATTR: { /* getattr_in -> attr_out */
const struct fuse_getattr_in *req = static_cast<const struct fuse_getattr_in*>(data);
return handle_getattr(fuse, handler, hdr, req);
}
// 设置权限及文件属性等
case FUSE_SETATTR: { /* setattr_in -> attr_out */
const struct fuse_setattr_in *req = static_cast<const struct fuse_setattr_in*>(data);
return handle_setattr(fuse, handler, hdr, req);
}

// case FUSE_READLINK:
// case FUSE_SYMLINK:
case FUSE_MKNOD: { /* mknod_in, bytez[] -> entry_out */
const struct fuse_mknod_in *req = static_cast<const struct fuse_mknod_in*>(data);
const char *name = ((const char*) data) + sizeof(*req);
return handle_mknod(fuse, handler, hdr, req, name);
}

case FUSE_MKDIR: { /* mkdir_in, bytez[] -> entry_out */
const struct fuse_mkdir_in *req = static_cast<const struct fuse_mkdir_in*>(data);
const char *name = ((const char*) data) + sizeof(*req);
return handle_mkdir(fuse, handler, hdr, req, name);
}
// 删除链接
case FUSE_UNLINK: { /* bytez[] -> */
const char *name = static_cast<const char*>(data);
return handle_unlink(fuse, handler, hdr, name);
}

case FUSE_RMDIR: { /* bytez[] -> */
const char *name = static_cast<const char*>(data);
return handle_rmdir(fuse, handler, hdr, name);
}

case FUSE_RENAME: { /* rename_in, oldname, newname -> */
const struct fuse_rename_in *req = static_cast<const struct fuse_rename_in*>(data);
const char *old_name = ((const char*) data) + sizeof(*req);
const char *new_name = old_name + strlen(old_name) + 1;
return handle_rename(fuse, handler, hdr, req, old_name, new_name);
}

// case FUSE_LINK:
case FUSE_OPEN: { /* open_in -> open_out */
const struct fuse_open_in *req = static_cast<const struct fuse_open_in*>(data);
return handle_open(fuse, handler, hdr, req);
}

case FUSE_READ: { /* read_in -> byte[] */
const struct fuse_read_in *req = static_cast<const struct fuse_read_in*>(data);
return handle_read(fuse, handler, hdr, req);
}

case FUSE_WRITE: { /* write_in, byte[write_in.size] -> write_out */
const struct fuse_write_in *req = static_cast<const struct fuse_write_in*>(data);
const void* buffer = (const __u8*)data + sizeof(*req);
return handle_write(fuse, handler, hdr, req, buffer);
}
// stat 函数对应的
case FUSE_STATFS: { /* getattr_in -> attr_out */
return handle_statfs(fuse, handler, hdr);
}
// 关闭文件
case FUSE_RELEASE: { /* release_in -> */
const struct fuse_release_in *req = static_cast<const struct fuse_release_in*>(data);
return handle_release(fuse, handler, hdr, req);
}

case FUSE_FSYNC:
case FUSE_FSYNCDIR: {
const struct fuse_fsync_in *req = static_cast<const struct fuse_fsync_in*>(data);
return handle_fsync(fuse, handler, hdr, req);
}

// case FUSE_SETXATTR:
// case FUSE_GETXATTR:
// case FUSE_LISTXATTR:
// case FUSE_REMOVEXATTR:

case FUSE_FLUSH: {
return handle_flush(fuse, handler, hdr);
}
// opendir 相关
case FUSE_OPENDIR: { /* open_in -> open_out */
const struct fuse_open_in *req = static_cast<const struct fuse_open_in*>(data);
return handle_opendir(fuse, handler, hdr, req);
}
// readdir 相关
case FUSE_READDIR: {
const struct fuse_read_in *req = static_cast<const struct fuse_read_in*>(data);
return handle_readdir(fuse, handler, hdr, req);
}

case FUSE_RELEASEDIR: { /* release_in -> */
const struct fuse_release_in *req = static_cast<const struct fuse_release_in*>(data);
return handle_releasedir(fuse, handler, hdr, req);
}

case FUSE_INIT: { /* init_in -> init_out */
const struct fuse_init_in *req = static_cast<const struct fuse_init_in*>(data);
return handle_init(fuse, handler, hdr, req);
}

case FUSE_CANONICAL_PATH: { /* nodeid -> bytez[] */
return handle_canonical_path(fuse, handler, hdr);
}

default: {
DLOG(INFO) << "[" << handler->token << "] NOTIMPL op=" << hdr->opcode
<< "uniq=" << std::hex << hdr->unique << "nid=" << hdr->nodeid << std::dec;
return -ENOSYS;
}
}
}

在handle_fuse_request 中涉及的操作中包含了权限即目录管理的逻辑.

2.4. 以FUSE_LOOKUP请求为例, 梳理流程

首先看FUSE_LOOKUP, lookup时传进的数据部分为 name?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
static int handle_lookup(struct fuse* fuse, struct fuse_handler* handler,
const struct fuse_in_header *hdr, const char* name)
{
struct node* parent_node;
char parent_path[PATH_MAX];
char child_path[PATH_MAX];
const char* actual_name;

pthread_mutex_lock(&fuse->global->lock);
// 加锁保护谁?
//1. lookup_node_and_path_by_id_locked 函数比较重要, 根据fuse_in_header中的nodeid找出parent_node 的 node节点, 并找到 parent_path
parent_node = lookup_node_and_path_by_id_locked(fuse, hdr->nodeid,
parent_path, sizeof(parent_path));
DLOG(INFO) << "[" << handler->token << "] LOOKUP " << name << " @ " << hdr->nodeid
<< " (" << (parent_node ? parent_node->name : "?") << ")";
pthread_mutex_unlock(&fuse->global->lock);
// 2. 查name 是不是在 parent_path中?, 并找出实际的名字, 和 child_path
if (!parent_node || !(actual_name = find_file_within(parent_path, name,
child_path, sizeof(child_path), 1))) {
return -ENOENT;
}
// 3. 查看是否有权限访问该节点
if (!check_caller_access_to_name(fuse, hdr, parent_node, name, R_OK)) {
return -EACCES;
}
// 4. 最后往 /dev/fuse中写入查出的内容,
return fuse_reply_entry(fuse, hdr->unique, parent_node, name, actual_name, child_path);
}

// 1. 根据fuse_in_header中的nodeid找出parent_node 的 node节点, 并找到 parent_node
// 对应 buf 存放parent_path(填充绝对路径), nid 为 fuse_in_header中的nodeid, 返回parent_node
static struct node* lookup_node_and_path_by_id_locked(struct fuse* fuse, __u64 nid,
char* buf, size_t bufsize)
{
struct node* node = lookup_node_by_id_locked(fuse, nid);
if (node && get_node_path_locked(node, buf, bufsize) < 0) {
node = NULL;
}
return node;
}

static struct node *lookup_node_by_id_locked(struct fuse *fuse, __u64 nid)
{
if (nid == FUSE_ROOT_ID) {
return &fuse->global->root;
} else {
// 非跟节点的, node 由 nid 转来, node的首地址和 nid相同.
// nodeid为 本次操作涉及的文件系统 node id
return static_cast<struct node*>(id_to_ptr(nid));
}
}


// 1.2 根据node 查找 绝对路径.
// 此函数为递归函数, 递归查找parent, 最终填充绝对路径.
static ssize_t get_node_path_locked(struct node* node, char* buf, size_t bufsize) {
const char* name;
size_t namelen;
// graph_path 为 obb路径使用
if (node->graft_path) {
name = node->graft_path;
namelen = node->graft_pathlen;
// actual_name 为 find_file_within 找出来的, 后面再介绍该函数, 初始时该字段为空
} else if (node->actual_name) {
name = node->actual_name;
namelen = node->namelen;
} else {
name = node->name;
namelen = node->namelen;
}

if (bufsize < namelen + 1) {
return -1;
}

ssize_t pathlen = 0;
if (node->parent && node->graft_path == NULL) {
// 递归 填充绝对路径
pathlen = get_node_path_locked(node->parent, buf, bufsize - namelen - 1);
if (pathlen < 0) {
return -1;
}
buf[pathlen++] = '/';
}

memcpy(buf + pathlen, name, namelen + 1); /* include trailing \0 */
return pathlen + namelen;
}

// 2. 查name 是不是在 parent_path中?, 并找出实际的名字, 和 child_path 即该文件的绝对路径.

/* Finds the absolute path of a file within a given directory.
* Performs a case-insensitive search for the file and sets the buffer to the path
* of the first matching file. If 'search' is zero or if no match is found, sets
* the buffer to the path that the file would have, assuming the name were case-sensitive.
*
* Populates 'buf' with the path and returns the actual name (within 'buf') on success,
* or returns NULL if the path is too long for the provided buffer.
*/
// path 为上面找到的该节点的父目录的绝对路径, name 为该节点的名字, handle_lookup 时 fuse_in_header中的数据部分为name,
// buf为填充的子路径(该文件的绝对路径), 打开目录查找忽略大小写查找是否有匹配项, 没有原样直接返回, 有的话, 填充actual
//
static char* find_file_within(const char* path, const char* name,
char* buf, size_t bufsize, int search)
{
size_t pathlen = strlen(path);
size_t namelen = strlen(name);
size_t childlen = pathlen + namelen + 1;
char* actual;

if (bufsize <= childlen) {
return NULL;
}
// 此时 buf = path
memcpy(buf, path, pathlen);
buf[pathlen] = '/';
actual = buf + pathlen + 1;
// 此时buf = path/name
memcpy(actual, name, namelen + 1);
// search 为 1, 找到了该文件
if (search && access(buf, F_OK)) {
struct dirent* entry;
// 打开 parent_path.
DIR* dir = opendir(path);
// 打不开这个路径, 直接返回 actual = name
if (!dir) {
PLOG(ERROR) << "opendir(" << path << ") failed";
return actual;
}
while ((entry = readdir(dir))) {
// 打开了parent_path, 忽略大小写, 查询该目录下是否有和 name 匹配的项, 查找到的第一个匹配项填充到 actual中.
if (!strcasecmp(entry->d_name, name)) {
/* we have a match - replace the name, don't need to copy the null again */
memcpy(actual, entry->d_name, namelen);
break;
}
}
closedir(dir);
}
return actual;
}

// 3. 查看是否有权限访问该节点, kernel已经进行了 uid的限制
check_caller_access_to_name(fuse, hdr, parent_node, name, R_OK)

/* Kernel has already enforced everything we returned through
* derive_permissions_locked(), so this is used to lock down access
* even further, such as enforcing that apps hold sdcard_rw. */
static bool check_caller_access_to_name(struct fuse* fuse,
const struct fuse_in_header *hdr, const struct node* parent_node,
const char* name, int mode) {
/* Always block security-sensitive files at root */
if (parent_node && parent_node->perm == PERM_ROOT) {
// 根节点, 限制访问下面三个文件
if (!strcasecmp(name, "autorun.inf")
|| !strcasecmp(name, ".android_secure")
|| !strcasecmp(name, "android_secure")) {
return false;
}
}

/* Root always has access; access for any other UIDs should always
* be controlled through packages.list. */
// uid 为 0, 表示访问者为 root用户, 不限制访问.
if (hdr->uid == 0) {
return true;
}

/* No extra permissions to enforce */
return true;
}

// 4. 最后往 /dev/fuse中写入查出的内容,
return fuse_reply_entry(fuse, hdr->unique, parent_node, name, actual_name, child_path);


static int fuse_reply_entry(struct fuse* fuse, __u64 unique,
struct node* parent, const char* name, const char* actual_name,
const char* path)
{
struct node* node;
struct fuse_entry_out out;
struct stat s;

if (lstat(path, &s) == -1) {
return -errno;
}

pthread_mutex_lock(&fuse->global->lock);
// 4.1 获取node 或者根据需要 根据parent_path, actual等信息创建node
node = acquire_or_create_child_locked(fuse, parent, name, actual_name);
if (!node) {
pthread_mutex_unlock(&fuse->global->lock);
return -ENOMEM;
}
memset(&out, 0, sizeof(out));
// 4.2 设置权限, gid等属性信息
attr_from_stat(fuse, &out.attr, &s, node);
out.attr_valid = 10;
out.entry_valid = 10;
out.nodeid = node->nid;
out.generation = node->gen;
pthread_mutex_unlock(&fuse->global->lock);
// 4.3 回复给/dev/fuse设备, 此次 opcode 的内容 out.
fuse_reply(fuse, unique, &out, sizeof(out));
return NO_STATUS;
}

// 4.1 获取node 或者根据需要 根据parent_path, actual等信息创建node
node = acquire_or_create_child_locked(fuse, parent, name, actual_name);

static struct node* acquire_or_create_child_locked(
struct fuse* fuse, struct node* parent,
const char* name, const char* actual_name)
{
// 先查找 parent node 节点中的 child节点中是否中有该node 子节点
// 遍历顺序 parent_node->child , 由child 遍历其next节点, 即:
// for (node = node->child; node; node = node->next)
struct node* child = lookup_child_by_name_locked(parent, name);
if (child) {
// 如果有, 直接返回, 同时该节点的饮用计数 + 1
acquire_node_locked(child);
} else {
// 4.1.1 查不到, 正常第一次lookup时是没有这些节点的, 应该会创建. 该函数比较重要
child = create_node_locked(fuse, parent, name, actual_name);
}
return child;
}

// 4.1.1 创建文件的node节点
child = create_node_locked(fuse, parent, name, actual_name);

struct node *create_node_locked(struct fuse* fuse,
struct node *parent, const char *name, const char* actual_name)
{
struct node *node;
size_t namelen = strlen(name);

// Detect overflows in the inode counter. "4 billion nodes should be enough
// for everybody".
// inode_ctr 初始化为 1, 每次create_node_locked时 + 1, __u32 格式, 当超出范围 2^32 后 + 1 会变为 0
if (fuse->global->inode_ctr == 0) {
LOG(ERROR) << "No more inode numbers available";
return NULL;
}

node = static_cast<struct node*>(calloc(1, sizeof(struct node)));
if (!node) {
return NULL;
}
node->name = static_cast<char*>(malloc(namelen + 1));
if (!node->name) {
free(node);
return NULL;
}
memcpy(node->name, name, namelen + 1);
// name 和 actual_name 不相同时, 也要fill actual_name字段
if (strcmp(name, actual_name)) {
node->actual_name = static_cast<char*>(malloc(namelen + 1));
if (!node->actual_name) {
free(node->name);
free(node);
return NULL;
}
memcpy(node->actual_name, actual_name, namelen + 1);
}
node->namelen = namelen;
// nid保存了 node的指针
node->nid = ptr_to_id(node);
// inode_ctr + 1
node->ino = fuse->global->inode_ctr++;
node->gen = fuse->global->next_generation++;

node->deleted = false;
// 4.1.1.1 设置perm标签, 并赋予userid uid等
derive_permissions_locked(fuse, parent, node);
// 引用计数 +1
acquire_node_locked(node);
// 4.1.1.2 将当前创建的文件节点跟parent节点绑定.
add_node_to_parent_locked(node, parent);
return node;
}

// 4.1.1.1 设置 perm 标签, 并根据情况赋予 userid uid等信息.
// 多用户情景下的限制, 主要来自该函数.
derive_permissions_locked(fuse, parent, node);
static void derive_permissions_locked(struct fuse* fuse, struct node *parent,
struct node *node) {
appid_t appid;

/* By default, each node inherits from its parent */
// node 节点继承其parent 节点的 userid uid.
node->perm = PERM_INHERIT;
node->userid = parent->userid;
node->uid = parent->uid;
node->under_android = parent->under_android;

/* Derive custom permissions based on parent and current node */
// 判断parent_node节点的权限
switch (parent->perm) {
// 如果为继承过来的, 不做处理
case PERM_INHERIT:
/* Already inherited above */
break;
// 为 PERM_PRE_ROOT , 表示其parent node 节点 为多用户情景下的 子节点.
case PERM_PRE_ROOT:
/* Legacy internal layout places users at top level */
// 此时对其赋予 PERM_ROOT , 并对其 userid赋值
// 如 /mnt/runtime/read 目录 emulated目录下 会有 0\1\2 等多用户的目录, userid = 0\1\2等.
node->perm = PERM_ROOT;
node->userid = strtoul(node->name, NULL, 10);
break;
case PERM_ROOT:
/* Assume masked off by default. */
if (!strcasecmp(node->name, "Android")) {
/* App-specific directories inside; let anyone traverse */
node->perm = PERM_ANDROID;
node->under_android = true;
}
break;
case PERM_ANDROID:
if (!strcasecmp(node->name, "data")) {
/* App-specific directories inside; let anyone traverse */
node->perm = PERM_ANDROID_DATA;
} else if (!strcasecmp(node->name, "obb")) {
/* App-specific directories inside; let anyone traverse */
node->perm = PERM_ANDROID_OBB;
/* Single OBB directory is always shared */
node->graft_path = fuse->global->obb_path;
node->graft_pathlen = strlen(fuse->global->obb_path);
} else if (!strcasecmp(node->name, "media")) {
/* App-specific directories inside; let anyone traverse */
node->perm = PERM_ANDROID_MEDIA;
}
break;
case PERM_ANDROID_DATA:
case PERM_ANDROID_OBB:
case PERM_ANDROID_MEDIA:
// emulated/<userid>/Android/data|obb|media 下, 赋予uid 为 multiuser_get_uid 根据userid appid计算出的值:user_id * 100000) + (app_id % 100000)
// package_to_appid std::map<std::string, appid_t, CaseInsensitiveCompare>; 从小到大排序, 将/data/system/packages.list 中扫描的 <name,uid> 保存. name为package的包名.
// ex: 如果有类似这样的目录 /emulated/0/Android/data|media|obb/<package_name>, 指定其uid为
// user_id * 100000) + (app_id % 100000)
const auto& iter = fuse->global->package_to_appid->find(node->name);
if (iter != fuse->global->package_to_appid->end()) {
appid = iter->second;
node->uid = multiuser_get_uid(parent->userid, appid);
}
break;
}
}

// 4.1.1.2 将当前创建的文件节点跟parent节点绑定.
// 上面介绍过怎样遍历, 即指定parent节点, 每创建一个节点, 都将parent->child 指向这个节点.
add_node_to_parent_locked(node, parent);
static void add_node_to_parent_locked(struct node *node, struct node *parent) {
node->parent = parent;
// 先将node->next 指向parent原来的child node
node->next = parent->child;
// 再将parent的child更新为现在的这个node.
parent->child = node;
acquire_node_locked(parent);
}

// 4.2 设置权限, gid等属性信息, 保存到 out.attr中
attr_from_stat(fuse, &out.attr, &s, node);
static void attr_from_stat(struct fuse* fuse, struct fuse_attr *attr,
const struct stat *s, const struct node* node) {
// 设置属性等信息, 主要信息是从 lstat(path, &s) 中的s里取出的.
attr->ino = node->ino;
attr->size = s->st_size;
attr->blocks = s->st_blocks;
attr->atime = s->st_atim.tv_sec;
attr->mtime = s->st_mtim.tv_sec;
attr->ctime = s->st_ctim.tv_sec;
attr->atimensec = s->st_atim.tv_nsec;
attr->mtimensec = s->st_mtim.tv_nsec;
attr->ctimensec = s->st_ctim.tv_nsec;
attr->mode = s->st_mode;
attr->nlink = s->st_nlink;

attr->uid = node->uid;
// 对应 mnt/runtime/default 目录, 其gid 已在初始化时指定为 sdcard_rw.
// default 目录不为多用户情景, 此项对应 root用户以及一些native进程还有申请WRITE_MEDIA_STROAGE的app访问所设置的存储视图.
if (fuse->gid == AID_SDCARD_RW) {
attr->gid = AID_SDCARD_RW;
} else {
// 多用户情景下, 根据前面设置的userid 设置 uid, 公式:
// userid*100000 + gid%100000 (gid为初始传入值 AID_EVERYBODY 9997)
// 对于应用, 应用在初始化时, 会赋予相对应的gid, 请看startProcessLocked函数
// gids[2] = UserHandle.getUserGid(UserHandle.getUserId(uid));
attr->gid = multiuser_get_uid(node->userid, fuse->gid);
}
// fuse_setup 时传进来的mask, sdcard_rw的是 0006, full_write时 0007,还有别的,
int visible_mode = 0775 & ~fuse->mask;
if (node->perm == PERM_PRE_ROOT) {
/* Top of multi-user view should always be visible to ensure
* secondary users can traverse inside. */
visible_mode = 0711;
// emulated/<userid>/Android 下的目录 禁止 other访问. 而default 视图下保留other的 +x 权限
} else if (node->under_android) {
/* Block "other" access to Android directories, since only apps
* belonging to a specific user should be in there; we still
* leave +x open for the default view. */
if (fuse->gid == AID_SDCARD_RW) {
visible_mode = visible_mode & ~0006;
} else {
visible_mode = visible_mode & ~0007;
}
}
// 取出owner_mode ,
int owner_mode = s->st_mode & 0700;
// filter_mode 为 owner_mode 扩展 给 group mode 和 other_mode后 与visable_mode相与
int filtered_mode = visible_mode & (owner_mode | (owner_mode >> 3) | (owner_mode >> 6));
// S_IFMT 文件类型的位遮罩 linux 中用低16位, 见附图1, 这里就是设置回mode. 其中权限为 filter_mode.
// 以上面传过来的 mask 0006举例, 初始 visible_mode 为 0755& 7771 = 0751 owner_mode是从stat时取出的, 如果owner_mode为 0644的话, (owner_mode | (owner_mode >> 3) | (owner_mode >> 6) 就是 0666, 这样filter_mode就是 0751& 0666 = 0640
attr->mode = (attr->mode & S_IFMT) | filtered_mode;
}

// 4.3 回复给/dev/fuse设备, 此次 opcode 的内容 out.是的是的
fuse_reply(fuse, unique, &out, sizeof(out));
out.attr_valid = 10;
out.entry_valid = 10;
out.nodeid = node->nid;
out.generation = node->gen;

static void fuse_reply(struct fuse *fuse, __u64 unique, void *data, int len)
{
struct fuse_out_header hdr;
hdr.len = len + sizeof(hdr);
hdr.error = 0;
hdr.unique = unique;

struct iovec vec[2];
// 构造fuse_out_header 的 数据 放在iovec[0] 中
vec[0].iov_base = &hdr;
vec[0].iov_len = sizeof(hdr);
// data 最终放在 iovec[1].iov_base中, 上文传过来的 out
vec[1].iov_base = data;
vec[1].iov_len = len;
// 会写到 /dev/fuse设备中
ssize_t ret = TEMP_FAILURE_RETRY(writev(fuse->fd, vec, 2));
}

, 附图1

2.4.1. handle_lookup小节

总结下上述handle_lookup的过程:

  1. 根据fuse_in_header中的nodeid找出parent_node 的 node节点, 并找到 parent_node
    对应 buf 存放parent_path(填充绝对路径), nid 为 fuse_in_header中的nodeid, 返回parent_node

  2. 查name (name是在lookup的data字段中)是不是在 parent_path中?, 并找出实际的名字, 和 child_path 即该文件的绝对路径.

  3. 查看是否有权限访问该节点, kernel已经进行了 uid的限制

  4. 最后往 /dev/fuse中写入查出的内容,

    4.1 获取node 或者根据需要根据parent_path, actual等信息创建node

    4.2 设置mode, gid stat等信息, 保存到 out.attr中

    4.3 回复给/dev/fuse设备, 此次 opcode 的内容 out.

此例中涉及到几个关键函数

  • lookup_node_and_path_by_id_locked

    根据fuse_in_header中的nodeid找出parent_node 的 node节点, 并找到 parent_node
    以及parent_path(填充绝对路径), nid 为 fuse_in_header中的nodeid, 返回parent_node

  • find_file_within

    查找name是否在parent目录中, 并找出实际的名字actual_name. 忽略大小写查找是否有匹配项. 并找出该name对应的绝对路径

  • fuse_reply_entry

    该函数负责回写请求的信息. 包括查找请求的 name 对应的 node 或者找不到时创建 node, 添加引用计数,并关联 parent next node 链表.填充和设置该 node 的信息等.(包括权限管理等), 最后会写到/dev/fuse 设备中.

    • acquire_or_create_child_locked 查找当前节点, 没有则创建,并设置parent next节点关联
    • derive_permissions_locked 设置perm标签, 填充uid userid等
    • attr_from_stat 设置gid, 和 访问的mode 和 st_mode作用相同
    • fuse_reply 根据前面的结果填充fuse_out_header 和 data, 会写到/dev/fuse中

3. fuse 交互传输

对应一次write系统调用的过程, 其文件的真实文件系统为ext4, 而通过fuse进行权限控制等过程.
一次系统调用需要在User Space和kernel Space中穿越6次. 如果一次调用过程中, 如果涉及到中间的消息, 如需要stat查询信息, 再做写入等, 则需要10次穿越.
可见fuse 文件系统的效率是非常慢的.

而最新的sdcard服务中使用sdcardfs取代fuse文件系统, sdcardfs没有用户态的服务, 因此其和vfs的交互是直接的, 这样对应一次系统调用用户态和内核态的穿越只需要2次.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
@startuml
autoactivate on
box "User space" #LightGreen
participant app #yellow
participant sdcard #pink
end box

box "kernel space" #LightBlue
participant VFS
participant FUSE #dark
participant EXT4 #dark
end box


app->VFS: write file
VFS->FUSE: write
FUSE->>sdcard: handle lookup
sdcard->VFS: stat
VFS -> EXT4: stat
EXT4-->VFS: return stat result
VFS-->sdcard: return stat result
sdcard ->o FUSE: reply lookup result

FUSE->>sdcard: handle write
sdcard->VFS: write
VFS->EXT4: write
EXT4-->VFS:return write status
VFS-->sdcard: return write status
sdcard->o FUSE: reply write status
FUSE-->VFS: return write
VFS-->app: return write
@enduml

fuse