0%

Q动态分区调研

1. Q动态分区调研

1.1. 相关文档

[Android Bootcamp 2019 - Dynamic Partitions in Q (go_android-dynamic-partitions-slides).pdf](../../../google_document/Android Bootcamp 2019 - Dynamic Partitions in Q (go_android-dynamic-partitions-slides).pdf)

06.Dynamic_Partitions-_LPC_Android_MC_v2.pdf

1.2. 代码路径

bootable/recovery/updater/ dynamic_partitions.cpp

system/core/fs_mgr/ liblp

system/core/fastboot/device fastbootd

build/core/Makefile

1.3. 宏配置

  • PRODUCT_USE_DYNAMIC_PARTITIONS := true
  • BOARD_SUPER_PARTITION_SIZE :=
  • BOARD_SUPER_PARTITION_GROUPS := group_oem
  • BOARD_GROUP_OEM_SIZE :=
  • BOARD_GROUP_OEM_PARTITION_LIST := system vendor odm product
  • BOARD_SUPER_PARTITION_$(device)_DEVICE_SIZE

# BOARD_SUPER_PARTITION_GROUPS defines a list of “updatable groups”. Each updatable group is a group of partitions that share the same pool of free spaces. For each group in BOARD_SUPER_PARTITION_GROUPS, a BOARD_{GROUP}SIZE and BOARD{GROUP}_PARTITION_PARTITION_LIST may be defined.

  • BOARD_{GROUP}_SIZE: The maximum sum of sizes of all partitions in the group. Must not be empty.

  • BOARD_{GROUP}_PARTITION_PARTITION_LIST: the list of partitions that belongs to this group. If empty, no partitions belong to this group, and the sum of sizes is effectively 0.

1.4. 刷机

对于动态分区的特有img, 可以直接通过fastboot 烧写super.img, super.img是包含所有设定为动态的分区的镜像的总和.

也可以进入用户空间的fastbootd, 烧写单独的动态的分区.

  • fastboot reboot fastboot

1.4.1. fastbootd commands

Command Description Available when device is OEM locked
getvar is-userspace Return yes Yes
getvar
is-logical:
Return yes if the given partition is a
logical partition, no otherwise
Yes
getvar super-partition-name MUST return super for a device
launching with logical partitions
Yes
create-logical-partition
Create a logical partition with the
given name and size
No
delete-logical-partition
Delete the given logical partition No
resize-logical-partition
Resize the logical partition to the new
size without changing its contents
No
update-super Similar to flash super, except rather
than flashing raw data to the super
partition, this will ensure that all
partitions within the downloaded
image are created
No
getvar max-download-size Return the maximum size of an image
that can be downloaded in bytes in
hex
Yes
getvar partition-type
Return file system type: ext4, f2fs,
raw
Yes
flash [
]
Flash the partition through a series of
download and flash fastboot
protocol commands
No
reboot bootloader Reboot into bootloader mode Yes
reboot fastboot Reboot back into fastbootd mode Yes

1.5. super partition 布局

init
1568031243961

1.5.1. 概要

  • 所有动态大小的分区填充到一个有固定大小的super的镜像文件中(最终烧写到super 物理分区中)
  • super的镜像中包含这些可以调整大小的子分区的内容和一些描述信息.
  • 可以调整大小的分区对应在实际的物理分区表中被super替代了
  • 非只读分区不能配置进去

1.5.2. 实现

  • super镜像文件中预留1M空间给动态分区表:
    • 动态分区名+块的索引
    • 上述描述信息(metadata)的备份
    • AB系统的A/B各自的描述信息
  • 描述信息(metadata)统一由liblp管理
  • liblp被集成在init/updata_engine/OTA updater/fastbootd中
  • liblp是用户空间的, kernel bootloader访问不到

1.5.3. AB/non_ab metadata的配置

  • AB系统, AB各一套metadata
    • 正在运行的系统读取另外一个未运行的系统的metadata
    • 保证目标版本升级失败后仍能回退老的版本
  • non_AB系统, 有一个备份的metadata, 防止在metadata进行update时意外断电引起的metadata损坏.

1.6. 开机修改

  • boot里又加上ramdisk了
  • boot的ramdisk里包含了first-stage阶段的init和vendor的fstab
    • init在first-stage阶段使用liblp解析super分区的metadata描述信息, 为super中包含的每一个可调大小的子分区创建对应的dm设备
    • 上述这些子分区在fstab中标识first_stage_mount的首先挂载上
  • system被挂载上后, boot的ramdisk被丢弃, 切换到second-stage阶段的init(system_root的ramdisk)

1.7. 动态分区调整大小代码跟踪

以增加分区大小为例, 跟踪下相关调用过程:

update_script

resize system

PerformOpResize

需要先构造builder, 从builder中找到partition, 再通过builder的ResizePartition接口调整相应的分区大小.

1
2
3
4
1. auto partition = params.builder->FindPartition(partition_name);
// 调整分区前先要将对应的dm设备销毁
2. UnmapPartitionOnDeviceMapper(partition_name)
3. builder->ResizePartition(partition, size.value());

1.7.1. 构造builder(MetadataBuilder)

1
2
3
// 对应super的block设备节点  
auto super_device = GetSuperDevice();
auto builder = MetadataBuilder::New(PartitionOpener(), super_device, 0);

创建MetadataBuilder的实例前先需要读出super中的metadata信息(描述super中各个子分区的布局情况)

1
std::unique_ptr<LpMetadata> metadata = ReadMetadata(opener, super_partition, slot_number);

1.7.1.1. 解析metadata过程涉及到的相关结构体

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
//metadata信息在super中的配置信息
typedef struct LpMetadataGeometry {
/* 0: Magic signature (LP_METADATA_GEOMETRY_MAGIC). */
uint32_t magic;
/* 4: Size of the LpMetadataGeometry struct. */
uint32_t struct_size;
/* 8: SHA256 checksum of this struct, with this field set to 0. */
uint8_t checksum[32];
/* 40: Maximum amount of space a single copy of the metadata can use. This
* must be a multiple of LP_SECTOR_SIZE.
*/
uint32_t metadata_max_size;
/* 44: Number of copies of the metadata to keep. For A/B devices, this
* will be 2. For an A/B/C device, it would be 3, et cetera. For Non-A/B
* it will be 1. A backup copy of each slot is kept, so if this is "2",
* there will be four copies total.
*/
uint32_t metadata_slot_count;
/* 48: Logical block size. This is the minimal alignment for partition and
* extent sizes, and it must be a multiple of LP_SECTOR_SIZE. Note that
* this must be equal across all LUNs that comprise the super partition,
* and thus this field is stored in the geometry, not per-device.
*/
uint32_t logical_block_size;
} __attribute__((packed)) LpMetadataGeometry;

/* The logical partition metadata has a number of tables; they are described
* in the header via the following structure
*/
// LpMetaDataHeader中对partition/extent/group的描述信息, 用来offset偏移找到对应的数据段填充对应的结构体.
typedef struct LpMetadataTableDescriptor {
/* 0: Location of the table, relative to end of the metadata header. */
uint32_t offset;
// entry_size * num_entries = size of the table
/* 4: Number of entries in the table. */
uint32_t num_entries;
/* 8: Size of each entry in the table, in bytes. */
uint32_t entry_size;
} __attribute__((packed)) LpMetadataTableDescriptor;

//metadata位于物理分区的描述头信息
typedef struct LpMetadataHeader {
/* 0: Four bytes equal to LP_METADATA_HEADER_MAGIC. */
uint32_t magic;

/* 4: Version number required to read this metadata. If the version is not
* equal to the library version, the metadata should be considered
* incompatible.
*/
uint16_t major_version;

/* 6: Minor version. A library supporting newer features should be able to
* read metadata with an older minor version. However, an older library
* should not support reading metadata if its minor version is higher.
*/
uint16_t minor_version;

/* 8: The size of this header struct. */
uint32_t header_size;

/* 12: SHA256 checksum of the header, up to |header_size| bytes, computed as
* if this field were set to 0.
*/
uint8_t header_checksum[32];

/* 44: The total size of all tables. This size is contiguous; tables may not
* have gaps in between, and they immediately follow the header.
*/
uint32_t tables_size;

/* 48: SHA256 checksum of all table contents. */
uint8_t tables_checksum[32];

/* 80: Partition table descriptor. */
// partitions表描述信息
LpMetadataTableDescriptor partitions;
/* 92: Extent table descriptor. */
// partition 区间块组描述信息
LpMetadataTableDescriptor extents;
/* 104: Updateable group descriptor. */
// parition group表描述信息
LpMetadataTableDescriptor groups;
/* 116: Block device table. */
// block_devices表描述信息, 第一个必须是super, 如super中包含system/vendor等,第2个是system
LpMetadataTableDescriptor block_devices;
} __attribute__((packed)) LpMetadataHeader;

// 各动态分区的描述信息
typedef struct LpMetadataPartition {
char name[36];
/* 36: Attributes for the partition (see LP_PARTITION_ATTR_* flags above). */
uint32_t attributes;
/* 40: Index of the first extent owned by this partition. The extent will
* start at logical sector 0. Gaps between extents are not allowed.
*/
uint32_t first_extent_index;
/* 44: Number of extents in the partition. Every partition must have at
* least one extent.
*/
uint32_t num_extents;
/* 48: Group this partition belongs to. */
uint32_t group_index;
} __attribute__((packed)) LpMetadataPartition;

/* This struct defines an extent entry in the extent table block. */
// 各动态分区的区间块的描述信息
typedef struct LpMetadataExtent {
/* 0: Length of this extent, in 512-byte sectors. */
uint64_t num_sectors;
/* 8: Target type for device-mapper (see LP_TARGET_TYPE_* values). */
uint32_t target_type;

/* 12: Contents depends on target_type.
*
* LINEAR: The sector on the physical partition that this extent maps onto.
* ZERO: This field must be 0.
*/
uint64_t target_data;

/* 20: Contents depends on target_type.
*
* LINEAR: Must be an index into the block devices table.
* ZERO: This field must be 0.
*/
uint32_t target_source;
} __attribute__((packed)) LpMetadataExtent;

typedef struct LpMetadataPartitionGroup {
/* 0: Name of this group. Any unused characters must be 0. */
char name[36];

/* 36: Flags (see LP_GROUP_*). */
uint32_t flags;

/* 40: Maximum size in bytes. If 0, the group has no maximum size. */
uint64_t maximum_size;
} __attribute__((packed)) LpMetadataPartitionGroup;

/* This struct defines an entry in the block_devices table. There must be at
* least one device, and the first device must represent the partition holding
* the super metadata.
*/
typedef struct LpMetadataBlockDevice {
/* 0: First usable sector for allocating logical partitions. this will be
* the first sector after the initial geometry blocks, followed by the
* space consumed by metadata_max_size*metadata_slot_count*2.
*/
uint64_t first_logical_sector;

/* 8: Alignment for defining partitions or partition extents. For example,
* an alignment of 1MiB will require that all partitions have a size evenly
* divisible by 1MiB, and that the smallest unit the partition can grow by
* is 1MiB.
*
* Alignment is normally determined at runtime when growing or adding
* partitions. If for some reason the alignment cannot be determined, then
* this predefined alignment in the geometry is used instead. By default
* it is set to 1MiB.
*/
uint32_t alignment;

/* 12: Alignment offset for "stacked" devices. For example, if the "super"
* partition itself is not aligned within the parent block device's
* partition table, then we adjust for this in deciding where to place
* |first_logical_sector|.
*
* Similar to |alignment|, this will be derived from the operating system.
* If it cannot be determined, it is assumed to be 0.
*/
uint32_t alignment_offset;

/* 16: Block device size, as specified when the metadata was created. This
* can be used to verify the geometry against a target device.
*/
uint64_t size;

/* 24: Partition name in the GPT. Any unused characters must be 0. */
char partition_name[36];

/* 60: Flags (see LP_BLOCK_DEVICE_* flags below). */
uint32_t flags;
} __attribute__((packed)) LpMetadataBlockDevice;

// metadata结构体
struct LpMetadata {
LpMetadataGeometry geometry;
LpMetadataHeader header;
std::vector<LpMetadataPartition> partitions;
std::vector<LpMetadataExtent> extents;
std::vector<LpMetadataPartitionGroup> groups;
std::vector<LpMetadataBlockDevice> block_devices;
};

1.7.1.2. 读取metadata配置信息

以非AB系统为例, slot_number在geometry中存储的信息为1

略过super的头4k, 读取第2个4K内容(primaryGeometry), 填充LpMetadataGeometry结构体.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// Read and validate geometry information from a block device that holds
// logical partitions. If the information is corrupted, this will attempt
// to read it from a secondary backup location.
bool ReadLogicalPartitionGeometry(int fd, LpMetadataGeometry* geometry) {
if (ReadPrimaryGeometry(fd, geometry)) {
return true;
}
return ReadBackupGeometry(fd, geometry);
}
bool ReadPrimaryGeometry(int fd, LpMetadataGeometry* geometry) {
std::unique_ptr<uint8_t[]> buffer = std::make_unique<uint8_t[]>(LP_METADATA_GEOMETRY_SIZE);
if (SeekFile64(fd, GetPrimaryGeometryOffset(), SEEK_SET) < 0) {
return false;
}
if (!android::base::ReadFully(fd, buffer.get(), LP_METADATA_GEOMETRY_SIZE)) {
return false;
}
return ParseGeometry(buffer.get(), geometry);
}
memcpy(geometry, buffer, sizeof(*geometry));

读取第3个4K内容(backup_geometry), 填充到backup_geometry

从第4个4k开始, 读取geometry中配置的metadata大小, 填充到LpMetadata结构体, primary_metadata.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
// Parse and validate all metadata at the current position in the given file
// descriptor.
static std::unique_ptr<LpMetadata> ParseMetadata(const LpMetadataGeometry& geometry,
Reader* reader) {
// 填充LpMetadataHeader, 包含了metadata的版本等
std::unique_ptr<LpMetadata> metadata = std::make_unique<LpMetadata>();
if (!reader->ReadFully(&metadata->header, sizeof(metadata->header))) {
return nullptr;
}
if (!ValidateMetadataHeader(metadata->header)) {
return nullptr;
}
// 填充LpMetadataGeometry
metadata->geometry = geometry;
LpMetadataHeader& header = metadata->header;

// Read the metadata payload. Allocation is fallible in case the metadata is
// corrupt and has some huge value.
// 读取LpMetadata中LpMetaDataHeader外剩下的数据, 算一个checksum和header中的checksum比对, 必须是一致的
std::unique_ptr<uint8_t[]> buffer(new (std::nothrow) uint8_t[header.tables_size]);
if (!reader->ReadFully(buffer.get(), header.tables_size)) {
return nullptr;
}
uint8_t checksum[32];
SHA256(buffer.get(), header.tables_size, checksum);
if (memcmp(checksum, header.tables_checksum, sizeof(checksum)) != 0) {
LERROR << "Logical partition metadata has invalid table checksum.";
return nullptr;
}

// ValidateTableSize ensured that |cursor| is valid for the number of
// entries in the table.
// header中记录了partition 表的信息,包含了offset, 有几个分区,每个分区item的大小
uint8_t* cursor = buffer.get() + header.partitions.offset;
for (size_t i = 0; i < header.partitions.num_entries; i++) {
LpMetadataPartition partition;
memcpy(&partition, cursor, sizeof(partition));
// 一共num_entries个partition, 遍历metadata的partition table, cursor记录了每个partition的头
cursor += header.partitions.entry_size;

// partion校验
if (partition.attributes & ~LP_PARTITION_ATTRIBUTE_MASK) {
LERROR << "Logical partition has invalid attribute set.";
return nullptr;
}
...
// 填充LpMetadata的partitions vector
metadata->partitions.push_back(partition);
}

cursor = buffer.get() + header.extents.offset;
for (size_t i = 0; i < header.extents.num_entries; i++) {
LpMetadataExtent extent;
memcpy(&extent, cursor, sizeof(extent));
cursor += header.extents.entry_size;

if (extent.target_type == LP_TARGET_TYPE_LINEAR &&
extent.target_source >= header.block_devices.num_entries) {
LERROR << "Logical partition extent has invalid block device.";
return nullptr;
}

metadata->extents.push_back(extent);
}

cursor = buffer.get() + header.groups.offset;
for (size_t i = 0; i < header.groups.num_entries; i++) {
LpMetadataPartitionGroup group = {};
memcpy(&group, cursor, sizeof(group));
cursor += header.groups.entry_size;

metadata->groups.push_back(group);
}

cursor = buffer.get() + header.block_devices.offset;
for (size_t i = 0; i < header.block_devices.num_entries; i++) {
LpMetadataBlockDevice device = {};
memcpy(&device, cursor, sizeof(device));
cursor += header.block_devices.entry_size;

metadata->block_devices.push_back(device);
}

const LpMetadataBlockDevice* super_device = GetMetadataSuperBlockDevice(*metadata.get());
if (!super_device) {
LERROR << "Metadata does not specify a super device.";
return nullptr;
}
return metadata;
}

1568276907369

1.7.1.3. builder初始化

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
std::unique_ptr<MetadataBuilder> MetadataBuilder::New(const LpMetadata& metadata,
const IPartitionOpener* opener) {
//1. builder构造函数
std::unique_ptr<MetadataBuilder> builder(new MetadataBuilder());
//2. builder init函数
if (!builder->Init(metadata)) {
return nullptr;
}
...
return builder;
}

MetadataBuilder::MetadataBuilder() : auto_slot_suffixing_(false), ignore_slot_suffixing_(false) {
// 初始化geometry和header_
memset(&geometry_, 0, sizeof(geometry_));
geometry_.magic = LP_METADATA_GEOMETRY_MAGIC;
geometry_.struct_size = sizeof(geometry_);

memset(&header_, 0, sizeof(header_));
header_.magic = LP_METADATA_HEADER_MAGIC;
header_.major_version = LP_METADATA_MAJOR_VERSION;
header_.minor_version = LP_METADATA_MINOR_VERSION;
header_.header_size = sizeof(header_);
header_.partitions.entry_size = sizeof(LpMetadataPartition);
header_.extents.entry_size = sizeof(LpMetadataExtent);
header_.groups.entry_size = sizeof(LpMetadataPartitionGroup);
header_.block_devices.entry_size = sizeof(LpMetadataBlockDevice);
}
//从前面解析出的metadata信息中初始化builder的成员变量.
// 执行AddGroup/AddPartition/ImportExtents函数, 初始化block_devices_ groups_ paritions_ extents_成员变量
bool MetadataBuilder::Init(const LpMetadata& metadata) {
geometry_ = metadata.geometry;
block_devices_ = metadata.block_devices;

for (const auto& group : metadata.groups) {
std::string group_name = GetPartitionGroupName(group);
if (!AddGroup(group_name, group.maximum_size)) {
return false;
}
}

for (const auto& partition : metadata.partitions) {
std::string group_name = GetPartitionGroupName(metadata.groups[partition.group_index]);
Partition* builder =
AddPartition(GetPartitionName(partition), group_name, partition.attributes);
if (!builder) {
return false;
}
// 对某一特定的partition执行时, 会判断extent是否相邻, 如果是相邻的, 会进行合并
ImportExtents(builder, metadata, partition);
}
return true;
}

1.7.2. builder->ResizePartition

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
builder->ResizePartition(partition, size.value();
bool MetadataBuilder::ResizePartition(Partition* partition, uint64_t requested_size) {
// Align the space needed up to the nearest sector.
// 先对新size进行对齐
uint64_t aligned_size = AlignTo(requested_size, geometry_.logical_block_size);
uint64_t old_size = partition->size();
// 1. 看下新size是否超出了可调整的范围
if (!ValidatePartitionSizeChange(partition, old_size, aligned_size, false)) {
return false;
}
//2. 分区增大了
if (aligned_size > old_size) {
if (!GrowPartition(partition, aligned_size)) {
return false;
}
//分区较小了
} else if (aligned_size < partition->size()) {
ShrinkPartition(partition, aligned_size);
}
return true;
}
  1. 怎么判断分区大小是否是合法的?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
bool MetadataBuilder::ValidatePartitionSizeChange(Partition* partition, uint64_t old_size,
uint64_t new_size, bool force_check) {
// 查找partition所在的组
PartitionGroup* group = FindGroup(partition->group_name());
CHECK(group);

if (!force_check && new_size <= old_size) {
return true;
}

// Figure out how much we need to allocate, and whether our group has
// enough space remaining.
uint64_t space_needed = new_size - old_size;
if (group->maximum_size() > 0) {
// 分区组的已用空间
uint64_t group_size = TotalSizeOfGroup(group);
if (group_size >= group->maximum_size() ||
// 分区组的最大空间
group->maximum_size() - group_size < space_needed) {
LERROR << "Partition " << partition->name() << " is part of group " << group->name()
<< " which does not have enough space free (" << space_needed << " requested, "
<< group_size << " used out of " << group->maximum_size() << ")";
return false;
}
}
return true;
}

能看到分区能调整的大小受分区所在组最大空间的限制.

1.7.2.1. 调大分区

找出空闲的extent, 加到当前待调整的partition的extents_中

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
bool MetadataBuilder::GrowPartition(Partition* partition, uint64_t aligned_size) {
uint64_t space_needed = aligned_size - partition->size();
uint64_t sectors_needed = space_needed / LP_SECTOR_SIZE;
// 根据已有的extent, 做相邻extent的gap, gap有效即为free的
//The new interval represents the free space starting at the end of
//the previous interval, and ending at the start of the next interval.
// free_regions->emplace_back(current.device_index, aligned, current.start);
std::vector<Interval> free_regions = GetFreeRegions();

const uint64_t sectors_per_block = geometry_.logical_block_size / LP_SECTOR_SIZE;
std::vector<std::unique_ptr<LinearExtent>> new_extents;

// If the last extent in the partition has a size < alignment, then the
// difference is unallocatable due to being misaligned. We peek for that
// case here to avoid wasting space.
if (auto extent = ExtendFinalExtent(partition, free_regions, sectors_needed)) {
sectors_needed -= extent->num_sectors();
new_extents.emplace_back(std::move(extent));
}
// 从free_regions中查找可以空闲的extent, 给当前正在调整大小的partition使用
for (auto& region : free_regions) {
// 分配够了就退出
if (!sectors_needed) {
break;
}
...
uint64_t sectors = std::min(sectors_needed, region.length());
auto extent = std::make_unique<LinearExtent>(sectors, region.device_index, region.start);
new_extents.push_back(std::move(extent));
sectors_needed -= sectors;
}
// Everything succeeded, so commit the new extents.
// 将新的extent加到partition对应的extents中
for (auto& extent : new_extents) {
partition->AddExtent(std::move(extent));
}
return true;
}

1.7.3. 输出新的metadata更新到disk中

有分区变动后, 需要在内存中生成一份整体的变动后的metadata信息.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
auto metadata = builder->Export();
std::unique_ptr<LpMetadata> MetadataBuilder::Export() {
// 校验group下的所有分区的空间总和小于group的最大空间
if (!ValidatePartitionGroups()) {
return nullptr;
}
// 构造一份新的metadata
std::unique_ptr<LpMetadata> metadata = std::make_unique<LpMetadata>();
metadata->header = header_;
metadata->geometry = geometry_;

// Assign this early so the extent table can read it.
for (const auto& block_device : block_devices_) {
metadata->block_devices.emplace_back(block_device);
if (auto_slot_suffixing_) {
metadata->block_devices.back().flags |= LP_BLOCK_DEVICE_SLOT_SUFFIXED;
}
}

std::map<std::string, size_t> group_indices;
for (const auto& group : groups_) {
LpMetadataPartitionGroup out = {};
strncpy(out.name, group->name().c_str(), sizeof(out.name));
out.maximum_size = group->maximum_size();

group_indices[group->name()] = metadata->groups.size();
metadata->groups.push_back(out);
}

// Flatten the partition and extent structures into an LpMetadata, which
// makes it very easy to validate, serialize, or pass on to device-mapper.
for (const auto& partition : partitions_) {
LpMetadataPartition part;
memset(&part, 0, sizeof(part));
...
strncpy(part.name, partition->name().c_str(), sizeof(part.name));
part.first_extent_index = static_cast<uint32_t>(metadata->extents.size());
part.num_extents = static_cast<uint32_t>(partition->extents().size());
part.attributes = partition->attributes();
if (auto_slot_suffixing_) {
part.attributes |= LP_PARTITION_ATTR_SLOT_SUFFIXED;
}

auto iter = group_indices.find(partition->group_name());
part.group_index = iter->second;

for (const auto& extent : partition->extents()) {
// 保存到metadata的extents字段中
if (!extent->AddTo(metadata.get())) {
return nullptr;
}
}
metadata->partitions.push_back(part);
}

metadata->header.partitions.num_entries = static_cast<uint32_t>(metadata->partitions.size());
metadata->header.extents.num_entries = static_cast<uint32_t>(metadata->extents.size());
metadata->header.groups.num_entries = static_cast<uint32_t>(metadata->groups.size());
metadata->header.block_devices.num_entries =
static_cast<uint32_t>(metadata->block_devices.size());
return metadata;
}

将新的分区信息更新到对应的super的设备节点中.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
UpdatePartitionTable(super_device, *metadata, 0) {
android::base::unique_fd fd = opener.Open(super_partition, O_RDWR | O_SYNC);
std::string blob;
// 校验和序列化metadata数据, 如extents/partitions/block_devices字段是否是合法的.
if (!ValidateAndSerializeMetadata(opener, metadata, slot_suffix, &blob)) {
return false;
}
// 比较新老geometry的配置是否是一致的. metadata_max_size/ metadata_slot_count / logical_block_size
if (!CompareGeometry(geometry, old_geometry)) {
LERROR << "Incompatible geometry in new logical partition metadata";
return false;
}
...
// 往磁盘中更新primary和backup的metadata
if (!WriteMetadata(fd, metadata, slot_number, blob, writer)) {
return false;
}
}

1.7.4. 小结

主要是对涉及到的数据结构的理解

对于非ab系统的(slot_count=1)的, 可以简单看下其中涉及的结构

首先是geometry, 包括primary和backup的, 每个4k, 描述了metadata的配置信息.

其次是metadata, 包括primary和backup的, 大小信息被配置在geometry中, 描述了分区/分区组/分区的range块的信息.

1.8. map partition

分区调整后, 如果当前修改的分区正在被使用中, 需要卸载unmap, 再重新map和挂载. 如果是系统分区卸载和unmap的操作会影响手机的运行, 因此手机需要重启才能重新map和挂载.

动态分区的调整需要重新map才能生效. 这里看下map的流程.

该过程与device-mapper的使用规范是紧密关联的, 只是在user space层多加了一个DeviceMapper实例对dm设备的创建查询销毁等操作进行托管.

1.8.1. device-mapper

图1 Device Mapper的内核体系架构

“linux storage stack”的图片搜索结果

Device-mapper是 Linux 内核映射块设备的一种技术框架。提供的一种从逻辑设备(虚拟设备)到物理设备的映射框架机制,在该机制下,用户可以很方便的根据自己的需要制定实现存储资源的管理策略。

当前比较流行的 Linux 下的逻辑卷管理器如 LVM2(Linux Volume Manager 2 version)、EVMS(Enterprise Volume Management System)、dmraid(Device Mapper Raid Tool)等都是基于该机制实现的。

Device Mapper 工作在块级别(block),并不工作在文件级别(file)。Device Mapper 自 Linux 2.6.9 后编入 Linux 内核,所有基于 Linux 内核 2.6.9 以后的发行版都内置 Device Mapper.

Device mapper 在内核中作为一个块设备驱动被注册的,它包含三个重要的对象概念:

  • mapped device

    Mapped device 是一个逻辑抽象(dm-*),可以理解成为内核向外提供的逻辑设备,它通过映射表描述的映射关系和 target device (super)建立映射。

  • 映射表

    是由用户空间创建,传递到内核空间。映射表里有映射设备逻辑的起始地址范围、和表示在目标设备所在物理设备的地址偏移量以及Target 类型等信息(注:这些地址和偏移量都是以磁盘的扇区为单位的,即 512 个字节大小,所以,当你看到 128 的时候,其实表示的是 128*512=64K)。

  • target device

​ 可以是真实的物理设备, 也可以是dm的虚拟设备. Device mapper 中这三个对象和 target driver 插件一起构成了一个可迭代的设备树.

undefined

映射驱动在内核空间是插件,Device Mapper 在内核中通过一个一个模块化的 Target Driver 插件实现对 IO 请求的过滤或者重新定向等工作,当前已经实现的插件包括软 Raid、加密、多路径、镜像、快照、线性映射等,策略和机制分离.

Device mapper处理所有从generic_make_requestsubmit_bio接口中定向到mapped device的所有块读写I/O请求。I/O请求在device mapper的设备树中通过请求转发从上到下地进行处理. Device mapper本质功能就是根据映射关系和target driver描述的IO处理规则,将IO请求从逻辑设备mapped device转发相应的target device上

  • 向下转发:当一个bio请求在设备树中的mapped deivce向下层转发时,一个或者多个bio的克隆被创建并发送给下层target device。然后相同的过程在设备树的每一个层次上重复,只要设备树足够大理论上这种转发过程可以无限进行下去
  • 向上返回事件:在设备树上某个层次中,target driver结束某个bio请求后,将表示结束该bio请求的事件上报给它上层的mapped device,该过程在各个层次上进行直到该事件最终上传到根mapped device的为止,然后device mapper结束根mapped device上原始bio请求,结束整个I/O请求过程.

1.8.1.1. 用户空间的操作规范:

Device mapper库就是对ioctl、用户空间创建删除device mapper逻辑设备所需必要操作的封装,dmsetup是一个提供给用户直接可用的创建删除device mapper设备的命令行工具。用户空间主要负责如下工作:

  1. 发现每个mapped device相关的target device;
  2. 根据配置信息创建映射表;
  3. 将用户空间构建好的映射表传入内核,让内核构建该mapped device对应的dm_table结构;
  4. 保存当前的映射信息,以便未来重新构建。

dm设备举例:

1
2
3
0    1024 linear /dev/sda 204
1024 512 linear /dev/sdb 766
1536 128 linear /dev/sdc 0

undefined

1.8.1.2. Q上动态分区的映射

  • 动态分区机制使用dm-linear设备驱动映射逻辑扇区到super中的system/vendor等只读分区的物理扇区上.
  • 通过修改大小, 动态分区可能变成片段化的

/dev/block/by-name/super super.img

1
2
3
4
/dev/block/by-name/dm-0 (system):
dm-linear super <block range 1>
/dev/block/by-name/dm-1 (vendor):
dm-linear super <block range 2>

resize partition

1
2
3
4
5
6
/dev/block/by-name/dm-0 (system):
dm-linear super <block range 1>
dm-linear super <block range 3>
/dev/block/by-name/dm-1 (vendor):
dm-linear super <block range 2>
dm-linear super <block range 4>

1.8.1.3. DeviceMapper库创建(用户空间)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
auto state = DeviceMapper::Instance().GetState(partition_name);
if (state == DmDeviceState::INVALID) {
return CreateLogicalPartition(GetSuperDevice(), 0 /* metadata slot */, partition_name,
true /* force writable */, kMapTimeout, path);
}
static bool CreateLogicalPartition(const LpMetadata& metadata, const LpMetadataPartition& partition,
bool force_writable, const std::chrono::milliseconds& timeout_ms,
const std::string& super_device, std::string* path) {
DeviceMapper& dm = DeviceMapper::Instance();

DmTable table;
// ---> 1. 创建dm映射表
if (!CreateDmTable(metadata, partition, super_device, &table)) {
return false;
}
if (force_writable) {
table.set_readonly(false);
}
// ---> 2. 根据映射表和target_device创建mapped_device
std::string name = GetPartitionName(partition);
if (!dm.CreateDevice(name, table)) {
return false;
}
if (!dm.GetDmDevicePathByName(name, path)) {
return false;
}
if (timeout_ms > std::chrono::milliseconds::zero()) {
// ---> 3. 等待mapped_device创建完成, 超时销毁
if (!fs_mgr_wait_for_file(*path, timeout_ms, FileWaitMode::Exists)) {
DestroyLogicalPartition(name, {});
return false;
}
}
return true;
}
// ----> 1. 创建映射表
static bool CreateDmTable(const LpMetadata& metadata, const LpMetadataPartition& partition,
const std::string& super_device, DmTable* table) {
uint64_t sector = 0;
// 对应于一个子分区, 其中所属的extent range块已经被编排到metadata的extents段中, 是相连的. start是partition的first_extent_index, 往后num_extents个item都是属于该子分区的. 这个编排是在builder Init时通过ImportExtents函数做的
for (size_t i = 0; i < partition.num_extents; i++) {
const auto& extent = metadata.extents[partition.first_extent_index + i];
std::unique_ptr<DmTarget> target;
switch (extent.target_type) {
...
case LP_TARGET_TYPE_LINEAR: {
const auto& block_device = metadata.block_devices[extent.target_source];
std::string path;
GetPhysicalPartitionDevicePath(metadata, block_device, super_device, &path)
// 对应每个extent range块,构造对应的target item(开始的扇区号,多少个扇区,block_device name, 映射的物理扇区的起始扇区号)
target = std::make_unique<DmTargetLinear>(sector, extent.num_sectors, path,
extent.target_data);
break;
}
...
}
// 每个extent range块构造的target item串联起来,放在table的targets下. target item的target_data是从小往大排的
if (!table->AddTarget(std::move(target))) {
return false;
}
sector += extent.num_sectors;
}
if (partition.attributes & LP_PARTITION_ATTR_READONLY) {
table->set_readonly(true);
}
return true;
}

// ---> 2. 根据映射表和target_device创建mapped_device
dm.CreateDevice(name, table)
{
struct dm_ioctl io;
InitIo(&io, name);
// 通过ioctl创建dm设备
if (ioctl(fd_, DM_DEV_CREATE, &io)) {
return false;
}

LoadTableAndActivate(name, table) {
std::string ioctl_buffer(sizeof(struct dm_ioctl), 0);
// ==========> 2.1 调用映射表的序列化函数将前面的target_序列化输出到ioctl_buffer中
ioctl_buffer += table.Serialize();

struct dm_ioctl* io = reinterpret_cast<struct dm_ioctl*>(&ioctl_buffer[0]);
InitIo(io, name);
io->data_size = ioctl_buffer.size();
io->data_start = sizeof(struct dm_ioctl);
io->target_count = static_cast<uint32_t>(table.num_targets());
if (table.readonly()) {
io->flags |= DM_READONLY_FLAG;
}
//Load a table into the 'inactive' slot for the device
if (ioctl(fd_, DM_TABLE_LOAD, io)) {
return false;
}

InitIo(io, name);
// 设备处于就绪状态或resume状态, 根据上面传入的io的flag决定. 没有带DM_SUSPEND_FLAG的flag, 是走resume状态
/*If a table is present in the 'inactive'
* slot, it will be moved to the active slot, then the old table from the active slot will be _destroyed_. Finally the device is resumed.
*/
if (ioctl(fd_, DM_DEV_SUSPEND, io)) {
return false;
}
}
// 通过ioctl获取dm设备的状态
if (ioctl(fd_, DM_DEV_STATUS, &io) < 0) {
return false;
}
uint32_t dev_num = minor(io.dev);
//dm-<minor> 设备对应io->name(target_device)绑定的mapped_device
*path = "/dev/block/dm-" + std::to_string(dev_num);
}
// ---> 3. 等待mapped_device dm-<minior>创建完成, 超时销毁

在上面创建映射表并初始化dm设备的过程中, 最重要的参数是dmtarget的序列化做的工作

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// ==========> 2.1 调用映射表的序列化函数将前面的target_序列化输出到ioctl_buffer中
ioctl_buffer += table.Serialize();
std::string DmTable::Serialize() const {
std::string table;
for (const auto& target : targets_) {
table += target->Serialize();
}
return table;
}
std::string DmTarget::Serialize() const {
// Create a string containing a dm_target_spec, parameter data, and an
// explicit null terminator.
std::string data(sizeof(dm_target_spec), '\0');
//构造头部, block_device_ + " " + std::to_string(physical_sector_); // target_data
data += GetParameterString();
data.push_back('\0');

// The kernel expects each target to be 8-byte aligned.
size_t padding = DM_ALIGN(data.size()) - data.size();
for (size_t i = 0; i < padding; i++) {
data.push_back('\0');
}

// Finally fill in the dm_target_spec.
struct dm_target_spec* spec = reinterpret_cast<struct dm_target_spec*>(&data[0]);
// 开始的扇区号
spec->sector_start = start();
// 包含多少个扇区
spec->length = size();
// 这里的name对应dm-linear的name即"linear", device-mapper驱动是通过target-type转到对应的插件去处理的
snprintf(spec->target_type, sizeof(spec->target_type), "%s", name().c_str());
// 将本段的大小偏移量作为下一段dm_target_spec的起始, 多个dm_target_spec item首尾相连
spec->next = (uint32_t)data.size();
return data;
}

用户空间使用ioctl的步骤:

  • 先创建一个dm-ioctl和dm_target_spec对象

  • 配置一下他们的参数

  • dm_target_spec后面跟一个特定设备的特定参数(special param)

    将三者结合到dm-ioctl上,通过调用一下命令就可以在device mapper中load一个dm设备了

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
struct dm_target_spec {
__u64 sector_start;
__u64 length;
__s32 status; /* used when reading from kernel only */

/*
* Location of the next dm_target_spec.
* - When specifying targets on a DM_TABLE_LOAD command, this value is
* the number of bytes from the start of the "current" dm_target_spec
* to the start of the "next" dm_target_spec.
* - When retrieving targets on a DM_TABLE_STATUS command, this value
* is the number of bytes from the start of the first dm_target_spec
* (that follows the dm_ioctl struct) to the start of the "next"
* dm_target_spec.
*/
__u32 next;

char target_type[DM_MAX_TYPE_NAME];
};

相关ioctl源代码kernel/msm-4.19/drivers/md/dm-ioctl.c

device_map