Linux kfence使用与实现原理

0 背景

为了更好的检测linux kernel中内存out-of-bounds、mem-corruption、use-after-free、invaild-free等问题,调研了kfence功能(该功能在linux kernel 5.12引入),帮助研发更好的分析与定位这类内存错误的问题。

一、kfence介绍

1.1 什么是kfence

kfence是Linux kernel中用于检测内存错误的工具,如检测out-of-bounds、mem-corruption、use-after-free、invaild-free等,利用该工具尽早发现项目中存在的内存错误问题,帮助研发人员快速定位分析这些问题。

1.2 kfence与kasan区别

检测范围

检测原理

性能影响

适用场景

kfence

小于1个page(4KB)的slab内存分配

1)采用page fence和canary pattern机制检测内存out-of-bounds

2)采用data page的状态标志(如已释放的data page标记free)检测内存use-after-free

对内存的影响:

kfence采用以大量内存开销换取较小的性能干扰的思路,占用的内存较高,但可设定任意较小的num_objects来节约内存;

其他情况(全量模式及动态开启)则需消耗GB级别的内存。

对性能的影响:

采样模式下,对性能影响较小;

全量模式,对性能影响较大。

采样模式下,由于性能开销较小,可以在量产阶段使用

ksan

适用整个kernel的内存分配,包括所有的slab、page、堆栈和全局内存等

采用shadow memory检测机制

开销较大

由于性能开销大,一般在研发阶段使用

二、kfence如何使用

kfence是linux kernel 5.12版本才引入,低内核版本想使用kfence工具,第一步需要功能移植(详见第四节)。

2.1 打开kfence功能开关

CONFIG_KFENCE=y    // kfence enable
CONFIG_KFENCE_SAMPLE_INTERVAL=500    // 采样时间间隔,每隔500ms做检测
CONFIG_KFENCE_NUM_OBJECTS=63    // kfence内存池size

 以上宏控配置可以根据自己的需求来做配置。

2.2 debug

宏控配置的方式不够灵活,不利于debug。因此,内核向用户空间提供了一些节点,方便用户动态调整配置:

​​/sys/module/kfence/parameters/check_on_panic​
Y:更多的DEBUG信息
N:在生产环境中,减少系统崩溃时的额外开销

/sys/module/kfence/parameters/deferrable
Y:KFENCE可以延迟执行某些内存检测操作,以减少对系统性能的影响
N:KFENCE 不会延迟执行内存检测操作,而是立即执行

/sys/kernel/debug/kfence/stats  // 记录kfence内存检测的状态信息

​​/sys/kernel/debug/kfence/objects​  // 提供关于 KFENCE 管理的内存对象的信息

echo -1 > /sys/module/kfence/parameters/sample_interval    // 动态调整内存检测的采样时间间隔;0:表示关闭kfence功能,-1:所有符合(slab类型筛选)条件的内存均将进入kfence的监控范围内
echo 100 > /sys/module/kfence/parameters/skip_covered_thresh    // 当某个内存区域的访问频率超过这个阈值时,KFENCE 可能会选择跳过对该区域的检测

2.3 查询相关日志信息

当kfence捕获到内存错误问题时,可以 cat /sys/kernel/debug/kfence/stats节点,查看total bugs计数会增加:

系统会将信息打印在dmesg,通过dmesg | grep -i kfence查询kfence相关的错误日志信息:

2.4 如何独立收集这些错误信息

在kfence捕获到内存错误,将日志输出到dmesg附近做hook,将日志获取到。详见3.2节。

三、kfence实现原理

3.1 检测原理

3.1.1 slub/slab hook实现

需要在slub/slab的malloc、free流程中加入kfence模块的hook,这样在内存分配与释放流程中才能走kfence的malloc、free流程,实现对内存错误的监控。

1)kfence alloc实现流程

在初始化阶段,kfence创建了自己的专有检测内存池 kfence_pool,详见3.3

kmem_cache_alloc--->__kmem_cache_alloc_lru---> slab_alloc--->slab_alloc_node--->kfence_alloc,kfence alloc代码实现,详见3.4节。

2)kfence free实现流程

__kmem_cache_free--->__do_kmem_cache_free--->__cache_free--->__kfence_free,kfence free代码实现,详见3.5节。

3.1.2 use-after-free

obj 被 free 以后,对应 data page 也会被设置成不可访问状态。当被访问时,立刻会触发异常。

3.1.3 out-of-bounds或mem-corruption

内存访问越界,可分为data page页外访问越界(out-of-bounds)和页内访问越界(mem-corruption)。

data page页外访问越界:

从 kfence_pool内存池中分配一个内存对象 obj,不管 obj 的实际大小有多大,都会占据一个 data page, data page 的两边加上了 fence page 电子栅栏,利用 MMU 的特性把 fence page 设置成不可访问。如果对 data page 的访问越过了 page 边界, 即访问page fence,就会立刻触发异常,这种就称为data page页外访问越界。

data page页内访问越界:

大部分情况下 obj 是小于一个 page 的,对于 data page 剩余空间系统使用 canary pattern 进行填充。这种操作是为了检测超出了 obj 但还在 data page 范围内的溢出访问,这种就称为data page页内访问越界。

页内访问越界发生时不会立刻触发,只能在 obj free 时,通过检测 canary pattern 被破坏来检测到有 canary 区域的溢出访问,这种异常访问也被叫做mem-corruption.

3.1.4 invalid-free

当obj free 时,会检查记录的 malloc 信息,判断是不是一次异常的 free,如内存重复释放。

3.2 异常如何触发&日志打印

1)use-after-free:KFENCE_ERROR_UAF类型的内存错误

当某个模块的代码中触发了use-after-free,会走kernel原生的流程,调用kfence的kfence_handle_page_fault函数,进行错误日志的收集与打印。

// kernel/arch/arm/mm/fault.c 

/*
 * Oops.  The kernel tried to access some page that wasn't present.
 */
static void
__do_kernel_fault(struct mm_struct *mm, unsigned long addr, unsigned int fsr,
                  struct pt_regs *regs)
{
        const char *msg;
        /*
         * Are we prepared to handle this kernel fault?
         */
        if (fixup_exception(regs))
                return;

        /*
         * No handler, we'll have to terminate things with extreme prejudice.
         */
        if (addr < PAGE_SIZE) {
                msg = "NULL pointer dereference";
        } else {
                if (is_translation_fault(fsr) &&
                    kfence_handle_page_fault(addr, is_write_fault(fsr), regs))
                        return;

                msg = "paging request";
        }

        die_kernel_fault(msg, mm, addr, fsr, regs);
}

kfence_handle_page_fault函数中判断是KFENCE_ERROR_OOB或KFENCE_ERROR_UAF类型的错误,调用kfence_report_error将错误的日志打印到dmesg.

bool kfence_handle_page_fault(unsigned long addr, bool is_write, struct pt_regs *regs)
{
        const int page_index = (addr - (unsigned long)__kfence_pool) / PAGE_SIZE;
        struct kfence_metadata *to_report = NULL;
        enum kfence_error_type error_type;
        unsigned long flags;

        if (!is_kfence_address((void *)addr))
                return false;

        if (!READ_ONCE(kfence_enabled)) /* If disabled at runtime ... */
                return kfence_unprotect(addr); /* ... unprotect and proceed. */

        atomic_long_inc(&counters[KFENCE_COUNTER_BUGS]);
        // 判断是KFENCE_ERROR_OOB(data page页外越界访问)还是KFENCE_ERROR_UAF(use-after-free)类型的错误
        // 如果page_index是奇数,说明是fence page被访问,KFENCE_ERROR_OOB类型错误
        // 如果page_index是偶数,说明是data page释放后被访问,KFENCE_ERROR_UAF类型错误
        if (page_index % 2) {
                /* This is a redzone, report a buffer overflow. */
                struct kfence_metadata *meta;
                int distance = 0;

                meta = addr_to_metadata(addr - PAGE_SIZE);
                if (meta && READ_ONCE(meta->state) == KFENCE_OBJECT_ALLOCATED) {
                        to_report = meta;
                        /* Data race ok; distance calculation approximate. */
                        distance = addr - data_race(meta->addr + meta->size);
                }

                meta = addr_to_metadata(addr + PAGE_SIZE);
                if (meta && READ_ONCE(meta->state) == KFENCE_OBJECT_ALLOCATED) {
                        /* Data race ok; distance calculation approximate. */
                        if (!to_report || distance > data_race(meta->addr) - addr)
                                to_report = meta;
                }

                if (!to_report)
                        goto out;

                raw_spin_lock_irqsave(&to_report->lock, flags);
                to_report->unprotected_page = addr;
                error_type = KFENCE_ERROR_OOB;

                /*
                 * If the object was freed before we took the look we can still
                 * report this as an OOB -- the report will simply show the
                 * stacktrace of the free as well.
                 */
        } else {
                to_report = addr_to_metadata(addr);
                if (!to_report)
                        goto out;

                raw_spin_lock_irqsave(&to_report->lock, flags);
                error_type = KFENCE_ERROR_UAF;
                /*
                 * We may race with __kfence_alloc(), and it is possible that a
                 * freed object may be reallocated. We simply report this as a
                 * use-after-free, with the stack trace showing the place where
                 * the object was re-allocated.
                 */
        }

out:
        if (to_report) {
                kfence_report_error(addr, is_write, regs, to_report, error_type);
                raw_spin_unlock_irqrestore(&to_report->lock, flags);
        } else {
                /* This may be a UAF or OOB access, but we can't be sure. */
                // 无法判断是哪种类型的内存错误
                kfence_report_error(addr, is_write, regs, NULL, KFENCE_ERROR_INVALID);
        }

        return kfence_unprotect(addr); /* Unprotect and let access proceed. */
}

2)out-of-bounds(页外访问越界):KFENCE_ERROR_OOB类型的内存错误

同上

3)out-of-bounds(页内访问越界):KFENCE_ERROR_CORRUPTION类型的内存错误

在kfence allock阶段初始化canary区域(详见3.4),kfence free阶段去检测canary区域是否被访问过或破坏,如果被破坏,传入KFENCE_ERROR_CORRUPTION类型的参数,调用kfence_report_error函数,打印错误日志信息。

static void kfence_guarded_free(void *addr, struct kfence_metadata *meta, bool zombie)
{
        ......
        
        /* Check canary bytes for memory corruption. */
        for_each_canary(meta, check_canary_byte);
        
        ......
}

/* __always_inline this to ensure we won't do an indirect call to fn. */
static __always_inline void for_each_canary(const struct kfence_metadata *meta, bool (*fn)(u8 *))
{
        // pageaddr为这块data page的首地址
        const unsigned long pageaddr = ALIGN_DOWN(meta->addr, PAGE_SIZE);
        unsigned long addr;

        /*
         * We'll iterate over each canary byte per-side until fn() returns
         * false. However, we'll still iterate over the canary bytes to the
         * right of the object even if there was an error in the canary bytes to
         * the left of the object. Specifically, if check_canary_byte()
         * generates an error, showing both sides might give more clues as to
         * what the error is about when displaying which bytes were corrupted.
         */

        /* Apply to left of object. */
        // 检查左边的canary区域
        for (addr = pageaddr; addr < meta->addr; addr++) {
                if (!fn((u8 *)addr))
                        break;
        }

        /* Apply to right of object. */
        // 检查右边的canary区域
        for (addr = meta->addr + meta->size; addr < pageaddr + PAGE_SIZE; addr++) {
                if (!fn((u8 *)addr))
                        break;
        }
}

/* Check canary byte at @addr. */
static inline bool check_canary_byte(u8 *addr)
{
        struct kfence_metadata *meta;
        unsigned long flags;
        // 如果data page的canary区域没被访问过或破坏,直接返回,否则,调用kfence_report_error函数,打印错误日志信息
        if (likely(*addr == KFENCE_CANARY_PATTERN(addr)))
                return true;

        atomic_long_inc(&counters[KFENCE_COUNTER_BUGS]);
        // 根据内存地址找到元数据对象
        meta = addr_to_metadata((unsigned long)addr);
        raw_spin_lock_irqsave(&meta->lock, flags);
        // 传入KFENCE_ERROR_CORRUPTION类型的参数,调用kfence_report_error函数,打印错误日志信息
        kfence_report_error((unsigned long)addr, false, NULL, meta, KFENCE_ERROR_CORRUPTION);
        raw_spin_unlock_irqrestore(&meta->lock, flags);

        return false;
}
/*
 * Get the canary byte pattern for @addr. Use a pattern that varies based on the
 * lower 3 bits of the address, to detect memory corruptions with higher
 * probability, where similar constants are used.
 */
#define KFENCE_CANARY_PATTERN(addr) ((u8)0xaa ^ (u8)((unsigned long)(addr) & 0x7))

4)invalid-free:KFENCE_ERROR_INVALID_FREE类型的内存错误

kfence free阶段去检测本次内存释放是否为invalid-free,调用kfence_report_error函数,传入KFENCE_ERROR_INVALID_FREE类型的参数,打印错误日志信息。

static void kfence_guarded_free(void *addr, struct kfence_metadata *meta, bool zombie)
{
        ......
        // 如果内存块没有被分配就释放(包含了double-free)或内存块分配与释放时的地址不一样,认为本次释放是invalid-free
        if (meta->state != KFENCE_OBJECT_ALLOCATED || meta->addr != (unsigned long)addr) {
                /* Invalid or double-free, bail out. */
                atomic_long_inc(&counters[KFENCE_COUNTER_BUGS]);
                // 调用kfence_report_error函数,传入KFENCE_ERROR_INVALID_FREE类型的参数,打印错误日志信息
                kfence_report_error((unsigned long)addr, false, NULL, meta,
                                    KFENCE_ERROR_INVALID_FREE);
                raw_spin_unlock_irqrestore(&meta->lock, flags);
                return;
        }

        ......
}

下面看如何打印错误的日志信息,kfence_report_error错误的日志信息会打印到dmesg.

#define pr_err printk


void kfence_report_error(unsigned long address, bool is_write, struct pt_regs *regs,
                         const struct kfence_metadata *meta, enum kfence_error_type type)
{
       ......
       
        /* Print report header. */
        switch (type) {
        // 打印data page页外访问越界的错误日志信息到dmesg
        case KFENCE_ERROR_OOB: {
                const bool left_of_object = address < meta->addr;

                pr_err("BUG: KFENCE: out-of-bounds %s in %pS\n\n", get_access_type(is_write),
                       (void *)stack_entries[skipnr]);
                pr_err("Out-of-bounds %s at 0x%p (%luB %s of kfence-#%td):\n",
                       get_access_type(is_write), (void *)address,
                       left_of_object ? meta->addr - address : address - meta->addr,
                       left_of_object ? "left" : "right", object_index);
                break;
        }
         // 打印use-after-free的错误日志信息到dmesg
        case KFENCE_ERROR_UAF:
                pr_err("BUG: KFENCE: use-after-free %s in %pS\n\n", get_access_type(is_write),
                       (void *)stack_entries[skipnr]);
                pr_err("Use-after-free %s at 0x%p (in kfence-#%td):\n",
                       get_access_type(is_write), (void *)address, object_index);
                break;
        // 打印data page页内(canary区域内存破坏)访问越界的错误日志信息到dmesg
        case KFENCE_ERROR_CORRUPTION:
                pr_err("BUG: KFENCE: memory corruption in %pS\n\n", (void *)stack_entries[skipnr]);
                pr_err("Corrupted memory at 0x%p ", (void *)address);
                print_diff_canary(address, 16, meta);
                pr_cont(" (in kfence-#%td):\n", object_index);
                break;
        case KFENCE_ERROR_INVALID:
                pr_err("BUG: KFENCE: invalid %s in %pS\n\n", get_access_type(is_write),
                       (void *)stack_entries[skipnr]);
                pr_err("Invalid %s at 0x%p:\n", get_access_type(is_write),
                       (void *)address);
                break;
        // 打印invalid-free的错误日志信息到dmesg
        case KFENCE_ERROR_INVALID_FREE:
                pr_err("BUG: KFENCE: invalid free in %pS\n\n", (void *)stack_entries[skipnr]);
                pr_err("Invalid free of 0x%p (in kfence-#%td):\n", (void *)address,
                       object_index);
                break;
        }

      ......
}

3.3 kfence init

kfence初始化主要做了几件事情:

1)判断kfence_sample_interval采样间隔是否为0,设置为0,说明kfence功能disable

2)分配kfence pool内存池,默认内存块是255,分配(255+1)*2 = 512个page,包括255个data page,256个fence page,1个不可用的data page(放在第一个位置,记为page 0)

3)初始化metadata数组,记录每个data page内存块状态信息

4)初始化freelist空闲链表,记录data page内存块的是否可分配

5)将所有fence page和page 0设置为不可访问

// mm/kfence/core.c

void __init kfence_init(void)
{
        stack_hash_seed = get_random_u32();

        /* Setting kfence_sample_interval to 0 on boot disables KFENCE. */
        // 1. 采样间隔为0,kfence disable
        if (!kfence_sample_interval)
                return;
        // 2. 初始化kfence pool内存池
        if (!kfence_init_pool_early()) {
                pr_err("%s failed\n", __func__);
                return;
        }
        kfence_init_enable();
}
static bool __init kfence_init_pool_early(void)
{
        unsigned long addr;

        if (!__kfence_pool)
                return false;

        addr = kfence_init_pool();

        ......
}

#define KFENCE_POOL_SIZE ((CONFIG_KFENCE_NUM_OBJECTS + 1) * 2 * PAGE_SIZE)    // 默认为256*2个page
static struct list_head kfence_freelist = LIST_HEAD_INIT(kfence_freelist);    // 空闲链表,记录空闲的内存块
struct kfence_metadata kfence_metadata[CONFIG_KFENCE_NUM_OBJECTS];    // metadata数组,记录data page内存块状态信息
/*
 * Initialization of the KFENCE pool after its allocation.
 * Returns 0 on success; otherwise returns the address up to
 * which partial initialization succeeded.
 */
static unsigned long kfence_init_pool(void)
{
        unsigned long addr;
        struct page *pages;
        int i;

        if (!arch_kfence_init_pool())
                return (unsigned long)__kfence_pool;

        addr = (unsigned long)__kfence_pool;
        // 将虚拟地址转换为物理地址
        pages = virt_to_page(__kfence_pool);

        /*
         * Set up object pages: they must have PG_slab set, to avoid freeing
         * these as real pages.
         *
         * We also want to avoid inserting kfence_free() in the kfree()
         * fast-path in SLUB, and therefore need to ensure kfree() correctly
         * enters __slab_free() slow-path.
         */
         // 默认分配512个page
        for (i = 0; i < KFENCE_POOL_SIZE / PAGE_SIZE; i++) {
                struct slab *slab = page_slab(nth_page(pages, i));

                if (!i || (i % 2))
                        continue;

                __folio_set_slab(slab_folio(slab));
#ifdef CONFIG_MEMCG
                slab->memcg_data = (unsigned long)&kfence_metadata[i / 2 - 1].objcg |
                                   MEMCG_DATA_OBJCGS;
#endif
        }

        /*
         * Protect the first 2 pages. The first page is mostly unnecessary, and
         * merely serves as an extended guard page. However, adding one
         * additional page in the beginning gives us an even number of pages,
         * which simplifies the mapping of address to metadata index.
         */
        for (i = 0; i < 2; i++) {
                if (unlikely(!kfence_protect(addr)))
                        return addr;

                addr += PAGE_SIZE;
        }

        for (i = 0; i < CONFIG_KFENCE_NUM_OBJECTS; i++) {
                struct kfence_metadata *meta = &kfence_metadata_init[i];

                /* Initialize metadata. */
                INIT_LIST_HEAD(&meta->list);
                raw_spin_lock_init(&meta->lock);
                // 记录内存块状态为unused
                meta->state = KFENCE_OBJECT_UNUSED;
                // 记录内存块地址
                meta->addr = addr; /* Initialize for validation in metadata_to_pageaddr(). */
                // 加入空闲链表
                list_add_tail(&meta->list, &kfence_freelist);

                /* Protect the right redzone. */
                // 将fence page设置为不可访问
                if (unlikely(!kfence_protect(addr + PAGE_SIZE)))
                        goto reset_slab;
                // 下一个data page的首地址
                addr += 2 * PAGE_SIZE;    // 每个page data间隔8KB,因为中间隔了一个fence page
        }

        /*
         * Make kfence_metadata visible only when initialization is successful.
         * Otherwise, if the initialization fails and kfence_metadata is freed,
         * it may cause UAF in kfence_shutdown_cache().
         */
        smp_store_release(&kfence_metadata, kfence_metadata_init);
        return 0;

reset_slab:
        for (i = 0; i < KFENCE_POOL_SIZE / PAGE_SIZE; i++) {
                struct slab *slab = page_slab(nth_page(pages, i));

                if (!i || (i % 2))
                        continue;
#ifdef CONFIG_MEMCG
                slab->memcg_data = 0;
#endif
                __folio_clear_slab(slab_folio(slab));
        }

        return addr;
}

3.4 kfence alloc

Kfence alloc主要做了以下几个事情:

1)从kfence pool内存池中找到空闲内存块(data page)

2)向data page canary区域写入固定的数据,便于在free阶段做检测

void *__kfence_alloc(struct kmem_cache *s, size_t size, gfp_t flags)
{
        unsigned long stack_entries[KFENCE_STACK_DEPTH];
        size_t num_stack_entries;
        u32 alloc_stack_hash;

        /*
         * Perform size check before switching kfence_allocation_gate, so that
         * we don't disable KFENCE without making an allocation.
         */
         // 如果申请的内存超过1个page(4KB),直接返回NULL
        if (size > PAGE_SIZE) {
                atomic_long_inc(&counters[KFENCE_COUNTER_SKIP_INCOMPAT]);
                return NULL;
        }

        /*
         * Skip allocations from non-default zones, including DMA. We cannot
         * guarantee that pages in the KFENCE pool will have the requested
         * properties (e.g. reside in DMAable memory).
         */
        if ((flags & GFP_ZONEMASK) ||
            (s->flags & (SLAB_CACHE_DMA | SLAB_CACHE_DMA32))) {
                atomic_long_inc(&counters[KFENCE_COUNTER_SKIP_INCOMPAT]);
                return NULL;
        }

        /*
         * Skip allocations for this slab, if KFENCE has been disabled for
         * this slab.
         */
         // 标志位设置了 ​SLAB_SKIP_KFENCE​,说明对于该 slab 已经禁用了 KFENCE,直接返回 NULL
         /*
         除此之外,还有以下标志位
         SLAB_RECLAIM_ACCOUNT​:用于标记 slab 是可回收的,即可以被内存回收机制重新使用。
        ​​SLAB_PANIC​:在出现内存分配失败时,会触发内核崩溃转储,用于故障排除。      
        ​​SLAB_CONSISTENCY_CHECKS​:启用一致性检查,用于检测内存污染或其他问题。
        ​​SLAB_RED_ZONE​:在分配的内存块两端添加红色区域,用于检测写越界操作。
        ​​SLAB_STORE_USER​:在 slab 元数据中存储用户定义的数据。 
        ​​SLAB_DEBUG_OBJECTS​:用于开启额外的对象调试功能。
        */
        if (s->flags & SLAB_SKIP_KFENCE)
            return NULL;
        // kfence_allocation_gate > 1,说明还没到下一轮采样时间点
        if (atomic_inc_return(&kfence_allocation_gate) > 1)
                return NULL;
#ifdef CONFIG_KFENCE_STATIC_KEYS
        /*
         * waitqueue_active() is fully ordered after the update of
         * kfence_allocation_gate per atomic_inc_return().
         */
        if (waitqueue_active(&allocation_wait)) {
                /*
                 * Calling wake_up() here may deadlock when allocations happen
                 * from within timer code. Use an irq_work to defer it.
                 */
                irq_work_queue(&wake_up_kfence_timer_work);
        }
#endif

        if (!READ_ONCE(kfence_enabled))
                return NULL;

        num_stack_entries = stack_trace_save(stack_entries, KFENCE_STACK_DEPTH, 0);

        /*
         * Do expensive check for coverage of allocation in slow-path after
         * allocation_gate has already become non-zero, even though it might
         * mean not making any allocation within a given sample interval.
         *
         * This ensures reasonable allocation coverage when the pool is almost
         * full, including avoiding long-lived allocations of the same source
         * filling up the pool (e.g. pagecache allocations).
         */
        alloc_stack_hash = get_alloc_stack_hash(stack_entries, num_stack_entries);
        if (should_skip_covered() && alloc_covered_contains(alloc_stack_hash)) {
                atomic_long_inc(&counters[KFENCE_COUNTER_SKIP_COVERED]);
                return NULL;
        }

        return kfence_guarded_alloc(s, size, flags, stack_entries, num_stack_entries,
                                    alloc_stack_hash);
}

static void *kfence_guarded_alloc(struct kmem_cache *cache, size_t size, gfp_t gfp,
                                  unsigned long *stack_entries, size_t num_stack_entries,
                                  u32 alloc_stack_hash)
{
        // 以kfence_metadata结构体管理元数据
        struct kfence_metadata *meta = NULL;
        unsigned long flags;
        struct slab *slab;
        void *addr;
        const bool random_right_allocate = prandom_u32_max(2);
        const bool random_fault = CONFIG_KFENCE_STRESS_TEST_FAULTS &&
                                  !prandom_u32_max(CONFIG_KFENCE_STRESS_TEST_FAULTS);

        /* Try to obtain a free object. */
        // 从kfence list中获取空闲的内存块
        raw_spin_lock_irqsave(&kfence_freelist_lock, flags);
        if (!list_empty(&kfence_freelist)) {
                meta = list_entry(kfence_freelist.next, struct kfence_metadata, list);
                list_del_init(&meta->list);
        }
        
        ......

        meta->addr = metadata_to_pageaddr(meta);
        /* Unprotect if we're reusing this page. */
        // 如果该data page被标记为已释放状态,则取消该标记
        if (meta->state == KFENCE_OBJECT_FREED)
                kfence_unprotect(meta->addr);

        /*
         * Note: for allocations made before RNG initialization, will always
         * return zero. We still benefit from enabling KFENCE as early as
         * possible, even when the RNG is not yet available, as this will allow
         * KFENCE to detect bugs due to earlier allocations. The only downside
         * is that the out-of-bounds accesses detected are deterministic for
         * such allocations.
         */
        if (random_right_allocate) {
                /* Allocate on the "right" side, re-calculate address. */
                meta->addr += PAGE_SIZE - size;
                meta->addr = ALIGN_DOWN(meta->addr, cache->align);
        }

        addr = (void *)meta->addr;

        /* Update remaining metadata. */
        metadata_update_state(meta, KFENCE_OBJECT_ALLOCATED, stack_entries, num_stack_entries);
        /* Pairs with READ_ONCE() in kfence_shutdown_cache(). */
        WRITE_ONCE(meta->cache, cache);
        meta->size = size;
        meta->alloc_stack_hash = alloc_stack_hash;
        raw_spin_unlock_irqrestore(&meta->lock, flags);

        alloc_covered_add(alloc_stack_hash, 1);

        /* Set required slab fields. */
        slab = virt_to_slab((void *)meta->addr);
        slab->slab_cache = cache;
#if defined(CONFIG_SLUB)
        slab->objects = 1;
#elif defined(CONFIG_SLAB)
        slab->s_mem = addr;
#endif

        /* Memory initialization. */
        // 初始化 canary区域
        for_each_canary(meta, set_canary_byte);

        /*
         * We check slab_want_init_on_alloc() ourselves, rather than letting
         * SL*B do the initialization, as otherwise we might overwrite KFENCE's
         * redzone.
         */
        if (unlikely(slab_want_init_on_alloc(gfp, cache)))
                memzero_explicit(addr, size);
        if (cache->ctor)
                cache->ctor(addr);

        if (random_fault)
                kfence_protect(meta->addr); /* Random "faults" by protecting the object. */

        atomic_long_inc(&counters[KFENCE_COUNTER_ALLOCATED]);
        atomic_long_inc(&counters[KFENCE_COUNTER_ALLOCS]);

        return addr;
}

下面看下是如何向data page的canary区域写入固定的数据:

/* Write canary byte to @addr. */
static inline bool set_canary_byte(u8 *addr)
{
        *addr = KFENCE_CANARY_PATTERN(addr);
        return true;
}

3.5 kfence free

kfence free主要做了以下事情:

1) data page释放后,将状态设置为‘不可访问状态’

2)检查data page的canary区域是否被破坏

3)将释放的内存还回到kfence pool内存池或空闲链表

void __kfence_free(void *addr)
{
        // 地址转换为 ​struct kfence_metadata​ 结构体指针 ​meta​。
        // 这里的 ​struct kfence_metadata​ 是内存分配元数据结构,用于追踪内存分配和释放的相关信息。
        struct kfence_metadata *meta = addr_to_metadata((unsigned long)addr);

#ifdef CONFIG_MEMCG
        KFENCE_WARN_ON(meta->objcg);
#endif
        /*
         * If the objects of the cache are SLAB_TYPESAFE_BY_RCU, defer freeing
         * the object, as the object page may be recycled for other-typed
         * objects once it has been freed. meta->cache may be NULL if the cache
         * was destroyed.
         */
         // 码判断了 ​meta​ 对应的缓存是否存在,并且缓存的标志为 ​SLAB_TYPESAFE_BY_RCU​,
         // 如果满足条件,则调用 ​call_rcu​ 来延迟释放对象。这是因为一些缓存类型在被释放后可能会
         // 立即被重新利用,因此需要通过 RCU 机制来确保安全释放。
        if (unlikely(meta->cache && (meta->cache->flags & SLAB_TYPESAFE_BY_RCU)))
                call_rcu(&meta->rcu_head, rcu_guarded_free);
        else
                // 否则,立即释放内存
                kfence_guarded_free(addr, meta, false);
}

static void kfence_guarded_free(void *addr, struct kfence_metadata *meta, bool zombie)
{
        struct kcsan_scoped_access assert_page_exclusive;
        unsigned long flags;
        bool init;

        raw_spin_lock_irqsave(&meta->lock, flags);

        if (meta->state != KFENCE_OBJECT_ALLOCATED || meta->addr != (unsigned long)addr) {
                /* Invalid or double-free, bail out. */
                atomic_long_inc(&counters[KFENCE_COUNTER_BUGS]);
                kfence_report_error((unsigned long)addr, false, NULL, meta,
                                    KFENCE_ERROR_INVALID_FREE);
                raw_spin_unlock_irqrestore(&meta->lock, flags);
                return;
        }

        /* Detect racy use-after-free, or incorrect reallocation of this page by KFENCE. */
        kcsan_begin_scoped_access((void *)ALIGN_DOWN((unsigned long)addr, PAGE_SIZE), PAGE_SIZE,
                                  KCSAN_ACCESS_SCOPED | KCSAN_ACCESS_WRITE | KCSAN_ACCESS_ASSERT,
                                  &assert_page_exclusive);

        if (CONFIG_KFENCE_STRESS_TEST_FAULTS)
                kfence_unprotect((unsigned long)addr); /* To check canary bytes. */

        /* Restore page protection if there was an OOB access. */
        // 
        if (meta->unprotected_page) {
                memzero_explicit((void *)ALIGN_DOWN(meta->unprotected_page, PAGE_SIZE), PAGE_SIZE);
                kfence_protect(meta->unprotected_page);
                meta->unprotected_page = 0;
        }

        /* Mark the object as freed. */
        // data page释放后,需要将状态设置为‘不可访问状态’,若被访问,立即触发use-after-free异常
        metadata_update_state(meta, KFENCE_OBJECT_FREED, NULL, 0);
        init = slab_want_init_on_free(meta->cache);
        raw_spin_unlock_irqrestore(&meta->lock, flags);

        alloc_covered_add(meta->alloc_stack_hash, -1);

        /* Check canary bytes for memory corruption. */
        // 检查data page的canary区域是否被破坏,即是否被访问过
        for_each_canary(meta, check_canary_byte);

        /*
         * Clear memory if init-on-free is set. While we protect the page, the
         * data is still there, and after a use-after-free is detected, we
         * unprotect the page, so the data is still accessible.
         */
        if (!zombie && unlikely(init))
                memzero_explicit(addr, meta->size);

        /* Protect to detect use-after-frees. */
        kfence_protect((unsigned long)addr);

        kcsan_end_scoped_access(&assert_page_exclusive);
        
        // 如果不是僵死进程,则将释放的内存还回到kfence pool内存池或空闲链表
        if (!zombie) {
                /* Add it to the tail of the freelist for reuse. */
                raw_spin_lock_irqsave(&kfence_freelist_lock, flags);
                KFENCE_WARN_ON(!list_empty(&meta->list));
                list_add_tail(&meta->list, &kfence_freelist);
                raw_spin_unlock_irqrestore(&kfence_freelist_lock, flags);

                atomic_long_dec(&counters[KFENCE_COUNTER_ALLOCATED]);
                atomic_long_inc(&counters[KFENCE_COUNTER_FREES]);
        } else {
                /* See kfence_shutdown_cache(). */
                atomic_long_inc(&counters[KFENCE_COUNTER_ZOMBIES]);
        }
}

3.6 metadata

metadata用于记录内存块的状态。

3.7 核心数据结构

/* Alloc/free tracking information. */
// 用于跟踪分配和释放的信息
struct kfence_track {
        pid_t pid;    // 进行分配/释放内存操作的进程ID
        int cpu;    // 进行操作时的CPU
        u64 ts_nsec;    // 记录内存分配或释放时间点
        int num_stack_entries;    // 函数调用栈数量
        unsigned long stack_entries[KFENCE_STACK_DEPTH];    // 函数调用栈存放数组
};

/* KFENCE error types for report generation. */
// 异常类型定义
enum kfence_error_type {
        KFENCE_ERROR_OOB,                /* Detected a out-of-bounds access. */
        KFENCE_ERROR_UAF,                /* Detected a use-after-free access. */
        KFENCE_ERROR_CORRUPTION,        /* Detected a memory corruption on free. */
        KFENCE_ERROR_INVALID,                /* Invalid access of unknown type. */
        KFENCE_ERROR_INVALID_FREE,        /* Invalid free. */
};

/* KFENCE object states. */
// 定义元数据对象的状态
enum kfence_object_state {
        KFENCE_OBJECT_UNUSED,                /* Object is unused. */
        KFENCE_OBJECT_ALLOCATED,        /* Object is currently allocated. */
        KFENCE_OBJECT_FREED,                /* Object was allocated, and then freed. */
};

/* KFENCE metadata per guarded allocation. */
// 用于记录data page的信息
struct kfence_metadata {
        struct list_head list;              /* Freelist node; access under kfence_freelist_lock. */
        struct rcu_head rcu_head;        /* For delayed freeing. */

        /*
         * Lock protecting below data; to ensure consistency of the below data,
         * since the following may execute concurrently: __kfence_alloc(),
         * __kfence_free(), kfence_handle_page_fault(). However, note that we
         * cannot grab the same metadata off the freelist twice, and multiple
         * __kfence_alloc() cannot run concurrently on the same metadata.
         */
        raw_spinlock_t lock;

        /* The current state of the object; see above. */
        enum kfence_object_state state;    // 内存块的状态

        /*
         * Allocated object address; cannot be calculated from size, because of
         * alignment requirements.
         *
         * Invariant: ALIGN_DOWN(addr, PAGE_SIZE) is constant.
         */
        unsigned long addr;    // data page内存块的地址

        /*
         * The size of the original allocation.
         */
        size_t size;    // 原始size

        /*
         * The kmem_cache cache of the last allocation; NULL if never allocated
         * or the cache has already been destroyed.
         */
        struct kmem_cache *cache;    // 用于分配小块内存的高速缓存,减少频繁地分配和释放内存的开销

        /*
         * In case of an invalid access, the page that was unprotected; we
         * optimistically only store one address.
         */
        unsigned long unprotected_page;

        /* Allocation and free stack information. */
        struct kfence_track alloc_track;    // 记录内存分配的信息
        struct kfence_track free_track;    // 记录内存释放的信息
        /* For updating alloc_covered on frees. */
        u32 alloc_stack_hash;    // 使用 ​alloc_stack_hash​ 来比较分配和释放时的栈信息哈希值,可以提高对释放操作的准确性和安全性
#ifdef CONFIG_MEMCG
        struct obj_cgroup *objcg;
#endif
};

四、如何移植kfence

kfence功能在linux kernel 5.12被引入,低内核版本要使用kfence,需做功能移植,如Alibaba Cloud Linux 3在内核版本5.10.134-16支持kfence功能。

功能移植,主要分为三个模块,如下:

1)移植框架代码

这部分是kfence功能代码,主要文件,如下:

include/linux/kfence.h

init/main.c

lib/Kconfig.debug

lib/Kconfig.kfence

mm/Makefile

mm/kfence/Makefile

mm/kfence/core.c

mm/kfence/kfence.h

mm/kfence/report.c

2)移植ARM平台代码

这部分是kfence在arm平台的hook代码,主要文件,如下:

arch/arm64/Kconfig

arch/arm64/include/asm/kfence.h

arch/arm64/mm/fault.c

arch/arm64/mm/mmu.c

3)移植slub模块中的hook代码

这部分是kfence在slub内存分配器的hook代码,主要文件,如下:

include/linux/slub_def.h

mm/kfence/core.c

mm/slub.c

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mfbz.cn/a/765011.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

【Unity 人性动画的复用性】

Unity的动画系统&#xff0c;通常称为Mecanim&#xff0c;提供了强大的动画复用功能&#xff0c;特别是针对人型动画的重定向技术。这种技术允许开发者将一组动画应用到不同的角色模型上&#xff0c;而不需要为每个模型单独制作动画。这通过在模型的骨骼结构之间建立对应关系来…

V Rising夜族崛起的管理员指令大全

使用方法&#xff1a; 如果没有启用控制台需要先启用控制台 打开游戏点击选项&#xff08;如果在游戏内点击ESC即可&#xff09;&#xff0c;在通用页面找到启用控制台&#xff0c;勾选右边的方框启用 在游戏内点击键盘ESC下方的波浪键&#xff08;~&#xff09;使用控制台 指…

JAVA妇产科专科电子病历系统源码,前端框架:Vue,ElementUI

JAVA妇产科专科电子病历系统源码&#xff0c;前端框架&#xff1a;Vue&#xff0c;ElementUI孕产妇健康管理信息管理系统是一种将孕产妇健康管理信息进行集中管理和存储的系统。通过建立该系统&#xff0c;有助于提高孕产妇健康管理的效率和质量&#xff0c;减少医疗事故发生的…

LSH算法:高效相似性搜索的原理与Python实现I

局部敏感哈希&#xff08;LSH&#xff09;技术是快速近似最近邻&#xff08;ANN&#xff09;搜索中的一个关键方法&#xff0c;广泛应用于实现高效且准确的相似性搜索。这项技术对于许多全球知名的大型科技公司来说是不可或缺的&#xff0c;包括谷歌、Netflix、亚马逊、Spotify…

【你也能从零基础学会网站开发】理解DBMS数据库管理系统架构,从用户到数据到底经历了什么

&#x1f680; 个人主页 极客小俊 ✍&#x1f3fb; 作者简介&#xff1a;程序猿、设计师、技术分享 &#x1f40b; 希望大家多多支持, 我们一起学习和进步&#xff01; &#x1f3c5; 欢迎评论 ❤️点赞&#x1f4ac;评论 &#x1f4c2;收藏 &#x1f4c2;加关注 其实前面我们也…

最新CRMEB商城多商户java版源码v1.6版本+前端uniapp

CRMEB 开源商城系统Java版&#xff0c;基于JavaVueUni-app开发&#xff0c;在微信公众号、小程序、H5移动端都能使用&#xff0c;代码全开源无加密&#xff0c;独立部署&#xff0c;二开很方便&#xff0c;还支持免费商用&#xff0c;能满足企业新零售、分销推广、拼团、砍价、…

Monkey测试

Monkey测试是一种自动化测试技术&#xff0c;它通过模拟用户在设备上的随机操作&#xff0c;来对应用程序进行压力测试。它的目的是测试软件的稳定性和健壮性。 Monkey测试有以下几个特点&#xff1a; 随机输入&#xff1a; Monkey测试不需要编写详细的测试用例&#xff0c;只…

【博主推荐】HTML5实现简洁好看的个人简历网页模板源码

文章目录 1.设计来源1.1 主界面1.2 关于我界面1.3 工作经验界面1.4 学习教育界面1.5 个人技能界面1.6 专业特长界面1.7 朋友评价界面1.8 获奖情况界面1.9 联系我界面 2.效果和源码2.1 动态效果2.2 源代码 源码下载万套模板&#xff0c;程序开发&#xff0c;在线开发&#xff0c…

怎么把录音转文字?推荐几个简单易操作的方法

在小暑这个节气里&#xff0c;炎热的天气让人分外渴望效率up&#xff01;Up&#xff01;Up&#xff01; 对于那些在会议或课堂中急需记录信息的朋友们&#xff0c;手写笔记的速度往往难以跟上讲话的节奏。此时&#xff0c;电脑录音转文字软件就像一阵及时雨&#xff0c;让记录…

中国网络安全审查认证和市场监管大数据中心数据合规官CCRC-DCO

关于CCRC-DCO证书的颁发机构&#xff0c;它是由中国网络安全审查认证与市场监管大数据中心&#xff08;简称CCRC&#xff09;负责。 该中心在2006年得到中央机构编制委员会办公室的批准成立&#xff0c;隶属于国家市场监督管理总局&#xff0c;是其直辖的事业单位。 依据《网络…

Rust学习笔记007:Trait --- Rust的“接口”

Trait 在Rust中&#xff0c;Trait&#xff08;特质&#xff09;是一种定义方法集合的机制&#xff0c;类似于其他编程语言中的接口&#xff08;java&#xff09;或抽象类(c的虚函数)。 。Trait 告诉 Rust 编译器: 某种类型具有哪些并且可以与其它类型共享的功能Trait:抽象的…

深层神经网络

深层神经网络 深层神经网络 深度神经网络&#xff08;Deep Neural Networks&#xff0c;DNN&#xff09;可以理解为有很多隐藏层的神经网络&#xff0c;又被称为深度前馈网络&#xff08;DFN&#xff09;&#xff0c;多层感知机&#xff08;Multi-Layer perceptron&#xff0c…

音视频同步的关键:深入解析PTS和DTS

&#x1f60e; 作者介绍&#xff1a;我是程序员行者孙&#xff0c;一个热爱分享技术的制能工人。计算机本硕&#xff0c;人工制能研究生。公众号&#xff1a;AI Sun&#xff0c;视频号&#xff1a;AI-行者Sun &#x1f388; 本文专栏&#xff1a;本文收录于《音视频》系列专栏&…

【ES】--Elasticsearch的Nested类型介绍

目录 一、问题现象二、普通数组类型1、为什么普通数组类型匹配不准?三、nested类型四、nested类型查询操作1、只根据nested对象内部数组条件查询2、只根据nested对象外部条件查询3、根据nested对象内部及外部条件查询4、向nested对象数组追加新数据5、删除nested对象数组某一个…

Python+Pytest+Allure+Yaml+Pymysql+Jenkins+GitLab接口自动化测试框架详解

PythonPytestAllureYaml接口自动化测试框架详解 编撰人&#xff1a;CesareCheung 更新时间&#xff1a;2024.06.20 一、技术栈 PythonPytestAllureYamlJenkinsGitLab 版本要求&#xff1a;Python3.7.0,Pytest7.4.4,Allure2.18.1,PyYaml6.0 二、环境配置 安装python3.7&…

Windows下快速安装Open3D-0.18.0(python版本)详细教程

目录 一、Open3D简介 1.1主要用途 1.2应用领域 二、安装Open3D 2.1 激活环境 2.2 安装open3d 2.3测试安装是否成功 三、测试代码 3.1 代码 3.2 显示效果 一、Open3D简介 Open3D 是一个强大的开源库&#xff0c;专门用于处理和可视化3D数据&#xff0c;如点云、网格和…

linux内核驱动第一课(基于RK3568)

学习Linux驱动需要以下基础知识&#xff1a; C语言编程&#xff1a;掌握C语言是开发Linux驱动程序的基本要求。操作系统原理&#xff1a;了解操作系统的基本概念和原理&#xff0c;如进程管理、内存管理、中断处理等。Linux内核&#xff1a;熟悉Linux内核的结构和工作机制&…

编译libvlccpp

首先下载vlc sdk https://get.videolan.org/vlc/3.0.9.2/win64/vlc-3.0.9.2-win64.7z Cmake 生成libvlccpp vs2022工程文件 编译libvlccpp 编译出错需修改代码 错误信息&#xff1a; \VLC\sdk\include\vlc/libvlc_media.h(368): error C2065: “libvlc_media_read_cb”: 未…

Python程序语法元素简析

文章目录 Python程序的语法元素是构成Python程序的基础构建块&#xff0c;它们共同决定了程序的结构、逻辑和行为。以下是一些关键的Python语法元素简析&#xff1a; 注释&#xff1a;用于解释代码功能&#xff0c;不被执行。单行注释以#开始&#xff0c;多行注释使用三个单引号…

智能写作与痕迹消除:AI在创意文案和论文去痕中的应用

作为一名AI爱好者&#xff0c;我积累了许多实用的AI生成工具。今天&#xff0c;我想分享一些我经常使用的工具&#xff0c;这些工具不仅能帮助提升工作效率&#xff0c;还能激发创意思维。 我们都知道&#xff0c;随着技术的进步&#xff0c;AI生成工具已经变得越来越智能&…
最新文章