Skip to content

Lego Profile Points

Lego profile points facility is added to trace specific functions, or even a small piece of code. It is added in the hope that it can help to find performance bottleneck. It is added in the hope that it can reduce the redundant coding chore.

Example

To trace TLB shootdown cost.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
DEFINE_PROFILE_POINT(flush_tlb_others)

void flush_tlb_others(const struct cpumask *cpumask, struct mm_struct *mm,
                      unsigned long start, unsigned long end)
{       
        struct flush_tlb_info info;
        PROFILE_POINT_TIME(flush_tlb_others)

        if (end == 0)
                end = start + PAGE_SIZE;
        info.flush_mm = mm;
        info.flush_start = start;
        info.flush_end = end;

        profile_point_start(flush_tlb_others);
        smp_call_function_many(cpumask, flush_tlb_func, &info, 1);
        profile_point_leave(flush_tlb_others);
}

Explanation: DEFINE_PROFILE_POINT() will define a local structure, that contains the profile point name, number of invoked times, and total execution time. PROFILE_POINT_TIME() will define a stack local variable, to save the starting time. profile_point_start() will save the current time in nanosecond, while profile_point_leave() will calculate the execution of this run, and update the global counters defined by DEFINE_PROFILE_POINT().

System-wide profile points will be printed together if you invoke print_profile_points():

1
2
3
4
5
6
[ 1017.422911] Kernel Profile Points
[ 1017.426594]  status                  name             total                nr            avg.ns
[ 1017.436292] -------  --------------------  ----------------  ----------------  ----------------
[ 1017.445988]     off      flush_tlb_others       0.000153470                55              2791
[ 1017.455685]     off     pcache_cache_miss      16.147020152            274698             58781
[ 1017.465381] -------  --------------------  ----------------  ----------------  ----------------

Mechanism

Once again, the profile points are aggregated by linker script. Each profile point will be in a special section .profile.point. The linker will merge them into one section, and export the starting and ending address of this section.

Part I. Annotate.

1
2
3
4
5
6
7
#define __profile_point         __section(.profile.point)

#define DEFINE_PROFILE_POINT(name)                                                      \
        struct profile_point _PP_NAME(name) __profile_point = {
        ...
        ...
        };

Part II. Link script merge.

1
2
3
4
5
6
. = ALIGN(L1_CACHE_BYTES);
.profile.point : AT(ADDR(.profile.point) - LOAD_OFFSET) {
    __sprofilepoint = .;
    *(.profile.point)
    __eprofilepoint = .;
}

Part III. Walk through.

1
2
3
4
5
6
7
8
void print_profile_points(void)
{
        struct profile_point *pp;

        for (pp = __sprofilepoint; pp < __eprofilepoint; pp++) {
                print_profile_point(pp);
        ...
    }  

I really love the linker script. ;-)


Yizhou Shan
Created: April 06, 2018
Last Updated: April 06, 2018