Linux » JasonLe's TechBlog

Archive for the ‘Linux’ category

物理内存管理:伙伴系统数据结构分析

March 4th, 2015

伙伴系统（buddy system）在物理内存管理中占有重要地位。

我们知道物理内存被分为三大部分：DMA、NORMAL、HIGHMEN三个区域。每个内存域都有一个struct zone的实例。而这个struct zone 的实例又被struct pglist_data管理。具体看http://www.lizhaozhong.info/archives/1184

page frame <==> struct page

struct zone 这个结构体非常庞大，总的来说分成三部分，每一部分由 ZONE_PADDING(_pad1_)这种标示分割。我们可以通过struct zone 下的unsigned long zone_start_pfn 找到某个特定的page！但是这种page frame，我们无法知道空闲页框的布局，不利于分配page frame。

具体zone定义：http://lxr.free-electrons.com/source/include/linux/mmzone.h#L327

为了尽量分配连续的page frame，避免外部碎片的产生，伙伴系统（buddy system）满足这个需求。
每个内存域（struct zone）的实例都有free_area数组，这个数组大小是11，也就是说空闲区域有0-10阶的page frame 块链表。比如free_area[3]所对应的页框块链表中，每个节点对应8个连续的页框（2的3次方）。

#ifndef CONFIG_FORCE_MAX_ZONEORDER
#define MAX_ORDER 11
#else
#define MAX_ORDER CONFIG_FORCE_MAX_ZONEORDER
#endif
struct zone{
........
476         ZONE_PADDING(_pad1_)
477
478         /* Write-intensive fields used from the page allocator */
479         spinlock_t              lock;
480
481         /* free areas of different sizes */
482         struct free_area        free_area[MAX_ORDER];
483
484         /* zone flags, see below */
485         unsigned long           flags;
486
487         ZONE_PADDING(_pad2_)
......
}

而struct free_area如下http://lxr.free-electrons.com/source/include/linux/mmzone.h#L92

 92 struct free_area {
 93         struct list_head        free_list[MIGRATE_TYPES];
 94         unsigned long           nr_free;
 95 };

通过声明看到free_list是一个链表数组。nr_free表示当前链表中空闲页框块的数目，比如free_area[3]中nr_free的值为5，表示有5个大小为8的页框块，那么总的页框数目为40。

具体概述上面buddy system结构就是：

在/proc中我们可以查看到每个阶空闲大小的PFN数量(注：本机使用的是AMD64系统，在AMD64中没有ZONE_HIGHMEN，ZONE_DMA寻值为16M，ZONE_DMA32寻值为0-4GiB，在32为机器上DMA32为0)

[root@localhost cgroup_unified]# cat /proc/buddyinfo
Node 0, zone      DMA      8      7      6      5      4      2      1      2      2      3      0
Node 0, zone    DMA32   1059   1278   9282   1868   1611    260     52     20      7      2      1

在struct list_head free_list[MIGRATE_TYPES]中，我们发现每个阶都带有一个MIGRATE_TYPES标志，通过这种方式，系统又把每个阶的空闲page更加详细的分割，具体类型有不可移动，移动，保留等。

 38 #define MIGRATE_UNMOVABLE     0
 39 #define MIGRATE_RECLAIMABLE   1
 40 #define MIGRATE_MOVABLE       2
 41 #define MIGRATE_PCPTYPES      3 /* the number of types on the pcp lists */
 42 #define MIGRATE_RESERVE       3
 43 #define MIGRATE_ISOLATE       4 /* can't allocate from here */
 44 #define MIGRATE_TYPES         5

这是为了更大限度的满足连续物理页框的需要，如果要分配一种MIGRATE_UNMOVABLE类型的页框，而两边的页框是可以移动的，这样就限制了连续大页框的分配，产生了外部碎片。
使用MIGRATE_TYPES策略后，不可移动的页面的不可移动性仅仅影响它自身的类别而不会导致一个不可移动的页面两边都是可移动的页面。这就是MIGRATE_TYPE被引入的目的。

MIGRATE_TYPE限制了内存页面的分配地点从而避免碎片，而不再仅仅寄希望于它们被释放时通过合并避免碎片[1]。

这种策略在proc中也可以查看:

[root@localhost cgroup_unified]# cat /proc/pagetypeinfo
Page block order: 9
Pages per block:  512

Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
Node    0, zone      DMA, type    Unmovable      1      4      4      1      1      1      1      1      0      0      0
Node    0, zone      DMA, type  Reclaimable      5      1      1      0      0      0      0      1      1      1      0
Node    0, zone      DMA, type      Movable      2      0      2      4      3      1      0      0      1      1      0
Node    0, zone      DMA, type      Reserve      0      0      0      0      0      0      0      0      0      1      0
Node    0, zone      DMA, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone    DMA32, type    Unmovable      1      0      8      1     17     15      5      0      0      0      0
Node    0, zone    DMA32, type  Reclaimable   1041    162      1      0      1      1      1      1      1      0      0
Node    0, zone    DMA32, type      Movable    698     78      7   1221   1194    232     45     20      6      0      0
Node    0, zone    DMA32, type      Reserve      0      0      0      0      0      0      0      0      0      1      1
Node    0, zone    DMA32, type      Isolate      0      0      0      0      0      0      0      0      0      0      0 

Number of blocks type     Unmovable  Reclaimable      Movable      Reserve      Isolate
Node 0, zone      DMA            1            2            4            1            0
Node 0, zone    DMA32           89           93         1216            2            0

参考：

[1] http://blog.csdn.net/dog250/article/details/6108028

2 comments »

Posted in Kernel内核分析, Linux, 内存管理

Tags: Management Memory

cgroup 介绍（2）

March 2nd, 2015

之前在第一篇介绍cgroup的文章中，我初步使用cgroup对资源进行限制隔离http://www.lizhaozhong.info/archives/1211

但是基于层级的cgroup存在一个弊端：就是不灵活，树的深度可能是无限的，这就导致实际操作中管理非常繁琐。

基于这个原因，在kernel 3.16中正式加入了unified hierarchy特性，这个特性目前仍然在开发，所以如果想显式开启该特性需要
mount -t cgroup -o __DEVEL__sane_behavior cgroup $MOUNT_POINT

__DEVEL__sane_behavior通过看名字，我们也能发现这个特性仍然在开发。

在之前的cgroup hierarchy中，我们知道一个hierarchy可以绑定一个子系统，也可以同时绑定12个子系统。

举例层级A绑定cpuset，层级B绑定memory，如果有一个task同时需要这两个子系统，则很多时候task在这两个层级中存在正交，非常不便。

hierarchy may be collapsed from leaf towards root when viewed from specific
controllers.  For example, a given configuration might not care about
how memory is distributed beyond a certain level while still wanting
to control how CPU cycles are distributed.

如果我们开启__DEVEL__sane_behavior特性，我们看到cgroup.controllers 存在的子系统，在unified hierarchy中，系统会把所有子系统都挂载到根层级下，只有leaf节点可以存在tasks，非叶子节点只进行资源控制。

# mount -t cgroup -o __DEVEL__sane_behavior cgroup /sys/fs/cgroup
# cat /sys/fs/cgroup/cgroup.controllers
cpuset cpu cpuacct memory devices freezer net_cls blkio perf_event net_prio hugetlb

现在我们在root cgroup下面创建parent与child，根层级的cgroup.subtree_control 控制parents的cgroup.controllers

如此往复，上级的cgroup.subtree_control控制下级的cgroup.controllers，也就是说subsystem不会有传递性！

如下面的例子，如果我指定根层级的cgroup.subtree_control 可以使能memory与cpu两个子系统，也就是说parents中可以控制memory、cpu两个子系统。而child如果没有指定子系统，是不会控制memory与cpu的。

# mkdir /sys/fs/cgroup/parent
# mkdir /sys/fs/cgroup/parent/child</pre>
# echo "+memory +cpu" > /sys/fs/cgroup/cgroup.subtree_control
# cat /sys/fs/cgroup/parent/cgroup.controllers
cpu memory

举个例子：

A(b,m) - B(b,m) - C (b)
              \ - D (b) - E

其中b代表blkio，m代表memory，A是根，在这个结构中ACD都拥有进程，比如C对blkio受限，那么memory则不受限，共享B，E比较特殊，如果没有指定子系统，那么blkio受D控制，memory受B控制。具体操作方式在上面parents、child已声明。

如果该cgroup中已有进程，那么只有在关联的组没有包含进程的时候，cgroup.subtree_control文件能被用来改变控制器的设置。

中间层级必须拥有子系统，如果指定E受限于blkio，那么系统不承认该操作！

Unified hierarchy implements an interface file “cgroup.populated”which can be used to monitor whether the cgroup’s subhierarchy has tasks in it or not. Its value is 0 if there is no task in the cgroup and its descendants; otherwise, 1. poll and [id]notify events are triggered when the value changes.

其他unified hierarchy 改动在document中说的很清楚，这里不再赘述。

包括tasks，cgroup.procs，cgroup.clone_children会被移除等。一旦这种层级开发明确，旧有的cgroup机制会被这种unified hierarchy代替。

参考：

http://lwn.net/Articles/601840/

http://events.linuxfoundation.org/sites/events/files/slides/2014-KLF.pdf

https://www.kernel.org/doc/Documentation/cgroups/unified-hierarchy.txt

http://d.hatena.ne.jp/defiant/mobile?date=20140826

No comments »

Posted in Linux, Linux容器

Tags: Linux容器

CFS 调度算法

February 2nd, 2015

之前说过CFS是Kernel中的一种调度policy，这个调度算法的核心，所有task都应该公平分配处理器，为了达成这个目标，CFS调度使用vruntime来衡量某一个进程是否值得调度。

上篇博文初步对CFS的实现有了一个说明，但是没有阐述vruntime的计算。

上篇 http://www.lizhaozhong.info/archives/1206

vruntime 是CFS算法模拟出来的一个变量，他淡化了优先级在调度中的作用，而是以vruntime的值使用struct sched_entity组织成为一棵red-black tree。

根据red-black tree的特点，值小的在tree的左边，值大的在右边，随着进程的运行，系统在timer 中断发生时会调用policy中的task_tick（）方法，这个函数可以更新vruntime的值。以供CFS调度时使用。

为了维护这个red-black tree最左边的节点vruntime值最小，我们必须使得这个值单调递增，所以要比较delta_exec 与 curr->statistics.exec_max值的，并取最大值。schedstat_set(curr->statistics.exec_max,max(delta_exec, curr->statistics.exec_max));update_min_vruntime(cfs_rq);

通过这两个函数，只有最靠左的节点超过min_vruntime才会更新。

有一种情况，如果进程睡眠，则他的vruntime不变，而min_vruntime变大，则，这个进程会更加靠左！

调用路径是：

void scheduler_tick(void)
{... curr->sched_class->task_tick(rq, curr, 0); ...}
->通过函数指针，调用具体policy的函数，在CFS中是task_tick_fair,这个函数可以调用entity_tick（）更新
当前调度实体sched_entity所在的cfs_rq中当前运行task的sche_entity中vruntime的值
->
3097 static void
3098 entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
3099 {
3100         /*
3101          * Update run-time statistics of the 'current'.
3102          */
3103         update_curr(cfs_rq);
3104
......
3131 }
->static void update_curr(struct cfs_rq *cfs_rq)

从entity_tick()中的update_curr()调cfs中真正更新vruntime值的函数：

694 static void update_curr(struct cfs_rq *cfs_rq)
695 {
....
697         u64 now = rq_clock_task(rq_of(cfs_rq));
....
703         delta_exec = now - curr->exec_start;
704         if (unlikely((s64)delta_exec <= 0))
705                 return;
706
707         curr->exec_start = now;
708
709         schedstat_set(curr->statistics.exec_max,
710                       max(delta_exec, curr->statistics.exec_max));
711
712         curr->sum_exec_runtime += delta_exec;
713         schedstat_add(cfs_rq, exec_clock, delta_exec);
714
715         curr->vruntime += calc_delta_fair(delta_exec, curr);
716         update_min_vruntime(cfs_rq);
717
718         if (entity_is_task(curr)) {
719                 struct task_struct *curtask = task_of(curr);
720
721                 trace_sched_stat_runtime(curtask, delta_exec, curr->vruntime);
722                 cpuacct_charge(curtask, delta_exec);
723                 account_group_exec_runtime(curtask, delta_exec);
724         }
725
726         account_cfs_rq_runtime(cfs_rq, delta_exec);
727 }

首先获取当前rq的时间，使用delta_exec获取当前进程运行的实际时间，然后将exec_start再次更新为now以便下一次使用。

并将该值加到sum_exec_runtime中时间中，对于vruntime 时间则需要calc_delta_fair(delta_exec, curr);进行处理。

通过下表我们可以看出当nice值为0，weight值为1024。另外我们需要明确nice值【-20，+19】映射到整个系统中是100~139，也就是说nice值每增加一个nice值，获得cpu时间减少10%，反之增加10%！而0~99则属于实时进程专用！nice值越高权值越小！

1046 static const int prio_to_weight[40] = {
1047  /* -20 */     88761,     71755,     56483,     46273,     36291,
1048  /* -15 */     29154,     23254,     18705,     14949,     11916,
1049  /* -10 */      9548,      7620,      6100,      4904,      3906,
1050  /*  -5 */      3121,      2501,      1991,      1586,      1277,
1051  /*   0 */      1024,       820,       655,       526,       423,
1052  /*   5 */       335,       272,       215,       172,       137,
1053  /*  10 */       110,        87,        70,        56,        45,
1054  /*  15 */        36,        29,        23,        18,        15,
1055 };

在calc_delta_fair()函数中会比较当前权重与nice值为0的权重（NICE_0_LOAD），如果等于则直接返回加权后的vruntime，如果不同则需要对该权值加权。
struct sched_entity *se 存在着当前进程的权重，就是上面那个array里面数字！

601 static inline u64 calc_delta_fair(u64 delta, struct sched_entity *se)
602 {
603         if (unlikely(se->load.weight != NICE_0_LOAD))
604                 delta = __calc_delta(delta, NICE_0_LOAD, &se->load);
605
606         return delta;
607 }

如果当前进程的nice值不等于nice 0 ，进入下面的函数：

214 static u64 __calc_delta(u64 delta_exec, unsigned long weight, struct load_weight *lw)
215 {
216         u64 fact = scale_load_down(weight);
217         int shift = WMULT_SHIFT;
218
219         __update_inv_weight(lw);
220
221         if (unlikely(fact >> 32)) {
222                 while (fact >> 32) {
223                         fact >>= 1;
224                         shift--;
225                 }
226         }
227
228         /* hint to use a 32x32->64 mul */
229         fact = (u64)(u32)fact * lw->inv_weight;
230
231         while (fact >> 32) {
232                 fact >>= 1;
233                 shift--;
234         }
235
236         return mul_u64_u32_shr(delta_exec, fact, shift);
237 }

这个函数有些复杂，我现在理解这个加权公式就是**delta_exec = delta_exec * （weight / lw.weight）**

我们可以绘制出不同nice下，加权后vruntime与真实的delta_exec值的关系。我们可以对照上面那个数组发现nice值越高，权值越小，在这里我们比较的是1024/lw.weight的值，权值越小的，商越大，vruntime越大！在CFS中，vruntime值越小，越容易调度！

mul_u64_u32_shr（）函数应该是32位与64位转换的，具体没研究清楚，改天再来。
具体这个的解释：

/*
203  * delta_exec * weight / lw.weight
204  *   OR
205  * (delta_exec * (weight * lw->inv_weight)) >> WMULT_SHIFT
206  *
207  * Either weight := NICE_0_LOAD and lw \e prio_to_wmult[], in which case
208  * we're guaranteed shift stays positive because inv_weight is guaranteed to
209  * fit 32 bits, and NICE_0_LOAD gives another 10 bits; therefore shift >= 22.
210  *
211  * Or, weight =< lw.weight (because lw.weight is the runqueue weight), thus
212  * weight/lw.weight <= 1, and therefore our shift will also be positive.
213  */

CFS总结：

1）不再区分进程类型，不使用nice值判断优先级，而是使用vruntime衡量一个进程的重要性。

2）对于IO类型的进程，随着睡眠时间正常，仍然可以得到公平的时间片

3）对于优先级高的进程，可以获得更多的CPU时间。

参考：
http://lxr.free-electrons.com/source/kernel/sched/fair.c#L214
http://lxr.free-electrons.com/source/kernel/sched/sched.h#L1046

No comments »

Posted in Kernel内核分析, Linux

Tags: Algorithm CFS schedual

__schedule()调度分析

January 22nd, 2015

主实现代码：http://lxr.free-electrons.com/source/kernel/sched/core.c#L2765

调度这一块，因为存在很多的调度policy，kernel为了分离mechanism与具体policy，在__schedule()中实现task的切换，具体policy在pick_next_task() 中实现。

内核中对进程调度的方法有两种，其一为周期性调度器（generic scheduler），它对进行进行周期性的调度，以固定的频率运行；其二为主调度器（main scheduler），如果进程要进行睡眠或因为其他原因主动放弃CPU，那么就直接调用主调度器。

其中，主调度器是__schedule() ,而周期性调度器是void scheduler_tick(void)。这个函数负责每个rq的平衡，保持每个cpu都有task可以运行，这个程序由timer调度。http://lxr.free-electrons.com/source/kernel/sched/core.c#L2524

__schedule（）是调度的核心函数，在这个函数里面是主要是从rq队列中，选择进程。除了切换上下文状态，还要使用 pick_next_task() 使用这个选择下一个进程,具体到使用哪种调度policy都在这个struct sched_class结构体里保存着。

目前kernel在SMP环境下使用的调度算法是CFS算法。具体我们先来看pick_next_task()函数。
我们发现具体的policy在fair_sched_class 定义，GNU C的语法就是用C 的strut来模拟C++的class方式，然后在fair.c中定义了众多的函数，这种方式就是一种钩子函数。具体CFS策略这里不再细讲，之后我会专门来分析CFS调度算法。

2692 static inline struct task_struct *
2693 pick_next_task(struct rq *rq, struct task_struct *prev)
2694 {
2695         const struct sched_class *class = &fair_sched_class;
2696         struct task_struct *p;
2697 
2698         /*
2699          * Optimization: we know that if all tasks are in
2700          * the fair class we can call that function directly:
2701          */
2702         if (likely(prev->sched_class == class &&
2703                    rq->nr_running == rq->cfs.h_nr_running)) {
2704                 p = fair_sched_class.pick_next_task(rq, prev);
2705                 if (unlikely(p == RETRY_TASK))
2706                         goto again;
2707 
2708                 /* assumes fair_sched_class->next == idle_sched_class */
2709                 if (unlikely(!p))
2710                         p = idle_sched_class.pick_next_task(rq, prev);
2711 
2712                 return p;
2713         }
2714 
2715 again:
2716         for_each_class(class) {
2717                 p = class->pick_next_task(rq, prev);
2718                 if (p) {
2719                         if (unlikely(p == RETRY_TASK))
2720                                 goto again;
2721                         return p;
2722                 }
2723         }
2724 
2725         BUG(); /* the idle class will always have a runnable task */
2726 }

const struct sched_class fair_sched_class（kernel/sched/fair.c）

在CFS算法中，我们看下面有两个比较特殊：

7944 #ifdef CONFIG_SMP
7945 .select_task_rq = select_task_rq_fair,
7946 .migrate_task_rq = migrate_task_rq_fair,

多CPU必然存在进程并行运行的情况，7945行是公平的选择特定的task，7956行是进行rq中task的迁移，我们知道每个cpu都对应着一个rq队列，这个不一定是quenu，而是red-black tree。对于rq中task的迁移，在

select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_flags)

这个函数正是真正的完全公平调度算法!

__schedule()函数是进程的主调度器，下面我们来分析这个的实现

2765 static void __sched __schedule(void)
2766 {
2767         struct task_struct *prev, *next;
2768         unsigned long *switch_count;
2769         struct rq *rq;
2770         int cpu;
2771 
2772 need_resched:
2773         preempt_disable();
2774         cpu = smp_processor_id();
2775         rq = cpu_rq(cpu);
2776         rcu_note_context_switch(cpu);
2777         prev = rq->curr;
2778 
2779         schedule_debug(prev);
2780 
2781         if (sched_feat(HRTICK))
2782                 hrtick_clear(rq);
2783 
2784         /*
2785          * Make sure that signal_pending_state()->signal_pending() below
2786          * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE)
2787          * done by the caller to avoid the race with signal_wake_up().
2788          */
2789         smp_mb__before_spinlock();
2790         raw_spin_lock_irq(&rq->lock);
2791 
2792         switch_count = &prev->nivcsw;
2793         if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {
2794                 if (unlikely(signal_pending_state(prev->state, prev))) {
2795                         prev->state = TASK_RUNNING;
2796                 } else {
2797                         deactivate_task(rq, prev, DEQUEUE_SLEEP);
2798                         prev->on_rq = 0;
2799 
2800                         /*
2801                          * If a worker went to sleep, notify and ask workqueue
2802                          * whether it wants to wake up a task to maintain
2803                          * concurrency.
2804                          */
2805                         if (prev->flags & PF_WQ_WORKER) {
2806                                 struct task_struct *to_wakeup;
2807 
2808                                 to_wakeup = wq_worker_sleeping(prev, cpu);
2809                                 if (to_wakeup)
2810                                         try_to_wake_up_local(to_wakeup);
2811                         }
2812                 }
2813                 switch_count = &prev->nvcsw;
2814         }
2815 
2816         if (task_on_rq_queued(prev) || rq->skip_clock_update < 0)
2817                 update_rq_clock(rq);
2818 
2819         next = pick_next_task(rq, prev);
2820         clear_tsk_need_resched(prev);
2821         clear_preempt_need_resched();
2822         rq->skip_clock_update = 0;
2823 
2824         if (likely(prev != next)) {
2825                 rq->nr_switches++;
2826                 rq->curr = next;
2827                 ++*switch_count;
2828 
2829                 context_switch(rq, prev, next); /* unlocks the rq */
2830                 /*
2831                  * The context switch have flipped the stack from under us
2832                  * and restored the local variables which were saved when
2833                  * this task called schedule() in the past. prev == current
2834                  * is still correct, but it can be moved to another cpu/rq.
2835                  */
2836                 cpu = smp_processor_id();
2837                 rq = cpu_rq(cpu);
2838         } else
2839                 raw_spin_unlock_irq(&rq->lock);
2840 
2841         post_schedule(rq);
2842 
2843         sched_preempt_enable_no_resched();
2844         if (need_resched())
2845                 goto need_resched;
2846 }

在2773 禁止进程抢占调度器，在2774 ~ 2777 获取当前cpu的id，并获取当前cpu的rq，切换RCU，获取当前rq运行的task，并赋值为prev。

203 #define TASK_RUNNING            0
204 #define TASK_INTERRUPTIBLE      1
205 #define TASK_UNINTERRUPTIBLE    2

我们发现TASK_RUNNING 值为0，这就使得2793行，如果判断当前的进程在运行，就不会进行调度，只会更新rq的clock。
反之如果当前占用cpu的task处于TASK_INTERRUPTIBLE态，却收到了某个唤醒它的信号，那么当前进程的标志被更新为TASK_RUNNING,等待再次被调度。否则，通过deactivate_task()将当前进程prev从就绪队列中删除。

之后在2819行使用pick_next_task()函数，去的当前rq的新的进程，然后清除之前prev进程的标志位。
获取要调度的新的进程，之后就是各种调度了。从2824~2839 这段代码会判断当前的选择的进程与之前的进程是否相同，相同就不用再切换上下文了。

一切调度完成，放开preempt_enable ，系统可以开始抢占。
参考：
http://www.makelinux.net/books/lkd2/ch09lev1sec9

1 comment »

Posted in Kernel内核分析, Linux, 进程管理

Tags: Process schedual

Coccinelle 使用

January 20th, 2015

Coccinelle是一个程序的匹配和转换引擎，它提供了语言SMPL（语义补丁语言）用于指定C代码所需的匹配和转换。Coccinelle 最初是用来帮助Linux的演变，支持更改库应用程序编程接口，比如重命名一个函数，增加一个依赖于上下文的函数参数或者重新组织一个数据结构。除此之外，Coccinelle页被人用来查找或者修复系统代码的bug。

项目地址：https://github.com/coccinelle/coccinelle

安装在这里不再赘述,这里要注意的是需要安装python的devel包，否则这个程序无法运行！

$git clone https://github.com/coccinelle/coccinelle
$git tag > git checkout -b build coccinelle-1.0.0-rc21
$apt-get install python2.6-dev libpycaml-ocaml-dev libmenhir-ocaml-dev menhir ocaml-native-compilers \
ocamlduce camlp4-extra ocaml-findlib pkg-config texlive-fonts-extra
$./configure --with-python --with-menhir
$make all
$apt-get remove coccinelle (prevent conflict)
$make install

安装完毕之后，我们可以定义脚本

@search@
identifier fn,call;
statement s1,s2;
expression E1,E2;
int fd;
position p;
constant C;
@@

<+...
* fd=open@p(...);
//  ...when != fn(<+...fd...+>);
  ...when !=fd=C
* if (fd<0||...){...}
...+>   

@script:python@
p << search.p;
@@

print "%s equal expression" % (p[0].line)

之后我们可以运行这个脚本，可以快速从代码中匹配。

$spatch -sp_file demos/simple.cocci demos/simple.c -o /tmp/new_simple.c

目前这个项目的问题是文档不是很完善，期待之后这个项目的发展。这个工具吸引人的地方在于可以智能的匹配譬如i++ <=> i=i+1这种形式。

目前我们可以更多的参考/usr/local/share/coccinelle/standard.iso

No comments »

Posted in C/C++, Linux, Linux下C编程

Tags: Code杂谈

Archive for the ‘Linux’ category

物理内存管理:伙伴系统数据结构分析

在/proc中我们可以查看到每个阶空闲大小的PFN数量(注：本机使用的是AMD64系统，在AMD64中没有ZONE_HIGHMEN，ZONE_DMA寻值为16M，ZONE_DMA32寻值为0-4GiB，在32为机器上DMA32为0)

cgroup 介绍（2）

CFS 调度算法

有一种情况，如果进程睡眠，则他的vruntime不变，而min_vruntime变大，则，这个进程会更加靠左！

这个函数有些复杂，我现在理解这个加权公式就是**delta_exec = delta_exec * （weight / lw.weight）**

CFS总结：

1）不再区分进程类型，不使用nice值判断优先级，而是使用vruntime衡量一个进程的重要性。

2）对于IO类型的进程，随着睡眠时间正常，仍然可以得到公平的时间片

3）对于优先级高的进程，可以获得更多的CPU时间。

__schedule()调度分析

Coccinelle 使用

目前我们可以更多的参考/usr/local/share/coccinelle/standard.iso

Recent Posts

热门文章

Archive for the ‘Linux’ category

物理内存管理:伙伴系统数据结构分析

在/proc中我们可以查看到每个阶空闲大小的PFN数量(注：本机使用的是AMD64系统，在AMD64中没有ZONE_HIGHMEN，ZONE_DMA寻值为16M，ZONE_DMA32寻值为0-4GiB，在32为机器上DMA32为0)

cgroup 介绍（2）

CFS 调度算法

有一种情况，如果进程睡眠，则他的vruntime不变，而min_vruntime变大，则，这个进程会更加靠左！

这个函数有些复杂，我现在理解这个加权公式就是delta_exec = delta_exec * （weight / lw.weight）

CFS总结：

1）不再区分进程类型，不使用nice值判断优先级，而是使用vruntime衡量一个进程的重要性。

2）对于IO类型的进程，随着睡眠时间正常，仍然可以得到公平的时间片

3）对于优先级高的进程，可以获得更多的CPU时间。

__schedule()调度分析

Coccinelle 使用

目前我们可以更多的参考/usr/local/share/coccinelle/standard.iso

Tags

Recent Posts

热门文章

这个函数有些复杂，我现在理解这个加权公式就是**delta_exec = delta_exec * （weight / lw.weight）**