Red Hat 内核测试招聘(第二季)

Red Hat 北京的内核测试组正在招人,现有3个实习的职位和5个全职的职位。欢迎来应聘,也欢迎来明年毕业的同学来实习。

工作地点都是北京中关村。感兴趣的同学可以把你的英文简历直接发给我:xiyou 点 wangcong 在 gmail 点 com,邮件中请注明应聘的是哪个职位。虽然我不在测试组,但我会帮你转发简历。:-)

详情见下:

Intern 1:

Job Description:

The Quality Engineering team at Red Hat is looking for intern to test
Linux kernel. Responsibilities include:

  • Testing kernel bugs, areas can be network(protocol, NIC driver,
    bonding, vlan, bridge, tunnel), file systems(ext4, xfs, btrfs, nfs,
    autofs), time/clock related, scheduler, infiniband, linux container,
    etc.
  • Writing and executing test cases and analyzing results.
  • Debugging software problems.
  • Investigating kernel features.

Requirements:

  • Knowledge in general Linux usage
  • Decent debugging, troubleshooting, analytical skills.
  • Intermediate to advanced scripting skills(Bash, python or equivalent languages).
  • Have passion and desire for testing and examining how things work internally.
  • Knowledge in network or file systems or time mechanism in kernel are strong plus.
  • At least 5 months, at least 3 days per week.

Keywords: kernel; testing

Intern 2:

Job Description:

The Quality Engineering team at Red Hat is looking for intern to test
Linux file system. Responsibilities include:

  • Testing file system bugs, including both local file system and network file system.
  • Writing and executing test cases and check results.
  • Debugging software problems.
  • Investigating file system features.

Requirements:

  • Knowledge in general Linux usage
  • Decent debugging, troubleshooting, analytical skills.
  • Some bash scripting skill.
  • Decent knowledge in one or more file systems(ext2/3/4, xfs, btrfs,nfs, cifs, autofs).
  • Have passion and desire for testing and examining how things work internally.
  • At least 5 months, at least 3 days per week.

Keywords: kernel; file system; testing

Intern 3:

Job Description:

The Quality Engineering team at Red Hat is looking for intern to test
Linux network. Responsibilities include:

  • Testing network bugs, including NIC drivers and protocols.
  • Writing and executing test cases and check results.
  • Debugging software problems.

Job Requirements:

  • Knowledge in general Linux usage
  • Decent debugging, troubleshooting, analytical skills.
  • Some bash scripting skill.
  • Familiar with linux network concept and configurations.
  • Familiar with NIC drivers is a strong plus.
  • Familiar with network protocols(TCP/UDP/IGMP,etc)
  • Have passion and desire for testing and examining how things work internally.
  • At least 5 months, at least 3 days per week.

Keywords: kernel; network; testing

Regular Job 1:
Job Description:
The Quality Engineering team at Red Hat is looking for engineer to test
linux kernel network, including network protocols, NIC drivers, bonding,
vlan, bridge, etc. You need to search for, analyze, report, track kernel
network defects and verify bug fixes. You should be a self motivated
person and have passion in finding bugs/defects in linux network.

Responsibilities include:

  • Review and test bugs
  • Investigate network implementation and new features, write or update test plans
  • Write test cases according to test plans or automate bug reproducer
  • Execute test cases and analyze result
  • Communicate with developer and other stake holder about testing gaps and cover them

Required Skills:

  • Middle or above level of skills and background in Linux.
  • Knowledge in network(protocols, NIC driver, bonding, vlan, bridge)
    implementation, and familiar with network related concepts and
    operations.
  • Must be a flexible self-motivated person who would like to take responsibilities.
  • Have passion and desire for testing and examining how things work internally.
  • Be willing to coordinate with others.

Regular Job 2:
Job Description:
The Quality Engineering team at Red Hat is looking for QE lead for Linux
kernel testing. You will test the kernel and communicate/coordinate with
developers and other QEs including assigning tasks. We are looking for
an experienced QA Engineer with strong technical and coordinating
skills . You must be a flexible and self-motivated person who can work
under pressure and implement jobs in tight schedule.

Responsibilities include:

  • Review kernel bugs and do initial analysis and assign them to proper QE owners
  • Be responsible for certain kernel areas: create/maintain test plans, write test cases, test bugs and automate/write bug reproducers
  • Communicate with various teams/stakeholders for technique and coordinating problems
  • Communicate with developer and other stakeholders about testing gaps
    and cover them

Required Skills:

  • A minimum of 2 years of professional experience is required
  • Strong skills and background in Linux
  • Strong debugging, troubleshooting, analytical skills
  • Wide-ranging of linux kernel knowledge
  • Must be a flexible self-motivated person who would like to take
    responsibilities.
  • Be willing to communicate and coordinate with others, ability to work
    collaboratively with multiple teams
  • Adapted to flexible working hours

Regular Job 3:
Job Description:
The Quality Engineering team at Red Hat is looking for engineer to test
linux file systems, including btrfs, xfs, ext4, etc. You need to search
for, analyze, report, track kernel file system defects and verify bug
fixes. You should be a self motivated person and have passion in finding
bugs/defects in linux file systems.

Responsibilities include:

  • Investigate file system implementation and new features, write or update test plans
  • Write test cases according to test plans
  • Execute test cases and analyze result
  • Review and test bugs
  • Communicate with developer and other stake holder about testing gaps
    and cover them

Required Skills:

  • Middle or above level of skills and background in Linux.
  • Knowledge in file system(ext2, ext3, ext4, xfs, btrfs) implementation, and familiar with file system related concepts and operations.
  • Must be a flexible self-motivated person who would like to take responsibilities.
  • Have passion and desire for testing and examining how things work internally.
  • Be willing to coordinate with others.

Regular Job 4:
Job Description:
The Quality Engineering team at Red Hat is looking for engineers to
search for, analyze, report, track defects and verify bug fixes in the
Linux kernel. We are looking for an experienced QA Engineer with strong
technical skills. You must be a flexible self-starter who can come up to
speed quickly with new technologies and can adapt to a growing and
evolving team.

Responsibilities include:

  • Review bugs and develop/automate bug reproducers and regression test cases according to the patch(es)
  • Run existing test cases and analyze results
  • Finding kernel testing gaps and investigate/create test plans for kernel functions
  • Investigating new features
  • Debugging software problems

Required Skills:

  • Strong skills and background in Linux
  • Strong debugging, troubleshooting, analytical skills
  • Adequate knowledge in linux kernel
  • Familiar with C/shell programming
  • Strong passion and desire for testing and examining how things work
    internally
  • Be willing to coordinate with others

Regular Job 5:
Job Description:
The Quality Engineering team at Red Hat is looking for engineer to test
linux kernel network, including network protocols, NIC drivers, bonding,
vlan, bridge, etc. You need to search for, analyze, report, track kernel
network defects and verify bug fixes. You should be a self motivated
person and have passion in finding bugs/defects in linux network.

Responsibilities include:

  • Review and test network bugs
  • Write test cases according to network test plans and automate bug reproducers
  • Execute test cases and analyze result
  • Investigate network features an create test plans
  • Debug software problems and create tools if needed

Required Skills:

  • Middle or above level of skills and background in Linux.
  • Familiar with network(protocols, NIC driver, bonding, vlan, bridge) concepts and configurations, knowledge in network implementation is a plus.
  • Must be a diligent self-motivated person and be patient with trivial work.
  • Be willing to coordinate with others.

A poem about division

From: Hacker’s Delight Author: Henry S. Warren

I think that I shall never envision
An op unlovely as division.

An op whose answer must be guessed
And then, through multiply, assessed;

An op for which we dearly pay,
In cycles wasted every day.

Division code is often hairy;
Long division’s downright scary.

The proofs can overtax your brain,
The ceiling and floor may drive you insane.

Good code to divide takes a Knuthian hero,
But even God can’t divide by zero!

为什么 bool 只是一个宏

可能你也会感到吃惊,在C99/C11标准中,bool 的定义居然是个宏,而不是 typedef 类型!

C11 第 7.18 节中明确提到: The macro “bool” expands to _Bool.

而且 false 和 true 也是宏……不过,它下面接着有解释:

Notwithstanding the provisions of 7.1.3, a program may undefine and perhaps then
redefine the macros bool, true, and false.

可见之所以把它们定义成宏是为了让用户可以重定义,想必在C99 之前应该有不少C程序都自己定义了 bool,false 和 true 吧!而且第 7.31.9 节中说这个以后会去掉:

The ability to undefine and perhaps then redefine the macros bool, true, and false is
an obsolescent feature.

你要是感到不爽,你可以这么做:
[c]

undef bool

define bool bool

typedef _Bool bool;
[/c]

SEASIDE CANON

From: http://www.3quarksdaily.com/3quarksdaily/2011/06/a-crab-canon-for-douglas-hofstadter.html

读过《集异璧》的同学一定对里面的“卡农”印象深刻,这里有人又专门写了一首“卡农”的诗,窃以为更好。转过来欣赏一下。

SEASIDE CANON, for Douglas Hofstadter

by Julia Galef

~
The ocean was still.
In an empty sky, two gulls turned lazy arcs, and
their keening cries echoed
off the cliff and disappeared into the sea.
When the child, scrambling up the rocks, slipped
out of her parents’ reach,
they called to her. She was already
so high, but those distant peaks beyond —
they called to her. She was already
out of her parents’ reach
when the child, scrambling up the rocks, slipped
off the cliff and disappeared into the sea.
Their keening cries echoed
in an empty sky. Two gulls turned lazy arcs, and
the ocean was still.
~

A Programmer's Poem

From: http://edweissman.com/53640595
Author: Ed Weissman

My friends all think
that I’m a neb
Cause I spend much time
here on the web

The good things here
I do not abuse
Except lots of time
on hacker news

I don’t read reddit
I will not digg
I’m not on facebook
My work’s too big

I do not text
I do not tweet
I just work on
Things that are neat

I check email
throughout the day
But there are no games
that I will play

My phone’s on vibrate
I do not chat
My work is really
Where it’s at

Knuth and Turing
are my big heroes
I love to move
Ones and zeros

My head is down
I’m in the mode
Don’t bother me
I have to code

Those who need me
leave voicemail
I’m much too busy
trying not to fail

I learn on-line
and from my schools
But I must avoid
all sorts of trolls

I can’t believe
I wrote this ode
When I have so much
I have to code

I’m not an addict
I have no drug
I’ve got to go
To fix a bug

编了几个段子

平时上微博经常看到一些有意思的段子,偶尔也会顺手模仿几个。贴出几个来供大家一乐。

身为一个Linux程序员,我毫不犹豫地选择了这个叫ELF的Hostel!

——师爷,写程序最要紧的是什么?
——蛋定!
——师爷,调试程序最要紧的是什么?
——运气!

三年前我就开始模仿文艺青年:喝星巴克、穿格子装、抽烟喝酒、留长头发、放荡不羁,每天坚持看文艺书刊,出门旅行背一把吉他。现在,我除了唱歌跑调吉他还是只会弹“两只老虎”外,一切都很像文艺青年了。

出租车司机#他问是程序员吧,我说是;他问是写代码的不是搞测试的吧,我说是;他问是底层系统呢还是上层应用呢,我说都做过现在做底层;他问做OS汇编用的少了吧,我颤声说,嗯,现在用C,忍不住问:你研究这个吗?他面无表情说,我给各大IT公司做过OS;我良久陷入沉默;最后他问,车钱够付吗?

晓松体# 我们这个行业,卖身卖命卖青春,用加班熬夜,献项目完成。从未巧取豪夺,鱼肉乡里,干过什么贪污腐败之事。干好了,谢同事谢项目经理,干砸了,加班加点不成眠。顶三五载虚浮名,挣七八吊养老钱。终归头发掉落,脊椎疼痛。经理总会有新宠,不复念旧人。看在曾带给大家片刻欢娱,能否值回些人间温暖?

甄嬛体# “方才在内核里看到一段代码,技巧极为高超,私心想着若是代码让你来写,定可提高你对内核的理解,对你编程能力的提高必是极好的。虽劳费些许精力,倒也不负恩泽。” “说人话!” “内核里有个bug,不知道在哪……”

中华民族到了最坑爹的时候

牛奶不能喝了!

奶粉不能吃了!

馒头不能吃了!

食用油不能用了!

胶囊不能吃了!

果冻酸奶不能吃了!

蜜饯不能吃了!

沙琪玛不能吃了!

牛肉拉面不能吃了!

砂锅粥不能吃了!

可乐不能喝了!

纯净水不能喝了!

果粒橙不能喝了!

雪碧不能喝了!

绿茶不能喝了!

……

起来!不愿做奴隶的人们!把我们的食物铸成我们新的生化武器!中华民族到了,最坑爹的时候!每个人被迫着发出最后的吼声!起来!起来!起来!我们东亚病夫,吃着最毒的食物,前进!吃着最毒的食物, 前进!前进!前进!进!

perfbook 读书笔记:ACCESS_ONCE()

如果你看过 Linux 内核中的 RCU 的实现,你应该注意到了这个叫做 ACCESS_ONCE() 宏,但是并没有很多人真正理解它的含义。网上有的地方甚至对此有错误的解释,所以特写此文来澄清一下。

虽然我早在读 perfbook 之前就了解了 ACCESS_ONCE() 的含义(通过询问大牛 Paul),但这本书中正好也没有很详细地介绍这个宏,所以就当是此书的读书笔记了。

定义

它的定义很简单,在 include/linux/compiler.h 的底部:

[c]

define ACCESS_ONCE(x) ((volatile typeof(x) )&(x))

[/c]

仅从语法上讲,这似乎毫无意义,先取其地址,在通过指针取其值。而实际上不然,多了一个关键词 volatile,所以它的含义就是强制编译器每次使用 x 都从内存中获取。

原因
仅仅从定义来看基本上看不大出来为什么要引入这么一个东西。可以通过几个例子(均来自 Paul,我做了小的修改)看一下。

1. 循环中有每次都要读取的全局变量:

[c]

static int should_continue;
static void do_something(void);

while (should_continue)
do_something();
[/c]

假设 do_something() 函数中并没有对变量 should_continue 做任何修改,那么,编译器完全有可能把它优化成:

[c]

if (should_continue)
for (;;)
do_something();
[/c]

这很好理解,不是吗?对于单线程的程序,这么做完全没问题,可是对于多线程,问题就出来了:如果这个线程在执行do_something() 的期间,另外一个线程改变了 should_continue 的值,那么上面的优化就是完全错误的了!更严重的问题是,编译器根本就没有办法知道这段代码是不是并发的,也就无从决定进行的优化是不是正确的!

这里有两种解决办法:1) 给 should_continue 加锁,毕竟多个进程访问和修改全局变量需要锁是很自然的;2) 禁止编译器做此优化。加锁的方法有些过了,毕竟 should_continue 只是一个布尔,而且退一步讲,就算每次读到的值不是最新的 should_continue 的值也可能是无所谓的,大不了多循环几次,所以禁止编译器做优化是一个更简单也更容易的解决办法。我们使用 ACCESS_ONCE() 来访问 should_continue:

[c]

while (ACCESS_ONCE(should_continue))
do_something();
[/c]

2. 指针读取一次,但要dereference多次:

[c]

p = global_ptr;
if (p && p->s && p->s->func)
p->s->func();
[/c]

那么编译器也有可能把它编译成:

[c]

if (global_ptr && global_ptr->s && global_ptr->s->func)
global_ptr->s->func();
[/c]

你可以谴责编译器有些笨了,但事实上这是C标准允许的。这种情况下,另外的进程做了 global_ptr = NULL; 就会导致后一段代码 segfault,而前一段代码没问题。同上,所以这时候也要用 ACCESS_ONCE():

[c]

p = ACCESS_ONCE(global_ptr);
if (p && p->s && p->s->func)
p->s->func();
[/c]

3. watchdog 中的变量:

[c]
for (;;) {
still_working = 1;
do_something();
}
[/c]

假设 do_something() 定义是可见的,而且没有修改 still_working 的值,那么,编译器可能会把它优化成:

[c]
still_working = 1;
for (;;) {
do_something();
}
[/c]

如果其它进程同时执行了:

[c]
for (;;) {
still_working = 0;
sleep(10);
if (!still_working)
panic();
}
[/c]

通过 still_working 变量来检测 wathcdog 是否停止了,并且等待10秒后,它确实停止了,panic()!经过编译器优化后,就算它没有停止也会 panic!!所以也应该加上 ACCESS_ONCE():

[c]
for (;;) {
ACCESS_ONCE(still_working) = 1;
do_something();
}
[/c]

综上,我们不难看出,需要使用 ACCESS_ONCE() 的两个条件是:

1. 在无锁的情况下访问全局变量;
2. 对该变量的访问可能被编译器优化成合并成一次(上面第1、3个例子)或者拆分成多次(上面第2个例子)。

例子

Linus 在邮件中给出的另外一个例子是:

编译器有可能把下面的代码:

[c]
if (a > MEMORY) {
do1;
do2;
do3;
} else {
do2;
}
[/c]

优化成:

[c]
if (a > MEMORY)
do1;
do2;
if (a > MEMORY)
do3;
[/c]

这里完全符合上面我总结出来的两个条件,所以也应该使用 ACCESS_ONCE()。正如 Linus 所说,不是编译器一定会这么优化,而是你无法证明它不会做这样的优化。

So the rule is: if you access unlocked values, you use ACCESS_ONCE(). You
don’t say “but it can’t matter”. Because you simply don’t know.

再看实际中的例子:

commit 0ad92ad03aa444b312bd318b0341011a8be09d13
Author: Eric Dumazet
Date:   Tue Nov 1 12:56:59 2011 +0000

    udp: fix a race in encap_rcv handling

    udp_queue_rcv_skb() has a possible race in encap_rcv handling, since
    this pointer can be changed anytime.

    We should use ACCESS_ONCE() to close the race.

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 131d8a7..ab0966d 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1397,6 +1397,8 @@ int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
     nf_reset(skb);

     if (up->encap_type) {
+        int (*encap_rcv)(struct sock *sk, struct sk_buff *skb);
+
         /*
          * This is an encapsulation socket so pass the skb to
          * the socket's udp_encap_rcv() hook. Otherwise, just
@@ -1409,11 +1411,11 @@ int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
          */

         /* if we're overly short, let UDP handle it */
-        if (skb->len > sizeof(struct udphdr) &&
-            up->encap_rcv != NULL) {
+        encap_rcv = ACCESS_ONCE(up->encap_rcv);
+        if (skb->len > sizeof(struct udphdr) && encap_rcv != NULL) {
             int ret;

-            ret = (*up->encap_rcv)(sk, skb);
+            ret = encap_rcv(sk, skb);
             if (ret <= 0) {
                 UDP_INC_STATS_BH(sock_net(sk),
                          UDP_MIB_INDATAGRAMS,

更多

或许看了上面的会让你有一种错觉,volatile 可以解决同步的问题,其实不然,它只解决其中一个方面。而且上面所有的例子有一个共同的特点:所有的写操作都是简单的赋值(相对于大于CPU字宽的结构体赋值),简单赋值操作在所有平台上都是原子性的,而如果是做加法操作,原子性未必可以保证,更不用说需要 memory barrier 的时候了。所以,不要滥用 volatile。

perfbook 读书笔记:SLAB_DESTROY_BY_RCU

perfbook 书中第8.3.3.6节讲到了类型安全,提到了Linux内核中的 SLAB_DESTROY_BY_RCU。但是书中并没有更详细的介绍这个特性。这里更详细地说一下。

要理解 SLABDESTROYBY_RCU,最重要的是看 kmem_cache_destroy(),以 slub 为例,
[c]
void kmem_cache_destroy(struct kmem_cache *s)
{
down_write(&slub_lock);
s->refcount—;
if (!s->refcount) {
list_del(&s->list);
up_write(&slub_lock);
if (kmem_cache_close(s)) {
printk(KERN_ERR “SLUB %s: %s called for cache that “
“still has objects.n”, s->name, __func
);
dump_stack();
}
if (s->flags & SLAB_DESTROY_BY_RCU)
rcu_barrier();
sysfs_slab_remove(s);
} else
up_write(&slub_lock);
}
[/c]

看那两行就足够了,rcu_barrier(); 是用来等待所有的 call_rcu() 回调函数结束,那么,kmem_cache_destroy() 在 SLAB_DESTROY_BY_RCU 的情况很明显就是等待所有 kmem_cache_free() 完成。

和普通的用 kmem_cache_alloc() 分配出来的对象相比,这种内存分配方式提供了更弱的保证,普通的分配可以保证对象不会被释放回 cache 中,而这个仅仅保证它不会被彻底释放,但不保证它会被放回 cache 重新利用,也就是说类型是不变的,即所谓的类型安全。实际上,在此期间它很有可能已经被放回 cache 重新利用了。

正是因为这种保证更弱了,所以在 rcu_read_lock() 并发区内就要多一个检查,检查是否还是之前的那个对象,因为这是类型安全的,所以对它进行同类型的检查是完全合法的。一个很好的例子是 __lock_task_sighand():

[c]
struct sighand_struct __lock_task_sighand(struct task_struct tsk,
unsigned long flags)
{
struct sighand_struct
sighand;

    for (;;) {
            local_irq_save(*flags);
            rcu_read_lock();
            sighand = rcu_dereference(tsk-&gt;sighand);
            if (unlikely(sighand == NULL)) {
                    rcu_read_unlock();
                    local_irq_restore(*flags);
                    break;
            }

            spin_lock(&amp;sighand-&gt;siglock);
            if (likely(sighand == tsk-&gt;sighand)) {
                    rcu_read_unlock();
                    break;
            }
            spin_unlock(&amp;sighand-&gt;siglock);
            rcu_read_unlock();
            local_irq_restore(*flags);
    }

    return sighand;

}
[/c]

注意,->siglock 是在 ->ctor() 中初始化的,所以刚分配出来的 sighand 的 ->siglock 也是已初始化的。

在此基础上,Linux 内核中还衍生出一个新的哈希链表,hlist_nulls,具体可以参考 Documentation/RCU/rculist_nulls.txt。