Project:Kernel/Maintenance

From Gentoo Wiki
Jump to:navigation Jump to:search

This guide aims to document the processes used in the maintenance of Gentoo's core kernel package, the gentoo-sources. It is written in hope that it will help future contributors get involved with these maintenance efforts.

At the core of gentoo-sources is genpatches: a generic kernel patchset, designed to be minimal, Gentoo-independent, and accessible to other distributions and projects. Please ensure you have a good understanding of genpatches before continuing: read through the genpatches homepage.

This document was written by Daniel Drake, a previous kernel project lead. References to "me" in this document refer to Daniel, but please do not feel that this is a one-man project. Other members of the team are happy to help you as well.

Getting started

Mailing lists

Please subscribe to the gentoo-kernel mailing list. Subscription instructions can be found here, and archives can be found here. This list is low-traffic. I will announce modifications to this document on that list.

You should also subscribe to the linux-kernel mailing list (LKML). This list is extremely high traffic. You shouldn't even attempt to read it all, but glance through the subject lines every day or two, and keep an eye out for topics that interest you. It's also interesting to read the release announcements from Linus and the -stable crew, and Andrew Morton's -mm release announcements to keep up with what's going on in the kernel community.

Keep the LKML archived in its own folder in a good mail client. It becomes very useful later on when you are researching bugs reported by users on Gentoo's bugzilla. I try to keep at least the most recent 30,000 emails around.

To subscribe to the LKML, send an email to majordomo@vger.kernel.org with the following in the body of the email:

CODE Message body
subscribe linux-kernel YOUR-EMAIL-ADDRESS

You may also be interested in reading the LKML FAQ. It contains lots of information about how the mailing list is used and the kernel community in general. I don't expect you to have any need to write to the LKML initially, but do be sure to read this FAQ before you send your first email there.

IRC

Please be active on our IRC channel, #gentoo-kernel (webchat) on Libera.Chat. The Gentoo website has more information on IRC channels.

While learning, please ask us questions! This is the best way to learn. If we don't respond, ask again another time. We do want to help you, since hopefully that will result in you helping us in the future!

The primary kernel maintainers are Mike Pagano (IRC username mpagano) and Alice Ferrazzi (IRC username alicef).

Gentoo Bugzilla

The bulk of Gentoo kernel maintenance happens on bugzilla - bug reports, keyword requests, feature requests, etc. Make sure you have your own personal account.

You should now configure bugzilla to send you copies of all bugmails related to kernel maintenance. To do this, go to bugzilla email preferences and add kernel@gentoo.org to your user watch list. I suggest you set up a mail filter to move these mails to their own folder.

Another useful feature is bugzilla saved searches. You can perform a bugzilla query, and save it with a user-defined name for quick access later. Links to these saved searches are present on the bottom of every bugzilla screen. For each of the following queries, please scroll to the bottom of the resultant buglist and use the "Remember search" functionality to save each one:

  • Gentoo kernel bugs: This query lists all open bugs assigned to kernel@gentoo.org except ebuild submissions, tracker bugs, and 2.4 bugs. This is the main buglist you will be working from.
  • Kernel regressions: This query lists all open regression bugs. A regression is a bug which was introduced in a certain kernel release, whereas the same functionality worked in releases beforehand. In many ways, regressions are the most critical form of bugs.

Kernel Bugzilla

We report the majority of bugs upstream to the kernel's Bugzilla. You should create an account here, and again, put kernel@gentoo.org on your email user watch list.

Git

You'll probably find it handy to keep a git checkout of genpatches locally.

Checking out genpatches from anonymous GIT:

user $git clone git://anongit.gentoo.org/proj/linux-patches.git linux-patches

You can also browse it online.

https://gitweb.gentoo.org/proj/linux-patches.git/

https://github.com/gentoo/linux-patches

Websites

Here's a list of useful websites which you'll see me referring to.

Maintenance style

General objectives

The kernel is a huge codebase, and is one of the packages which is present on almost every users system and is used all the time. Our maintenance style may seem strange at first, but please remember that kernel maintenance is overall a huge task, and as such, one of our primary objectives is minimizing the amount of work that we have to do (and overall make things more manageable).

One way we reduce our workload is to attempt to deviate away from the upstream kernel releases as little as possible: we aim to make genpatches as small as possible. The less patches we have to handle/backport/forwardport, the less work we have on our hands.

Another approach we take is quick adoption of new kernel releases. Each new kernel release fixes many bugs and adds more hardware support, which invariantly affects proportions of our userbase. Instead of going to the pains of backporting these code changes, we reduce our workload by encouraging quick adoption of new kernel releases where this new code originates. Further in line with these goals, we tend to stop supporting older kernels fairly soon after a new release has been cut.

Even more time consuming than patch handling is bug fixing. Being such a huge codebase, we get a steady flow of kernel bug reports on our bugzilla. Some of them take considerable time/effort to get resolved. We work to streamline our bug handling processes here.

Even after 3 years of bug handling, I personally am familiar with only a very small proportion of the kernel codebase. It's important that we raise these bugs to the attention of the upstream kernel developers who do know what they are doing, and it's important that we do this quickly. As you will learn, we don't actually do much of the fixing downstream in Gentoo.

With respect to the above forwarding of bugs to upstream developers, it's important that we maintain good relationships with upstream developers. In general, upstream developers are wary of distro-reported kernel bugs - after all, most distributions patch their kernel releases so heavily that they aren't easily supportable by people outside of those distributions. I recently had to dig a patch out of the Mandriva patchset, and just out of curiousity, I ran diffstat against their patches. The data set was so great that it revealed an obscure bug in diffstat itself!

One way we attempt to get respect from upstream developers is by our patching policy: other than in exceptional circumstances, we don't add any patches until they have been merged into the official development tree upstream. Additionally, this helps us to respect upstream decisions, and helps us ensure quality (and upstream support) of our patchset.

Bug handling

Basic strategy

The bugfixing process certainly varies from bug to bug, but many have things in common. During the process I usually aim to do the following:

  • Verify that it is not a user or configuration error
  • Verify that it is not a Gentoo-specific distro bug
  • Gather detailed information to classify the problem
  • Send the bug upstream

With that in mind, bugs often follow a process such as the following:

  1. Initial evaluation
  2. Information gathering (including kernel test requests)
  3. Research
  4. Filing upstream

Initial process

A user reports a bug to the Gentoo bugzilla. A bug wrangler reassigns the bug to kernel@gentoo.org.

You are notified of this new bug by email. You first run a few sanity checks over it: is it an obvious user error? Is it a duplicate of another bug that has already been reported?

Information gathering

The initial bug report is usually lacking in information, by no fault of the reporter. Examples of common questions you might ask in response to the intial report include:

  • Which kernel version are you using? This is important information which is often omitted from the initial report.
  • Which other kernel versions have you tried? Sometimes the user knows that this is a long-term bug, and sometimes this information can be useful to you.
  • Are there any known previous working kernels? Here you're basically asking "is this a regression?"
  • Can you reproduce this? Sometimes people post crash dumps with no indication of whether this was a one-off after years of stability, or if it reliably occurs soon after boot, or if they know of a way to trigger the crash whenever they feel like it.

There are typically many other such questions that could be asked for most bug reports, but many of these can be answered by requesting output of certain commands or certain files. I find myself commonly requesting the following:

  • dmesg - this dumps the kernel log from the current boot to the console. It contains all sorts of interesting information about the hardware and drivers, as well as information about any kernel-level errors that may have occurred.
  • .config - sometimes it's useful to verify exactly how the user has configured their kernel.
  • lsmod output - when the user has built certain components as modules, this is one way you can verify that they have been loaded.
  • lspci output - this provides a useful overview of the system hardware and can be used to get an idea of which drivers the user should be using.

Version considerations

We generally accept bug reports for 2 kernel versions at any one time:

  1. The most recent kernel with stable keywords
  2. The most recent kernel with testing keywords

For example, at the time of writing, the most recent stable kernel is 2.6.20. 2.6.21 has just been released, and is present in the testing tree, but it needs substantially more testing time before we are confident enough to mark it stable. During this time we will accept bugs for the latest gentoo-sources-2.6.20 release, and the latest gentoo-sources-2.6.21 release.

Even if a user files a bug on a currently supported kernel, we often ask them to test the latest development kernel (e.g. 2.6.22-rc1 in the above example). Why? We're moving to get this bug reported upstream as soon as possible, and as a kernel developer, there simply isn't any point diagnosing issues on a codebase which isn't current.

With that in mind, and the above example of current versions, there are several situations which can arise:

  • A user may report a bug with 2.6.19 or older. These kernels are not supported. You first ask them to reproduce on any supported kernel (give them the option of either 2.6.20 or 2.6.21), and if the bug still exists there, ask them to test the latest development kernel (2.6.22-rc1).
  • The user may report a bug with 2.6.20 (current supported stable kernel). Ask them to test the current testing kernel (2.6.21), and if the bug still exists there, request that they test the latest development kernel (2.6.22-rc1).
  • The user may report the bug with 2.6.21 (current supported testing kernel). Ask them to reproduce with the latest development kernel (e.g. 2.6.22-rc1).

For 2 weeks after the upstream release of a 'production' kernel, there is a period in which there is no development kernel available. For example, 2.6.21 has only just come out and 2.6.22-rc1 will not be released for at least a week. In this case, you can skip the stage where you ask the user to test the latest development kernel -- reproducing the issue on 2.6.21 is enough to request the upstream developers to work on it.

Researching the bug

When you're happy that you have enough information about the problem, you should attempt to find other reports of the same issue. In some cases you can do this without asking the user to test all the new kernels.

It is quite often the case that the bug has already been reported upstream and there is already some discussion happening about how to fix it, or maybe a patch is already available.

This part is really just common sense: stick some search terms into google. If you have a distinguishable error message, search for that. Otherwise, search for "linux 2.6.20 sky2" if you're researching a reliability regression in 2.6.20 with the sky2 driver.

Additionally, query the linux bugzilla with similar search terms, and use your email client to search through your local LKML archives.

If you find something, that's great, reference it on the bug. You may even have found a fix for the problem. It's certainly not uncommon for standard searching not to find any references though.

vanilla-sources

Occasionally we get bug reports from people using vanilla-sources. Whether or not we accept these bugs is a matter of judgement. vanilla-sources is provided to aid testing, but is not supported: if it's broken, we don't patch it.

That said, if someone reports a bug in vanilla-sources-2.6.21.1 (the latest upstream release) it is likely to also exist in the latest gentoo-sources-2.6.21 release, so it probably makes sense to accept the bug and go through the usual diagnosis. However, if you feel that gentoo-sources may have a patch to fix this, don't hesitate to remind them that vanilla-sources is unsupported and they need to reproduce this on gentoo-sources before we'll look into it.

Sometimes, users report -rc kernel regressions, e.g. they report bugs which exist in vanilla-sources-2.6.22-rc1 but not in 2.6.21. In these cases, you can close the bug right away: you should thank the user for reporting the issue but point out that they should report it upstream at the kernel bugzilla directly. gentoo-sources does not do -rc releases.

Binary and out-of-kernel modules

It may become apparent through information gathering that the user is using kernel modules/drivers which are not part of the official kernel sources. One example of such a package would be x11-drivers/nvidia-drivers.

Again, these situations require some judgement. There's simply no way we can support kernels which are modified at runtime in this way, so it is perfectly reasonable to ask the user to reproduce the issue without those external modules loaded (and also request that they have not been loaded since boot, as opposed to them booting, loading and unloading the modules, and then reproducing the issues).

That said, you can sometimes conclude that the bug exists through other means, in which case there is no need to ask them to reproduce without the external modules. For example, you might find an identical report on a non-tainted kernel on LKML.

Regression handling

If you determine that a bug is a regression, there are a few extra things to consider. First, you should tag it as so (read about the Status Whiteboard later in this document). Secondly, you should note that the Linux source code management utility (git) has a useful bisection feature which can identify the exact commit which caused a regression. This really makes the bug a lot easier to solve.

However, performing a bisection is a time consuming process: for identifying a regression between versions 2.6.x and 2.6.x+1 it usually requires that you build and test approximately 13 different kernels. So, unless you sense that the user is particularly keen, you should probably exhaust other options first. You could get the bug forwarded upstream and after a few days if there hasn't been any progress, ask them if they will consider doing a bisection.

I have some pre-written instructions on how to do a bisection, which are easy to follow. If you are going to suggest a bisection, I would suggest you just drop them a link to this article.

If the user doesn't respond

It's unfortunate, but fairly often, users turn unresponsive and don't provide the extra information that you requested. While we aren't paid to fix these issues and have plenty more to be working on, it's best just to discard these bugs.

After 14 days, if the user has not responded to my most recent request, I close the bug as NEEDINFO with a comment asking that they reopen the bug when the information has been provided. Quite often this spurs the user into providing what was requested.

Reporting bugs upstream

Overview

You've established that this appears to be a kernel bug, you've gathered information which agrees, and it's been reproduced on the latest development kernel. The next step is to get the bug reported upstream.

There are 2 ways you can do this: by asking the user to report the bug at the upstream kernel bugzilla, and by yourself sending an email to the relevent lists and maintainers.

It's very important that we make it easy for the upstream developers here. When reporting bugs, state clearly which versions are affected, mentioning which ones are unmodified (usually we can say "it's present in 2.6.19-gentoo but is also reproducible on unmodified 2.6.20-rc5").

While the procedure does involve linking to the downstream bug, make sure all information is presented upstream - don't expect them to click and read through the downstream bug, which probably contains some degree of noise anyway.

The procedure outlined below includes modifying the status whiteboard field, which is explained further later in this document.

Reporting bugs on the kernel bugzilla

The kernel bugzilla is the primary mechanism we use to report bugs upstream. Actually, the way we do it is by getting the user to file the bug report, and then us tracking it. Here is the process I follow:

  1. Mark the Gentoo bug as UPSTREAM, and add "linux-bugzilla-pending" to the status whiteboard. While doing so, add a comment asking the user to file the bug upstream. Give them the kernel bugzilla URL, ask them to post the new bug URL on the Gentoo bug when done, and provide some basic guidance for what they should include in their bug report (example here).
  2. When the user responds with the upstream bug URL, place that URL in the URL field of the Gentoo bug. Remove "linux-bugzilla-pending" from the status whiteboard, and add "watch-linux-bugzilla". Add a quick comment stating that we'll keep an eye on the upstream bug.
  3. Go to the upstream bug report. Read through it. Did the user miss any important information from their description? If so, add a comment with that information. If there are useful attachments on the Gentoo bug which the user didn't attach to the upstream bug, attach them yourself with brief descriptions.
  4. Add kernel@gentoo.org to the CC list. If the user didn't mention the URL to the Gentoo bug on the upstream bug report, add a comment with the URL.

If the user doesn't respond to the request to file the bug upstream, discard the bug like you would for other unresponsive cases. Since you closed it as UPSTREAM, it will drop off the buglist anyway -- so do remember to continue the process when they do respond with the upstream bug URL. The linux-bugzilla-pending keyword allows us to measure how often bugs get lost at this point, and allows us to follow up.

When the bug gets updated upstream, kernel@gentoo.org will get the bugmail, and since you have that address on your watch list, you'll see it too. When the bug gets fixed, reopen then downstream bug with a quick summary of the solution, removing the "watch-linux-bugzilla" tag.

If someone closes the upstream bug without a fix (for example, the user didn't respond to an information request, or the bug got marked as invalid), reflect the changes downstream: mark the equivalent Gentoo bug as CLOSED and remove the "watch-linux-bugzilla" tag.

Sending email bug reports upstream

To be written later. This is basically used as a fallback if the upstream bug isn't getting the attention it needs.

Other bugzilla stuff

Access privileges

This document includes references to manipulating bugs on the Gentoo bugzilla (closing/reopening/reassigning/etc). Unless you're a Gentoo developer or you're the bug reporter, you don't actually have access to do this though.

If a certain action should be taken on a bug which you don't have permission for, you should ask in #gentoo-kernel on IRC for someone to perform the operation. If there is no response, leave a simple comment on the bug requesting the change, e.g. "please close this bug as NEEDINFO"

I realise this is a pain, but it's only a temporary one. Once you've made several contributions to bugs in this form, I will increase your bugzilla access permissions so that you can perform these tasks on your own.

The other aspect of this problem is manipulating bugs on the upstream kernel bugzilla. This needs to be done much less than bugs downstream. I personally have developer access to the upstream bugzilla, so if anything needs doing (bugs closing/reassigning/etc), prod me on IRC. In future we could look into getting more people developer access here.

Status whiteboard

Bugzilla has a single-line status whiteboard text field for each bug, which you are free to put anything in. I use it for certain kinds of tagging.

If a bug is present in 2.6.20, but fixed in 2.6.21 and the fix is not yet known, I place "linux-2.6.21" in the status whiteboard field. This is effectively tagging the bug saying "this problem goes away when 2.6.21 gets released".

If the bug is a 2.6.21 regression, I place "linux-2.6.21-regression" in the status whiteboard. This is used for one of the saved searches I linked to earlier. If the bug is a regression but the exact version which introduced the bug is unknown (e.g. 2.6.16 worked, 2.6.20 is broken), I place "linux-2.6.??-regression" in the whiteboard. For regressions I also prefix the summary field with (e.g.) [2.6.21 regression], so that things are immediately viewable from the search results.

If the bug has been reported upstream, I place the upstream bug URL in the URL field, and "watch-linux-bugzilla" in the status whiteboard field. This is used by one of the saved searches.

Bug resolution

When a bug has been solved but is not fixed in portage (i.e. is pending a patch being added to genpatches, or pending the next genpatches/gentoo-sources release), we add the InGIT bugzilla keyword.

The bug is then closed with an appropriate message when a new kernel release (incorporating the fix) is added to the portage tree.

genpatches and linux-stable

Submission criteria

  • This section is aimed at documenting the process for Gentoo kernel developers only (rather than just prospective ones), but if you're really keen, feel free to ask on IRC if you'd like to submit a patch.
  • The patch must conform to the stable kernel rules, see Documentation/stable_kernel_rules.txt
  • The patch must have been shipped in at least one gentoo-sources release. The reasoning here is that some of the time, patches committed to genpatches are untested -- full testing is carried out before release.

Submission process

Prepare an email in the following way:

  • Addressed to stable@kernel.org
  • CC kernel@gentoo.org
  • CC the patch author, who's contact details can be found in the patch
  • Use the same subject line as the one in the patch, prepended with [PATCH] if it is not there already.
  • In the message body, include a very brief description with a reference to the Gentoo bug report, for example: This patch fixes a bootup crash in the snd-intel-hda driver as detailed at bug #12345
  • Attach the patch to the email, directly from genpatches. The patch should have come from gitweb or similar and already have the authorship details, commit ID, etc.

What to do now?

Look at some bug reports from the saved searches you configured earlier. If you can contribute straight off, go right ahead. However, I expect you'll probably be unsure what should happen next. So, pick a bug, and question us about it, preferably on IRC in #gentoo-kernel (webchat). Alternatively, you could write to the gentoo-kernel mailing list.

It may seem that many of the open bugs are already in good hands, waiting on the user, or something like that. However, please stick around, because we get a steady flow of new bugs!

While this document has introduced you to the processes, you may not feel capable of jumping into problems and solving bugs. That's fine! Because you have put kernel@gentoo.org on your bugzilla watchlist, you will see bugs as they come in and the way that we fix them. Observe how we handle and solve the issues. Sooner or later, you will be able to mimick our activities on other bugs, and you'll be away!

Thank you for your interest in helping us!