Project:Bug-cleaners/Mass-cleaning

Note
This plan is still a work-in-progress - until it is finalised, no changes should be made to Bugzilla.

At the time of writing, there is currently an unmanageable 4251 open bugs currently assigned to maintainer-wanted@gentoo.org. The vast majority of these bugs are many years old and have not been modified for several years. To address this, we need a plan to automate the process of identifying, querying and, if no suitable response is given, closing bugs requesting packages that are no longer relevant.

Rationale and numbers

This purpose of this page is to define the conditions under which a bug should be considered closable, as well as to describe the actual process - from start to finish - to be used (including scripts to be run, and from where). The process for addressing these bugs should accommodate the following requirements:

Only automatically close bugs that have not been modified for X years
A prompt should be made on each bug to ask if it's still wanted
- Perhaps a whiteboard entry of bug-cleaner-YYYY-MM-DD indicating either the time of run or the time of close can be used
A timeout (30-days?) should apply before bugs should be closed automatically
This will need to be done in batches to avoid overloading infra
- Also in the interests of reducing load, any changes should be able to be done in a single step - we don't want to generate more than one mail for each step for each bug.

A breakdown of the current maintainer-wanted bugs (by whatever seems suitable - years since modification, status, whatever) should be given here. Perhaps also link bugz exports hosted on devspace? And/or a table?

Process

This should describe the steps in managing these bugs, including the scripts and commands used to implement them.

Notification

Given the scope of this project, this should probably be announced on the dev mailing list (project? others) linking to this wiki (once it's completed) and seeking comment, suggestions or flames (though preferably not flames - I don't like heat).

Pinging bugs

The first step is to prompt each bug for whether it is still a wanted package. This should state that it is an automated process, that we are trying to close old unwanted bugs, and that if there is no response or a negative response within 30-days (or whatever timeout if someone wants something else) then the bug will be automatically closed.

A template message that could be entered:

This is an automated bug check-up process.

While we appreciate any contributions that have been made, this bug has remained unmodified for a number of years. Because of this, we would like to determine if the package(s) in question are still wanted within the Gentoo Portage repository and, if not, close this bugs.

If this is still a valid package request, please add a comment saying so, and we will leave this bug open and begin addressing it as soon as possible. Please note that this may take a while as we expect a large number of packages to be requiring attention.

If this package is no longer relevant or wanted then no action is required and this bug will be automatically closed in 30 days.

Thank you!

The following points should also be addressed here:

Criteria for identifying bugs (as in what search criteria is used)
- Should there be different handling based on STATUS?
How it is the bug list provided to the script that is doing the cleaning
- Are they csv exports?
- Does it use dev-python/pybugz for searching?
Is a separate Bugzilla account needed?
- Infra would need to create one if so.
Are any flags set for later identification (eg. WHITEBOARD)?

Note
As previously noted, this should done in batches in order to not overload infra with ~3000-~4000 bug updates/emails all at once. Possibly include infra early in the formulation of the plan, too.

As an example, this could show running a script for a given batch as:

user $prompt-bugs.sh --only-year=2010 --really-do-it

some output here
another line here

The script or whatever used should also be included either inline in the wiki or linked to a devspace.

Triaging responses

As users or developers respond on bugs indicating that they are still wanted and should not be closed, they will need to be handled appropriately. Whatever flags or criteria set in step 1 should be cleared so that the automated close process does not also close these bugs.

Note
Perhaps, as part of the notification in Step 1, we could encourage developers who wish to claim m-w packages to unset the relevant flags themselves so as to reduce load when possible. But only if they intend to follow through with the bug to see it closed.

This will also need to address the issue of finding a maintainer (either a project, individual developer, or proxy-maintainer).

Closing bugs

This should be mostly similar to Step 1, though either a second script or the first with another option. Perhaps include a are you sure prompt or flag, as well as a --pretend to check that it will do what we want. This should naturally be run in batches mirroring Step 1, which should address the infra load concern.

Again, this should list what search criteria are being used to identify the bugs, whether they're being searched live by the script or if it's fed an export or list; and how the changes are actually made to Bugzilla (again, dev-python/pybugz?).

Again, the script should be included or linked, and a usage example given such as:

user $bug-cleaner.sh --close-them --year=1234 --really-do-it

oh look, we have some more example output here!
blah blah blah blah :D

Back-out strategy (reversal)

Given the quantity of changes, should there be a reversal process? Accidents happen, and the ability to undo changes made automatically is a feaseable necessity. This would also be linked to step 1 and 4 in that however the script(s) are fed could also be used. Perhaps, alternatively, a different WHITEBOARD flag could be added at closeure (or even just leave an existing one in place) so as to be able to search them again.

Also, logging for whatever scripted process?

Repeatability or ongoing automation

Are there any plans to have this as a repeatable process, or to have some bot or cronjob running that will automatically close bugs after a timeout or something? Other distributions run similar on their systems (see launchpad and Fedora/RedHat Bugzilla), so I don't think it's unreasonable to do the same here.

Other concerns or issues

Anything else?