Tinderbox log collection and analysis
This wiki page documents the progress of the "Log collection and analysis for the tinderbox" Google Summer of Code 2013 project. The following are excerpts from the original project proposal, to be updated.
- 1 Motivation and description of features
- 2 Deliverables
- 3 Timeline
Motivation and description of features
Tinderboxing has been greatly beneficial to both Gentoo developers and users by continuously building and testing packages, reporting bugs and issues, thus improving their quality. However, some aspects of the tinderbox workflow could be improved. This project would specifically target log collection and analysis.
Currently, tinderbox instances submit build logs from Portage to a collector&analyser, using a combination of shell scripting, tar, netcat and Ruby, which are then processed, stored and made available through a web interface. The solution depends on external (and most importantly, non-free, in both senses of the term) Amazon SimpleDB storage, and also lacks some useful features.
I would implement a log collector that:
- supports collection from multiple hosts and uses IPv6
- is able to group log files: this would be used to group logs belonging to the same package build
- has an analyser which preprocesses the logs, checking for and marking error and warning messages, before storing them
- uses regular expressions in the analyser to detect issues
- uses a standalone storage backend instead of Amazon's services for the log files, and another database containing metadata records for the files
- has a web-based frontend, with authentication, which displays a list of logs that show errors or issues, with optional filtering capabilities
- has Bugzilla integration in the frontend, so that a list of open bugs for a certain package can be displayed, new bugs can be filed from the collected logs, and the logs themselves (or compressed variants) can be attached to bugs
- is integrated with Portage, meaning that it would perform the log submission, and also supply information on why a build failed
- is written in Python and, at least on the client (log provider) side, low on dependencies, as it has to be used in a tinderbox environment, where packages can often break
- is modular: it should be possible to adapt the system for various types of log providers, analysers and frontend models by writing modules/plugins, but implementing them for anything other than Portage and Bugzilla is not within the scope of this project
A nice corollary of the grouping feature and Portage integration would be an extra feature for the package manager that would help bug reporters. I would implement a command that produces a tarball of logs for the last package that failed to build. This would include the output of "emerge --info", amongst others, and thus reduce the amount of work a reporter has to do to supply information about a bug.
The work would also serve as a less complex alternative to systems such as Apache Flume, Logstash or Scribe, which could not be used in the tinderbox scenario as they either are written in Java or have many dependencies.
- a log collection system with the features described above
- documentation for the code, guides for setting up and usage
- optionally, a guide on writing plugins for the collector
- a Portage feature to produce a tarball of logs
17th June - 30th June
build a skeleton of the functionality: a simple client that submits files, a daemon that receives them, collects trivial information (like hostname), passes them through a no-op analyser, and stores in the filesystem; then add a proper analyser
1st July - 14th July
add support for log groups, add the storage backend and database, and a simple frontend which only displays logs and information about them
15th July - 28th July
integration with Portage: integrate the submission client, then add support for consolidating multiple logs; finally, provide information why build failed
29th July - 2nd August
3rd August - 18th August
write the log tarball feature and implement Bugzilla integration for the frontend
19th August - 1st September
work on optional bits: filtering capabilities for the frontend and any extra features that might become desirable while doing the project
2nd September - 15th September
testing and documentation
16th September - 23rd September
pencils down: finish up everything, this also gives some buffer time in case of unexpected difficulties
24th September - 27th September