Project:Gentoostats

Gentoostats project tasks itself with the deployment, maintenance and continued development of "gentoostats", a software that collects various statistics from Gentoo machines.

About Gentoostats
Gentoostats is written by Vikraman Choudhury as a Google Summer Code 2011 project. It is written in Python and implements a client-server model. The server component is a WSGI web application, built using the webpy framework. The client component uses the Portage API to collect various statistics from a Gentoo machine, encodes it in the JSON format and submits it to the server. Users have the ability to configure which information is to be transmitted according to their privacy needs.

Upload package build time statistics (veremit)
Instead of reporting it in absolute time, look into using a relative measure like SBU (see: https://en.wikipedia.org/wiki/Linux_From_Scratch#Standard_build_unit).

Distributed collection servers
Make multiple servers exchange and sync stats with each other. This is similar to the pgp keyservers and the goal is to distribute the load and combat DDOS. The major problem is the collision of host UUIDs across multiple independent servers. Second problem is the trust between servers for which solutions exist. Since the database will grow significantly large, some form of delta-sync will be necessary.

Validity of the submitted samples
There's no mechanism to stop a malicious user from flooding the server by synthetically creating a large set of statistics reports and submitting them. There's no way to prove that the submitted statistics come from an actual installation. This can be utilized in the form of denial of service or skewing the statistics. Some form of rate limiting and snapshotting may be useful.

TODO

 * Make an initial release of gentoostats server and add it to the tree
 * Work with infra about gentoostats deployment
 * Update Gentoostats, add a section for deploying private instances, improve the usage text

Gentoostats 2011

 * Code: https://gitweb.gentoo.org/proj/gentoostats.git/

Progress Reports:
 * Progress Report #1: https://archives.gentoo.org/gentoo-soc/message/2f9044ad5390b53a338fc9bca4bebda5
 * Progress Report #2: https://archives.gentoo.org/gentoo-soc/message/b345988ca5df929abb4f0f5b9aceb00c
 * Progress Report #3: https://archives.gentoo.org/gentoo-soc/message/845d373851b7b06f4ab7ce3662b15b4b
 * Progress Report #4: https://archives.gentoo.org/gentoo-soc/message/76a0eb1b38e9101ca44d5da7723dcf60
 * Progress Report #5: https://archives.gentoo.org/gentoo-soc/message/0179caaa96f8df9f4619a38d630c8cdb
 * Midterm Report: https://archives.gentoo.org/gentoo-soc/message/a982111423d18fb7a714526bf9052708
 * Progress Report #6: https://archives.gentoo.org/gentoo-soc/message/635ee0e2c9e3d599be5e9c05cd905f9c
 * Progress Report #7: https://archives.gentoo.org/gentoo-soc/message/606094a198354a2938b8b8a10f7b0cb5
 * Final Report: https://archives.gentoo.org/gentoo-soc/message/c90536fdd571898e6a15c6c7d9fa0c75

Gentoostats 2012
Apparently, there's another gentoostats project based on django written as part of GSoC 2012:


 * Server: https://github.com/gg7/gentoostats_server
 * Client: https://github.com/gg7/gentoostats
 * Playground (??): https://github.com/vikraman/gentoostats-playground
 * Deployment bug: https://bugs.gentoo.org/show_bug.cgi?id=425056

Progress Reports:
 * Progress Report #1: https://archives.gentoo.org/gentoo-soc/message/a85db0776186d6e4fa032377af2c8634
 * Progress Report #2: https://archives.gentoo.org/gentoo-soc/message/b0be0d2f6a5c43457ef6cebd3f8e9b7b
 * Progress Report #3: https://archives.gentoo.org/gentoo-soc/message/1b45015692cecc31211f93de4bb701d0
 * Progress Report #4: https://archives.gentoo.org/gentoo-soc/message/1e1a675494bca49352097a0b25dd58f9
 * Progress Report #5: https://archives.gentoo.org/gentoo-soc/message/a8a0f843bd2b755f834b3f9eacdbf97b
 * Progress Report #6: https://archives.gentoo.org/gentoo-soc/message/e8a9ef1386d0bf29d922a86ee5332ea8
 * Progress Report #7: https://archives.gentoo.org/gentoo-soc/message/8e9fcbd3ab67cdc7c66c9aab87eea62f
 * Final Report: https://archives.gentoo.org/gentoo-soc/message/760cbd58a309b56f31d3697d90f44601

Find out why the code isn't being hosted on infra. Evaluate the functionality. Determine which version is to be deployed and maintained.

Attempting to deploy Gentoostats 2012
This is an ongoing effort to deploy this version of gentoostats on my local machine:

Package list:
 * dev-python/django-1.8.9
 * dev-python/django-extensions-1.6.1
 * dev-python/django-debug-toolbar-1.3.2
 * dev-python/django-tastypie-0.9.15

Steps: File "/usr/lib64/python2.7/site-packages/tastypie/resources.py", line 2256, in ModelResource @transaction.commit_on_success AttributeError: 'module' object has no attribute 'commit_on_success' As a hackaroo, edit tastypie and replace "@transaction.commit_on_success" with "@transaction.atomic", see: https://github.com/macropin/django-registration/issues/51#issuecomment-100579391 Do the same in gentoostats/receivers/views.py. /usr/lib64/python2.7/site-packages/django/core/management/commands/syncdb.py:24: RemovedInDjango19Warning: The syncdb command will be removed in Django 1.9 January 03, 2017 - 01:01:37 Django version 1.8.9, using settings 'gentoostats.settings' Starting development server at http://127.0.0.1:8000/ Dies with "ImportError: No module named transaction" INFO 2017-01-03 01:50:31,928 views 27822 140719396648704 process_submission: Error: Invalid date in LASTSYNC. Traceback (most recent call last): File "/tmp/gentoostats_server/gentoostats/receiver/views.py", line 88, in process_submission time.strptime(lastsync, "%a, %d %b %Y %H:%M:%S +0000") File "/usr/lib64/python2.7/_strptime.py", line 478, in _strptime_time return _strptime(data_string, format)[0] File "/usr/lib64/python2.7/_strptime.py", line 332, in _strptime (data_string, format)) ValueError: time data u'Unknown' does not match format '%a, %d %b %Y %H:%M:%S +0000' This is due to using a git repo in /usr/portage, which doesn't contain the timestamp file and the client is sending the string "Unknown" after patching with similarly to https://gitweb.gentoo.org/proj/gentoostats.git/commit/?id=963afe1163125b8cbed08c0e8edea9a05a37510e. Patch it with -   if lastsync: +    if lastsync and lastsync != "Unknown": and add the following else statement to it: else:     lastsync = None ERROR 2017-01-03 02:02:24,104 views 810 140115441485568 process_submission: 'NoneType' object has no attribute '__getitem__' Traceback (most recent call last):  File "/tmp/gentoostats_server/gentoostats/receiver/views.py", line 369, in accept_submission    return process_submission(request)  File "/usr/lib64/python2.7/site-packages/django/views/decorators/csrf.py", line 58, in wrapped_view return view_func(*args, **kwargs) File "/usr/lib64/python2.7/site-packages/django/utils/decorators.py", line 145, in inner return func(*args, **kwargs) File "/tmp/gentoostats_server/gentoostats/receiver/views.py", line 159, in process_submission country      = GeoIP.country_name(ip_addr), File "/usr/lib64/python2.7/site-packages/django/contrib/gis/geoip/base.py", line 190, in country_name return self.city(query)['country_name'] TypeError: 'NoneType' object has no attribute '__getitem__' Comment out the call to "GeoIP.country_name(ip_addr)" in gentoostats/receiver/view.py for now. ERROR 2017-01-03 02:26:40,646 views 13386 140066905986816 process_submission: Cannot assign "u''": "Submission.sync" must be a "SyncServer" instance. Traceback (most recent call last): File "/tmp/gentoostats_server/gentoostats/receiver/views.py", line 369, in accept_submission return process_submission(request) File "/usr/lib64/python2.7/site-packages/django/views/decorators/csrf.py", line 58, in wrapped_view return view_func(*args, **kwargs) File "/usr/lib64/python2.7/site-packages/django/utils/decorators.py", line 145, in inner return func(*args, **kwargs) File "/tmp/gentoostats_server/gentoostats/receiver/views.py", line 186, in process_submission sync         = sync, File "/usr/lib64/python2.7/site-packages/django/db/models/manager.py", line 127, in manager_method return getattr(self.get_queryset, name)(*args, **kwargs) File "/usr/lib64/python2.7/site-packages/django/db/models/query.py", line 346, in create obj = self.model(**kwargs) File "/usr/lib64/python2.7/site-packages/django/db/models/base.py", line 468, in __init__ setattr(self, field.name, rel_obj) File "/usr/lib64/python2.7/site-packages/django/db/models/fields/related.py", line 642, in __set__ self.field.rel.to._meta.object_name, ValueError: Cannot assign "u''": "Submission.sync" must be a "SyncServer" instance. This is due to gentoostats expecting the SYNC variable in make.conf. From gentoostats/stats/models.py: sync = models.ForeignKey(SyncServer, blank=True, null=True, related_name='+') Set sync to None for now: @@ -145,6 +147,7 @@ def process_submission(request): validate_item(lang) sync = data.get('SYNC') +   sync = None if sync: sync, _ = SyncServer.objects.get_or_create(url=sync) validate_item(sync)
 * Clone the repo, copy gentoostats/settings.py.example to gentoostats/settings.py and edit accordingly
 * "manage.py check" dies with "ImportError: No module named south"
 * south is hard masked
 * comment out south from INSTALLED_APPS in settings.py
 * "manage.py" check dies with the following:
 * Initialize the database with "manage.py syncdb"
 * Run the server with "manage.py runserver"
 * Comment out 'django.middleware.transaction.TransactionMiddleware' from MIDDLEWARE_CLASSES in settings.py, see: http://stackoverflow.com/a/33102743
 * Try to upload stats, dies with:
 * Try to upload stats again, dies with:
 * Try to upload stats again, dies with:
 * 1) make.conf example: SYNC="rsync://rsync.gentoo.org/gentoo-portage"
 * Try to upload stats again, viola!