Project:Gentoostats

From Gentoo Wiki
Jump to: navigation, search
Gentoostats
Description Gentoostats project maintains and develops the "gentoostats" statistics collection software for Gentoo machines
Email gentoostats@gentoo.org
Lead(s)
Last elected: 2017/01/02
Members
Subprojects
(and inherited members)
(none)
Parent Project Gentoo Linux
Project listing

Gentoostats project tasks itself with the deployment, maintenance and continued development of "gentoostats", a software that collects various statistics from Gentoo machines.

About Gentoostats

Gentoostats is written by Vikraman Choudhury as a Google Summer Code 2011 project. It is written in Python and implements a client-server model. The server component is a WSGI web application, built using the webpy framework. The client component uses the Portage API to collect various statistics from a Gentoo machine, encodes it in the JSON format and submits it to the server. Users have the ability to configure which information is to be transmitted according to their privacy needs.

Warning
This page is a work in progress by gokturk (talk | contribs). Treat its contents with caution.

Suggested Features

Upload package build time statistics (veremit)

Instead of reporting it in absolute time, look into using a relative measure like SBU (see: https://en.wikipedia.org/wiki/Linux_From_Scratch#Standard_build_unit).

Distributed collection servers

Make multiple servers exchange and sync stats with each other. This is similar to the pgp keyservers and the goal is to distribute the load and combat DDOS. The major problem is the collision of host UUIDs across multiple independent servers. Second problem is the trust between servers for which solutions exist. Since the database will grow significantly large, some form of delta-sync will be necessary.

Open Problems

Validity of the submitted samples

There's no mechanism to stop a malicious user from flooding the server by synthetically creating a large set of statistics reports and submitting them. There's no way to prove that the submitted statistics come from an actual installation. This can be utilized in the form of denial of service or skewing the statistics. Some form of rate limiting and snapshotting may be useful.

TODO

  • Make an initial release of gentoostats server and add it to the tree
  • Work with infra about gentoostats deployment
  • Update Gentoostats, add a section for deploying private instances, improve the usage text

Discussion regarding which version to deploy

Gentoostats 2011 Gentoostats 2012
Pros
  • Simplistic design based on web.py with small number of dependencies
  • With near to no maintenance over 5 years, it was still almost completely functional
  • Submitting stats is much faster compared to Gentoostats 2012
  • Based on django and has potential for richer web interfaces
  • Use of django models provides a good data abstraction and makes the solution independent of a particular SQLDB
Cons
  • Archaic web interface
  • Directly deals with SQL queries over web.py, no abstraction
  • No python3 support for web.py yet
  • Maintenance becomes a burden. Need to keep up with django upgrades.
  • Submitting statistics works way more slowly
  • Increased code complexity

Gentoostats 2011

Progress Reports:

Gentoostats 2012

Apparently, there's another gentoostats project based on django written as part of GSoC 2012:

Progress Reports:

Find out why the code isn't being hosted on infra. Evaluate the functionality. Determine which version is to be deployed and maintained.

Attempting to deploy Gentoostats 2012

This is an ongoing effort to deploy this version of gentoostats on my local machine:

Package list:

  • dev-python/django-1.8.9
  • dev-python/django-extensions-1.6.1
  • dev-python/django-debug-toolbar-1.3.2
  • dev-python/django-tastypie-0.9.15

Steps:

  • Clone the repo, copy gentoostats/settings.py.example to gentoostats/settings.py and edit accordingly
  • "manage.py check" dies with "ImportError: No module named south"
    • south is hard masked
    • comment out south from INSTALLED_APPS in settings.py
  • "manage.py" check dies with the following:
File "/usr/lib64/python2.7/site-packages/tastypie/resources.py", line 2256, in ModelResource @transaction.commit_on_success()
AttributeError: 'module' object has no attribute 'commit_on_success'

As a hackaroo, edit tastypie and replace "@transaction.commit_on_success" with "@transaction.atomic", see: https://github.com/macropin/django-registration/issues/51#issuecomment-100579391 Do the same in gentoostats/receivers/views.py.

  • Initialize the database with "manage.py syncdb"
 /usr/lib64/python2.7/site-packages/django/core/management/commands/syncdb.py:24: RemovedInDjango19Warning: The syncdb command will be removed in Django 1.9
  • Run the server with "manage.py runserver"
January 03, 2017 - 01:01:37
Django version 1.8.9, using settings 'gentoostats.settings'
Starting development server at http://127.0.0.1:8000/

Dies with "ImportError: No module named transaction"

  • Try to upload stats, dies with:
INFO 2017-01-03 01:50:31,928 views 27822 140719396648704 process_submission(): Error: Invalid date in LASTSYNC.
Traceback (most recent call last):
 File "/tmp/gentoostats_server/gentoostats/receiver/views.py", line 88, in process_submission
   time.strptime(lastsync, "%a, %d %b %Y %H:%M:%S +0000")
 File "/usr/lib64/python2.7/_strptime.py", line 478, in _strptime_time
   return _strptime(data_string, format)[0]
 File "/usr/lib64/python2.7/_strptime.py", line 332, in _strptime
   (data_string, format))
ValueError: time data u'Unknown' does not match format '%a, %d %b %Y %H:%M:%S +0000'

This is due to using a git repo in /usr/portage, which doesn't contain the timestamp file and the client is sending the string "Unknown" after patching with similarly to https://gitweb.gentoo.org/proj/gentoostats.git/commit/?id=963afe1163125b8cbed08c0e8edea9a05a37510e. Patch it with

-    if lastsync:
+    if lastsync and lastsync != "Unknown":

and add the following else statement to it:

else:
    lastsync = None
  • Try to upload stats again, dies with:
ERROR 2017-01-03 02:02:24,104 views 810 140115441485568 process_submission(): 'NoneType' object has no attribute '__getitem__'
Traceback (most recent call last):
 File "/tmp/gentoostats_server/gentoostats/receiver/views.py", line 369, in accept_submission
   return process_submission(request)
 File "/usr/lib64/python2.7/site-packages/django/views/decorators/csrf.py", line 58, in wrapped_view
   return view_func(*args, **kwargs)
 File "/usr/lib64/python2.7/site-packages/django/utils/decorators.py", line 145, in inner
   return func(*args, **kwargs)
 File "/tmp/gentoostats_server/gentoostats/receiver/views.py", line 159, in process_submission
   country       = GeoIP().country_name(ip_addr),
 File "/usr/lib64/python2.7/site-packages/django/contrib/gis/geoip/base.py", line 190, in country_name
   return self.city(query)['country_name']
TypeError: 'NoneType' object has no attribute '__getitem__'

Comment out the call to "GeoIP().country_name(ip_addr)" in gentoostats/receiver/view.py for now.

  • Try to upload stats again, dies with:
ERROR 2017-01-03 02:26:40,646 views 13386 140066905986816 process_submission(): Cannot assign "u": "Submission.sync" must be a "SyncServer" instance.
Traceback (most recent call last):
 File "/tmp/gentoostats_server/gentoostats/receiver/views.py", line 369, in accept_submission
   return process_submission(request)
 File "/usr/lib64/python2.7/site-packages/django/views/decorators/csrf.py", line 58, in wrapped_view
   return view_func(*args, **kwargs)
 File "/usr/lib64/python2.7/site-packages/django/utils/decorators.py", line 145, in inner
   return func(*args, **kwargs)
 File "/tmp/gentoostats_server/gentoostats/receiver/views.py", line 186, in process_submission
   sync          = sync,
 File "/usr/lib64/python2.7/site-packages/django/db/models/manager.py", line 127, in manager_method
   return getattr(self.get_queryset(), name)(*args, **kwargs)
 File "/usr/lib64/python2.7/site-packages/django/db/models/query.py", line 346, in create
   obj = self.model(**kwargs)
 File "/usr/lib64/python2.7/site-packages/django/db/models/base.py", line 468, in __init__
   setattr(self, field.name, rel_obj)
 File "/usr/lib64/python2.7/site-packages/django/db/models/fields/related.py", line 642, in __set__
   self.field.rel.to._meta.object_name,
ValueError: Cannot assign "u": "Submission.sync" must be a "SyncServer" instance.

This is due to gentoostats expecting the SYNC variable in make.conf. From gentoostats/stats/models.py:

# make.conf example: SYNC="rsync://rsync.gentoo.org/gentoo-portage"                                                                 
sync = models.ForeignKey(SyncServer, blank=True, null=True, related_name='+')

Set sync to None for now:

@@ -145,6 +147,7 @@ def process_submission(request):
         validate_item(lang)
 
     sync = data.get('SYNC')
+    sync = None
     if sync:
         sync, _ = SyncServer.objects.get_or_create(url=sync)
         validate_item(sync)
  • Try to upload stats again, viola!