Graduated and employed

Oh my, it's been so long since I last checked in here. Since then, I've graduated from the HvA and the kind folks at Nikhef have offered me a job. In September I hope to start my master's in computer science at the Vrije Universiteit.

I should also mention I attended HAR, EGEE, and RER2009 last year. And to top it, off the week before christmas I visited CERN for the all-hands developers meeting. I got the t-shirt and everything. Project Euler, Insecure Programming By Example (with great help from Mishou and some No Starch Press books) and Securitytube are what's keeping me busy besides that.

I helped out my friends set up a music-related blog at phonophanatic.nl which is already doing quite well, if you have an off-beat taste in music I recommend you check it out. Finally, here's a pic of me and my colleague with Bob Jones, the EGEE project director.


Using an Aladdin eToken to store your SSH keys.

The past week I have tried using an Aladdin eToken on my Ubuntu desktop. They are used at Nikhef to store x509 certificates with Grid credentials, and make proxy certificates from those. (A proxy certificate is a more temporary certificate, signed by the original.) The ultimate goal (for me) was to use it to store the private keys I use most, my SSH keys.

After peeking around the Nikhef Gridwiki and downloading the eToken PKI Client 4.55 from aladdin.ru, the pkcs11-tool confirmed the token was loaded and recognized by the system. I was also able to store my grid certificate on it with the instructions provided, and loaded the /usr/lib/libeToken.so file into Firefox as a security device. After restarting firefox things took a turn for the worse...

It appears the Aladdin client ships with some of its own binaries, which appear to do something to firefox (like overwriting the LD_LIBRARY_PATH environment variable) and shipping an old version of libnss3 that's needed by firefox. This sends firefox crashing with a "could not initialize XPCOM" message. A colleague helped me to locate (using strace) and fix the offending libraries in /usr/lib/eToken/nss_tools. The shipped files should be removed, and replaced by symlinks to the system certutil and modutil binaries. This fixes the problem! (Thanks again, Willem)

After alleviating this puzzling error, I was free to play around with the token some more. Concerning Openssh, it appears the shipped Ubuntu version doesn't enable the use of a smartcard (which is what the token pretends to be). This can be alleviated in two ways. Either by recompiling openssh with the --with-opensc flag, or applying a PKCS11 patch to openssh. The opensc option seemed to be the easier route. Getting the source packages built on Ubuntu was as simple as:

aczid@maggie:~$ apt-get source openssh; apt-get build-dep openssh

If you get errors about public keys that are missing, look them up using gpg --keyserver keyserver.ubuntu.com --search-keys <name of maintainer> This implies you trust this person and his or her public key!

aczid@maggie:~$ cd openssh-5.1p1/
aczid@maggie:~/openssh-5.1p1$ vi debian/rules 

Add the line: confflags += --with-opensc=/usr/ and build the package with dpkg-buildpackage. Now your SSH will understand opensc which relies on PKCS15, but the card is still formatted as PKCS11. Luckily, the token can hold files to facilitate both API's simultaneously. Detailed instructions to accomplish this with a different kind of token Worked For Me (TM). I suppose once you have the pkcs#-tools working, they're all more or less the same.

Now that you know how to put your certificate on the token using both API's, we can finally use openssh! Extract the public key from you certificate and place it in a remote server's authorized_keys file, and launch your newly built ssh with:

aczid@maggie:~/openssh-5.1p1$ ./debian/openssh-client/usr/bin/ssh -I0 &lt;some host&gt;

If you would like to use your existing keys (like the ones generated by ssh-keygen) you need to generate a self-signed certificate from them and put it on the token. Now you can use ssh without even needing a .ssh directory to store your private key! Unfortunately I was still unable to use the token to generate grid proxy certificates on Ubuntu.


Started my internship at Nikhef

About two weeks ago I started my internship at Nikhef, the Dutch institute for subatomic physics. Up until now it has mostly been a dizzying experience. I'm learning to cope with a whole new world, namely that of Grid computing. That wiki page is actually not very specific as to scale, so perhaps the GridPP introduction page can convey it better. Nikhef (together with SARA) has a Tier-1 site, which means they can provide computing power and storage for one tenth of the data generated by the LHC experiments. If nothing else, this map showing the grid sites should impress you.


View Larger Map

But you still know nothing of my tiny, (so far) insignificant role in all this. It's to design and develop the EES. It involves the redesign of existing (sudo-like) pluggable software library that should be backwards-compatible. And oh yeah, it's in C. I hope now you'll understand why I feel like my head is spinning. I've just about given up on reading all the related articles linked from the LHC wikipedia entry in my first week. I feel right in my place, but not very useful yet. The Grid infrastructure is so huge I only have a vague, high level idea of how it's all supposed to work. I have heard new acronyms every day for the past two weeks, most of which are still kind of lacking a real definition for me. In the mean time I've been coding several prototypes (more like examples, or exercises) of how parts of the project should work.

Blogging everything I learn there would be impossible, but you can follow me on twitter to keep up with my progress. My direct supervisor / boss is also on there. I hope to get out a blogpost again every week or so.


Nikto web site scanner

Today I was looking for an automated way to find any security related server mis-configurations on my website, and found a really nice tool called Nikto that does just that. In fact it was so helpful it showed me I was doing directory indexing through Apache where I didn’t want to.
Here is an example of its use.

aczid@aczid:~$ nikto -host blog.aczid.nl
---------------------------------------------------------------------------
- Nikto 2.02/2.03     -     cirt.net
+ Target IP:       127.0.1.1
+ Target Hostname: blog.aczid.nl
+ Target Port:     80
+ Start Time:      2009-01-25 0:16:00
---------------------------------------------------------------------------
+ Server: nginx/0.6.32
+ OSVDB-0: Retrieved X-Powered-By header: Phusion Passenger (mod_rails/mod_rack) 2.0.6
- /robots.txt - contains 1 'disallow' entry which should be manually viewed. (GET)
+ OSVDB-0: GET /?mod=some_thing&amp;op=browse : Sage 1.0b3 reveals system paths with invalid module names.
+ OSVDB-3092: GET /sitemap.xml : This gives a nice listing of the site content.
+ OSVDB-3092: GET /archives/ : This might be interesting...
+ OSVDB-3092: GET /stats/ : This might be interesting...
+ 2967 items checked: 6 item(s) reported on remote host
+ End Time:        2009-01-25 0:16:00 (23 seconds)
---------------------------------------------------------------------------
+ 1 host(s) tested

And remember, Gort! Klaatu barada nikto!


Setting up AWstats to parse Nginx log files served from Apache

After moving most of my Ruby apps onto Apache I got back to the idea of wanting to do my own log analysis. Although the Google analytics urchin is nice, I prefer something more unobtrusive. I have used AWstats in the past, and was always quite impressed with its feature set. After installing Apache again, I figured I would give it a go. I prefer using the packaged AWstats because that way it will (hopefully) automatically update through apt. This post on AWstats by Sami Dalouche was really helpful in getting my head around the packaged AWstats configuration on Debian. This will show you how to make awstats virtualhosts for every domain on your site, show you how to set up a cronjob and prevent logrotate from rotating the logs before AWstats has run. So after you have copied a new awstats.<yourvhost>.conf file in /etc/awstats/, set up the following variables: (This of course implies you have set the access_log to the same location, /var/log/nginx/<yourvhost>-access.log)

LogFile=”/var/log/nginx/&lt;yourvhost&gt;-access.log”
SiteDomain=”&lt;yourvhost&gt;”
HostAliases=”&lt;yourvhost&gt;”
DirData=”/var/lib/awstats/&lt;yourvhost&gt;”

And set the logformat for Nginx:

LogFormat="%host %other %logname %time1 %methodurl %code %bytesd %refererquot %uaquot"

Now for the Apache Virtualhost, which is also largely boilerplate.

&lt;VirtualHost&gt;
        ServerName &lt;yourvhost&gt;
        Alias /awstatsclasses "/usr/share/awstats/lib/"
        Alias /awstats-icon/ "/usr/share/awstats/icon/"
        Alias /awstatscss "/usr/share/doc/awstats/examples/css"
        ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/
        ScriptAlias /stats /usr/lib/cgi-bin/awstats.pl
        # This is the important bit. It tells AWstats to use your defined vhost config from the .conf file, rather than try to parse the Apache log.
        SetEnv AWSTATS_FORCE_CONFIG &lt;yourvhost&gt;
&lt;/VirtualHost&gt;

If the website you want to monitor is already being served from Apache, and Nginx proxies requests for nonexistent files to Apache you are probably done. Go check <yourvhost>/stats/! If the site is served using only Nginx (static assets) you will have to add a proxy_pass in your Nginx virtualhost for the 4 paths in the VirtualHost definition needed by awstats. Hope this helps people out who are struggling to get this set up! It took me about a day in total, I think. Now I have fancy separate stats on all my public Nginx websites!


Keystroke Dynamics Ruby gem

The KSD gem 0.0.1 is out! This is my simple keystroke dynamics library for Ruby GTK widgets. Developers can help out on GitHub.

Here are some screenshots of the included examples.

Enroll with login
Enroll with login
or
Entroll with sentences
Enroll with sentences

Try to log in
Try to log in

If you do it right, you will see something like:

Verified user aczid with mean accuracy of: 0.585
Logged in successfully!

Update Apparently somebody in China found this cool enough to blog about it! (translated)


Reversing 2-array axis in Ruby

Recently, I was working on a project that imports some CSV data into a dynamic database table. It needs to sort an array of floats. Along the way coding, I found myself doing something curious:

    rows = @table_class.all
    rows.each do | row |
      key = row.primary_key.to_sym
      @matches[key] = []
      row.instance_variables.each do | column |
        unless ['@id', '@repository','@primary_key','@original_values', '@new_record','@collection', '@updated_at'].include? column
          x = row.instance_variable_get(column)
          y = column.gsub(/@/, '') 
          @matches[key] << {:x => x, :y => y}
        end
      end 
      @matches[key] = @matches[key].sort_by { |match| match[:y] }
    end
    @matches

Sorting in Ruby! This smells bad. I put the data in a database for this?

The solution
The solution was to reverse the axis of the imported data, thereby enabling MySQL to sort the data for us. Instead of doing:

    n=0
    @parsed_file.each do | row |
      hash = row2hash(row)
      unless @table_class.first(:primary_key => hash[:primary_key])
        instance = @table_class.new(hash)
        if instance.save
          n+=1
          GC.start if n%50 == 0
        end
      end
    end

We can parse the file with inversed axis by doing:

    values = {}
    @parsed_file[0].enum_with_index.map do |primary_key, idx|
      if primary_key
        pk = primary_key.to_sym
        @parsed_file.collect do |row|
          if row[0]
            values[pk] = {} unless values[pk].is_a?(Hash)
            values[pk][row[0].to_sym] = row[idx]
          end
        end
      end 
    end
    n = 0 
    values.keys.each do |key|
      if values[key]
        unless @table_class.first(:primary_key => key.to_s)
          instance = @table_class.new(values[key])
          instance.primary_key = key
          if instance.save
            n+=1
            GC.start if n%50 == 0
          end
        end
      end
    end

I admit this is totally crazy code, and I don’t expect you to follow along. The rest of the class needed a bit of modifying too, but the first code example above has been simplified to:

    @matches[primary_key.to_sym] = @table_class.all(:order => [primary_key.to_sym.desc])

Ofcourse this hasn’t hurt performance, either


Dynamic DataMapper objects from imported CSV data

I have been working on a project that required some CSV data to be imported into a database. After I noticed DataMapper classes can be migrated through a class method, the idea of dynamically creating anonymous instances of DataMapper classes for imports occurred to me. In the code below the column types are known, but the column names are not. Here, I know all the columns except the primary key are of type Float. You could extend this example to add magic for determining the type of data, if you need it. This is experimental code, your mileage may vary

class CsvImporter

  attr_accessor :table_class

  require 'fastercsv'

  def initialize(filename)
    # CSV filename
    @filename = filename
    # Table column names array
    @table_columns = ['primary_key']
    puts "Parsing CSV file #{filename}"
    parse_file(@filename)
  end 

  # Returns sanitized name from filename.
  # Replaces dashes with underscores, removes slashes, removes .csv extension and prepends 'csvimport_'
  def self.table_name(filename)
    basename = File.basename(filename.to_s).to_s
    table_name = "csvimport_#{basename.gsub(/\.csv/, '').gsub(/-/, '_').gsub(/\//, '')}"
    table_name
  end 

  # Import CSV data into the database table using the ORM class
  def parse_file(filename)
    @parsed_file = FasterCSV.read(filename)
    analyze_header(@parsed_file.shift)
    create_table(CsvImporter.table_name(filename), @table_columns)
    n = 0 
    @parsed_file.each do | row |
      hash = row2hash(row)
      unless @table_class.first(:primary_key => hash[:primary_key])
        instance = @table_class.new(hash)
        if instance.save
          n+=1
          GC.start if n%50 == 0
        end
      end
    end
  end

  # Converts a row of CSV data to a ruby Hash.
  def row2hash(row)
    hash = {}
    row.size.times do |i|
      unless row[i].nil?
        hash[ @table_columns[i].to_sym ] = row[i]
      end
    end
    hash
  end

  # Analyzes CSV header and adds fields to @table_columns array
  def analyze_header(header)
    header.each do | column |
      # strips digit prefixes from CSV header and adds the result to 
      # table columns
      if column
        #column = "token_#{column.to_s}" unless column.to_s[0].is_a?(Integer)
        @table_columns.push column.to_s.gsub(/^\d+: /,'')
      end
    end
  end

  # Automagically creates an ORM class for the import using @table_columns array
  def create_table(name, columns)
    # creates a new table class with a primary_key property
    @table_class = Class.new do
      include DataMapper::Resource
      property :id, DataMapper::Types::Serial
      property :updated_at, DateTime
      property :primary_key, String
    end

    # set table name
    @table_class.storage_names[:default] = name
    # shift first element off because it is the primary key
    pk = columns.shift
    columns.each do | column |
      # Here, I know all the columns except the primary key are of type Float. You can extend this to add magic for determining the type of data.
      @table_class.property column.to_sym, Float, :precision => 11
    end

    # unshift PK back in place
    columns.unshift(pk)
    # dont destroy tables we already have
    unless @table_class.storage_exists?
      @table_class.auto_migrate!
    end
  end
end

The problem I had after this is that the anonymous object cannot be serialized in a traditionaly way. I decided to circumvent this by implementing a quick and dirty MySQL-specific DESC hack. I readily admit this is unstable, highly experimental code. If you plan to use it for any other purpose than mine, you will probably need to extend it a bit.

class CsvImporter
  def self.load_class(name)
    @table_class = Class.new do
      # Again, these types are known to always be there
      include DataMapper::Resource
      property :id, DataMapper::Types::Serial
      property :updated_at, DateTime
    end 
    @table_class.storage_names[:default] = name
    if @table_class.storage_exists?
      desc = repository(:default).adapter.query("desc #{name}")
      desc.each do |field|
        case field.type
        when /DateTime/i
          klass = DateTime
        when /Float/i
          klass = Float
        else
          klass = String
        end
        klass = DataMapper::Types::Serial if field.id == "id"
        if klass == Float
          @table_class.property field.id.to_sym, klass, :precision => 11
        else
          @table_class.property field.id.to_sym, klass
        end
      puts "Created field with id #{field.id.to_sym}, class: #{klass}"
      end 
    end
    @table_class
  end
end

And there you have it. The ability to work with you CSV imported data through a DM class, as if it has always lived in the database. I hope somebody besides myself finds this cool/useful.


New blog design

Using some free time I had during the holidays creatively, I have finally made a better layout design for the blog you are reading now. I hope you enjoy it, and everything is rendering well for you.


Moving to Phusion Passenger

This week I have moved my Ruby websites (which were previously running on Mongrel) to the Phusion Passenger Apache2 module. I have lived without apache for about a year, but I am really happy I switched back to it again. I am still using Nginx as a front-end proxy to serve static assets.
I am very pleased with Passenger because it makes deployment a lot easier! Basically, all Capistrano needs to do now for a deployment is move your app into the DocumentRoot and touch a "restart.txt" file. It supposedly works with any Rack-based web framework. I am using it with Merb and Rails.

I have more available memory and CPU cycles because there are no idle mongrels running, and availability is increased because new instances of the apps are spawned as needed (where memory is shared between multiple instances of an app).

Life is good with passenger!



Me elsewhere: