Dynamic DataMapper objects from imported CSV data

I have been working on a project that required some CSV data to be imported into a database. After I noticed DataMapper classes can be migrated through a class method, the idea of dynamically creating anonymous instances of DataMapper classes for imports occurred to me. In the code below the column types are known, but the column names are not. Here, I know all the columns except the primary key are of type Float. You could extend this example to add magic for determining the type of data, if you need it. This is experimental code, your mileage may vary

class CsvImporter

  attr_accessor :table_class

  require 'fastercsv'

  def initialize(filename)
    # CSV filename
    @filename = filename
    # Table column names array
    @table_columns = ['primary_key']
    puts "Parsing CSV file #{filename}"
    parse_file(@filename)
  end 

  # Returns sanitized name from filename.
  # Replaces dashes with underscores, removes slashes, removes .csv extension and prepends 'csvimport_'
  def self.table_name(filename)
    basename = File.basename(filename.to_s).to_s
    table_name = "csvimport_#{basename.gsub(/\.csv/, '').gsub(/-/, '_').gsub(/\//, '')}"
    table_name
  end 

  # Import CSV data into the database table using the ORM class
  def parse_file(filename)
    @parsed_file = FasterCSV.read(filename)
    analyze_header(@parsed_file.shift)
    create_table(CsvImporter.table_name(filename), @table_columns)
    n = 0 
    @parsed_file.each do | row |
      hash = row2hash(row)
      unless @table_class.first(:primary_key => hash[:primary_key])
        instance = @table_class.new(hash)
        if instance.save
          n+=1
          GC.start if n%50 == 0
        end
      end
    end
  end

  # Converts a row of CSV data to a ruby Hash.
  def row2hash(row)
    hash = {}
    row.size.times do |i|
      unless row[i].nil?
        hash[ @table_columns[i].to_sym ] = row[i]
      end
    end
    hash
  end

  # Analyzes CSV header and adds fields to @table_columns array
  def analyze_header(header)
    header.each do | column |
      # strips digit prefixes from CSV header and adds the result to 
      # table columns
      if column
        #column = "token_#{column.to_s}" unless column.to_s[0].is_a?(Integer)
        @table_columns.push column.to_s.gsub(/^\d+: /,'')
      end
    end
  end

  # Automagically creates an ORM class for the import using @table_columns array
  def create_table(name, columns)
    # creates a new table class with a primary_key property
    @table_class = Class.new do
      include DataMapper::Resource
      property :id, DataMapper::Types::Serial
      property :updated_at, DateTime
      property :primary_key, String
    end

    # set table name
    @table_class.storage_names[:default] = name
    # shift first element off because it is the primary key
    pk = columns.shift
    columns.each do | column |
      # Here, I know all the columns except the primary key are of type Float. You can extend this to add magic for determining the type of data.
      @table_class.property column.to_sym, Float, :precision => 11
    end

    # unshift PK back in place
    columns.unshift(pk)
    # dont destroy tables we already have
    unless @table_class.storage_exists?
      @table_class.auto_migrate!
    end
  end
end

The problem I had after this is that the anonymous object cannot be serialized in a traditionaly way. I decided to circumvent this by implementing a quick and dirty MySQL-specific DESC hack. I readily admit this is unstable, highly experimental code. If you plan to use it for any other purpose than mine, you will probably need to extend it a bit.

class CsvImporter
  def self.load_class(name)
    @table_class = Class.new do
      # Again, these types are known to always be there
      include DataMapper::Resource
      property :id, DataMapper::Types::Serial
      property :updated_at, DateTime
    end 
    @table_class.storage_names[:default] = name
    if @table_class.storage_exists?
      desc = repository(:default).adapter.query("desc #{name}")
      desc.each do |field|
        case field.type
        when /DateTime/i
          klass = DateTime
        when /Float/i
          klass = Float
        else
          klass = String
        end
        klass = DataMapper::Types::Serial if field.id == "id"
        if klass == Float
          @table_class.property field.id.to_sym, klass, :precision => 11
        else
          @table_class.property field.id.to_sym, klass
        end
      puts "Created field with id #{field.id.to_sym}, class: #{klass}"
      end 
    end
    @table_class
  end
end

And there you have it. The ability to work with you CSV imported data through a DM class, as if it has always lived in the database. I hope somebody besides myself finds this cool/useful.


Trackbacks

Use the following link to trackback from your own site:
http://blog.aczid.nl/trackbacks?article_id=21


Me elsewhere: