Dynamic DataMapper objects from imported CSV data
I have been working on a project that required some CSV data to be imported into a database. After I noticed DataMapper classes can be migrated through a class method, the idea of dynamically creating anonymous instances of DataMapper classes for imports occurred to me. In the code below the column types are known, but the column names are not. Here, I know all the columns except the primary key are of type Float. You could extend this example to add magic for determining the type of data, if you need it. This is experimental code, your mileage may vary
class CsvImporter
attr_accessor :table_class
require 'fastercsv'
def initialize(filename)
# CSV filename
@filename = filename
# Table column names array
@table_columns = ['primary_key']
puts "Parsing CSV file #{filename}"
parse_file(@filename)
end
# Returns sanitized name from filename.
# Replaces dashes with underscores, removes slashes, removes .csv extension and prepends 'csvimport_'
def self.table_name(filename)
basename = File.basename(filename.to_s).to_s
table_name = "csvimport_#{basename.gsub(/\.csv/, '').gsub(/-/, '_').gsub(/\//, '')}"
table_name
end
# Import CSV data into the database table using the ORM class
def parse_file(filename)
@parsed_file = FasterCSV.read(filename)
analyze_header(@parsed_file.shift)
create_table(CsvImporter.table_name(filename), @table_columns)
n = 0
@parsed_file.each do | row |
hash = row2hash(row)
unless @table_class.first(:primary_key => hash[:primary_key])
instance = @table_class.new(hash)
if instance.save
n+=1
GC.start if n%50 == 0
end
end
end
end
# Converts a row of CSV data to a ruby Hash.
def row2hash(row)
hash = {}
row.size.times do |i|
unless row[i].nil?
hash[ @table_columns[i].to_sym ] = row[i]
end
end
hash
end
# Analyzes CSV header and adds fields to @table_columns array
def analyze_header(header)
header.each do | column |
# strips digit prefixes from CSV header and adds the result to
# table columns
if column
#column = "token_#{column.to_s}" unless column.to_s[0].is_a?(Integer)
@table_columns.push column.to_s.gsub(/^\d+: /,'')
end
end
end
# Automagically creates an ORM class for the import using @table_columns array
def create_table(name, columns)
# creates a new table class with a primary_key property
@table_class = Class.new do
include DataMapper::Resource
property :id, DataMapper::Types::Serial
property :updated_at, DateTime
property :primary_key, String
end
# set table name
@table_class.storage_names[:default] = name
# shift first element off because it is the primary key
pk = columns.shift
columns.each do | column |
# Here, I know all the columns except the primary key are of type Float. You can extend this to add magic for determining the type of data.
@table_class.property column.to_sym, Float, :precision => 11
end
# unshift PK back in place
columns.unshift(pk)
# dont destroy tables we already have
unless @table_class.storage_exists?
@table_class.auto_migrate!
end
end
endThe problem I had after this is that the anonymous object cannot be serialized in a traditionaly way. I decided to circumvent this by implementing a quick and dirty MySQL-specific DESC hack. I readily admit this is unstable, highly experimental code. If you plan to use it for any other purpose than mine, you will probably need to extend it a bit.
class CsvImporter
def self.load_class(name)
@table_class = Class.new do
# Again, these types are known to always be there
include DataMapper::Resource
property :id, DataMapper::Types::Serial
property :updated_at, DateTime
end
@table_class.storage_names[:default] = name
if @table_class.storage_exists?
desc = repository(:default).adapter.query("desc #{name}")
desc.each do |field|
case field.type
when /DateTime/i
klass = DateTime
when /Float/i
klass = Float
else
klass = String
end
klass = DataMapper::Types::Serial if field.id == "id"
if klass == Float
@table_class.property field.id.to_sym, klass, :precision => 11
else
@table_class.property field.id.to_sym, klass
end
puts "Created field with id #{field.id.to_sym}, class: #{klass}"
end
end
@table_class
end
endAnd there you have it. The ability to work with you CSV imported data through a DM class, as if it has always lived in the database. I hope somebody besides myself finds this cool/useful.
Trackbacks
Use the following link to trackback from your own site:
http://blog.aczid.nl/trackbacks?article_id=21










