Dynamic DataMapper objects from imported CSV data
I have been working on a project that required some CSV data to be imported into a database. After I noticed DataMapper classes can be migrated through a class method, the idea of dynamically creating anonymous instances of DataMapper classes for imports occurred to me. In the code below the column types are known, but the column names are not. Here, I know all the columns except the primary key are of type Float. You could extend this example to add magic for determining the type of data, if you need it. This is experimental code, your mileage may vary
class CsvImporter
attr_accessor :table_class
require 'fastercsv'
def initialize(filename)
# CSV filename
@filename = filename
# Table column names array
@table_columns = ['primary_key']
puts "Parsing CSV file #{filename}"
parse_file(@filename)
end
# Returns sanitized name from filename.
# Replaces dashes with underscores, removes slashes, removes .csv extension and prepends 'csvimport_'
def self.table_name(filename)
basename = File.basename(filename.to_s).to_s
table_name = "csvimport_#{basename.gsub(/\.csv/, '').gsub(/-/, '_').gsub(/\//, '')}"
table_name
end
# Import CSV data into the database table using the ORM class
def parse_file(filename)
@parsed_file = FasterCSV.read(filename)
analyze_header(@parsed_file.shift)
create_table(CsvImporter.table_name(filename), @table_columns)
n = 0
@parsed_file.each do | row |
hash = row2hash(row)
unless @table_class.first(:primary_key => hash[:primary_key])
instance = @table_class.new(hash)
if instance.save
n+=1
GC.start if n%50 == 0
end
end
end
end
# Converts a row of CSV data to a ruby Hash.
def row2hash(row)
hash = {}
row.size.times do |i|
unless row[i].nil?
hash[ @table_columns[i].to_sym ] = row[i]
end
end
hash
end
# Analyzes CSV header and adds fields to @table_columns array
def analyze_header(header)
header.each do | column |
# strips digit prefixes from CSV header and adds the result to
# table columns
if column
#column = "token_#{column.to_s}" unless column.to_s[0].is_a?(Integer)
@table_columns.push column.to_s.gsub(/^\d+: /,'')
end
end
end
# Automagically creates an ORM class for the import using @table_columns array
def create_table(name, columns)
# creates a new table class with a primary_key property
@table_class = Class.new do
include DataMapper::Resource
property :id, DataMapper::Types::Serial
property :updated_at, DateTime
property :primary_key, String
end
# set table name
@table_class.storage_names[:default] = name
# shift first element off because it is the primary key
pk = columns.shift
columns.each do | column |
# Here, I know all the columns except the primary key are of type Float. You can extend this to add magic for determining the type of data.
@table_class.property column.to_sym, Float, :precision => 11
end
# unshift PK back in place
columns.unshift(pk)
# dont destroy tables we already have
unless @table_class.storage_exists?
@table_class.auto_migrate!
end
end
endThe problem I had after this is that the anonymous object cannot be serialized in a traditionaly way. I decided to circumvent this by implementing a quick and dirty MySQL-specific DESC hack. I readily admit this is unstable, highly experimental code. If you plan to use it for any other purpose than mine, you will probably need to extend it a bit.
class CsvImporter
def self.load_class(name)
@table_class = Class.new do
# Again, these types are known to always be there
include DataMapper::Resource
property :id, DataMapper::Types::Serial
property :updated_at, DateTime
end
@table_class.storage_names[:default] = name
if @table_class.storage_exists?
desc = repository(:default).adapter.query("desc #{name}")
desc.each do |field|
case field.type
when /DateTime/i
klass = DateTime
when /Float/i
klass = Float
else
klass = String
end
klass = DataMapper::Types::Serial if field.id == "id"
if klass == Float
@table_class.property field.id.to_sym, klass, :precision => 11
else
@table_class.property field.id.to_sym, klass
end
puts "Created field with id #{field.id.to_sym}, class: #{klass}"
end
end
@table_class
end
endAnd there you have it. The ability to work with you CSV imported data through a DM class, as if it has always lived in the database. I hope somebody besides myself finds this cool/useful.
Reversing 2-array axis in Ruby
Recently, I was working on a project that imports some CSV data into a dynamic database table. It needs to sort an array of floats. Along the way coding, I found myself doing something curious:
rows = @table_class.all
rows.each do | row |
key = row.primary_key.to_sym
@matches[key] = []
row.instance_variables.each do | column |
unless ['@id', '@repository','@primary_key','@original_values', '@new_record','@collection', '@updated_at'].include? column
x = row.instance_variable_get(column)
y = column.gsub(/@/, '')
@matches[key] << {:x => x, :y => y}
end
end
@matches[key] = @matches[key].sort_by { |match| match[:y] }
end
@matchesSorting in Ruby! This smells bad. I put the data in a database for this?
The solution
The solution was to reverse the axis of the imported data, thereby enabling MySQL to sort the data for us.
Instead of doing:
n=0
@parsed_file.each do | row |
hash = row2hash(row)
unless @table_class.first(:primary_key => hash[:primary_key])
instance = @table_class.new(hash)
if instance.save
n+=1
GC.start if n%50 == 0
end
end
endWe can parse the file with inversed axis by doing:
values = {}
@parsed_file[0].enum_with_index.map do |primary_key, idx|
if primary_key
pk = primary_key.to_sym
@parsed_file.collect do |row|
if row[0]
values[pk] = {} unless values[pk].is_a?(Hash)
values[pk][row[0].to_sym] = row[idx]
end
end
end
end
n = 0
values.keys.each do |key|
if values[key]
unless @table_class.first(:primary_key => key.to_s)
instance = @table_class.new(values[key])
instance.primary_key = key
if instance.save
n+=1
GC.start if n%50 == 0
end
end
end
endI admit this is totally crazy code, and I don’t expect you to follow along. The rest of the class needed a bit of modifying too, but the first code example above has been simplified to:
@matches[primary_key.to_sym] = @table_class.all(:order => [primary_key.to_sym.desc])Ofcourse this hasn’t hurt performance, either










