Skip to content

The Basics

Tilo edited this page Jan 29, 2018 · 11 revisions

Vanilla CSV Files

If you have a vanilla CSV file which is comma separated, and has standard line endings.

$ cat  /tmp/test.csv
"CATEGORY " ," FIRST  NAME" , " AGE "
 "Red","John" , " 34 "

Reading and processing the CSV file is straight-forward, and you get an array of hashes containing the data:

data = SmarterCSV.process( filename )
 => [{:category=>"Red", :first_name=>"John", :age=>"34"}]

You will notice that the sample CSV file had a couple of extra spaces, which were stripped off, and the fields from the header line were converted into Ruby symbols. This is the default behavior, assuming that you want to hand this data to an ORM, but this behavior can be overwritten.

Another Vanilla CSV File

This sample file has a few fields empty, and one row without any values.

$ cat /tmp/pets.csv
first name,last name,dogs,cats,birds,fish
Dan,McAllister,2,,,
,,,,,
Lucy,Laweless,,5,,
Miles,O'Brian,,,,21
Nancy,Homes,2,,1,

$ irb
 > require 'smarter_csv'
 > pets_by_owner = SmarterCSV.process('/tmp/pets.csv')
  => [ {:first_name=>"Dan", :last_name=>"McAllister", :dogs=>"2"}, 
       {:first_name=>"Lucy", :last_name=>"Laweless", :cats=>"5"}, 
       {:first_name=>"Miles", :last_name=>"O'Brian", :fish=>"21"}, 
       {:first_name=>"Nancy", :last_name=>"Homes", :dogs=>"2", :birds=>"1"}
     ]
 
 > SmarterCSV.warnings
  => {3=>["No data in line 3"]}
 > SmarterCSV.errors
  => {}

Another default behavior of SmarterCSV is that it will remove any key/value pairs from the result hash if the value is nil or an empty string. The reasoning here is that if you would update a database record containing valid data with and empty string, you would destroy data. SmarterCSV is trying to be safe here, and avoid this scenario by default. But this default behavior can be changed if needed.

You'll also notice that there is a way to get any errors or warnings which may occur during processing. In this case, there was no data in line 3 - all the values were empty.

But my CSV Files are not Comma-Separated

cat /tmp/test2.csv
"CATEGORY";"FIRST--NAME";"AGE"
"Red";"John";"35"

To read this file, we just need to tell SmarterCSV which column-separator col_sep to use.

data = SmarterCSV.process('/tmp/test.csv', {col_sep: ';'})
 => [{:category=>"Red", :first_name=>"John", :age=>"35"}]

Notice how the double-dash becomes an underscore in :first_name.

But I don't want Symbols as Keys

If you don't want symbols as Keys, you can just pass this option in:

data = SmarterCSV.process('/tmp/test.csv', {header_transformations: [:none, :keys_as_strings]})
 => [{"category"=>"Red", "first_name"=>"John", "age"=>"35"}]

Again, the default is to strip whitespaces and downcase the headers, because ORMs in Ruby have lower-case attribute names.

The keyword :none disables any defaults for header_transformations, before we specify :keys_as_strings.

But I want the RAW headers!

data = SmarterCSV.process('/tmp/test.csv', {header_transformations: [:none]})
 => [{"CATEGORY "=>"Red", " FIRST  NAME"=>"John", " AGE "=>"35"}]

Congrats! Now you have to strip those spaces yourself 😛 🎉

Clone this wiki locally