-
-
Notifications
You must be signed in to change notification settings - Fork 190
The Basics
If you have a vanilla CSV file which is comma separated, and has standard line endings.
$ cat /tmp/test.csv
"CATEGORY " ," FIRST NAME" , " AGE "
"Red","John" , " 34 "
Reading and processing the CSV file is straight-forward, and you get an array of hashes containing the data:
data = SmarterCSV.process( filename )
=> [{:category=>"Red", :first_name=>"John", :age=>"34"}]
You will notice that the sample CSV file had a couple of extra spaces, which were stripped off, and the fields from the header line were converted into Ruby symbols. This is the default behavior, assuming that you want to hand this data to an ORM, but this behavior can be overwritten.
This sample file has a few fields empty, and one row without any values.
$ cat /tmp/pets.csv
first name,last name,dogs,cats,birds,fish
Dan,McAllister,2,,,
,,,,,
Lucy,Laweless,,5,,
Miles,O'Brian,,,,21
Nancy,Homes,2,,1,
$ irb
> require 'smarter_csv'
> pets_by_owner = SmarterCSV.process('/tmp/pets.csv')
=> [ {:first_name=>"Dan", :last_name=>"McAllister", :dogs=>"2"},
{:first_name=>"Lucy", :last_name=>"Laweless", :cats=>"5"},
{:first_name=>"Miles", :last_name=>"O'Brian", :fish=>"21"},
{:first_name=>"Nancy", :last_name=>"Homes", :dogs=>"2", :birds=>"1"}
]
> SmarterCSV.warnings
=> {3=>["No data in line 3"]}
> SmarterCSV.errors
=> {}
Another default behavior of SmarterCSV is that it will remove any key/value pairs from the result hash if the value is nil or an empty string. The reasoning here is that if you would update a database record containing valid data with and empty string, you would destroy data. SmarterCSV is trying to be safe here, and avoid this scenario by default. But this default behavior can be changed if needed.
You'll also notice that there is a way to get any errors or warnings which may occur during processing. In this case, there was no data in line 3 - all the values were empty.
cat /tmp/test2.csv
"CATEGORY";"FIRST--NAME";"AGE"
"Red";"John";"35"
To read this file, we just need to tell SmarterCSV which column-separator col_sep
to use.
data = SmarterCSV.process('/tmp/test.csv', {col_sep: ';'})
=> [{:category=>"Red", :first_name=>"John", :age=>"35"}]
Notice how the double-dash becomes an underscore in :first_name
.
If you don't want symbols as Keys, you can just pass this option in:
data = SmarterCSV.process('/tmp/test.csv', {header_transformations: [:none, :keys_as_strings]})
=> [{"category"=>"Red", "first_name"=>"John", "age"=>"35"}]
Again, the default is to strip whitespaces and downcase the headers, because ORMs in Ruby have lower-case attribute names.
The keyword :none
disables any defaults for header_transformations
, before we specify :keys_as_strings
.
data = SmarterCSV.process('/tmp/test.csv', {header_transformations: [:none]})
=> [{"CATEGORY "=>"Red", " FIRST NAME"=>"John", " AGE "=>"35"}]
Congrats! Now you have to strip those spaces yourself 😛 🎉