Stack Overflow GitHub RSS Feed

The Curious Encoding Caper

Published: September 11, 2019

I came across an interesting thing while working on a project the other day. There was a bug that was logged for the product.

When a customer entered some French into one of our forms(some of the words had special French symbols and accents, i.e.: Contrecœur, sécurité) and submitted the form. The information, as entered, was saved into our database. We could open that record back up in the web application and everything would display as expected. The web application had a button that allows customers to export some data to a csv file and this is where things got interesting. When clicking on the button, the csv file would download and clicking on the file would open it up by default in Excel. The above words were then displayed as: ContrecÅ“ur, sécurité respectively. An internal member of our team noted that they were able to export the same record to csv from SSMS(Sql Server Management Studio) and it would display correctly in Excel.

Looking into the code, the line in question looked as follows: return File(new UTF8Encoding().GetBytes(sb.ToString()), "application/octet-stream", "mycsv.csv");.

Those were the details I had to work with and as all good developers do, I started Bingling. I noticed that if I opened the csv in Notepad++, the characters would display as expected which led me to start thinking there was some issue with Excel. I searched for properly encoding csv’s for Excel and found some interesting and seemingly related posts talking about Excel’s inability to do this and exporting to csv well. This gave me false hope that I was getting close, but throughout it all, I didn’t see anything that seemed to answer the difficulty we were facing.

I changed search tactics a bit and found a bunch of seemingly related Stack Overflow Q&As.

After all of the above, I was able to try a few things and came up with the following change:

Once all was said and done, the fix seemed pretty straight forward and insignificant. I struggle to understand why I couldn’t find any questions/documentation relating to this as once I knew what the fix was, the method in use seems pretty well documented by Microsoft.

Basically, without this Byte Order Mark(BOM) on the file, Excel wasn’t sure what the proper encoding was for the file and was taking an incorrect guess(what it was guessing, I’m not sure :) ).

FWIW, if you are as curious as I am, the BOM in this instance seems to consist of three bytes: 239 187 191 OR 11101111 10111011 10111111 in binary!