Link Search Menu Expand Document

PDF to CSV output adds strange  character when opening in Excel

The problem here is that our web service generates CSV files in UTF-8 encoding but Excel cannot detect UTF-8 in CSV files if they don’t have special Byte Order Mark (BOM) in the beginning. Excel opens such CSV in default ANSI encoding. This causes £ character presented in multiple bytes.

This should not be a problem if you import CSV into a database which detect encoding automatically, but if you need to work with files in Excel you should re-save them with the BOM.

The following c# code performs this:

// Read downloaded file into a string
string text = File.ReadAllText(@"c:\temp\ReconciledTransactionsReport (2).csv");
// Re-save with the explicit encoding parameter. This will add the required BOM.
File.WriteAllText(@"c:\temp\fixed.csv", text, Encoding.UTF8);

If you need to avoid temporary files, then, in your c# code, replace this line

webClient.DownloadFile(resultFileUrl, DestinationFile);

with the following:

byte[] bytes = webClient.DownloadData(resultFileUrl);
string text = Encoding.UTF8.GetString(bytes);
File.WriteAllText(DestinationFile, text, Encoding.UTF8);