Data-driven unit testing - Problem with CSV encoding?

Data-driven unit testing - Problem with CSV encoding? - c#

I have the following CSV file that is used in my data-driven unit test:
File;expected
Resources.resx;default
Resources.de.resx;de
AttachmentDetail.ascx.it.resx;it
SomeOtherFile.rm-CH.resx;rm-CH
"File" and "expected" are the header. But if I want to get the "File"-column in the code like
TestContext.DataRow["File"].ToString();
I get the error
System.ArgumentException: Column
'File' does not belong to table ..
When I add the CSV file to an existing test-case over the test-method properties, it seems as if the "File"-column has some strange signs before its name, much like an encoding problem. But if I open the CSV file with Notepad, Notepad++ or even TextMate (on Mac) I don't see any such signs and I'm not able to get rid of them.
Can someone give me a suggestion about this problem?

If you edit the CSV file in the VS2008 you can set the way the CSV file will be saved:
"File\Advanced Save Options..."
In the "Encoding:" drop down the default will be UTF-8. Change it to "Wester European (DOS) - Codepage 850". Leave the Line endings selection alone.

What encoding is the file being saved?
From what I know, UTF-8 saved in windows notepad puts up some strange symbols in front of the file to know exactly what encoding was used when it didn't find any symbols that are needed for UTF8 (assuming everything is just plain ascii)
Did you edit the file using Notepad++ and saved it? I would try doing that and compare the results.

Related

Convert CSV to CSV MS-Dos Extension

I have a process in SSIS that outputs SQL table data to CSV format. However, I want the output CSV in CSV (MS-DOS). Is there a way I can convert the normal CSV file to CSV (MS-DOS) ? (Like C# code that would convert the extension/type) . I tried using the option available in visual studio in SSIS, and couldn't find the solution towards it. Your help is appreciated.
By default, the output format is in CSV(Comma delimited, highlighted blue). I want that to be converted to CSV(MS-DOS, highlighted yellow).

If this article is accurate, https://excelribbon.tips.net/T009508_Comma-Delimited_and_MS-DOS_CSV_Variations.html then getting an CSV (MS-DOS) output will be fairly straight-forward
if you have certain special characters in text fields; for example, an accented (foreign language) character. If you export as Windows CSV, those fields are encoded using the Windows-1252 code page. DOS encoding usually uses code page 437, which maps characters used in old pre-Windows PCs.
Then you need to define 2 Flat File Connection Managers. The first will use 1252 (ANSI - Latin I) as your code page and point to C:\ssisdata\input\File.csv. The second will use 437 (OEM - United States) and point to C:\ssisdata\input\DOSFile.csv (this way you create a new file instead of clobbering the existing.)
Your Data Flow then becomes a Flat File Source to Flat File Destination.

Web API action returns FileContentResult that, if saved as .csv, will open as gibberish , while if as .txt, is ok. Why?

I am exporting a file via a http get response, using ASP.NET Web API.
For that, I am returning a FileContentResult object, as in:
return File(Encoding.UTF8.GetBytes(fileContents.ToString()), "text/plain; charset=UTF-8");
After several minutes stucked with encoding issues, I use google's Advanced REST Client to perform the get to the web api controller's action, and the file is being download just ok.
Well, not exactly. I originally wanted it to be sent/downloaded as a .csv file.
If I set the http request content-type to "text/csv" and the File() call sets the response's content type to "text/csv" just as well, Advanced REST Client will show the contents properly, but excel will open it as gibberish data.
If I simply change the content-type to "text/plain", save it as a .txt file (have to rename it after saving, don't know why it is being saved as _.text-plain, while as a csv it is being saved with .csv extension), and finally perform an import in Excel like described here Excel Import Text Wizard, then then excel opens the file correctly.
Why is the .csv being opened as gibberish, while as a .txt it is not ? For opening a .csv, there is no import wizard like with a .txt file (not that I am aware of).
Providing a bit of the source below:
StringBuilder fileContents = new StringBuilder();
//csv header
fileContents.AppendLine(String.Join(CultureInfo.CurrentCulture.TextInfo.ListSeparator, fileData.Select(fileRecord => fileRecord.Name)));
//csv records
foreach (ExportFileField fileField in fileData)
fileContents.AppendLine(fileField.Value);
return File(Encoding.UTF8.GetBytes(fileContents.ToString()), "text/plain; charset=UTF-8");
As requested, the binary contents of both files.
The text-plain (.txt) version (the one that will open in excel, using import):
and the .csv one (the one that excel will open with junk data):
The (files are the same, the cropping of the screen shots was not the same...)

I was able to reproduce the issue by saving a file containing Greek characters with BOM. Double clicking attempts to import the file using the system's locale (Greek). When manually importing, Excel detects the codepage and offers to use the 65001 (UTF8) codepage.
This behavior is strange but not a bug. Text files contain no indication that would help detect their codepage, nor is it possible to guess. An ASCII file containing only A-Z characters saved as 1252 is identical to one saved using 1253. That's why Windows uses the system codepage, which is the local used for all non-Unicode programs and files.
When you double click on a text file, Excel can't ask you for the correct encoding - this could get tedious very quickly. Instead, it opens the file using your regional settings and the system codepage. ASCII files created on your machine are saved using your system's codepage so this behaviour is logical. Files given to you by non-programmers will probably be saved using your country's codepage as well. Programmers typically switch everything to US English and that's how problems start. Your REST client may have saved the text as ASCII using the Latin encoding used by most programmers.
When you import the text file to an empty sheet though, Excel can ask you what to do. It tries to detect the codepage by checking for a BOM or a codepage that may be matching the file's contents and presents the guess in the import dialog box, together with a preview. The decimal and column separators are still those provided by your regional settings (can't guess those). UTF8 is generally easy to guess - the file starts with a BOM or contains NUL entries.
ASCII codepages are harder though. Saving my Greek file as ASCII results in a Japanese guess. That's English humour for you I guess.

To my surprise, trying to perform the request via a browser instead of using google's Advanced REST Client, clicking on the the file that is downloaded just works! Excel opens it correctly. So the problem must be with ARC.
In any case, since the process is not going to be done using an http client other than a browser... my problem is gone. Again, in ARC's output screen the file is displayed correctly. I do not know why upon clicking it to be opened in Excel it "gets corrupted".
Strange.

The binary contents of the file show a correctly utf-8 encoded CSV file with hebrew characters. If,a s you state in the comments, Excel does not allow you to change it's guessed file encoding when opening a CSV file, that is rather a misbehavior in Excel itself (call it a bug if you want).
Your options are: use LibreOffice (http://www.libreoffice.org/) which spreadsheet component does allow you to customize the settings for opening a CSV file.
Another one is to write a small program to explicitely convert your file to the encoding excel is expecting - if you have a Python3 interpreter installed, you could for example type:
python -c "open('correct.csv', 'wt', encoding='cp1255').write(open('utf8.csv', encoding='utf8').read())"
However, if your default Windows encoding is not cp1255 for handling Hebrew, as I suppose above, that won't help excel, but to give you different gibberish :-) In that case, you should resort to use programs that can correctly deal with different encodings.
(NB. there is a Python call to return the default system encoding in Windows, but I forgot which it is, and it is not easily googleable)

How to get the version and the encoding of a dbf file

I have a dbf file which contains text fields filled with Russian text.
How to find which particular encoding has been used in this file? Also, how to get whether it is dBase III, IV, 5.0, or it is FoxPro?
Thanks!

If you can open the table, SYS(2029) will give you some clues about its origin. If you can't open it, look with a hex editor at the first byte of the file. The value there will tell you where it comes from. This article shows at least some of the values: http://msdn.microsoft.com/en-US/library/sswxxbea%28v=vs.80%29.aspx
Tamar

"Data at the root level is invalid" error when adding ResourceDictionary [duplicate]

I have an XSD file that is encoded in UTF-8, and any text editor I run it through doesn't show any character at the beginning of the file, but when I pull it up in Visual Studio's debugger, I clearly see an empty box in front of the file.
I also get the error:
Data at the root level is invalid. Line 1, position 1.
Anyone know what this is?
Update: Edited post to qualify type of file. It's an XSD file created by Microsoft's XSD creator.

It turns out, the answer is that what I'm seeing is a Byte Order Mark, which is a character that tells whatever is loading the document what it is encoded in. In my case, it's encoded in utf-8, so the corresponding BOM was EF BB BF, as shown below. To remove it, I opened it up in Notepad++ and clicked on "Encode in UTF-8 without BOM", as shown below:
.
To actually see the BOM, I had to open it up in TextPad in Binary mode:, and conducted a Google search for "EF BB BF".
It took me about 8 hours to find out this was what was causing it, so I thought I'd share this with everyone.
Update: If I had read Joel Spolsky's blog post: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!), then I might not have had this problem.

here's how you do it with vim:
# vim file.xml
:set nobomb
:wq

C# Text File upload and download issue

Okay so I have an application which uploads a text file to a web server and all works fine.
However, a line magically appears on the text file when it is downloaded
example:
textfile contains = Hello World
downloaded textfile contains = //notice the blank line here
Hello World
Normally this wouldnt be a problem as I would just create a temp file and delete the line.
However, as the text file contains encrypted data and if I create a new temp file to delete the line it completely messes with the encrypted text and creates
"Bad Data" and "length of data to decrypt is invalid" errors.
Im almost 100% sure its not my encryption algorithm as the text files are output before they are uploaded and it works fine on the non uploaded text files.
If you guys could help me that would be awesome. Any work around will do (no matter how horrible / nasty it is).

Does the server and client run the same family operating system? I'm thinking that this may be due to newline sequence differences, and uploading and downloading in different modes (text/binary).
If the data is encrypted or cryptographically signed, you want to do everything you can to make sure the transfers are done in binary mode.

What does the download code look like?
Making a wild guess: you are Response.Write()ing the text, without a Response.Clear() to clear any "aspx text". Plus you need that code to end on a Response.End() to prevent further additions to the text.

It looks like your encryption algorithm is appending your text with null terminated string.
Try loading the text file on you webserver in a byte array and see if last byte is '\0'.

There are two reasons something like this can happen.
You are making some changes on upload(like parsing the text and some amount of data manipulation,where you introduce this line)
You are readingthe file and manipulating it before you download it...
Check both the code and post some samples if you are actually manipulating it. I have uploaded files using c# and it works fine.
You should check Hanselman's blog for a simple upload application...It is straight forward.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Data-driven unit testing - Problem with CSV encoding? - c#

If you edit the CSV file in the VS2008 you can set the way the CSV file will be saved: "File\Advanced Save Options..." In the "Encoding:" drop down the default will be UTF-8. Change it to "Wester European (DOS) - Codepage 850". Leave the Line endings selection alone.

Related

Convert CSV to CSV MS-Dos Extension

Web API action returns FileContentResult that, if saved as .csv, will open as gibberish , while if as .txt, is ok. Why?

How to get the version and the encoding of a dbf file

"Data at the root level is invalid" error when adding ResourceDictionary [duplicate]

C# Text File upload and download issue

Categories

Resources