I have a process in SSIS that outputs SQL table data to CSV format. However, I want the output CSV in CSV (MS-DOS). Is there a way I can convert the normal CSV file to CSV (MS-DOS) ? (Like C# code that would convert the extension/type) . I tried using the option available in visual studio in SSIS, and couldn't find the solution towards it. Your help is appreciated.
By default, the output format is in CSV(Comma delimited, highlighted blue). I want that to be converted to CSV(MS-DOS, highlighted yellow).
If this article is accurate, https://excelribbon.tips.net/T009508_Comma-Delimited_and_MS-DOS_CSV_Variations.html then getting an CSV (MS-DOS) output will be fairly straight-forward
if you have certain special characters in text fields; for example, an accented (foreign language) character. If you export as Windows CSV, those fields are encoded using the Windows-1252 code page. DOS encoding usually uses code page 437, which maps characters used in old pre-Windows PCs.
Then you need to define 2 Flat File Connection Managers. The first will use 1252 (ANSI - Latin I) as your code page and point to C:\ssisdata\input\File.csv. The second will use 437 (OEM - United States) and point to C:\ssisdata\input\DOSFile.csv (this way you create a new file instead of clobbering the existing.)
Your Data Flow then becomes a Flat File Source to Flat File Destination.
I am exporting a file via a http get response, using ASP.NET Web API.
For that, I am returning a FileContentResult object, as in:
return File(Encoding.UTF8.GetBytes(fileContents.ToString()), "text/plain; charset=UTF-8");
After several minutes stucked with encoding issues, I use google's Advanced REST Client to perform the get to the web api controller's action, and the file is being download just ok.
Well, not exactly. I originally wanted it to be sent/downloaded as a .csv file.
If I set the http request content-type to "text/csv" and the File() call sets the response's content type to "text/csv" just as well, Advanced REST Client will show the contents properly, but excel will open it as gibberish data.
If I simply change the content-type to "text/plain", save it as a .txt file (have to rename it after saving, don't know why it is being saved as _.text-plain, while as a csv it is being saved with .csv extension), and finally perform an import in Excel like described here Excel Import Text Wizard, then then excel opens the file correctly.
Why is the .csv being opened as gibberish, while as a .txt it is not ? For opening a .csv, there is no import wizard like with a .txt file (not that I am aware of).
Providing a bit of the source below:
StringBuilder fileContents = new StringBuilder();
//csv header
fileContents.AppendLine(String.Join(CultureInfo.CurrentCulture.TextInfo.ListSeparator, fileData.Select(fileRecord => fileRecord.Name)));
//csv records
foreach (ExportFileField fileField in fileData)
fileContents.AppendLine(fileField.Value);
return File(Encoding.UTF8.GetBytes(fileContents.ToString()), "text/plain; charset=UTF-8");
As requested, the binary contents of both files.
The text-plain (.txt) version (the one that will open in excel, using import):
and the .csv one (the one that excel will open with junk data):
The (files are the same, the cropping of the screen shots was not the same...)
I was able to reproduce the issue by saving a file containing Greek characters with BOM. Double clicking attempts to import the file using the system's locale (Greek). When manually importing, Excel detects the codepage and offers to use the 65001 (UTF8) codepage.
This behavior is strange but not a bug. Text files contain no indication that would help detect their codepage, nor is it possible to guess. An ASCII file containing only A-Z characters saved as 1252 is identical to one saved using 1253. That's why Windows uses the system codepage, which is the local used for all non-Unicode programs and files.
When you double click on a text file, Excel can't ask you for the correct encoding - this could get tedious very quickly. Instead, it opens the file using your regional settings and the system codepage. ASCII files created on your machine are saved using your system's codepage so this behaviour is logical. Files given to you by non-programmers will probably be saved using your country's codepage as well. Programmers typically switch everything to US English and that's how problems start. Your REST client may have saved the text as ASCII using the Latin encoding used by most programmers.
When you import the text file to an empty sheet though, Excel can ask you what to do. It tries to detect the codepage by checking for a BOM or a codepage that may be matching the file's contents and presents the guess in the import dialog box, together with a preview. The decimal and column separators are still those provided by your regional settings (can't guess those). UTF8 is generally easy to guess - the file starts with a BOM or contains NUL entries.
ASCII codepages are harder though. Saving my Greek file as ASCII results in a Japanese guess. That's English humour for you I guess.
To my surprise, trying to perform the request via a browser instead of using google's Advanced REST Client, clicking on the the file that is downloaded just works! Excel opens it correctly. So the problem must be with ARC.
In any case, since the process is not going to be done using an http client other than a browser... my problem is gone. Again, in ARC's output screen the file is displayed correctly. I do not know why upon clicking it to be opened in Excel it "gets corrupted".
Strange.
The binary contents of the file show a correctly utf-8 encoded CSV file with hebrew characters. If,a s you state in the comments, Excel does not allow you to change it's guessed file encoding when opening a CSV file, that is rather a misbehavior in Excel itself (call it a bug if you want).
Your options are: use LibreOffice (http://www.libreoffice.org/) which spreadsheet component does allow you to customize the settings for opening a CSV file.
Another one is to write a small program to explicitely convert your file to the encoding excel is expecting - if you have a Python3 interpreter installed, you could for example type:
python -c "open('correct.csv', 'wt', encoding='cp1255').write(open('utf8.csv', encoding='utf8').read())"
However, if your default Windows encoding is not cp1255 for handling Hebrew, as I suppose above, that won't help excel, but to give you different gibberish :-) In that case, you should resort to use programs that can correctly deal with different encodings.
(NB. there is a Python call to return the default system encoding in Windows, but I forgot which it is, and it is not easily googleable)
I am having a file name "Connecticut is now 2 °C.txt" which contains a unicode but the file contents are just normal characters. Previously the code was used to identify whether the file name has unicode if so the file header was written with the unicode details. This way of implementation leads to conflict in the output file. So can anyone suggest how to find whether the file stream has an unicode in it.
Thanks in advance,
Lokesh.
By far the simplest strategy is to decide on an encoding for a particular file, e.g. UTF-8, and use it exclusively, both when you write it and then when you read it. Trying to detect what encoding is in use is decidedly error prone so it's best not to have to do this detection.
UPDATE
In the comments below you clarify that you wish to write to a file that is created by somebody else with an unknown encoding.
In full generality this is impossible to do with 100% reliability.
If you are lucky then you may find that the file comes with a Byte Order Mark (BOM). In which case you can read the BOM and thus infer the encoding. There's no requirement for a text file to contain a BOM and they frequently don't.
However, I would urge you to agree an interchange format with whoever is creating these files. Pick a single encoding and always use it.
I think this link would be helpful for you. Pay attention to IsTextUnicode Function
I have the following CSV file that is used in my data-driven unit test:
File;expected
Resources.resx;default
Resources.de.resx;de
AttachmentDetail.ascx.it.resx;it
SomeOtherFile.rm-CH.resx;rm-CH
"File" and "expected" are the header. But if I want to get the "File"-column in the code like
TestContext.DataRow["File"].ToString();
I get the error
System.ArgumentException: Column
'File' does not belong to table ..
When I add the CSV file to an existing test-case over the test-method properties, it seems as if the "File"-column has some strange signs before its name, much like an encoding problem. But if I open the CSV file with Notepad, Notepad++ or even TextMate (on Mac) I don't see any such signs and I'm not able to get rid of them.
Can someone give me a suggestion about this problem?
If you edit the CSV file in the VS2008 you can set the way the CSV file will be saved:
"File\Advanced Save Options..."
In the "Encoding:" drop down the default will be UTF-8. Change it to "Wester European (DOS) - Codepage 850". Leave the Line endings selection alone.
What encoding is the file being saved?
From what I know, UTF-8 saved in windows notepad puts up some strange symbols in front of the file to know exactly what encoding was used when it didn't find any symbols that are needed for UTF8 (assuming everything is just plain ascii)
Did you edit the file using Notepad++ and saved it? I would try doing that and compare the results.
My program reads a CSV file that contains hebrew text, it then displays the values in a form but the text is unredable. What am I doing wrong?
Thanks
James
Possible options for what you're doing wrong:
Reading the file with the wrong encoding
Using a font that doesn't support Hebrew
Using a control that doesn't support right-to-left
How are you reading the file? If you look at the data in the debugger, does it seem correct? Do you know what encoding the file is in to start with?
See my Debugging Unicode Problems for some suggestions - although they won't help with any right-to-left issues. (I'm afraid I don't know much about bidi displays.)