I have a process in SSIS that outputs SQL table data to CSV format. However, I want the output CSV in CSV (MS-DOS). Is there a way I can convert the normal CSV file to CSV (MS-DOS) ? (Like C# code that would convert the extension/type) . I tried using the option available in visual studio in SSIS, and couldn't find the solution towards it. Your help is appreciated.
By default, the output format is in CSV(Comma delimited, highlighted blue). I want that to be converted to CSV(MS-DOS, highlighted yellow).
If this article is accurate, https://excelribbon.tips.net/T009508_Comma-Delimited_and_MS-DOS_CSV_Variations.html then getting an CSV (MS-DOS) output will be fairly straight-forward
if you have certain special characters in text fields; for example, an accented (foreign language) character. If you export as Windows CSV, those fields are encoded using the Windows-1252 code page. DOS encoding usually uses code page 437, which maps characters used in old pre-Windows PCs.
Then you need to define 2 Flat File Connection Managers. The first will use 1252 (ANSI - Latin I) as your code page and point to C:\ssisdata\input\File.csv. The second will use 437 (OEM - United States) and point to C:\ssisdata\input\DOSFile.csv (this way you create a new file instead of clobbering the existing.)
Your Data Flow then becomes a Flat File Source to Flat File Destination.
Related
I have a SQL Server 2008 database from an application which stores office file templates.
The files in the database are stored in hex format (0x504B030414000600...).
With a file signature table (https://www.garykessler.net/library/file_sigs.html), I can find out which file format it is: Microsoft Office Open XML Format Documents (OOXML, like DOCX, PPTX, XLSX ...).
How can I export/convert these hex strings into the original files?
Maybe with C# ...
With the application itself, I can only export 1 file at a time. It would take days to do this with all files (about 1000).
Thank you
Export column with files from SQL Server to single file (it may be very large, but it shouldn't matter). You can for example right click and export results to CSV file.
Write simple C# console application:
use CSV parser to loop over data
inside loop you can make simple if statements to determine document file format
in every iteration convert hex format to blob file and save it somewhere on your drive
I am exporting a file via a http get response, using ASP.NET Web API.
For that, I am returning a FileContentResult object, as in:
return File(Encoding.UTF8.GetBytes(fileContents.ToString()), "text/plain; charset=UTF-8");
After several minutes stucked with encoding issues, I use google's Advanced REST Client to perform the get to the web api controller's action, and the file is being download just ok.
Well, not exactly. I originally wanted it to be sent/downloaded as a .csv file.
If I set the http request content-type to "text/csv" and the File() call sets the response's content type to "text/csv" just as well, Advanced REST Client will show the contents properly, but excel will open it as gibberish data.
If I simply change the content-type to "text/plain", save it as a .txt file (have to rename it after saving, don't know why it is being saved as _.text-plain, while as a csv it is being saved with .csv extension), and finally perform an import in Excel like described here Excel Import Text Wizard, then then excel opens the file correctly.
Why is the .csv being opened as gibberish, while as a .txt it is not ? For opening a .csv, there is no import wizard like with a .txt file (not that I am aware of).
Providing a bit of the source below:
StringBuilder fileContents = new StringBuilder();
//csv header
fileContents.AppendLine(String.Join(CultureInfo.CurrentCulture.TextInfo.ListSeparator, fileData.Select(fileRecord => fileRecord.Name)));
//csv records
foreach (ExportFileField fileField in fileData)
fileContents.AppendLine(fileField.Value);
return File(Encoding.UTF8.GetBytes(fileContents.ToString()), "text/plain; charset=UTF-8");
As requested, the binary contents of both files.
The text-plain (.txt) version (the one that will open in excel, using import):
and the .csv one (the one that excel will open with junk data):
The (files are the same, the cropping of the screen shots was not the same...)
I was able to reproduce the issue by saving a file containing Greek characters with BOM. Double clicking attempts to import the file using the system's locale (Greek). When manually importing, Excel detects the codepage and offers to use the 65001 (UTF8) codepage.
This behavior is strange but not a bug. Text files contain no indication that would help detect their codepage, nor is it possible to guess. An ASCII file containing only A-Z characters saved as 1252 is identical to one saved using 1253. That's why Windows uses the system codepage, which is the local used for all non-Unicode programs and files.
When you double click on a text file, Excel can't ask you for the correct encoding - this could get tedious very quickly. Instead, it opens the file using your regional settings and the system codepage. ASCII files created on your machine are saved using your system's codepage so this behaviour is logical. Files given to you by non-programmers will probably be saved using your country's codepage as well. Programmers typically switch everything to US English and that's how problems start. Your REST client may have saved the text as ASCII using the Latin encoding used by most programmers.
When you import the text file to an empty sheet though, Excel can ask you what to do. It tries to detect the codepage by checking for a BOM or a codepage that may be matching the file's contents and presents the guess in the import dialog box, together with a preview. The decimal and column separators are still those provided by your regional settings (can't guess those). UTF8 is generally easy to guess - the file starts with a BOM or contains NUL entries.
ASCII codepages are harder though. Saving my Greek file as ASCII results in a Japanese guess. That's English humour for you I guess.
To my surprise, trying to perform the request via a browser instead of using google's Advanced REST Client, clicking on the the file that is downloaded just works! Excel opens it correctly. So the problem must be with ARC.
In any case, since the process is not going to be done using an http client other than a browser... my problem is gone. Again, in ARC's output screen the file is displayed correctly. I do not know why upon clicking it to be opened in Excel it "gets corrupted".
Strange.
The binary contents of the file show a correctly utf-8 encoded CSV file with hebrew characters. If,a s you state in the comments, Excel does not allow you to change it's guessed file encoding when opening a CSV file, that is rather a misbehavior in Excel itself (call it a bug if you want).
Your options are: use LibreOffice (http://www.libreoffice.org/) which spreadsheet component does allow you to customize the settings for opening a CSV file.
Another one is to write a small program to explicitely convert your file to the encoding excel is expecting - if you have a Python3 interpreter installed, you could for example type:
python -c "open('correct.csv', 'wt', encoding='cp1255').write(open('utf8.csv', encoding='utf8').read())"
However, if your default Windows encoding is not cp1255 for handling Hebrew, as I suppose above, that won't help excel, but to give you different gibberish :-) In that case, you should resort to use programs that can correctly deal with different encodings.
(NB. there is a Python call to return the default system encoding in Windows, but I forgot which it is, and it is not easily googleable)
What is the difference between these two file format.
i found this from Here
.txt File:
This is a plain text file which can be opened using a notepad present on all desktop PCs running MS Windows any version. You can store any type of text in this file. There is no limitation of what so ever text format. Due to ease of use for end users many daily data summery providers use .txt files. These files contain data which is properly comma seperated.
.csv File: abreviation of "comma seperated values"
This is a special file extension commonly used by MS Excel. Basically this is also a plain text file but with a limitation of comma seperated values. Normally when you double click this type of file it will open in MS Excel. If you do not have MS Excel installed on your computer or you find Notepad easy to use then you also can open this file in a notepad by right clicking the file and from the menu select "Open With" and then choose notepad.
My Question :
what does means comma seperated value?
if i'm going to create .csv file using c#, does i need to write file using StreamWriter and does it need to only change the the extention to .csv?
if so do i need to change the writing string with commas?
thanx....
what does means comma seperated value?
Values separated by Comma, for example.
Name,Id,3,Address
if i'm going to create .csv file using c#, does i need to write file
using StreamWriter and does it need to only change the the extention
to .csv?
Changing extension of the file will help you in opening it in MS Excel, other than that it can be anything and you can still open it through your code (StreamReader)
if so do i need to change the writing string with commas?
Yes, separate your values with Comma, or you can use any other delimiter you like. It can be semicolon ; as well since , in some languages/cultures is used for decimal separator.
CSV is structured like this:
"value","value1,"value2"
A text file can be anything from delimited, to free form , fixed width, jagged right, etc...
CSV files can be a pain in the ass if you have commas in your data, and don't properly qualify the values.
I typically create tab delimited or pipe delimited files.
From the perspective of programming, file extensions do not make a difference. In fact you may write comma seperated values inside a txt file.
Comma seperated values indicates the values are just seperated with commas; this is helpful if you want to store some data and share it accross multiple systems (on the otherhand XML is a better option).
Assume you need to store name, age and location;
TilT,25,Germany
is a comma seperated data.
In the scope of c#, you need to add commas between your values and you may save it as a CSV file or a TXT file; it makes no difference.
I have a C++ program that sends data via FTP via ASCII mode to an IBM Mainframe. I am now doing this via C#.
When it gets there and viewed the file looks like garbage.
I cannot see anything in the C++ code that does anything special to encode the file into something like EPCDIC. When the C++ files are sent they are viewed ok. The only thing I see different is \015 & \012 for line feeds whereas C# is using \r\n.
Would these characters have an effect and if so how can I get my C# app to use \015?
Do I have to do any special encoding to make it appear ok?
It sounds like you should indeed be using an EBCDIC encoding, and then probably transferring the text in binary. I have an EBCDIC encoding class you can use, should you wish.
Note that \015\012 is \r\n - they're characters 13 and 10 in decimal, just different ways of representing them. If you think the C++ code really is producing the same files as C#, compare two files which should be the same in a binary file editor.
Make sure you have the TYPE TEXT instead of TYPE BINARY command before you transfer the file.
If you are truly sending the files in ASCII mode, then the mainframe itself will convert that to EBCDIC (it's receiver-makes-good).
The fact that you're getting apparent garbage at the mainframe end, and character codes \015 and \012 (which are CR and LF respectively) means that you're not transferring in ASCII mode.
As an aside, the ISPF editor has been able to view ASCII data sets for quite a few versions now. Open up the file and enter the commands source ascii and lf.
The first renders converts the characters from ASCII to EBCDIC so you can see what they are, the second goes through and pads out "lines" so that linefeed markers are replaced with enough spaces to reach the record length.
Invaluable commands when dealing with mixed-encoding environments, which is where I do a lot of my work.
I have the following CSV file that is used in my data-driven unit test:
File;expected
Resources.resx;default
Resources.de.resx;de
AttachmentDetail.ascx.it.resx;it
SomeOtherFile.rm-CH.resx;rm-CH
"File" and "expected" are the header. But if I want to get the "File"-column in the code like
TestContext.DataRow["File"].ToString();
I get the error
System.ArgumentException: Column
'File' does not belong to table ..
When I add the CSV file to an existing test-case over the test-method properties, it seems as if the "File"-column has some strange signs before its name, much like an encoding problem. But if I open the CSV file with Notepad, Notepad++ or even TextMate (on Mac) I don't see any such signs and I'm not able to get rid of them.
Can someone give me a suggestion about this problem?
If you edit the CSV file in the VS2008 you can set the way the CSV file will be saved:
"File\Advanced Save Options..."
In the "Encoding:" drop down the default will be UTF-8. Change it to "Wester European (DOS) - Codepage 850". Leave the Line endings selection alone.
What encoding is the file being saved?
From what I know, UTF-8 saved in windows notepad puts up some strange symbols in front of the file to know exactly what encoding was used when it didn't find any symbols that are needed for UTF8 (assuming everything is just plain ascii)
Did you edit the file using Notepad++ and saved it? I would try doing that and compare the results.