Importing .csv files to SSIS - c#

For reasons best known to a supplier I am being provided a number of .CSV files in both ANSI and UTF formats and this is confusing my flat file load process. Basically, I cannot rely on the format being the same each time, although the structure is consistent.
So my questions are:
Does the fact that my flat file process has issues with (UTF and ANSI) mean that I have not set up the flat file connection properly?
I believe I have identified two possible solutions, but which of these solutions would be best?
A split by type (ANSI vs UTF) and if so how?
Convert all the .csv files to "excel" as part of a VB/C# script task? For "excel" read any other common format.
Thanks for your help.

Flat file connection manager change code page to 65001 (utf -8)
sample com

Related

C# File open / convert and save

I am very new to C# and hoping this is a simple question. Not finding what I need on google
I have a file C:\test\losses.csv
That I want to open up then convert to an xlsx file and save in a different directory.
Save to
C:\test\Losses.xlsx
The reason for opening the file is the move command does not convert it to xlsx, unfortunately it keeps the same structure as the csv and is unusable in that format.
File.Copy(#"C:\test\losses.csv", #"C:\test1\Losses.xlsx");
The above code works great but still is a csv file (well really a hybrid of one). That is another SAP story.
Any help will be greatly appreciated. Thanks
File.Copy only copies the file - similar to copying a file in DOS or in windows file explorer.
You'd need to translate your CSV to an XLSX file. The format should be pretty straight-forward, but you'll need to do more research:
Load the CSV as a data table
Use the Excel.XlFileFormat.xlOpenXMLWorkbook class to translate the file.
A different StackOverflow problem addresses how to use the xlOpenXMLWorkbook:
Exporting to .xlsx using Microsoft.Office.Interop.Excel SaveAs Error
Hope this helps. Good luck.

Is csv with multi tabs/sheet possible?

I am calling a web service and the data from the web service is in csv format.
If I try to save data in xls/xlsx, then I get multiple sheets in a workbook.
So, how can I save the data in csv with multipletab/sheets in c#.
I know csv with multiple tabs is not practical, but is there any damn way or any library to save data in csv with multiple tabs/sheet?
CSV, as a file format, assumes one "table" of data; in Excel terms that's one sheet of a workbook. While it's just plain text, and you can interpret it any way you want, the "standard" CSV format does not support what your supervisor is thinking.
You can fudge what you want a couple of ways:
Use a different file for each sheet, with related but distinct names, like "Book1_Sheet1", "Book1_Sheet2" etc. You can then find groups of related files by the text before the first underscore. This is the easiest to implement, but requires users to schlep around multiple files per logical "workbook", and if one gets lost in the shuffle you've lost that data.
Do the above, and also "zip" the files into a single archive you can move around. You keep the pure CSV advantage of the above option, plus the convenience of having one file to move instead of several, but the downside of having to zip/unzip the archive to get to the actual files. To ease the pain, if you're in .NET 4.5 you have access to a built-in ZipFile implementation, and if you are not you can use the open-source DotNetZip or SharpZipLib, any of which will allow you to programmatically create and consume standard Windows ZIP files. You can also use the nearly universal .tar.gz (aka .tgz) combination, but your users will need either your program or a third-party compression tool like 7Zip or WinRAR to create the archive from a set of exported CSVs.
Implement a quasi-CSV format where a blank line (containing only a newline) acts as a "tab separator", and your parser would expect a new line of column headers followed by data rows in the new configuration. This variant of standard CSV may not readable by other consumers of CSVs as it doesn't adhere to the expected file format, and as such I would recommend you don't use the ".csv" extension as it will confuse and frustrate users expecting to be able to open it in other applications like spreadsheets.
If I try to save data in xls/xlsx, then I get multiple sheets in a workbook.
Your answer is in your question, don't use text/csv (which most certainly can not do multiple sheets, it can't even do one sheet; there's no such thing as a sheet in text/csv though there is in how some applications like Excel or Calc choose to import it into a format that does have sheets) but save it as xls, xlsx, ods or another format that does have sheets.
Both XLSX and ODS are much more complicated than text/csv, but are each probably the most straightforward of their respective sets of formats.
I've been using this library for a while now,
https://github.com/SheetJS/js-xlsx
in my projects to import data and structure from formats like: xls(x), csv and xml but you can for sure save in that formats as well (all from client)!
Hope that can help you,, take a look on online demo,
http://oss.sheetjs.com/js-xlsx/
peek in source code or file an issue on GH? but I think you will have to do most coding on youre own
I think you want to reduce the size of your excel file. If yes then you can do it by saving it as xlsb i.e., Excel Binary Workbook format. Further, you can reduce your file size by deleting all the blank cells.

Creating Excel File

Well, I didn't find any libs to create Excel file in Windows Phone 7 and the default libs for Excel are not working because they weren't compile for it.
Does any of you guys know how to do this?
Excel is able to open many different kinds of files beyond the .xls or .xlsx. Most common is CSV; it's dead simple but not very capable, and I would avoid it for all but the simplest applications.
A format I've used successfully is the Symbolic Link (SYLK) format. The .slk files open directly in Excel, and you can include cell formatting and formulas. It's easy to save out a file from Excel itself and use it as a template for creating your own files.
You're going to struggle to find a library to do this simply because WP (as of 7.1) doesn't include the System.IO.Packaging namespace, which most libraries will depend on to read/write docx/xlsx/etc files.

Converting IBM DB2 IXF file to CSV or XML

How do I convert an exported IXF file (using db2 export) to a human-readable format, like CSV or XML? I am comfortable with doing it in Python or .NET C#.
The PC/IXF format is fairly complex, and is practically unknown to programs outside of DB2. Writing your own PC/IXF parser just to convert an IXF file directly to some other format might take a while. A faster alternative is to issue an IMPORT command on the DB2 server and specify CREATE INTO instead of INSERT INTO , which will generate a brand new table that can accommodate the contents of the file being imported. This will allow you to run an EXPORT command on the new table to dump the rows to a delimited format.
In case you still want write your own PC/IXF parser, you can take a look at this project, that converts IXF file to JSON
IXF is an old and also well documented file format. It is possible to read and process it, I've done this couple of years ago. So don't let you be discouraged. It's hard, but not too hard for developers.
Today you can find also solutions on GitHub, e.g. ixfcvt

Detect file extension c#

There is a virus that my brother got in his computer and what that virus did was to rename almost all files in his computer. It changed the file extensions as well. so a file that might have been named picture.jpg was renamed to kjfks.doc for example.
so what I have done in order to solve this problem is:
remove all file extensions from files. (I use a recursive method to search for all files in a directory and as I go through the files I remove the extension)
now the files do not have an extension. the files now look like:
I think this file names are stored in a local database created by the virus and if I purchase the anti virus they will be renamed back to their original name.
since my brother created a backup I selected the files that had a creation date latter than when my brother performed the backup. so I have placed that files in a directory.
I am not interested in getting the right extension as long as I can see the content of the file. for example, I will scan each file and if it has text inside I know it will have a .txt extension. maybe it was a .html or .css extension I will not be able to know that I know.
I belive that all pdf files should have something in common. or doc files should also have something in common. How can I figure what the most common types (pdf, doc, docx, png, jpg, etc) files have in common)
Edit:
I know it will probably take less time to go over all this 200 files and test each one instead of creating this program. it is just that I am curios to see if it will be possible to get the file extension.
In unix, you can use file to determine the type of file. There is also a port for windows and you can obviously write a script (batch, powershell, etc.) or C# program to automate this.
First, congratulate your brother on doing a backup. Many people don't, and are absolutely wiped out by these problems.
You're going to have to do a lot of research, I'm afraid, but you're on the right track.
Open each file with a TextReader or a BinaryReader and examine the headers. Most of them are detectable.
For instance: Every PDF starts with "%PDF-" and then its version number. Just look at those first 5 characters. If it's "%PDF-", then put a PDF on the filename and move on.
Similarly: "ÿØÿà..JFIF" for JPEG's, "[InternetShortcut]" for URL shortcuts, "L...........À......Fƒ" for regular shortcuts (the "." is a zero/null, BTW)
ZIPs / Compressed directories start with {0x50}{0x4B]{0x03}{0x04}{0x14}, and you should be aware that Office 2007/2010 documents are really ZIPs with XML files inside of them.
You'll have to do some digging as you find each type, but you should be able to write something to establish most of the file types.
You'll have to write some recursion to work through directories, but you can eliminate any file with no extension.
BTW - A great tool to help pwith this is HxD: http://www.mh-nexus.de/ It's what I used to pull this answer together!
Good luck!
"most common types" each have it's own format and most of them have some magic bytes at the fixed position near beginning of the file. You can detect most of formats quite easily. Even HTML, XML, .CSS and similar text files can be detected by analyzing their beginning. But it will take some time to write an application that will guess the format. For some types (such as ODF format or JAR format, which are built on top of regular ZIPs) you will be also able to detect this format.
But ... Can it be that there exists such application on the market? I guess you can find something if you search, cause the task is not as tricky as it initially seems to be.

Categories

Resources