I'm trying to:
Delete data from a CSV file using SSIS or C#. If I can't delete data then,
At least delete CSV file using SSIS or C#.
I want to know how to do #1 if it is possible.
If you want to delete all the rows in CSV file by using C#... then follow the below code...
System.IO.File.WriteAllText(#"D:/filename.csv",string.Empty);
it delete all the rows and keep the file as empty...
C#
To remove lines from a flat file, you'll need to read the file and write it out to another file excluding any of the lines you don't want.
Read these two articles, they're short
Reading a Text File One Line at a Time
Writing to a Text File
There may be more clever ways of zipping to a particular location. If it were a fixed width file, then you could calculate where to go and how much to strip but since it's a CSV, you're humped on that accord.
SSIS
I'm assuming the desire to remove the lines is a header or trailer as SSIS before SQL Server 2012 didn't play well with those. In that case, you're going to want to establish 2 connection managers. One will read the entire file in a single column. The second will be your well formatted file output. The same work flow will happen here.
In a data flow task, add a Flat File Source and configure it to use the 1 column connection manager. Add a Derived Column component or a Script Component to enable you to determine whether this row is a keeper or not. Add a Conditional Split off that and shunt only the good rows to a Flat File Destination which uses your strongly typed definition.
Then in your next Data Flow, use a Flat File Source connected to the strongly typed definition and bob's your uncle.
using System.IO;
string path = #"c:\temp\a.txt";
if (File.Exists(path))
{
File.Delete(path);
}
Related
I'm struggling to figure out how to convert a CSV file into a database. I've tried a few methods here but I can't wrap my head around it. I have a CSV file with thousands of rows and I need to convert that into a
SQLite database using C#. Any help is appreciated!
You can leverage MS Excel, open the CSV file and specify your character separator as needed (I believe it will default to tab limited). You can now save your thousands of rows as an Excel spreadsheet versus the character separated file (CSV) format.
Then you can leverage the open source OpenXML libraries to open the Excel document and work with it using object model. In an object oriented fashion, you can programatically create your new database using SQL statements.
Query for the spreadsheet headers to be used as your column names. Of course you'll need to ensure that your source CSV had provided appropriate headers. These can easily be added to the top of the large file if not.
E.g.
https://learn.microsoft.com/en-us/office/open-xml/how-to-get-a-column-heading-in-a-spreadsheet
Next, you simply iterate the rows and construct your SQL statement to insert the rows. You can review the Open XML docs, Microsoft docs, or existing StackOverflow docs for sample code on how this is easily done.
How to read xslx with open XML SDK based on columns in each row in C#?
What I'm really trying to achieve
We have an Excel dashboard which is built to be used in tandem with an Export from our in-house application. The generation of the data Excel file is done with EPPlus and we were convinced that we could use the Excel Dashboard as the "base" file for the generation, so that the exported file contained the dashboard in one worksheet and the dataset in another worksheet.
So the user would receive everything they needed in one single file.
We sort of got going with this thanks to named ranges which dynamically adapt to the size of the dataset, covered here
However, we've found, that since the DataModel in the Excel dashboard file is an OLAP Cube, we can't update the underlying dataset and save it with EPPlus and receive an error about "the cache source is not a worksheet" when trying to save the sheet.
So while we worked on that, we sought out a temporary solution... where we've also hit a snag.
The new problem
The temporary solution we thought of is to distribute the Excel dashboard and the data as two separate Excel files. The Dashboard is distributed to those who need it and the data exports can then be generated from our application.
We thought that the only downside of this would be that it would require the user to rename and place the data file side by side with the Excel Dashboard manually.
However, we are running into an issue where Excel is insisting on using the absolute path to the data file instead of a relative path.
This results in requiring the user to point the source to the data export manually. Which is, apparently, done this way:
Now, I'm showing the entire process to also give some insight into the way we've set it up, since I'm not sure I'm using the right words about the technical details - and perhaps I'm grossly mistaken in the approach.
Overall, the way it works
File A contains the dashboard, some sheets with pivot tables used by the dashboard. The pivot tables are all working off of the aforementioned Data Model in the sheet which is a Cube (we need it to be a cube for some of the functions we use in the pivot tables). The Data Model is based on a named range including all the data in one of the worksheets.
File B is the one generated by our application. The two files were placed side by side when designing the dashboard and mapping the data since I was of the impression that Excel attempted to keep relative file paths when possible. The file consists of a single worksheet containing the exported data.
At this point, the challenge was to automatically pull in data from File B into the worksheet in File A. I did this by going to the Data tab and using the Get Data function to point it to File B and telling Excel to load the data from the specified sheet.
This worked like a charm although the "solution" seems overly complicated.
All was well until we tried to use sheet from another machine/directory. Then we found that the path to File B seemed to be absolute and that the data file could no longer be found.
So, the short question after this long-winded explanation is: "How can I possibly have Excel use a relative path to another file, when importing data from an external file with the "Get Data" function?"
By getting more familiar with Power Query I've been able to come up with a solution to this.
My setup is still as described above.
First, I have a sheet with some "system" values I use various places in the Workbook. I added a field there with the following Excel formula:
=LEFT(CELL("filename");FIND("[";CELL("filename");1)-1)
This provides me with the absolute path to the folder where my File A resides.
I use this value, and concatenate the expected filename of File B which holds the data. The result is an absolute path, pointing to where I expect the data file to be located.
I then added a name range pointing to the exact cell where this value is contained.
Next, I added a new Power Query function like this:
= (rangeName) => Excel.CurrentWorkbook(){[Name=rangeName]}[Content]{0}[Column1]
The function takes the name of a named range as a parameter and spits back the value. I my case I called the function GetValue. This function can now be used in other Power Query scripts.
Finally, I loaded up the Power Query scripts which is responsible for loading the data from the other Excel sheet. In that script, I changed the path of the file to this:
Source = Excel.Workbook(File.Contents(GetValue("FilePath")), null, true),
The single thing to notice here is the call of GetValue("FilePath") for the path parameter of File.Contents. FilePath is the name I gave the range pointing to the cell. All it does, is load the path from my sheet and use that as the path for the Excel sheet holding the data.
A quite convoluted solution, but it works.
I am calling a web service and the data from the web service is in csv format.
If I try to save data in xls/xlsx, then I get multiple sheets in a workbook.
So, how can I save the data in csv with multipletab/sheets in c#.
I know csv with multiple tabs is not practical, but is there any damn way or any library to save data in csv with multiple tabs/sheet?
CSV, as a file format, assumes one "table" of data; in Excel terms that's one sheet of a workbook. While it's just plain text, and you can interpret it any way you want, the "standard" CSV format does not support what your supervisor is thinking.
You can fudge what you want a couple of ways:
Use a different file for each sheet, with related but distinct names, like "Book1_Sheet1", "Book1_Sheet2" etc. You can then find groups of related files by the text before the first underscore. This is the easiest to implement, but requires users to schlep around multiple files per logical "workbook", and if one gets lost in the shuffle you've lost that data.
Do the above, and also "zip" the files into a single archive you can move around. You keep the pure CSV advantage of the above option, plus the convenience of having one file to move instead of several, but the downside of having to zip/unzip the archive to get to the actual files. To ease the pain, if you're in .NET 4.5 you have access to a built-in ZipFile implementation, and if you are not you can use the open-source DotNetZip or SharpZipLib, any of which will allow you to programmatically create and consume standard Windows ZIP files. You can also use the nearly universal .tar.gz (aka .tgz) combination, but your users will need either your program or a third-party compression tool like 7Zip or WinRAR to create the archive from a set of exported CSVs.
Implement a quasi-CSV format where a blank line (containing only a newline) acts as a "tab separator", and your parser would expect a new line of column headers followed by data rows in the new configuration. This variant of standard CSV may not readable by other consumers of CSVs as it doesn't adhere to the expected file format, and as such I would recommend you don't use the ".csv" extension as it will confuse and frustrate users expecting to be able to open it in other applications like spreadsheets.
If I try to save data in xls/xlsx, then I get multiple sheets in a workbook.
Your answer is in your question, don't use text/csv (which most certainly can not do multiple sheets, it can't even do one sheet; there's no such thing as a sheet in text/csv though there is in how some applications like Excel or Calc choose to import it into a format that does have sheets) but save it as xls, xlsx, ods or another format that does have sheets.
Both XLSX and ODS are much more complicated than text/csv, but are each probably the most straightforward of their respective sets of formats.
I've been using this library for a while now,
https://github.com/SheetJS/js-xlsx
in my projects to import data and structure from formats like: xls(x), csv and xml but you can for sure save in that formats as well (all from client)!
Hope that can help you,, take a look on online demo,
http://oss.sheetjs.com/js-xlsx/
peek in source code or file an issue on GH? but I think you will have to do most coding on youre own
I think you want to reduce the size of your excel file. If yes then you can do it by saving it as xlsb i.e., Excel Binary Workbook format. Further, you can reduce your file size by deleting all the blank cells.
I have already written a script to export data from a single CSV file into SQL Server. I do have to provide the path for that CSV.
I'm trying to figure out how do I modify the code so that it checks a particular folder for CSV files, and then starts processing them one by one. After processing each file, it moves the original file to a different location. Thanks in advance.
Update:
I have a console application written that parses the CSV, connects to SQL database and inserts values. But like I said I have to give the file path. I'm looking for a way to provide only a folder name and the application should look for any CSV files in that folder, parses each file, exports data to SQL, once done moves that file to a different folder and then starts with the next file.
For migrate data from csv try bulk insert
http://msdn.microsoft.com/ru-ru/library/ms188365.aspx
For example
bulk insert [tableName] from 'c:\test.csv'
With (
FIELDTERMINATOR =',',
FIRSTROW=1
)
I have a excel template file (C:\Report\Template\abc.xls).
I need to write a C# console application to do following,
copy the abc.xls file from Template folder and save the same template with different name and different folder "Data" ((C:\Report\Data\new_abc.xls)
load the "new_abc.xls" file into memory and write data (comes from database) to it's specific cell (for example i want to write in cell H17)
Please let me suggest or give me link or code to do this. Thanks!
Copying files is easily done with File.Copy. Simply pass in the source path from which to copy and the destination path to copy to as strings.
Take a look at the Microsoft.Office.Interop.Excel classes for manipulating Excel workbooks. Here's a good code sample to get you started. Also this is another overview.