SSIS - Fixed number of columns in flat file source - c#

I have a number of text files in a directory which have a set number of columns [6] separated by tabs. I read this into an SSIS package using a 'Flat File Source' block. If a file has more columns than the required number or if data is missing from any of the columns, I want to reject this file.
I have done some testing with various sample files. Whenever I add additional columns, the program accepts these files. It throws an error when there are less columns which is good.
But, is there a way of specifying that the file must have a certain amount of columns and that data must be present in each column?
I don't have much experience with SSIS so I would appreciate any suggestions.
Thanks

I would use a Script Task to do this.
You can use System.IO.StreamReader to open the file and read your header row, and then perform whatever validation you need on the resulting string.
I would also create a Boolean variable in the SSIS package, called something like 'FileIsValid', to which I would write (from the Script Task) True if the conditions are met, and False if they aren't. I would then use this to direct the package flow using precedence constraints.
Something like this:
public void Main()
{
System.IO.StreamReader reader = null;
try
{
Dts.Variables["User::FileIsValid"].Value = false;
reader = new System.IO.StreamReader(Dts.Variables["User::Filepath"].Value.ToString());
string header = reader.ReadLine();
if (header.Trim() == "Column1\tColumn2\tColumn3\tColumn4\tColumn5\tColumn6")
Dts.Variables["User::FileIsValid"].Value = true;
reader.Close();
reader.Dispose();
Dts.TaskResult = (int)ScriptResults.Success;
}
catch
{
if (reader != null)
{
reader.Close();
reader.Dispose();
}
throw;
}
}
With regards to checking there is data in all columns, does this need to be for every row?
You could continue reading the lines with StreamReader and use regular expressions to check for something like this.

Expanding on Chris Mack:
If files do not have headers you can do a count.
char[] delim = new char[] {'\t'};
if(header.Split(delim).Length() == 5)
...

Related

How can I have an optional header row when reading a CSV file?

I need to import a CSV file that may or may not have a header record.
If I read a file that doesn't have a header row it assumes the first data row is the header and doesn't return it.
If I specify HasHeaderRecord = false it will throw an exception when there is a header record.
Is there a way to use the CsvHelper library and have an optional header record?
I can get this to work using this approach, but it seems like there could be a better way:
csvReader.Configuration.HasHeaderRecord = false;
while (csvReader.Read())
{
try
{
var record = csvReader.GetRecord<MyModel>();
myRecordList.Add(record);
}
catch (ReaderException rex)
{
// Check if this is an error with no header record and ignore
if (rex.ReadingContext.CurrentIndex == 1 &&
rex.ReadingContext.RawRecord.ToLower().Contains("myHeaderColumnName"))
{
continue;
}
}
}
I'm not sure if this is the best way, but it does bypass throwing an exception.
public static void Main(string[] args)
{
using (var reader = new StreamReader("path\\to\\file.csv"))
using (CsvReader csv = new CsvReader(reader))
{
csv.Read();
csv.ReadHeader();
if (!csv.Context.HeaderRecord.Contains("myHeaderColumnName"))
{
csv.Configuration.HasHeaderRecord = false;
reader.BaseStream.Position = 0;
}
var records = csv.GetRecords<MyModel>().ToList();
}
}
I think there is no built in way to know for the csvReader. There are two ways to know:
The information "Header row yes/no" is provided by the user.
You implement a detection logic yourself by reading the first few lines and check for a few properties. Eg. the content type of a few columns.
In my opinion the information should be user provided or the source of the file should meet a standard to always provide a header row or never provide a header row.

How to remove all lines in a file, then rewrite the file in Compact Framework 3.5 c#

In the .net framework using a Windows Forms app I can purge a file, then write the data that I want back to into that file.
Here is the code that I use in Windows Forms:
var openFile = File.OpenText(fullFileName);
var fileEmpty = openFile.ReadLine();
if (fileEmpty != null)
{
var lines = File.ReadAllLines(fullFileName).Skip(4); //Will skip the first 4 then rewrite the file
openFile.Close();//Close the reading of the file
File.WriteAllLines(fullFileName, lines); //Reopen the file to write the lines
openFile.Close();//Close the rewriting of the file
}
openFile.Close();
openFile.Dispose();
I am trying to do the same thing the compact framework. I can keep the lines that I want, and then delete all the lines in the file. However I am not able to rewrite the file.
Here is my compact framework code:
var sb = new StringBuilder();
using (var sr = new StreamReader(fullFileName))
{
// read the first 4 lines but do nothing with them; basically, skip them
for (int i = 0; i < 4; i++)
sr.ReadLine();
string line1;
while ((line1 = sr.ReadLine()) != null)
{
sb.AppendLine(line1);
}
}
string allines = sb.ToString();
openFile.Close();//Close the reading of the file
openFile.Dispose();
//Reopen the file to write the lines
var writer = new StreamWriter(fullFileName, false); //Don't append!
foreach (char line2 in allines)
{
writer.WriteLine(line2);
}
openFile.Close();//Close the rewriting of the file
}
openFile.Close();
openFile.Dispose();
Your code
foreach (char line2 in allines)
{
writer.WriteLine(line2);
}
is writing out the characters of the original file, each on a separate line.
Remember, allines is a single string that happens to have Environment.NewLine between the original strings of the file.
What you probably intend to do is simply
writer.WriteLine(allines);
UPDATE
You are closing openFile a number of times (you should only do this once), but you are not flushing or closing your writer.
Try
using (var writer = new StreamWriter(fullFileName, false)) //Don't append!
{
writer.WriteLine(allines);
}
to ensure the writer is disposed and therefore flushed.
If you plan to do this to have something like a "rotating" buffer for a log file consider that most Windows CE devices uses flash as storage media and your approach will generate a full re-write of the whole file (whole - 4 lines) every time. If this happens quite often (every few seconds) this may wear our the flash, reaching its maximum number of erase cycles quickly (quickly may mean a few weeks or months).
An alternative approach would be rename the old log file when it has reached the maximum size (deleting any existing file with the same name) and create a new one.
In this was you logging info would be split on two files but you'll always append to the existing files, limiting the number of writes you perform. Also renaming or deleting a file aren't heavy operations from the point of view of a flash file system.

Deleting a file exception, deleting a line of text

I am trying to implement some basic config-file read/write/edit-funtctionality in a class.
The config-file is stored in the file [...]\config.txt, which is stored in the 'path'-variable.
Syntax of my configfile is as follows:
param0 value_of_it
param1 value_of_it
something_else you_get_it
Everything left of the first space is the name by which the parameter is to be found, the rest of the line is considered the value of that param.
So much for context.
Now, when I am trying to delete that file using file.delete I get:
"System.IO.IOException: Cannot access file, because it is being used by another process".
Here is the code in question:
internal void alter(string param, bool replace = false, string value = null) {
//here
string buf;
System.IO.StreamReader reader = new System.IO.StreamReader(path);
string bufPath = path.Substring(0, path.Length-4)+"_buf.txt";
System.IO.StreamWriter writer = new StreamWriter(bufPath);
while((buf = reader.ReadLine()) != null) {
if(buf.Substring(0, param.Length) != param)
writer.WriteLine(buf);
else if(replace)
writer.WriteLine(param+" "+value);
}
writer.Dispose();
reader.Dispose();
writer.Close();
reader.Close();
//there
File.Delete(path);
File.Move(bufPath, path);
}
The function copies the original file line by line into the buffer, except for the param specified in the call. That line either gets ignored, thus deleted, or replaced if the call says so.
Then the original file is deleted and the _buf version is "moved" there (renamed).
These steps are executed correctly (the _buf-file gets created and correctly filled) until 'File.Delete(path)' which is where the exception is thrown.
I already tried commenting out the entire part between //here and //there and only deleting the file, which results in the same exception, so the problem has to be of more basic nature.
I tried finding a process having a problematic handle using Sysinternals Process Explorer as suggested here on ServerFault, but couldn't find anything. I only found three handles for that file in my own process.
I also ran the app after killing the windows explorer, because I've read the sometimes it is the offender; That didn't help either.
I also checked the entire rest of my program and confirmed the following:
This is the first occasion this file is used, so forgotten handles from the past are impossible.
It is a single-threaded application (so far), so deadlocks and race-conditions are ruled out as well.
So here is my question: How can I find what is locking my file?
Alternatively: Is there a simpler way to delete or edit a line in a file?
It will be more convenient to store configuration in xml format, because you have options to store more complex value if you need to do so later. And also you can rely on XDocument for loading and saving the file as simple as XDocument.Load(path) and doc.Save(path). For example, your current configuration file will look like following in xml format :
<?xml version="1.0" encoding="utf-8"?>
<params>
<param name="param0">value_of_it</param>
<param name="param1">value_of_it</param>
<param name="something_else">you_get_it</param>
</params>
And your function to alter existing param's value or add new parameter to configuration file will be like so :
internal void alterXml(string param, bool replace = false, string value = null)
{
//load xml configuration file to XDocument object
var doc = XDocument.Load(path);
//search for <param> having attribute "name" = param
var existingParam = doc.Descendants("param").FirstOrDefault(o => o.Attribute("name").Value == param);
//if such a param element doesn't exist, add new element
if (existingParam == null)
{
var newParam = new XElement("param");
newParam.SetAttributeValue("name", param);
newParam.Value = "" + value;
doc.Root.Add(newParam);
}
//else update element's value
else if (replace) existingParam.Value = "" + value;
//save modified object back to xml file
doc.Save(path);
}
If there are three handles to that file in your process, then you are opening that file multiple times, or calling the function multiple times with the handle being left open, or you have ghost versions of your processing hanging around. You should double check that you are not opening this file elsewhere in your code. When I run your code in a test, it runs fine (if I don't cause the bug related to the Substring function below).
Because the file at "path" is opened in a read-mode, it will be able to be opened by other readers without problems. However, as soon as other code tries a write operation (including file metadata, such as File.Delete), you will see this error.
Your code is not exception safe, so an exception thrown by this, or a similar function while you are reading from the stream in that loop will cause the handle to stay open, causing a later call to a function that opens the file to fail with exception you are now experiencing at File.Delete. To avoid exception safety issues, use try/finally or using (I'll provide an example below).
One such exception that is likely to occur is where you call Substring within the loop, because there is a chance the length of the line is less than the length of a full parameter. There's also another issue here in that if you pass the parameter without a trailing space, then it's possible that you will match another parameter that contains the first as a prefix (i.e. "hello" would match "hello_world").
Here's a slightly fixed version of the code that is exception safe and fixes the Substring issue.
internal void alter( string param, bool replace = false, string value = null )
{
//here
string buf;
string bufPath = path.Substring( 0, path.Length - Path.GetExtension( path ).Length ) + "_buf.txt";
using ( System.IO.StreamReader reader = new System.IO.StreamReader( path ) )
{
string paramWithSpace = param + " ";
using ( System.IO.StreamWriter writer = new StreamWriter( bufPath ) )
{
while ( ( buf = reader.ReadLine() ) != null )
{
if ( !buf.StartsWith( paramWithSpace ) )
writer.WriteLine( buf );
else if ( replace )
writer.WriteLine( paramWithSpace + value );
}
}
}
//there
File.Delete( path );
File.Move( bufPath, path );
}
However, you may wish to consider loading your configuration entirely into memory, altering it in memory multiple times and then writing back to it once in a batch. This will give you greater performance reading/writing configuration (and usually you have to write multiple configuration changes at once) and it will also simplify your code. For a simple implementation, use the generic Dictionary class.

Can I get StreamReader.EndOfStream to return false after the first time I read a file?

---short version:
When I get to the while (!checkReader.EndOfStream) every time after the first, it says EndOfStream = true.
---more detail:
A user will upload a file using an Ajax AsyncFileUpload control. I take that file, ensure it's a very specific format of csv that we use and spit it out into a GridView. This all works great the first time through: I get the file, parse it out, and it displays great.
But, if I call this same code again anytime during the user's session the StreamReader.EndOfStream = true.
For example, a user uploads a file and I spit it out into the GridView. Oops! User realizes there are headers... I have a checkbox available with an event handler that will call the method below to re-read the original file (it's stored in a session variable). User checks the box, event fires, method gets called, but my EndOfStream is now true.
I thought that using () would change that flag and I have tried adding checkReader.DiscardBufferedData just after the while loop below, but neither of those seem to have any affect.
What am I doing wrong?
private void BuildDataFileGridView(bool hasHeaders)
{
//read import file from the session variable
Stream theStream = SessionImportFileUpload.PostedFile.InputStream;
theStream.Position = 0;
StringBuilder sb = new StringBuilder();
using (StreamReader checkReader = new StreamReader(theStream))
{
while (!checkReader.EndOfStream)
{
string line = checkReader.ReadLine();
while (line.EndsWith(","))
{
line = line.Substring(0, line.Length - 1);
}
sb.AppendLine(line);
}
}
using (TextReader reader = new StringReader(sb.ToString()))
{
//read the file in and shove it out for the client
using (CsvReader csv = new CsvReader(reader, hasHeaders, CsvReader.DefaultDelimiter))
{
sDataInputTable = new DataTable();
try
{
//Load the DataTable with csv values
sDataInputTable.Load(csv);
}
catch
{
DisplayPopupMessage("ERROR: A problem was encountered");
}
//Copy only the first 10 rows into a temp table for display.
DataTable displayDataTable = sDataInputTable.Rows.Cast<System.Data.DataRow>().Take(10).CopyToDataTable();
MyDataGridView.DataSource = displayDataTable;
MyDataGridView.DataBind();
}
}
}
Edit:
SessionImportFileUpload is the actual Ajax AsyncFileUpload control being stored as a session variable (this was already the case as a previous person wrote other stuff in that uses it).
You are storing the posted file stream in Session. This is not correct, because the stream is not the data, but rather the mechanism to read the data. The file is uploaded only once, during a single POST request, and you won't be able to read from the same stream again later. Usually you even cannot rewind the stream to re-read it.
That's why I suggest to read the posted file stream only once and put the whole content into Session - this way the content will be reusable, and you'll be able to reprocess it as many times as you need.

C# File Caching

I've got a class with a method "GetNewsFeed", that when a page is requested:
Check to see if a file exists & it is less than 30 minutes old
If it does exist, read contents of the file, push contents onto page
If it does not exist, go to a URL and write the contents of that page to a .txt file, push contents onto page
I am not very well versed with C#, so I'm trying to cobble together a few sources. I believe I am close, but I'm unable to get the files to refresh every 30 minutes if needed (I'm not getting any compliation errors or anything). Any help would be appreciated.
public static string GetNewsFeed(string url, string fileName)
{
// Set the path to the cache file
String filePath = HttpContext.Current.Server.MapPath("/cachefeed/" + fileName + ".txt");
string fileContents = "";
// If the file exists & is less than 30 minutes old, read from the file.
if (File.Exists(filePath) && (File.GetLastWriteTime(filePath) > DateTime.Now.AddMinutes(-30)))
{
fileContents = File.ReadAllText(filePath);
}
else
{
try
{
// If the file is older than 30 minutes, go out and download a fresh copy
using (var client = new WebClient())
{
// Delete and write the file again
fileContents = client.DownloadString(url);
File.Delete(filePath);
File.WriteAllText(filePath, fileContents);
}
}
catch (Exception)
{
if (File.Exists(filePath))
{
fileContents = File.ReadAllText(filePath);
}
}
}
return fileContents;
}
Finally, I've got some code elsewhere that will read these text files and manipulate their contents onto the page. I don't have any issues with this.
Odds are, you're catching an exception in the else block and it's only returning the fileContents. Try putting a breakpoint in the exception block to see what is going on.
You'll need to change it to:
catch( Exception e )
in order to get this information.
Also, you don't need this:
File.Delete(filePath);
The WriteAllText method will overwrite the file that is already there. Try removing that line and check your directory permissions.
You may also want to change
(File.GetLastWriteTime(filePath) > DateTime.Now.AddMinutes(-30)))
to
(DateTime.Now - File.GetLastWriteTime(filePath)).TotalMinutes > 30
I added a throw to my catch and believe it or not, one of the URL's I was passing into my method was invalid. So yes, the culprit in my code was the catch statement.
I fixed this and all is working properly.
Thanks for the tips everyone.

Categories

Resources