How to compare two TXT files before send it to SQL - c#

I have to handle TXT dat files which coming from one embed device, My problem is in that device always sending all captured data but I want to take only difrences between two sending and do calculation on them. After calculation I send it to SQL using bulkinsert function. I want to extract data which is different according to first file I got from device. Lats say that device first time device send data like this in some.dat (ASCII) file
0000199991
0000199321
0000132913
0000232318
0000312898
On second calls to get data from device it is going to return all again (previous and next captured records) something like this
0000199991
0000199321
0000132913
0000232318
0000312898
9992129990
8782999022
2323423456
But this time I do want only to calculate and pass trough data added after first insert.
I am trying to make Win Forms app using C# and Visual Studio 2008

You can do this using LINQ:
string[] addedLines = File.ReadAllLines(secondPath)
.Except(File.ReadAllLines(firstPath))
.ToArray();
Note that this will be slow for large files.
For large files, replace ReadAllLines with the following method: (In .Net 4, you can use File.ReadLines instead)
static IEnumerable<string> EnumerateLines(string path) {
using(var reader = File.OpenText(path)) {
string line;
while(null != (line = reader.ReadLine())
yield return line;
}
}

Would this work for you?
string dataAddedAfterFirstInsert = secondString.SubString(firstString.Length, secondString.Length)

One option would be to remember the filesize each time you receive the file, then when you get the new file you can immediately move the file pointer to the position in the file that corresponds to the end of the previous file and read from that point on.
Here is a rough outline of the idea
long lastPosition = GetLastFilePositionFromDatabase();
using (FileStream fs = new FileStream(...))
{
// Seek to the last position, this is zero the first time
fs.Seek(lastFilePosition, SeekOrigin.Begin);
// Process your file from the current position
ProcessFile(fs);
// Once you reach the end of the file, save this position so
// for use with the next file
SaveLastFilePositionToDatabase(fs.Position);
}

Related

How to convert huge text file from Unix to Windows quickly in .NET core

In .Net core, I have huge text files that need to be converted from Unix to Windows.
Since I can't load the file completly in memory (the files are too big), I read each byte one after the other, and when I encounter a LF, I output a LF+CR. This process works, but it takes a long time for huge files. Is there a more efficently way to do?
I thought about using a StreamReader, but the problem I'm having is that we don't know the source file encoding.
Any idea?
Thank you
Without knowing more about the specific files you're trying to process, I'd probably start off with something like the below and see if that gets me the results I want.
Depending on the specifics of your situation you may be able to do something more efficient, but if you're handling truly large datasets with unstructured text then it's usually a matter of throwing more powerful hardware at the problem if speed is still an issue.
You don't have to specify the Encoding to make use of the StreamReader class. Was there a specific problem with the reader you encountered?
const string inputFilePath = "";
const string outputFilePath = "";
using var sr = new StreamReader(inputFilePath);
using var sw = new StreamWriter(outputFilePath);
string line;
// Buffers each line into memory, but not the newline characters.
while ((line = await sr.ReadLineAsync()) != null)
{
// Write the contents of the string out to the "fixed" file (manually
// specifying the line ending you want).
await sw.WriteAsync(line + "\r\n");
}

Input from any file in WPF Application is very Slow

I have a WPF Application which takes an input file path from user and then at the backend open the text file and try to read single character from the file.
fs = File.OpenRead(fileName);
var sr = new StreamReader(fs);
int c;
while ((c = sr.Read()) != -1)
{
Console.Write((char)c); //to check character read from file
try
{
frequencyMap.Add((char)c, 1);
}
catch
{
frequencyMap[(char)c] += 1;
}
}
Here frequencyMap is the dictionary in which character and it's frequency is stored.
This is one method no matter whatever i do the reading from file is always slow even if i try to read the whole text. On output window i see
Area selected is the part of input from the file.
Files upto 2KBs are fine but reading from files like 20KB really gives a hard time.
Now I read that using threads can solve this problem i just don't know how.
My Question is how can i read data from files fastly? if using threads is the solution then how to implement it?
i am new to this so kindly help me.
Thanks
Don't read it by character, read it for example by line, and process each string in a loop. Also Exception is not a way to check if the key exists in the Dictionary.
using (var sr = new StreamReader(fileName))
{
while (!sr.EndOfStream)
{
string s = sr.ReadLine();
Debug.WriteLine(s); //to check string read from file
foreach (char c in s)
{
if (frequencyMap.ContainsKey(c))
frequencyMap[c]++;
else
frequencyMap.Add(c, 1);
}
}
}
Firstly I hope the Console.WrieLine is purely test code. Writing to the console for every character will slow down your processing considerably.
Secondly, it appears from the screen shot you shared that your application is throwing a lot of exceptions. Throwing exceptions is not cheap either in a tight loop.
Thirdly I would recommend you profile your application (visual studio provides a profiler) to help you pin point where exactly your application is spending it’s time.

Best approach to replace specific lines in a text file in c#

For the following operation:
Open a text file
Search and replace all searching characters with new characters
I'd like to achieve above in c#, here is my code:
using (StreamReader sr = new StreamReader(#"S:\Personal Folders\A\TESTA.txt"))
{
using (StreamWriter sw = new StreamWriter(#"S:\Personal Folders\A\TESTB.txt"))
{
string line;
while ((line = sr.ReadLine())!= null)
{
if (!line.Contains("before"))
{
sw.WriteLine(line);
}
else if (line.Contains("before"))
{
sw.WriteLine(line.Replace("before", "after"));
}
}
}
}
Basically, the above code will generate a new file with the desired replace operation, but as you can see, the way I am doing is read each line of the original file and write to a new file. This could achieve my goal, but it may have system IO issue because it is reading and writing for each line. Also, I cannot read all the lines to an array first, and then write, because the file is large and if I try to write to an string[], replace all, then write the array to the file, will bring about the memory timeout issue.
Is there any way that I can just locate to the specific lines, and just replace those lines and keep all the rest? Or What is the best way to solve the above problem? Thanks
I don't know what IO issue you are worried about, but your code should work ok. You can code more concisely as follows:
using (StreamReader sr = new StreamReader(#"S:\Personal Folders\A\TESTA.txt"))
{
using (StreamWriter sw = new StreamWriter(#"S:\Personal Folders\A\TESTB.txt"))
{
while ((string line = sr.ReadLine())!= null)
{
sw.WriteLine(line.Replace("before", "after"));
}
}
}
This will run a bit faster because it searches for "before" only once per line. By default the StreamWriter buffers your writes and does not flush to the disk each time you call WriteLine, and file IO is asynchronous in the operating system, so don't worry so much about IO.
In general, what you are doing is correct, possibly followed by some renames to replace the original file. If you do want to replace the original file, you should rename the original file to a temporary name, rename the new file to the original name, and then either leave or delete the original file. You must handle conflicts with your temporary name and errors in all renames.
Consider you are replacing a six character string with a five character string - if you write back to the original file, what will you do with the extra characters? Files are stored on disk as raw bytes of data, there is no "text" file on disk. What if you replace a string with a longer one - you then potentially have to move the entire rest of the file to make room to write the longer line.
You can imagine the file on disk as letters written on graph paper in the boxes. The end of each line is noted by a special character (or characters - in Windows, that is CRLF), the characters fill all the boxes horizontally. If you tried to replace words on the graph paper you would have to erase and re-write lots of letters. Writing on a new sheet will be easiest.
Well, your approach is basically fine... but I wouldn't check if the line contains the word before... the trade-off is not good enough:
using (StreamReader sr = new StreamReader(#"S:\Personal Folders\A\TESTA.txt"))
{
using (StreamWriter sw = new StreamWriter(#"S:\Personal Folders\A\TESTB.txt"))
{
String line;
while ((line = sr.ReadLine()) != null)
sw.WriteLine(line.Replace("before", "after"));
}
}
Try following :
else if (line.Contains("before"))
{
sw.WriteLine(line.Replace("before", "after"));
sw.Write(sr.ReadToEnd());
break;
}

Get all lines after the last print of line with 'keyword' in C#

I am working on a c# project.
I am trying to send a logfile via email whenever application gets crashed.
however logfile is a little bit larger in size.
So I thought that i should include only a specific portion of logfile.
For that I am trying to read all the lines after the last instance of line with specified keyword.(in my case "Application Started")
since Application get restarted many times(due to crashing), 'Application Started' gets printed many times in file. So I would only want last print of line containing 'Application Started' & lines after that until end of file.
I require help to figure out how can i do this.
I have just started with Basic code as of now.
System.IO.StreamReader file = new System.IO.StreamReader("c:\\mylogfile.txt");
while((line = file.ReadLine()) != null)
{
if ( line.Contains("keyword") )
{
}
}
Read the file, line-by-line, until you find your keyword. Once you find your keyword, start pushing every line after that into a List<string>. If you find another line with your keyword, just Clear your list and start refilling it from that point.
Something like:
List<string> buffer = new List<string>();
using (var sin = new StreamReader("pathtomylogfile"))
{
string line;
bool read;
while ((line = sin.ReadLine())!=null)
{
if (line.Contains("keyword"))
{
buffer.Clear();
read = true;
}
if (read)
{
buffer.Add(line);
}
}
// now buffer has the last entry
// you could use string.Join to put it back together in a single string
var lastEntry = string.Join("\n",buffer);
}
If the number of lines in each entry is very large, it might be more efficient to scan the file first to find the last entry and then loop again to extract it. If the whole log file isn't that large, it might be more efficient to just ReadToEnd and then use LastIndexOf to find the start of the last entry.
Read everything from the file and then select the portion you want.
string lines = System.IO.File.ReadAllText("c:\\mylogfile.txt");
int start_index = lines.LastIndexOf("Application Started");
string needed_portion = lines.Substring(start_index);
SendEmail(needed_portion);
I advise you to use a proper logger, like log4net or NLogger.
You can configure it to save to multiple files - one containing complete logs, other containing errors/exceptions only. Also you can set maximum size of log files, etc. Or can configure them to send you a mail if exception occours.
Of course this does not solves your current problem, for it there is some solution above.
But I would try simpler methods, like trying out Notepad++ - it can handle bigger files (last time i've formatted a 30MB XML document with it, it took about 20 mins, but he did it! With simple text files there should be much better perf.). Or if you open the file for reading only (not for editing) you may get much better performance (in Windows).

StreamReader and TextReader

I am trying to read from a pbx file using StreamReader, edit the contents and display the contents to a new file using TextReader in c#.
This is my first developement task in a c#.
I studied java at uni and my new job uses c#.
Basically I have to read through a list of records contained in a pbx file from a phone system. These records however, have a line of good call records followed by a line with a few dodgy characters, followed by another line of good records.
My task is to read through this file line by line and then write a piece of code to ignore the lines with dodgy characters and output the good records into a new file on my c:\ drive which ive called output.txt.
I can write the while loop to take out the dodgy characters but im unsure of the c# code to read from the pbx file on my c drive and then output the edited contents to a new file called output.txt, also on my c drive.
I'm new to c# and have explored google for hours on this. Just need a little guidance and I'm away...
You didn't mention the file encodings, so I'm sticking with the UTF-8 defaults here.
One option is the 'regular' method of a loop that reads, checks, and conditionally writes, like this:
var inputFilePath = #"C:\temp\input.txt";
var outputFilePath = #"C:\temp\output.txt";
using (var reader = File.OpenText(inputFilePath))
using (var writer = File.CreateText(outputFilePath))
{
string line;
while ((line = reader.ReadLine()) != null)
{
var isValidLine = CheckLine(line);
if (isValidLine)
{
writer.WriteLine(line);
}
}
}
Since you tagged this VS2008, I'm guessing that means you're limited to .NET 3.5, but on 4.0 or later, you can read and write enumerables and then leverage that (in .NET 3.5 you'd have to read all the lines into memory, filter, then write all the lines).
var inputFilePath = #"C:\temp\input.txt";
var outputFilePath = #"C:\temp\output.txt";
var inputLines = File.ReadLines(inputFilePath);
var linesToWrite = inputLines
.Where(line => IsLineValid(line));
File.WriteAllLines(outputFilePath, linesToWrite);

Categories

Resources