I have 2 files file A and file B how need to copy line 30 on file A and paste it over the top of line 30 in file B can I do this in C#?
Here's a very simple way, assuming file B is small enough to read into memory:
string lineFromA = File.ReadLines("fileA.txt").Skip(29).First();
string[] linesFromB = File.ReadAllLines("fileB.txt");
linesFromB[29] = lineFromA;
File.WriteAllLines("fileC.txt", linesFromB);
This assumes you're using .NET 4, with its lazy File.ReadLines method. If you're not, the simplest approach would be to read both files into memory completely, using File.ReadAllLines twice:
string[] linesFromA = File.ReadAllLines("fileA.txt");
string[] linesFromB = File.ReadAllLines("fileB.txt");
linesFromB[29] = linesFromA[29];
File.WriteAllLines("fileC.txt", linesFromB);
There are definitely more efficient approaches, but I'd go with the above unless I had any reason to need a more efficient one.
If you use a streamwriter for the writing side you get a routine that does not use a lot of memory and can also be used for larger files.
string lineFromA = File.ReadLines("fileA.txt").Skip(29).First();
using (var fileC = File.AppendText("fileC.txt"))
{
int i = 0;
foreach (var lineFromB in File.ReadLines("fileB.txt"))
{
i++;
fileC.WriteLine(i != 30 ? lineFromB : lineFromA);
}
}
Related
I checked 2 ways to search in big files.
I tested on 500mb size file.
1st way took 9500ms and 2nd way took 11500ms.
How could it happen?
Buffering is faster than accessing the resources on each iteration.
Linq is more powerfull than foreach search.
Is it trouble with memory allocation?
1:
var __file = new System.IO.StreamReader(file);
var line = "";
while ((line = __file.ReadLine()) != null)
{
var firstOccurrence = line.Contains(contains);
}
__file.Close();
2:
var lines = File.ReadAllLines(_file);
var firstOccurrence = lines.FirstOrDefault(l => l.Contains(contains));
In your first code snippet, you don't stop looping when you find a match. Try something like this:
while ((line = __file.ReadLine()) != null)
{
var firstOccurrence = line.Contains(contains);
if (firstOccurrence)
{
break;
}
}
In your second code snippet, you read the entire file into memory, and then start looking through it line-by-line. This is different to your first code snippet, where you read the file off disk one line at a time.
The equivalent method is File.ReadLines -- this reads the file line-by-line:
var firstOccurrence = File.ReadLines(_file).FirstOrDefault(l => l.Contains(Contains));
I have a text file, which I am trying to insert a line of code into. Using my linked-lists I believe I can avoid having to take all the data out, sort it, and then make it into a new text file.
What I did was come up with the code below. I set my bools, but still it is not working. I went through debugger and what it seems to be going on is that it is going through the entire list (which is about 10,000 lines) and it is not finding anything to be true, so it does not insert my code.
Why or what is wrong with this code?
List<string> lines = new List<string>(File.ReadAllLines("Students.txt"));
using (StreamReader inFile = new StreamReader("Students.txt", true))
{
string newLastName = "'Constant";
string newRecord = "(LIST (LIST 'Constant 'Malachi 'D ) '1234567890 'mdcant#mail.usi.edu 4.000000 )";
string line;
string lastName;
bool insertionPointFound = false;
for (int i = 0; i < lines.Count && !insertionPointFound; i++)
{
line = lines[i];
if (line.StartsWith("(LIST (LIST "))
{
values = line.Split(" ".ToCharArray());
lastName = values[2];
if (newLastName.CompareTo(lastName) < 0)
{
lines.Insert(i, newRecord);
insertionPointFound = true;
}
}
}
if (!insertionPointFound)
{
lines.Add(newRecord);
}
You're just reading the file into memory and not committing it anywhere.
I'm afraid that you're going to have to load and completely re-write the entire file. Files support appending, but they don't support insertions.
you can write to a file the same way that you read from it
string[] lines;
/// instanciate and build `lines`
File.WriteAllLines("path", lines);
WriteAllLines also takes an IEnumerable, so you can past a List of string into there if you want.
one more issue: it appears as though you're reading your file twice. one with ReadAllLines and another with your StreamReader.
There are at least four possible errors.
The opening of the streamreader is not required, you have already read
all the lines. (Well not really an error, but...)
The check for StartsWith can be fooled if you lines starts with blank
space and you will miss the insertionPoint. (Adding a Trim will remove any problem here)
In the CompareTo line you check for < 0 but you should check for == 0. CompareTo returns 0 if the strings are equivalent, however.....
To check if two string are equals you should avoid using CompareTo as
explained in MSDN link above but use string.Equals
List<string> lines = new List<string>(File.ReadAllLines("Students.txt"));
string newLastName = "'Constant";
string newRecord = "(LIST (LIST 'Constant 'Malachi 'D ) '1234567890 'mdcant#mail.usi.edu 4.000000 )";
string line;
string lastName;
bool insertionPointFound = false;
for (int i = 0; i < lines.Count && !insertionPointFound; i++)
{
line = lines[i].Trim();
if (line.StartsWith("(LIST (LIST "))
{
values = line.Split(" ".ToCharArray());
lastName = values[2];
if (newLastName.Equals(lastName))
{
lines.Insert(i, newRecord);
insertionPointFound = true;
}
}
}
if (!insertionPointFound)
lines.Add(newRecord);
I don't list as an error the missing write back to the file. Hope that you have just omitted that part of the code. Otherwise it is a very simple problem.
(However I think that the way in which CompareTo is used is probably the main reason of your problem)
EDIT Looking at your comment below it seems that the answer from Sam I Am is the right one for you. Of course you need to write back the modified array of lines. All the changes are made to an in memory array of lines and nothing is written back to a file if you don't have code that writes a file. However you don't need new file
File.WriteAllLines("Students.txt", lines);
I found this code on stackoverflow to delete first and last line from a text file.
But I'm not getting how to combine this code into one so that it will delete 1st and
last line from a single file?
What I tried was using streamreader read the file and then skip 1st and last line then
streamwriter to write in new file, but couldn't get the proper structure.
To delete first line.
var lines = System.IO.File.ReadAllLines("test.txt");
System.IO.File.WriteAllLines("test.txt", lines.Skip(1).ToArray());
to delete last line.
var lines = System.IO.File.ReadAllLines("...");
System.IO.File.WriteAllLines("...", lines.Take(lines.Length - 1).ToArray());
You can chain the Skip and Take methods. Remember to subtract the appropriate number of lines in the Take method. The more you skip at the beginning, the less lines remain.
var filename = "test.txt";
var lines = System.IO.File.ReadAllLines(filename);
System.IO.File.WriteAllLines(
filename,
lines.Skip(1).Take(lines.Length - 2)
);
Whilst probably not a major issue in this case, the existing answers all rely on reading the entire contents of the file into memory first. For small files, that's probably fine, but if you're working with very large files, this could prove prohibitive.
It is reasonably trivial to create a SkipLast equivalent of the existing Skip Linq method:
public static class SkipLastExtension
{
public static IEnumerable<T> SkipLast<T>(this IEnumerable<T> source, int count)
{
var queue = new Queue<T>();
foreach (var item in source)
{
queue.Enqueue(item);
if (queue.Count > count)
{
yield return queue.Dequeue();
}
}
}
}
If we also define a method that allows us to enumerate over each line of a file without pre-buffering the whole file (per: https://stackoverflow.com/a/1271236/381588):
static IEnumerable<string> ReadFrom(string filename)
{
using (var reader = File.OpenText(filename))
{
string line;
while ((line = reader.ReadLine()) != null)
{
yield return line;
}
}
}
Then we can use the following the following one-liner to write a new file that contains all the lines from the original file, except the first and last:
File.WriteAllLines("output.txt", ReadFrom("input.txt").Skip(1).SkipLast(1));
This is undoubtedly (considerably) more code than the other answers that have already been posted here, but should work on files of essentially any size, (as well as providing a code for a potentially useful SkipLast extension method).
Here's a different approach that uses ArraySegment<string> instead:
var lines = File.ReadAllLines("test.txt");
File.WriteAllLines("test.txt", new ArraySegment<string>(lines, 1, lines.Length-2));
I need to merge two files while also applying a sort. It is important the I keep the task light on memory usage. I need to create a console app in c# for this.
Input File 1:
Some Header
A12345334
A00123445
A44566555
B55677
B55683
B66489
record count: 6
Input File 2:
Some Header
A00123465
B99423445
record count: 2
So, I need to make sure that the third file should have all the "A" records coming first and then the "B" records followed by the Total record count.
Output File:
Some header
A12345334
A00123445
A44566555
A00123465
B99423445
B55677
B55683
B66489
record count: 8
Record sorting within "A" and "B" is not relevant.
Since your source files appear sorted, you can do with with very low memory usage.
Just open both input files as well as a new file for writing. Then compare the next available line from each input file and write the line that comes first to your output file. Each time you write a line to the output file, get the next line from the input file it came from.
Continue until both input files are finished.
If memory is an issue the easiest way to do this is probably going to be to read the records from both files, store them in a SQLite or SQL Server Compact database, and execute a SELECT query that returns a sorted record set. Make sure you have an index on the field you want to sort on.
That way, you don't have to store the records in memory, and you don't need any sorting algorithms; the database will store the records on disk and do your sorting for you.
Quick idea, assuming the records are already sorted in the original files:
Start looping through file 2, collecting all A-records
Once you reach the first B-record, start collecting those in a separate collection.
Read all of File 1.
Write out the content of the A-records collection from file 2, then append the contents read from file 1, followed by the B-records from file 2.
Visualized:
<A-data from file 2>
<A-data, followed by B-data from file 1>
<B-data from file 2>
If you are concerned about memory this is a perfect case for insertion sort and read one line at a time from each file. If that is not an issue read the whole thing into a list and just call sort the write it out.
If you can't even keep the whole sorted list in memory then a database or memory mapped file is you best bet.
Assuming your input files are already ordered:
Open Input files 1 and 2 and create the Output file.
Read the first record from file 1. If it starts with A, write it to the output file. Continue reading from input file 1 until you reach a record that starts with B.
Read the first record from file 2. If it start with A, write it to the output file. Continue reading from input file 2 until you reach a record that starts with B.
Go back to file 1, and write the 'B' record to the output file. Continue reading from input file 1 until you reach the end of the stream.
Go back to file 2, and write the 'B' record to the output file. Continue reading from input file 2 until you reach the end of the stream.
This method will prevent you from ever having to hold more than 2 rows of data in memory at a time.
i would recommend using StreamReader and StreamWriter for this application. So you can open a file using StreamWriter, copy all lines using StreamReader for file #1, then for file #2. This operations are very fast, have integrated buffers and are very lightweight.
if the input files are already sorted by A and B, you can switch between the source readers to make the output sorted.
Since you have two sorted sequences you just need to merge the two sequences into a single sequence, in much the same way the second half of the MergeSort algorithm works.
Unfortunately, given the interface that IEnumerable provides, it ends up a bit mess and copy-pasty, but it should perform quite well and use a very small memory footprint:
public class Wrapper<T>
{
public T Value { get; set; }
}
public static IEnumerable<T> Merge<T>(IEnumerable<T> first, IEnumerable<T> second, IComparer<T> comparer = null)
{
comparer = comparer ?? Comparer<T>.Default;
using (var secondIterator = second.GetEnumerator())
{
Wrapper<T> secondItem = null; //when the wrapper is null there are no more items in the second sequence
if (secondIterator.MoveNext())
secondItem = new Wrapper<T>() { Value = secondIterator.Current };
foreach (var firstItem in first)
{
if (secondItem != null)
{
while (comparer.Compare(firstItem, secondItem.Value) > 0)
{
yield return secondItem.Value;
if (secondIterator.MoveNext())
secondItem.Value = secondIterator.Current;
else
secondItem = null;
}
}
yield return firstItem;
yield return secondItem.Value;
while (secondIterator.MoveNext())
yield return secondIterator.Current;
}
}
}
Once you have a Merge function it's pretty trivial:
File.WriteAllLines("output.txt",
Merge(File.ReadLines("File1.txt"), File.ReadLines("File2.txt")))
The File ReadLines and WriteAllLines here each utilize IEnumerable and will stream the lines accordingly.
Here's the source code for the more generic/boiler plate solution for merge sorting 2 files.
public static void Merge(string inFile1, string inFile2, string outFile)
{
string line1 = null;
string line2 = null;
using (StreamReader sr1 = new StreamReader(inFile1))
{
using (StreamReader sr2 = new StreamReader(inFile2))
{
using (StreamWriter sw = new StreamWriter(outFile))
{
line1 = sr1.ReadLine();
line2 = sr2.ReadLine();
while(line1 != null && line2 != null)
{
// your comparison function here
// ex: (line1[0] < line2[0])
if(line1 < line2)
{
sw.WriteLine(line1);
line1 = sr1.ReadLine();
}
else
{
sw.WriteLine(line2);
line2 = sr2.ReadLine();
}
}
while(line1 != null)
{
sw.WriteLine(line1);
line1 = sr1.ReadLine();
}
while(line2 != null)
{
sw.WriteLine(line2);
line2 = sr2.ReadLine();
}
}
}
}
}
public void merge_click(Object sender, EventArgs e)
{
DataTable dt = new DataTable();
dt.Clear();
dt.Columns.Add("Name");
dt.Columns.Add("designation");
dt.Columns.Add("age");
dt.Columns.Add("year");
string[] lines = File.ReadAllLines(#"C:\Users\user1\Desktop\text1.txt", Encoding.UTF8);
string[] lines1 = File.ReadAllLines(#"C:\Users\user2\Desktop\text1.txt", Encoding.UTF8);
foreach (string line in lines)
{
string[] values = line.Split(',');
DataRow dr = dt.NewRow();
dr["Name"] = values[0].ToString();
dr["designation"] = values[1].ToString();
dr["age"] = values[2].ToString();
dr["year"] = values[3].ToString();
dt.Rows.Add(dr);
}
foreach (string line in lines1)
{
string[] values = line.Split(',');
DataRow dr = dt.NewRow();
dr["Name"] = values[0].ToString();
dr["designation"] = values[1].ToString();
dr["age"] = values[2].ToString();
dr["year"] = values[3].ToString();
dt.Rows.Add(dr);
}
grdstudents.DataSource = dt;
grdstudents.DataBind();
}
I'm developing a log parser, and I'm reading files of strings of more than 150MB.- This is my approach, Is there any way to optimize what is in the While statement? The problem is that is consuming a lot of memory.- I also tried with a stringbuilder facing the same memory comsuption.-
private void ReadLogInThread()
{
string lineOfLog = string.Empty;
try
{
StreamReader logFile = new StreamReader(myLog.logFileLocation);
InformationUnit infoUnit = new InformationUnit();
infoUnit.LogCompleteSize = myLog.logFileSize;
while ((lineOfLog = logFile.ReadLine()) != null)
{
myLog.transformedLog.Add(lineOfLog); //list<string>
myLog.logNumberLines++;
infoUnit.CurrentNumberOfLine = myLog.logNumberLines;
infoUnit.CurrentLine = lineOfLog;
infoUnit.CurrentSizeRead += lineOfLog.Length;
if (onLineRead != null)
onLineRead(infoUnit);
}
}
catch { throw; }
}
Thanks in advance!
EXTRA:
Im saving each line because after reading the log I will need to check for some information on every stored line.- The language is C#
Memory economy can be achieved if your log lines are actually can be parsed to a data row representation.
Here is a typical log line i can think of:
Event at: 2019/01/05:0:24:32.435, Reason: Operation, Kind: DataStoreOperation, Operation Status: Success
This line takes 200 bytes in memory.
At the same time, following representation just takes belo 16 bytes:
Enum LogReason { Operation, Error, Warning };
Enum EventKind short { DataStoreOperation, DataReadOperation };
Enum OperationStatus short { Success, Failed };
LogRow
{
DateTime EventTime;
LogReason Reason;
EventKind Kind;
OperationStatus Status;
}
Another optimization possibility is just parsing a line to array of string tokens,
this way you could make use of string interning.
For example, if a word "DataStoreOperation" takes 36 bytes, and if it has 1000000 entiries in the file, the economy is (18*2 - 4) * 1000000 = 32 000 000 bytes.
Try to make your algorithm sequential.
Using an IEnumerable instead of a List helps playing nice with memory, while keeping same semantic as working with a list, if you don't need random access to lines by index in the list.
IEnumerable<string> ReadLines()
{
// ...
while ((lineOfLog = logFile.ReadLine()) != null)
{
yield return lineOfLog;
}
}
//...
foreach( var line in ReadLines() )
{
ProcessLine(line);
}
I am not sure if it will fit your project but you can store the result in StringBuilder instead of strings list.
For example, this process on my machine takes 250MB memory after loading (file is 50MB):
static void Main(string[] args)
{
using (StreamReader streamReader = File.OpenText("file.txt"))
{
var list = new List<string>();
string line;
while (( line=streamReader.ReadLine())!=null)
{
list.Add(line);
}
}
}
On the other hand, this code process will take only 100MB:
static void Main(string[] args)
{
var stringBuilder = new StringBuilder();
using (StreamReader streamReader = File.OpenText("file.txt"))
{
string line;
while (( line=streamReader.ReadLine())!=null)
{
stringBuilder.AppendLine(line);
}
}
}
Memory usage keeps going up because you're simply adding them to a List<string>, constantly growing. If you want to use less memory one thing you can do is to write the data to disk, rather than keeping it in scope. Of course, this will greatly cause speed to degrade.
Another option is to compress the string data as you're storing it to your list, and decompress it coming out but I don't think this is a good method.
Side Note:
You need to add a using block around your streamreader.
using (StreamReader logFile = new StreamReader(myLog.logFileLocation))
Consider this implementation: (I'm speaking c/c++, substitute c# as needed)
Use fseek/ftell to find the size of the file.
Use malloc to allocate a chunk of memory the size of the file + 1;
Set that last byte to '\0' to terminate the string.
Use fread to read the entire file into the memory buffer.
You now have char * which holds the contents of the file as a
string.
Create a vector of const char * to hold pointers to the positions
in memory where each line can be found. Initialize the first element
of the vector to the first byte of the memory buffer.
Find the carriage control characters (probably \r\n) Replace the
\r by \0 to make the line a string. Increment past the \n.
This new pointer location is pushed back onto the vector.
Repeat the above until all of the lines in the file have been NUL
terminated, and are pointed to by elements in the vector.
Iterate though the vector as needed to investigate the contents of
each line, in your business specific way.
When you are done, close the file, free the memory, and continue
happily along your way.
1) Compress the strings before you store them (ie see System.IO.Compression and GZipStream). This would probably kill the performance of your program though since you'd have to uncompress to read each line.
2) Remove any extra white space characters or common words you can do without. ie if you can understand what the log is saying with the words "the, a, of...", remove them. Also, shorten any common words (ie change "error" to "err" and "warning" to "wrn"). This would slow down this step in the process but shouldn't affect performance of the rest.
What encoding is your original file? If it is ascii then just the strings alone are going to take over 2x the size of the file just to load up into your array. A C# character is 2 bytes and a C# string adds an extra 20 bytes per string in addition to the characters.
In your case, since it is a log file, you can probably exploit the fact that there is a lot of repetition in the the messages. You most likely can parse the incoming line into a data structure which reduces the memory overhead. For example, if you have a timestamp in the log file you can convert that to a DateTime value which is 8 bytes. Even a short timestamp of 1/1/10 would add 12 bytes to the size of a string, and a timestamp with time information would be even longer. Other tokens in your log stream might be able to be turned into a code or an enum in a similar manner.
Even if you have the leave the value as a string, if you can break it down into pieces that are used a lot, or remove boilerplate that is not needed at all you can probably cut down on your memory usage. If there are a lot of common strings you can Intern them and only pay for 1 string no matter how many you have.
If you must store the raw data, and assuming that your logs are mostly ASCII, then you can save some memory by storing UTF8 bytes internally. Strings are UTF16 internally, so you're storing an extra byte for each character. So by switching to UTF8 you're cutting memory use by half (not counting class overhead, which is still significant). Then you can convert back to normal strings as needed.
static void Main(string[] args)
{
List<Byte[]> strings = new List<byte[]>();
using (TextReader tr = new StreamReader(#"C:\test.log"))
{
string s = tr.ReadLine();
while (s != null)
{
strings.Add(Encoding.Convert(Encoding.Unicode, Encoding.UTF8, Encoding.Unicode.GetBytes(s)));
s = tr.ReadLine();
}
}
// Get strings back
foreach( var str in strings)
{
Console.WriteLine(Encoding.UTF8.GetString(str));
}
}