Can't find string in input file - c#

I have a text file, which I am trying to insert a line of code into. Using my linked-lists I believe I can avoid having to take all the data out, sort it, and then make it into a new text file.
What I did was come up with the code below. I set my bools, but still it is not working. I went through debugger and what it seems to be going on is that it is going through the entire list (which is about 10,000 lines) and it is not finding anything to be true, so it does not insert my code.
Why or what is wrong with this code?
List<string> lines = new List<string>(File.ReadAllLines("Students.txt"));
using (StreamReader inFile = new StreamReader("Students.txt", true))
{
string newLastName = "'Constant";
string newRecord = "(LIST (LIST 'Constant 'Malachi 'D ) '1234567890 'mdcant#mail.usi.edu 4.000000 )";
string line;
string lastName;
bool insertionPointFound = false;
for (int i = 0; i < lines.Count && !insertionPointFound; i++)
{
line = lines[i];
if (line.StartsWith("(LIST (LIST "))
{
values = line.Split(" ".ToCharArray());
lastName = values[2];
if (newLastName.CompareTo(lastName) < 0)
{
lines.Insert(i, newRecord);
insertionPointFound = true;
}
}
}
if (!insertionPointFound)
{
lines.Add(newRecord);
}

You're just reading the file into memory and not committing it anywhere.
I'm afraid that you're going to have to load and completely re-write the entire file. Files support appending, but they don't support insertions.
you can write to a file the same way that you read from it
string[] lines;
/// instanciate and build `lines`
File.WriteAllLines("path", lines);
WriteAllLines also takes an IEnumerable, so you can past a List of string into there if you want.
one more issue: it appears as though you're reading your file twice. one with ReadAllLines and another with your StreamReader.

There are at least four possible errors.
The opening of the streamreader is not required, you have already read
all the lines. (Well not really an error, but...)
The check for StartsWith can be fooled if you lines starts with blank
space and you will miss the insertionPoint. (Adding a Trim will remove any problem here)
In the CompareTo line you check for < 0 but you should check for == 0. CompareTo returns 0 if the strings are equivalent, however.....
To check if two string are equals you should avoid using CompareTo as
explained in MSDN link above but use string.Equals
List<string> lines = new List<string>(File.ReadAllLines("Students.txt"));
string newLastName = "'Constant";
string newRecord = "(LIST (LIST 'Constant 'Malachi 'D ) '1234567890 'mdcant#mail.usi.edu 4.000000 )";
string line;
string lastName;
bool insertionPointFound = false;
for (int i = 0; i < lines.Count && !insertionPointFound; i++)
{
line = lines[i].Trim();
if (line.StartsWith("(LIST (LIST "))
{
values = line.Split(" ".ToCharArray());
lastName = values[2];
if (newLastName.Equals(lastName))
{
lines.Insert(i, newRecord);
insertionPointFound = true;
}
}
}
if (!insertionPointFound)
lines.Add(newRecord);
I don't list as an error the missing write back to the file. Hope that you have just omitted that part of the code. Otherwise it is a very simple problem.
(However I think that the way in which CompareTo is used is probably the main reason of your problem)
EDIT Looking at your comment below it seems that the answer from Sam I Am is the right one for you. Of course you need to write back the modified array of lines. All the changes are made to an in memory array of lines and nothing is written back to a file if you don't have code that writes a file. However you don't need new file
File.WriteAllLines("Students.txt", lines);

Related

How to remove rows from IEnumerable

I'm loading CSV Files into a IEnumerable.
string[] fileNames = Directory.GetFiles(#"read\", "*.csv");
for (int i = 0; i < fileNames.Length; i++)
{
string file = #"read\" + Path.GetFileName(fileNames[i]);
var lines = from rawLine in File.ReadLines(file, Encoding.Default)
where !string.IsNullOrEmpty(rawLine)
select rawLine;
}
After that I work with the Data but now there are couple of Files that are pretty much empty and only have ";;;;;;" (the amount varies) written in there.
How can I delete those rows before working with them and without changing anything in the csv files?
If the amount of ; characters per line is variable, this is what your "where" condition should look like:
where !string.IsNullOrEmpty(rawLine) && !string.IsNullOrEmpty(rawLine.Trim(';'))
rawLine.Trim(';') will return a copy of the string with all ; characters removed. If this new string is empty, it means this line can be ignored, since it only contained ; characters.
You can't remove anything from an IEnumerable(like from a List<T>), buty ou can add a filter:
lines = lines.Where(l => !l.Trim().All(c => c == ';'));
This won't delete anything, but you won't process these lines anymore.
You can't remove rows from an enumerable - https://msdn.microsoft.com/en-us/library/system.collections.ienumerable.aspx.
Instead try creating a new array with the filtered data or filter it on the where clause that you presented like:
string[] fileNames = Directory.GetFiles(#"read\", "*.csv");
for (int i = 0; i < fileNames.Length; i++)
{ string file = #"read\" + Path.GetFileName(fileNames[i]);
var lines = from rawLine in File.ReadLines(file, Encoding.Default) where !string.IsNullOrEmpty(rawLine) && rawLine != ";;;;;;" select rawLine;}
There are multiple solution.
Convert enumerable to List, then delete from List. This is bit expensive.
Create one function. // You can apply multiple filter in case required.
public IEnumrable<string> GetData(ref IEnumrable<string> data)
{
return data.Where(c=> !String.Equals(c,"<<data that you want to filter>>");
}
As another option to read CSV file is to make use of TextFieldParser class. It has CommentTokens and Delimiters which may help you on this.
Specifying ; as a CommentTokens may help you.
Tutorial

Replace character at specific index in List<string>, but indexer is read only [duplicate]

This question already has answers here:
Is there an easy way to change a char in a string in C#?
(8 answers)
Closed 5 years ago.
This is kind of a basic question, but I learned programming in C++ and am just transitioning to C#, so my ignorance of the C# methods are getting in my way.
A client has given me a few fixed length files and they want the 484th character of every odd numbered record, skipping the first one (3, 5, 7, etc...) changed from a space to a 0. In my mind, I should be able to do something like the below:
static void Main(string[] args)
{
List<string> allLines = System.IO.File.ReadAllLines(#"C:\...").ToList();
foreach(string line in allLines)
{
//odd numbered logic here
line[483] = '0';
}
...
//write to new file
}
However, the property or indexer cannot be assigned to because it is read only. All my reading says that I have not set a setter for the variable, and I have tried what was shown at this SO article, but I am doing something wrong every time. Should what is shown in that article work? Should I do something else?
You cannot modify C# strings directly, because they are immutable. You can convert strings to char[], modify it, then make a string again, and write it to file:
File.WriteAllLines(
#"c:\newfile.txt"
, File.ReadAllLines(#"C:\...").Select((s, index) => {
if (index % 2 = 0) {
return s; // Even strings do not change
}
var chars = s.ToCharArray();
chars[483] = '0';
return new string(chars);
})
);
Since strings are immutable, you can't modify a single character by treating it as a char[] and then modify a character at a specific index. However, you can "modify" it by assigning it to a new string.
We can use the Substring() method to return any part of the original string. Combining this with some concatenation, we can take the first part of the string (up to the character you want to replace), add the new character, and then add the rest of the original string.
Also, since we can't directly modify the items in a collection being iterated over in a foreach loop, we can switch your loop to a for loop instead. Now we can access each line by index, and can modify them on the fly:
for(int i = 0; i < allLines.Length; i++)
{
if (allLines[i].Length > 483)
{
allLines[i] = allLines[i].Substring(0, 483) + "0" + allLines[i].Substring(484);
}
}
It's possible that, depending on how many lines you're processing and how many in-line concatenations you end up doing, there is some chance that using a StringBuilder instead of concatenation will perform better. Here is an alternate way to do this using a StringBuilder. I'll leave the perf measuring to you...
var sb = new StringBuilder();
for (int i = 0; i < allLines.Length; i++)
{
if (allLines[i].Length > 483)
{
sb.Clear();
sb.Append(allLines[i].Substring(0, 483));
sb.Append("0");
sb.Append(allLines[i].Substring(484));
allLines[i] = sb.ToString();
}
}
The first item after the foreach (string line in this case) is a local variable that has no scope outside the loop - that’s why you can’t assign a value to it. Try using a regular for loop instead.
Purpose of for each is meant to iterate over a container. It's read only in nature. You should use regular for loop. It will work.
static void Main(string[] args)
{
List<string> allLines = System.IO.File.ReadAllLines(#"C:\...").ToList();
for (int i=0;i<=allLines.Length;++i)
{
if (allLines[i].Length > 483)
{
allLines[i] = allLines[i].Substring(0, 483) + "0";
}
}
...
//write to new file
}

read a text file and search for string in memory efficient way (and abort when found)

I'm searching for a string in a text file (also includes XML). This is what I thought first:
using (StreamReader sr = File.OpenText(fileName))
{
string s = String.Empty;
while ((s = sr.ReadLine()) != null)
{
if (s.Contains("mySpecialString"))
return true;
}
}
return false;
I want to read line by line to minimize the amount of RAM used. When the string has been found it should abort the operation. The reason why I don't process it as XML is because it has to be parsed and would also consume more memory as necessary.
Another easy implementation would be
bool found = File.ReadAllText(path).Contains("mySpecialString") ? true : false;
but that would read the complete file into memory, which isn't what I want. On the other side it could have a performance increase.
Another one would be this
foreach (string line in File.ReadLines(path))
{
if (line.Contains("mySpecialString"))
{
return true;
}
}
return false;
But which one of them (or another one from you?) is more memory efficient?
You can use a query with File.ReadLines, so it only reads as many lines as it needs to, in order to satisfy your query. The Any() method will stop when it hits a line containing your string.
return File.ReadLines(fileName).Any(line => line.Contains("mySpecialString"));
I also prefer the accepted answer. Maybe i'm micro opimizing things here but you have asked for a memory efficient approach. Also consider that the text you are searching could also contain new-line characters like '\r', '\n' or "\r\n" and a large file could theoretically contain a single line which negates the benefit of ReadLines.
So you could use this method:
public static bool FileContainsString(string path, string str, bool caseSensitive = true)
{
if(String.IsNullOrEmpty(str))
return false;
using (var stream = new StreamReader(path))
while (!stream.EndOfStream)
{
bool stringFound = true;
for (int i = 0; i < str.Length; i++)
{
char strChar = caseSensitive ? str[i] : Char.ToUpperInvariant(str[i]);
char fileChar = caseSensitive ? (char)stream.Read() : Char.ToUpperInvariant((char)stream.Read());
if (strChar != fileChar)
{
stringFound = false;
break; // break for-loop, start again with first character at next position
}
}
if (stringFound)
return true;
}
return false;
}
bool containsString = FileContainsString(path, "mySpecialString", false); // ignore case if desired
Note that this might be the most efficient approach and hidden in a method also readable. But it has one drawback, it's not feasible to implement a culture-sensitive comparison because it looks at single characters and not at substrings.
So you have to keep some edge cases in mind where you can run into issues, like the famous turkish i example or surrogate pairs.
I think both of your solutions are the same. Read at the MSDN: https://msdn.microsoft.com/en-us/library/dd383503%28v=vs.110%29.aspx
There it says: "The ReadLines and ReadAllLines methods differ as follows: When you use ReadLines, you can start enumerating the collection of strings before the whole collection is returned"
The same article also suggests that ReadLines should be used in conjunction with LINQ to Objects.

Extremely Large Single-Line File Parse

I am downloading data from a site and the site gives the data to me in very large blocks. Within the very large block, there are "chunks" that I need to parse individually. These "chunks" begin with "(ClinicalData)" and end with "(/ClinicalData)". Therefore, an example string would look something like:
(ClinicalData)(ID="1")(/ClinicalData)(ClinicalData)(ID="2")(/ClinicalData)(ClinicalData)(ID="3")(/ClinicalData)(ClinicalData)(ID="4")(/ClinicalData)(ClinicalData)(ID="5")(/ClinicalData)
Under "ideal" circumstances, the block is meant to be one-single line of data, however sometimes there are erroneous newline characters. Since I want to parse the (ClinicalData) chunks within the block, I want to make my data parse-able line-by-line. Therefore, I take the text file, read it all into a StringBuilder, remove new-lines (just in case), and then insert my own newlines, that way I can read line-by-line.
StringBuilder dataToWrite = new StringBuilder(File.ReadAllText(filepath), Int32.MaxValue);
// Need to clear newline characters just in case they exist.
dataToWrite.Replace("\n", "");
// set my own newline characters so the data becomes parse-able by line
dataToWrite.Replace("<ClinicalData", "\n<ClinicalData");
// set the data back into a file, which is then used in a StreamReader to parse by lines.
File.WriteAllText(filepath, dataToWrite.ToString());
This has been working out great (albeit maybe not efficient, but at least it is friendly to me :)), until I have not encountered a chunk of data that is being given to me as a 280MB large file.
Now I am getting a System.OutOfMemoryException with this block and I just cannot figure out a way around it. I believe the issue is that StringBuilder cannot handle 280MB of straight text? Well, I have tried string splits, regex.match splits, and various other ways to break it into guaranteed "(ClinicalData) chunks, but I continue to get the memory exception. I have also had no luck in attempting to read pre-defined chunks (e.g.: using .ReadBytes).
Any suggestions on how to handle a 280MB large, potentially-but-might-not-actually-be single line of text would be great!
That's an extremely inefficient way to read a text file, let alone a large one. If you only need one pass, replacing or adding individual characters, you should use a StreamReader. If you only need one character of lookahead you only need to maintain a single intermediate state, something like:
enum ReadState
{
Start,
SawOpen
}
using (var sr = new StreamReader(#"path\to\clinic.txt"))
using (var sw = new StreamWriter(#"path\to\output.txt"))
{
var rs = ReadState.Start;
while (true)
{
var r = sr.Read();
if (r < 0)
{
if (rs == ReadState.SawOpen)
sw.Write('<');
break;
}
char c = (char) r;
if ((c == '\r') || (c == '\n'))
continue;
if (rs == ReadState.SawOpen)
{
if (c == 'C')
sw.WriteLine();
sw.Write('<');
rs = ReadState.Start;
}
if (c == '<')
{
rs = ReadState.SawOpen;
continue;
}
sw.Write(c);
}
}
First off, I don't think you need to put all the text in a StringBuilder, since you aren't even concatenating parts to it. You could just try the following:
File.ReadAllText(filepath).Replace("\n", "").Replace("<ClinicalData", "\n<ClinicalData");
Why not try a StreamReader for this task? You can pick a "chunk" size that you want to read by and then split up those chunks into the (ClinicalData)data(/ClinicalData) parts. Here is some detailed code on how to do this:
char[] buffer = new char[1024];
string remainder = string.Empty;
List<ClientData> list = new List<ClientData>();
using (StreamReader reader = File.OpenText(#"source.txt"))
{
while (reader.Read(buffer, 0, 1024) > 0)
{
remainder = Parse(remainder + new string(buffer), list);
}
}
with the following method:
string Parse(string value, List<ClientData> list)
{
string[] parts = value.Split(new string[1] { "</ClientData>" }, StringSplitOptions.None);
for (int i = 0; i < parts.Length - 1; i++)
list.Add(new ClientData(parts[i]));
return parts[parts.Length - 1];
}
and the ClientData class however you have it implemented:
class ClientData
{
public ClientData(string value)
{
// fill in however you are already parsing out ID, and other info
}
}
There are many ways to implement something like this, but hopefully this can help get you started.
StreamReader's ReadLine() method is only one of the many ways you can read the text from the file. You can read into a buffer with a specified length, and then parse out the ClinicalData tags. I can provide an example if you'd like.
http://msdn.microsoft.com/en-us/library/9kstw824%28v=vs.110%29.aspx
Alternately, if you are reading an XML file, XmlReader is another option.
http://msdn.microsoft.com/en-us/library/system.xml.xmlreader%28v=vs.110%29.aspx

Most memory efficient way to merge two files

I need to merge two files while also applying a sort. It is important the I keep the task light on memory usage. I need to create a console app in c# for this.
Input File 1:
Some Header
A12345334
A00123445
A44566555
B55677
B55683
B66489
record count: 6
Input File 2:
Some Header
A00123465
B99423445
record count: 2
So, I need to make sure that the third file should have all the "A" records coming first and then the "B" records followed by the Total record count.
Output File:
Some header
A12345334
A00123445
A44566555
A00123465
B99423445
B55677
B55683
B66489
record count: 8
Record sorting within "A" and "B" is not relevant.
Since your source files appear sorted, you can do with with very low memory usage.
Just open both input files as well as a new file for writing. Then compare the next available line from each input file and write the line that comes first to your output file. Each time you write a line to the output file, get the next line from the input file it came from.
Continue until both input files are finished.
If memory is an issue the easiest way to do this is probably going to be to read the records from both files, store them in a SQLite or SQL Server Compact database, and execute a SELECT query that returns a sorted record set. Make sure you have an index on the field you want to sort on.
That way, you don't have to store the records in memory, and you don't need any sorting algorithms; the database will store the records on disk and do your sorting for you.
Quick idea, assuming the records are already sorted in the original files:
Start looping through file 2, collecting all A-records
Once you reach the first B-record, start collecting those in a separate collection.
Read all of File 1.
Write out the content of the A-records collection from file 2, then append the contents read from file 1, followed by the B-records from file 2.
Visualized:
<A-data from file 2>
<A-data, followed by B-data from file 1>
<B-data from file 2>
If you are concerned about memory this is a perfect case for insertion sort and read one line at a time from each file. If that is not an issue read the whole thing into a list and just call sort the write it out.
If you can't even keep the whole sorted list in memory then a database or memory mapped file is you best bet.
Assuming your input files are already ordered:
Open Input files 1 and 2 and create the Output file.
Read the first record from file 1. If it starts with A, write it to the output file. Continue reading from input file 1 until you reach a record that starts with B.
Read the first record from file 2. If it start with A, write it to the output file. Continue reading from input file 2 until you reach a record that starts with B.
Go back to file 1, and write the 'B' record to the output file. Continue reading from input file 1 until you reach the end of the stream.
Go back to file 2, and write the 'B' record to the output file. Continue reading from input file 2 until you reach the end of the stream.
This method will prevent you from ever having to hold more than 2 rows of data in memory at a time.
i would recommend using StreamReader and StreamWriter for this application. So you can open a file using StreamWriter, copy all lines using StreamReader for file #1, then for file #2. This operations are very fast, have integrated buffers and are very lightweight.
if the input files are already sorted by A and B, you can switch between the source readers to make the output sorted.
Since you have two sorted sequences you just need to merge the two sequences into a single sequence, in much the same way the second half of the MergeSort algorithm works.
Unfortunately, given the interface that IEnumerable provides, it ends up a bit mess and copy-pasty, but it should perform quite well and use a very small memory footprint:
public class Wrapper<T>
{
public T Value { get; set; }
}
public static IEnumerable<T> Merge<T>(IEnumerable<T> first, IEnumerable<T> second, IComparer<T> comparer = null)
{
comparer = comparer ?? Comparer<T>.Default;
using (var secondIterator = second.GetEnumerator())
{
Wrapper<T> secondItem = null; //when the wrapper is null there are no more items in the second sequence
if (secondIterator.MoveNext())
secondItem = new Wrapper<T>() { Value = secondIterator.Current };
foreach (var firstItem in first)
{
if (secondItem != null)
{
while (comparer.Compare(firstItem, secondItem.Value) > 0)
{
yield return secondItem.Value;
if (secondIterator.MoveNext())
secondItem.Value = secondIterator.Current;
else
secondItem = null;
}
}
yield return firstItem;
yield return secondItem.Value;
while (secondIterator.MoveNext())
yield return secondIterator.Current;
}
}
}
Once you have a Merge function it's pretty trivial:
File.WriteAllLines("output.txt",
Merge(File.ReadLines("File1.txt"), File.ReadLines("File2.txt")))
The File ReadLines and WriteAllLines here each utilize IEnumerable and will stream the lines accordingly.
Here's the source code for the more generic/boiler plate solution for merge sorting 2 files.
public static void Merge(string inFile1, string inFile2, string outFile)
{
string line1 = null;
string line2 = null;
using (StreamReader sr1 = new StreamReader(inFile1))
{
using (StreamReader sr2 = new StreamReader(inFile2))
{
using (StreamWriter sw = new StreamWriter(outFile))
{
line1 = sr1.ReadLine();
line2 = sr2.ReadLine();
while(line1 != null && line2 != null)
{
// your comparison function here
// ex: (line1[0] < line2[0])
if(line1 < line2)
{
sw.WriteLine(line1);
line1 = sr1.ReadLine();
}
else
{
sw.WriteLine(line2);
line2 = sr2.ReadLine();
}
}
while(line1 != null)
{
sw.WriteLine(line1);
line1 = sr1.ReadLine();
}
while(line2 != null)
{
sw.WriteLine(line2);
line2 = sr2.ReadLine();
}
}
}
}
}
public void merge_click(Object sender, EventArgs e)
{
DataTable dt = new DataTable();
dt.Clear();
dt.Columns.Add("Name");
dt.Columns.Add("designation");
dt.Columns.Add("age");
dt.Columns.Add("year");
string[] lines = File.ReadAllLines(#"C:\Users\user1\Desktop\text1.txt", Encoding.UTF8);
string[] lines1 = File.ReadAllLines(#"C:\Users\user2\Desktop\text1.txt", Encoding.UTF8);
foreach (string line in lines)
{
string[] values = line.Split(',');
DataRow dr = dt.NewRow();
dr["Name"] = values[0].ToString();
dr["designation"] = values[1].ToString();
dr["age"] = values[2].ToString();
dr["year"] = values[3].ToString();
dt.Rows.Add(dr);
}
foreach (string line in lines1)
{
string[] values = line.Split(',');
DataRow dr = dt.NewRow();
dr["Name"] = values[0].ToString();
dr["designation"] = values[1].ToString();
dr["age"] = values[2].ToString();
dr["year"] = values[3].ToString();
dt.Rows.Add(dr);
}
grdstudents.DataSource = dt;
grdstudents.DataBind();
}

Categories

Resources