I have a piece of code that is adding lines of text to a System.IO.StringWriter.
When it gets above a certain size, I want to purge lines from the beginning.
How do I do that? Can it be done?
System.IO.StringWriter log = new System.IO.StringWriter();
log.WriteLine("some text");
log.WriteLine("more text");
// some how remove the first line ????
A possible solution to your problem involves the use of the Queue class. You can add your text to this object and when it reaches a certain size you start trimming away the initial data
For example
void Main()
{
int maxQueueSize = 50;
var lines = File.ReadAllLines(filePath);
Queue<string> q = new Queue<string>(lines);
// Here you should check for files bigger than your limit
....
// Trying to add too many elements
for (int x = 0; x < maxQueueSize * 2; x++)
{
// Remove the first if too many elements
if(q.Count == maxQueueSize)
q.Dequeue();
// as an example, add the x converted to string
q.Enqueue(x.ToString());
}
// Back to disk
File.WriteAllLines(filePath, q.ToList());
}
System.IO.StringWriter log = new System.IO.StringWriter();
log.WriteLine("some text");
log.WriteLine("more text");
// some how remove the first line ????
var sb = log.GetStringBuilder(); //get the underlying StringBuilder
var newLinePosition = sb.ToString().IndexOf(Environment.NewLine); //find the first newline
sb.Remove(0, newLinePosition + Environment.NewLine.Length); //remove from start to the newline... including the newline itself
You can, instead of writing to a stream write to a different data structure (such as a list) and use an iterator to loop over your lines and replace them if you hit a certain threshold.
List<string> log = new List<string>();
int idx = 0;
//...
if (idx > 10) // your max amount of messages
{
idx = 0;
}
if (log.Count < idx)
{
log.Add("more Text");
}
else
{
log[idx] = "more Text";
}
of course you should wrap this in a class for logging.
Related
string filePath = #"C:\Users\Me\Desktop\Palindromes\palindromes.txt";
List<string> lines = File.ReadAllLines(filePath).ToList();
var meStack = new Stack<string>();
for (int i = 0; i < lines.Count; i++)
{
string pali;
pali = lines.RemoveAt(i);
meStack.Push(pali[i]);
}
Basically I need to Remove each element (in the txt there are 40 lines) from the list and then Push each one onto a stack.
Why even make a list List<String>? ReadAllLines responds with a String[]. And Stack takes an array as constructor parameter... So, would code below do the job for you?
string filePath = #"C:\Users\Me\Desktop\Palindromes\palindromes.txt";
var meStack = new Stack<string>(File.ReadAllLines(filePath));
Do not RemoveAt but Clear (if necessary) the lines list at the very end:
for (int i = 0; i < lines.Count; ++i)
meStack.Push(lines[i]);
lines.Clear();
Or even (we can get rid of list at all):
string filePath = #"C:\Users\Me\Desktop\Palindromes\palindromes.txt";
var meStack = new Stack<string>();
foreach (var item in File.ReadLines(filePath))
meStack.Push(item);
You can simplify it to
lines.ForEach(meStack.Push);
lines.Clear();
Your code with some comments:
string filePath = #"C:\Users\Me\Desktop\Palindromes\palindromes.txt";
List<string> lines = File.ReadAllLines(filePath).ToList();
var meStack = new Stack<string>();
for (int i = 0; i < lines.Count; i++)
{
string pali;
pali = lines.RemoveAt(i); // < this will return AND REMOVE the line from the list.
// now, what was line i+1 is now line i, next iteration
// will return and remove (the new) line i+1, though,
// skipping one line.
meStack.Push(pali[i]); // here you push one char (the ith) of the string (the line you
// just removed) to the stack which _may_ cause an
// IndexOutOfBounds! (if "i" >= pali.Length )
}
Now since I do not want to reiterate the other (great) answers, here is one where you can actually use RemoveAt:
while( lines.Count > 0 ) // RemoveAt will decrease Count with each iteration
{
meStack.Push(lines.RemoveAt(0)); // Push the whole line that is returned.
// Mind there is hardcoded "0" -> we always remove and push the first
// item of the list.
}
Which is not the best solution, just another alternative.
I have project that reads 100 text file with 5000 words in it.
I insert the words into a list. I have a second list that contains english stop words. I compare the two lists and delete the stop words from first list.
It takes 1 hour to run the application. I want to be parallelize it. How can I do that?
Heres my code:
private void button1_Click(object sender, EventArgs e)
{
List<string> listt1 = new List<string>();
string line;
for (int ii = 1; ii <= 49; ii++)
{
string d = ii.ToString();
using (StreamReader reader = new StreamReader(#"D" + d.ToString() + ".txt"))
while ((line = reader.ReadLine()) != null)
{
string[] words = line.Split(' ');
for (int i = 0; i < words.Length; i++)
{
listt1.Add(words[i].ToString());
}
}
listt1 = listt1.ConvertAll(d1 => d1.ToLower());
StreamReader reader2 = new StreamReader("stopword.txt");
List<string> listt2 = new List<string>();
string line2;
while ((line2 = reader2.ReadLine()) != null)
{
string[] words2 = line2.Split('\n');
for (int i = 0; i < words2.Length; i++)
{
listt2.Add(words2[i]);
}
listt2 = listt2.ConvertAll(d1 => d1.ToLower());
}
for (int i = 0; i < listt1.Count(); i++)
{
for (int j = 0; j < listt2.Count(); j++)
{
listt1.RemoveAll(d1 => d1.Equals(listt2[j]));
}
}
listt1=listt1.Distinct().ToList();
textBox1.Text = listt1.Count().ToString();
}
}
}
}
I fixed many things up with your code. I don't think you need multi-threading:
private void RemoveStopWords()
{
HashSet<string> stopWords = new HashSet<string>();
using (var stopWordReader = new StreamReader("stopword.txt"))
{
string line2;
while ((line2 = stopWordReader.ReadLine()) != null)
{
string[] words2 = line2.Split('\n');
for (int i = 0; i < words2.Length; i++)
{
stopWords.Add(words2[i].ToLower());
}
}
}
var fileWords = new HashSet<string>();
for (int fileNumber = 1; fileNumber <= 49; fileNumber++)
{
using (var reader = new StreamReader("D" + fileNumber.ToString() + ".txt"))
{
string line;
while ((line = reader.ReadLine()) != null)
{
foreach(var word in line.Split(' '))
{
fileWords.Add(word.ToLower());
}
}
}
}
fileWords.ExceptWith(stopWords);
textBox1.Text = fileWords.Count().ToString();
}
You are reading through the list of stopwords many times as well as continually adding to the list and re-attempting to remove the same stopwords over and again due to the way your code is structured. Your needs are also better matched to a HashSet than to a List, as it has set based operations and uniqueness already handled.
If you still wanted to make this parallel, you could do it by reading the stopword list once and passing it to an async method that will read the input file, remove the stopwords and return the resulting list, then you would need to merge the resulting lists after the asynchronous calls came back, but you had better test before deciding you need that, because that is quite a bit more work and complexity than this code already has.
If I understand you correctly, you want to:
Read all words from a file into a List
Remove all "stop words" from the List
Repeat for 99 more files, saving only the unique words
If this is correct, the code is pretty simple:
// The list of words to delete ("stop words")
var stopWords = new List<string> { "remove", "these", "words" };
// The list of files to check - you can get this list in other ways
var filesToCheck = new List<string>
{
#"f:\public\temp\temp1.txt",
#"f:\public\temp\temp2.txt",
#"f:\public\temp\temp3.txt"
};
// This list will contain all the unique words from all
// the files, except the ones in the "stopWords" list
var uniqueFilteredWords = new List<string>();
// Loop through all our files
foreach (var fileToCheck in filesToCheck)
{
// Read all the file text into a varaible
var fileText = File.ReadAllText(fileToCheck);
// Split the text into distinct words (splitting on null
// splits on all whitespace) and ignore empty lines
var fileWords = fileText.Split(null)
.Where(line => !string.IsNullOrWhiteSpace(line))
.Distinct();
// Add all the words from the file, except the ones in
// your "stop list" and those that are already in the list
uniqueFilteredWords.AddRange(fileWords.Except(stopWords)
.Where(word => !uniqueFilteredWords.Contains(word)));
}
This can be condensed into a single line with no explicit loop:
// This list will contain all the unique words from all
// the files, except the ones in the "stopWords" list
var uniqueFilteredWords = filesToCheck.SelectMany(fileToCheck =>
File.ReadAllText(fileToCheck)
.Split(null)
.Where(word => !string.IsNullOrWhiteSpace(word) &&
!stopWords.Any(stopWord => stopWord.Equals(word,
StringComparison.OrdinalIgnoreCase)))
.Distinct());
This code processed over 100 files with more than 12000 words each in less than a second (WAY less than a second... 0.0001782 seconds)
One issue I see here that can help improve performance is listt1.ConvertAll() will run in O(n) on the list. You are already looping to add the items to the list, why not convert them to lower case there. Also why not store the words in a hash set, so you can do look up and insertion in O(1). You could store the list of stop words in a hash set and when you are reading your text input see if the word is a stop word and if its not add it to the hash set to output the user.
I am loading a file using File.ReadLines method (Files could get very large so I used this rather than ReadAllLines)
I need to access each line and perform an action on it. So my code is like this
IEnumerable<String> lines = File.ReadLines("c:\myfile.txt", new UTF8Encoding());
StringBuilder sb = new StringBuilder();
int totalLines = lines.Count(); //used for progress calculation
//use for instead of foreach here - easier to know the line I'm on for progress percent complete calculation
for(int i = 0; i < totalLines; i++){
//for example get the line and do something
sb.Append(lines.ElementAt(i) + "\r\n");
//get the line again using ElementAt(i) and do something else
//...ElementAt(I)...
}
So my bottleneck is each time I access ElementAt(i)because it has to iterate over the entire IEmumerable to get to position i.
Is there any way to keep using File.ReadLines, but improve this somehow?
EDIT - the reason I count at the beginning is so I can calculate progress complete for display to the user. Which is why I removed foreach in favor of the for.
How about using foreach? It's designed to handle exactly this situation.
IEnumerable<String> lines = File.ReadLines("c:\myfile.txt", new UTF8Encoding());
StringBuilder sb = new StringBuilder();
string previousLine = null;
int lineCounter = 0;
int totalLines = lines.Count();
foreach (string line in lines) {
// show progress
float done = ++lineCounter/totalLines;
Debug.WriteLine($"{done*100:0.00}% complete");
//get the line and do something
sb.AppendLine(line);
//do something else, like look at the previous line to compare
if (line == previousLine) {
Debug.WriteLine($"Line {lineCounter} is the same as the previous line.");
}
previousLine = line;
}
Sure, you can use a foreach instead the for loop, so you don't have to go back and reference the line via its index:
foreach (string line in lines)
{
sb.AppendLine(line);
}
You will also no longer need the int totalLines = lines.Count(); line because you don't need the count for anything (unless you're using somewhere you're not showing).
Problem Statement
In order to run gene annotation software, I need to prepare two types of files, vcard files and coverage tables, and there has to be one-to-one match of vcard to coverage table. Since Im running 2k samples, its hard to identify which file is not one-to-one match. I know that both files have unique identifier numbers, hence, if both folders have files that have same unique numbers, i treat that as "same" file
I made a program that compares two folders and reports unique entries in each folder. To do so, I made two list that contains unique file names to each directory.
I want to format the report file (tab delimited .txt file) such that it looks something like below:
Unique in fdr1 Unique in fdr2
file x file a
file y file b
file z file c
I find this difficult to do because I have to iterate twice (since I have two lists), but there is no way of going back to the previous line in StreamWriter as far as I know. Basically, once I iterate through the first list and fill the first column, how can I fill the second column with the second list?
Can someone help me out with this?
Thanks
If design of the code has to change (i.e. one list instead of two), please let me know
As requested by some user, this is how I was going to do (not working version)
// Write report
using (StreamWriter sw = new StreamWriter(dest_txt.Text + #"\" + "Report.txt"))
{
// Write headers
sw.WriteLine("Unique Entries in Folder1" + "\t" + "Unique Entries in Folder2");
// Write unique entries in fdr1
foreach(string file in fdr1FileList)
{
sw.WriteLine(file + "\t");
}
// Write unique entries in fdr2
foreach (string file in fdr2FileList)
{
sw.WriteLine(file + "\t");
}
sw.Dispose();
}
As requested for my approach for finding unique entries, here's my code snippet
Dictionary<int, bool> fdr1Dict = new Dictionary<int, bool>();
Dictionary<int, bool> fdr2Dict = new Dictionary<int, bool>();
List<string> fdr1FileList = new List<string>();
List<string> fdr2FileList = new List<string>();
string fdr1Path = folder1_txt.Text;
string fdr2Path = folder2_txt.Text;
// File names in the specified directory; path not included
string[] fdr1FileNames = Directory.GetFiles(fdr1Path).Select(Path.GetFileName).ToArray();
string[] fdr2FileNames = Directory.GetFiles(fdr2Path).Select(Path.GetFileName).ToArray();
// Iterate through the first directory, and add GL number to dictionary
for(int i = 0; i < fdr1FileNames.Length; i++)
{
// Grabs only the number from the file name
string number = Regex.Match(fdr1FileNames[i], #"\d+").ToString();
int glNumber;
// Make sure it is a number
if(Int32.TryParse(number, out glNumber))
{
fdr1Dict[glNumber] = true;
}
// If number not present, raise exception
else
{
throw new Exception(String.Format("GL Number not found in: {0}", fdr1FileNames[i]));
}
}
// Iterate through the second directory, and add GL number to dictionary
for (int i = 0; i < fdr2FileNames.Length; i++)
{
// Grabs only the number from the file name
string number = Regex.Match(fdr2FileNames[i], #"\d+").ToString();
int glNumber;
// Make sure it is a number
if (Int32.TryParse(number, out glNumber))
{
fdr2Dict[glNumber] = true;
}
// If number not present, raise exception
else
{
throw new Exception(String.Format("GL Number not found in: {0}", fdr2FileNames[i]));
}
}
// Iterate through the first directory, and find files that are unique to it
for (int i = 0; i < fdr1FileNames.Length; i++)
{
int glNumber = Int32.Parse(Regex.Match(fdr1FileNames[i], #"\d+").Value);
// If same file is not present in the second folder add to the list
if(!fdr2Dict[glNumber])
{
fdr1FileList.Add(fdr1FileNames[i]);
}
}
// Iterate through the second directory, and find files that are unique to it
for (int i = 0; i < fdr2FileNames.Length; i++)
{
int glNumber = Int32.Parse(Regex.Match(fdr2FileNames[i], #"\d+").Value);
// If same file is not present in the first folder add to the list
if (!fdr1Dict[glNumber])
{
fdr2FileList.Add(fdr2FileNames[i]);
}
I am a quite confident that this will work as I've tested it:
static void Main(string[] args)
{
var firstDir = #"Path1";
var secondDir = #"Path2";
var firstDirFiles = System.IO.Directory.GetFiles(firstDir);
var secondDirFiles = System.IO.Directory.GetFiles(secondDir);
print2Dirs(firstDirFiles, secondDirFiles);
}
private static void print2Dirs(string[] firstDirFile, string[] secondDirFiles)
{
var maxIndex = Math.Max(firstDirFile.Length, secondDirFiles.Length);
using (StreamWriter streamWriter = new StreamWriter("result.txt"))
{
streamWriter.WriteLine(string.Format("{0,-150}{1,-150}", "Unique in fdr1", "Unique in fdr2"));
for (int i = 0; i < maxIndex; i++)
{
streamWriter.WriteLine(string.Format("{0,-150}{1,-150}",
firstDirFile.Length > i ? firstDirFile[i] : string.Empty,
secondDirFiles.Length > i ? secondDirFiles[i] : string.Empty));
}
}
}
It's a quite simple code but if you need help understanding it just let me know :)
I would construct each line at a time. Something like this:
int row = 0;
string[] fdr1FileList = new string[0];
string[] fdr2FileList = new string[0];
while (row < fdr1FileList.Length || row < fdr2FileList.Length)
{
string rowText = "";
rowText += (row >= fdr1FileList.Length ? "\t" : fdr1FileList[row] + "\t");
rowText += (row >= fdr2FileList.Length ? "\t" : fdr2FileList[row]);
row++;
}
Try something like this:
static void Main(string[] args)
{
Dictionary<int, string> fdr1Dict = FilesToDictionary(Directory.GetFiles("path1"));
Dictionary<int, string> fdr2Dict = FilesToDictionary(Directory.GetFiles("path2"));
var unique_f1 = fdr1Dict.Where(f1 => !fdr2Dict.ContainsKey(f1.Key)).ToArray();
var unique_f2 = fdr2Dict.Where(f2 => !fdr1Dict.ContainsKey(f2.Key)).ToArray();
int f1_size = unique_f1.Length;
int f2_size = unique_f2.Length;
int list_length = 0;
if (f1_size > f2_size)
{
list_length = f1_size;
Array.Resize(ref unique_f2, list_length);
}
else
{
list_length = f2_size;
Array.Resize(ref unique_f1, list_length);
}
using (StreamWriter writer = new StreamWriter("output.txt"))
{
writer.WriteLine(string.Format("{0,-30}{1,-30}", "Unique in fdr1", "Unique in fdr2"));
for (int i = 0; i < list_length; i++)
{
writer.WriteLine(string.Format("{0,-30}{1,-30}", unique_f1[i].Value, unique_f2[i].Value));
}
}
}
static Dictionary<int, string> FilesToDictionary(string[] filenames)
{
Dictionary<int, string> dict = new Dictionary<int, string>();
for (int i = 0; i < filenames.Length; i++)
{
int glNumber;
string filename = Path.GetFileName(filenames[i]);
string number = Regex.Match(filename, #"\d+").ToString();
if (int.TryParse(number, out glNumber))
dict.Add(glNumber, filename);
}
return dict;
}
I'm doing this exercise from a lab. the instructions are as follows
This method should read the product catalog from a text file called “catalog.txt” that you should
create alongside your project. Each product should be on a separate line.Use the instructions in the video to create the file and add it to your project, and to return an
array with the first 200 lines from the file (use the StreamReader class and a while loop to read
from the file). If the file has more than 200 lines, ignore them. If the file has less than 200 lines,
it’s OK if some of the array elements are empty (null).
I don't understand how to stream data into the string array any clarification would be greatly appreciated!!
static string[] ReadCatalogFromFile()
{
//create instance of the catalog.txt
StreamReader readCatalog = new StreamReader("catalog.txt");
//store the information in this array
string[] storeCatalog = new string[200];
int i = 0;
//test and store the array information
while (storeCatalog != null)
{
//store each string in the elements of the array?
storeCatalog[i] = readCatalog.ReadLine();
i = i + 1;
if (storeCatalog != null)
{
//test to see if its properly stored
Console.WriteLine(storeCatalog[i]);
}
}
readCatalog.Close();
Console.ReadLine();
return storeCatalog;
}
Here are some hints:
int i = 0;
This needs to be outside your loop (now it is reset to 0 each time).
In your while() you should check the result of readCatalog() and/or the maximum number of lines to read (i.e. the size of your array)
Thus: if you reached the end of the file -> stop - or if your array is full -> stop.
static string[] ReadCatalogFromFile()
{
var lines = new string[200];
using (var reader = new StreamReader("catalog.txt"))
for (var i = 0; i < 200 && !reader.EndOfStream; i++)
lines[i] = reader.ReadLine();
return lines;
}
A for-loop is used when you know the exact number of iterations beforehand. So you can say it should iterate exactly 200 time so you won't cross the index boundaries. At the moment you just check that your array isn't null, which it will never be.
using(var readCatalog = new StreamReader("catalog.txt"))
{
string[] storeCatalog = new string[200];
for(int i = 0; i<200; i++)
{
string temp = readCatalog.ReadLine();
if(temp != null)
storeCatalog[i] = temp;
else
break;
}
return storeCatalog;
}
As soon as there are no more lines in the file, temp will be null and the loop will be stopped by the break.
I suggest you use your disposable resources (like any stream) in a using statement. After the operations in the braces, the resource will automatically get disposed.