Read Text file and access elements of Text file separately in c# - c#

I am reading a text file consisting of two columns (contains x and y coordinates):
static void Main(string[] args)
{
string line;
using (StreamReader file = new StreamReader(#"C:\Point(x,y).txt"))
{
while ((line = file.ReadLine()) != null)
{
char[] delimiters = new char[] { '\t' };
string[] parts = line.Split(delimiters, StringSplitOptions.RemoveEmptyEntries);
for (int i = 0; i < parts.Length; i++)
{
Console.WriteLine(parts[i]);
}
}
}
// Suspend the screen.
Console.ReadLine();
I want to read the data in text file separately in two different variables
How do I access each value?

This is a case where Linq can come to the rescue. You can read all the lines and parse directly with Linq. I am assuming you have a tab delimited text file based on the line.Split you are doing. You can create a new anonymous type to capture row/line in your text file. You then have access to the fields as seen when writing out to the console, by looping through each item/anonymous type in the list. Make sure you have the using System.Linq namespace included which is usually there by default.
var records = File
.ReadAllLines(#"C:\Users\saira\Documents\Visual Studio 2017\Projects\Point(x,y).txt")
.Select(record => record.Split('\t'))
.Select(record => new
{
x = record[0],
y = record[1]
}).ToList();
foreach (var record in records)
{
Console.WriteLine($"The x coordinate is:{record.x}, and the y coordinate is {record.y}");
}

Related

Read and process 100 text files in c# in parallel

I have project that reads 100 text file with 5000 words in it.
I insert the words into a list. I have a second list that contains english stop words. I compare the two lists and delete the stop words from first list.
It takes 1 hour to run the application. I want to be parallelize it. How can I do that?
Heres my code:
private void button1_Click(object sender, EventArgs e)
{
List<string> listt1 = new List<string>();
string line;
for (int ii = 1; ii <= 49; ii++)
{
string d = ii.ToString();
using (StreamReader reader = new StreamReader(#"D" + d.ToString() + ".txt"))
while ((line = reader.ReadLine()) != null)
{
string[] words = line.Split(' ');
for (int i = 0; i < words.Length; i++)
{
listt1.Add(words[i].ToString());
}
}
listt1 = listt1.ConvertAll(d1 => d1.ToLower());
StreamReader reader2 = new StreamReader("stopword.txt");
List<string> listt2 = new List<string>();
string line2;
while ((line2 = reader2.ReadLine()) != null)
{
string[] words2 = line2.Split('\n');
for (int i = 0; i < words2.Length; i++)
{
listt2.Add(words2[i]);
}
listt2 = listt2.ConvertAll(d1 => d1.ToLower());
}
for (int i = 0; i < listt1.Count(); i++)
{
for (int j = 0; j < listt2.Count(); j++)
{
listt1.RemoveAll(d1 => d1.Equals(listt2[j]));
}
}
listt1=listt1.Distinct().ToList();
textBox1.Text = listt1.Count().ToString();
}
}
}
}
I fixed many things up with your code. I don't think you need multi-threading:
private void RemoveStopWords()
{
HashSet<string> stopWords = new HashSet<string>();
using (var stopWordReader = new StreamReader("stopword.txt"))
{
string line2;
while ((line2 = stopWordReader.ReadLine()) != null)
{
string[] words2 = line2.Split('\n');
for (int i = 0; i < words2.Length; i++)
{
stopWords.Add(words2[i].ToLower());
}
}
}
var fileWords = new HashSet<string>();
for (int fileNumber = 1; fileNumber <= 49; fileNumber++)
{
using (var reader = new StreamReader("D" + fileNumber.ToString() + ".txt"))
{
string line;
while ((line = reader.ReadLine()) != null)
{
foreach(var word in line.Split(' '))
{
fileWords.Add(word.ToLower());
}
}
}
}
fileWords.ExceptWith(stopWords);
textBox1.Text = fileWords.Count().ToString();
}
You are reading through the list of stopwords many times as well as continually adding to the list and re-attempting to remove the same stopwords over and again due to the way your code is structured. Your needs are also better matched to a HashSet than to a List, as it has set based operations and uniqueness already handled.
If you still wanted to make this parallel, you could do it by reading the stopword list once and passing it to an async method that will read the input file, remove the stopwords and return the resulting list, then you would need to merge the resulting lists after the asynchronous calls came back, but you had better test before deciding you need that, because that is quite a bit more work and complexity than this code already has.
If I understand you correctly, you want to:
Read all words from a file into a List
Remove all "stop words" from the List
Repeat for 99 more files, saving only the unique words
If this is correct, the code is pretty simple:
// The list of words to delete ("stop words")
var stopWords = new List<string> { "remove", "these", "words" };
// The list of files to check - you can get this list in other ways
var filesToCheck = new List<string>
{
#"f:\public\temp\temp1.txt",
#"f:\public\temp\temp2.txt",
#"f:\public\temp\temp3.txt"
};
// This list will contain all the unique words from all
// the files, except the ones in the "stopWords" list
var uniqueFilteredWords = new List<string>();
// Loop through all our files
foreach (var fileToCheck in filesToCheck)
{
// Read all the file text into a varaible
var fileText = File.ReadAllText(fileToCheck);
// Split the text into distinct words (splitting on null
// splits on all whitespace) and ignore empty lines
var fileWords = fileText.Split(null)
.Where(line => !string.IsNullOrWhiteSpace(line))
.Distinct();
// Add all the words from the file, except the ones in
// your "stop list" and those that are already in the list
uniqueFilteredWords.AddRange(fileWords.Except(stopWords)
.Where(word => !uniqueFilteredWords.Contains(word)));
}
This can be condensed into a single line with no explicit loop:
// This list will contain all the unique words from all
// the files, except the ones in the "stopWords" list
var uniqueFilteredWords = filesToCheck.SelectMany(fileToCheck =>
File.ReadAllText(fileToCheck)
.Split(null)
.Where(word => !string.IsNullOrWhiteSpace(word) &&
!stopWords.Any(stopWord => stopWord.Equals(word,
StringComparison.OrdinalIgnoreCase)))
.Distinct());
This code processed over 100 files with more than 12000 words each in less than a second (WAY less than a second... 0.0001782 seconds)
One issue I see here that can help improve performance is listt1.ConvertAll() will run in O(n) on the list. You are already looping to add the items to the list, why not convert them to lower case there. Also why not store the words in a hash set, so you can do look up and insertion in O(1). You could store the list of stop words in a hash set and when you are reading your text input see if the word is a stop word and if its not add it to the hash set to output the user.

StreamWriter C# formatting output

Problem Statement
In order to run gene annotation software, I need to prepare two types of files, vcard files and coverage tables, and there has to be one-to-one match of vcard to coverage table. Since Im running 2k samples, its hard to identify which file is not one-to-one match. I know that both files have unique identifier numbers, hence, if both folders have files that have same unique numbers, i treat that as "same" file
I made a program that compares two folders and reports unique entries in each folder. To do so, I made two list that contains unique file names to each directory.
I want to format the report file (tab delimited .txt file) such that it looks something like below:
Unique in fdr1 Unique in fdr2
file x file a
file y file b
file z file c
I find this difficult to do because I have to iterate twice (since I have two lists), but there is no way of going back to the previous line in StreamWriter as far as I know. Basically, once I iterate through the first list and fill the first column, how can I fill the second column with the second list?
Can someone help me out with this?
Thanks
If design of the code has to change (i.e. one list instead of two), please let me know
As requested by some user, this is how I was going to do (not working version)
// Write report
using (StreamWriter sw = new StreamWriter(dest_txt.Text + #"\" + "Report.txt"))
{
// Write headers
sw.WriteLine("Unique Entries in Folder1" + "\t" + "Unique Entries in Folder2");
// Write unique entries in fdr1
foreach(string file in fdr1FileList)
{
sw.WriteLine(file + "\t");
}
// Write unique entries in fdr2
foreach (string file in fdr2FileList)
{
sw.WriteLine(file + "\t");
}
sw.Dispose();
}
As requested for my approach for finding unique entries, here's my code snippet
Dictionary<int, bool> fdr1Dict = new Dictionary<int, bool>();
Dictionary<int, bool> fdr2Dict = new Dictionary<int, bool>();
List<string> fdr1FileList = new List<string>();
List<string> fdr2FileList = new List<string>();
string fdr1Path = folder1_txt.Text;
string fdr2Path = folder2_txt.Text;
// File names in the specified directory; path not included
string[] fdr1FileNames = Directory.GetFiles(fdr1Path).Select(Path.GetFileName).ToArray();
string[] fdr2FileNames = Directory.GetFiles(fdr2Path).Select(Path.GetFileName).ToArray();
// Iterate through the first directory, and add GL number to dictionary
for(int i = 0; i < fdr1FileNames.Length; i++)
{
// Grabs only the number from the file name
string number = Regex.Match(fdr1FileNames[i], #"\d+").ToString();
int glNumber;
// Make sure it is a number
if(Int32.TryParse(number, out glNumber))
{
fdr1Dict[glNumber] = true;
}
// If number not present, raise exception
else
{
throw new Exception(String.Format("GL Number not found in: {0}", fdr1FileNames[i]));
}
}
// Iterate through the second directory, and add GL number to dictionary
for (int i = 0; i < fdr2FileNames.Length; i++)
{
// Grabs only the number from the file name
string number = Regex.Match(fdr2FileNames[i], #"\d+").ToString();
int glNumber;
// Make sure it is a number
if (Int32.TryParse(number, out glNumber))
{
fdr2Dict[glNumber] = true;
}
// If number not present, raise exception
else
{
throw new Exception(String.Format("GL Number not found in: {0}", fdr2FileNames[i]));
}
}
// Iterate through the first directory, and find files that are unique to it
for (int i = 0; i < fdr1FileNames.Length; i++)
{
int glNumber = Int32.Parse(Regex.Match(fdr1FileNames[i], #"\d+").Value);
// If same file is not present in the second folder add to the list
if(!fdr2Dict[glNumber])
{
fdr1FileList.Add(fdr1FileNames[i]);
}
}
// Iterate through the second directory, and find files that are unique to it
for (int i = 0; i < fdr2FileNames.Length; i++)
{
int glNumber = Int32.Parse(Regex.Match(fdr2FileNames[i], #"\d+").Value);
// If same file is not present in the first folder add to the list
if (!fdr1Dict[glNumber])
{
fdr2FileList.Add(fdr2FileNames[i]);
}
I am a quite confident that this will work as I've tested it:
static void Main(string[] args)
{
var firstDir = #"Path1";
var secondDir = #"Path2";
var firstDirFiles = System.IO.Directory.GetFiles(firstDir);
var secondDirFiles = System.IO.Directory.GetFiles(secondDir);
print2Dirs(firstDirFiles, secondDirFiles);
}
private static void print2Dirs(string[] firstDirFile, string[] secondDirFiles)
{
var maxIndex = Math.Max(firstDirFile.Length, secondDirFiles.Length);
using (StreamWriter streamWriter = new StreamWriter("result.txt"))
{
streamWriter.WriteLine(string.Format("{0,-150}{1,-150}", "Unique in fdr1", "Unique in fdr2"));
for (int i = 0; i < maxIndex; i++)
{
streamWriter.WriteLine(string.Format("{0,-150}{1,-150}",
firstDirFile.Length > i ? firstDirFile[i] : string.Empty,
secondDirFiles.Length > i ? secondDirFiles[i] : string.Empty));
}
}
}
It's a quite simple code but if you need help understanding it just let me know :)
I would construct each line at a time. Something like this:
int row = 0;
string[] fdr1FileList = new string[0];
string[] fdr2FileList = new string[0];
while (row < fdr1FileList.Length || row < fdr2FileList.Length)
{
string rowText = "";
rowText += (row >= fdr1FileList.Length ? "\t" : fdr1FileList[row] + "\t");
rowText += (row >= fdr2FileList.Length ? "\t" : fdr2FileList[row]);
row++;
}
Try something like this:
static void Main(string[] args)
{
Dictionary<int, string> fdr1Dict = FilesToDictionary(Directory.GetFiles("path1"));
Dictionary<int, string> fdr2Dict = FilesToDictionary(Directory.GetFiles("path2"));
var unique_f1 = fdr1Dict.Where(f1 => !fdr2Dict.ContainsKey(f1.Key)).ToArray();
var unique_f2 = fdr2Dict.Where(f2 => !fdr1Dict.ContainsKey(f2.Key)).ToArray();
int f1_size = unique_f1.Length;
int f2_size = unique_f2.Length;
int list_length = 0;
if (f1_size > f2_size)
{
list_length = f1_size;
Array.Resize(ref unique_f2, list_length);
}
else
{
list_length = f2_size;
Array.Resize(ref unique_f1, list_length);
}
using (StreamWriter writer = new StreamWriter("output.txt"))
{
writer.WriteLine(string.Format("{0,-30}{1,-30}", "Unique in fdr1", "Unique in fdr2"));
for (int i = 0; i < list_length; i++)
{
writer.WriteLine(string.Format("{0,-30}{1,-30}", unique_f1[i].Value, unique_f2[i].Value));
}
}
}
static Dictionary<int, string> FilesToDictionary(string[] filenames)
{
Dictionary<int, string> dict = new Dictionary<int, string>();
for (int i = 0; i < filenames.Length; i++)
{
int glNumber;
string filename = Path.GetFileName(filenames[i]);
string number = Regex.Match(filename, #"\d+").ToString();
if (int.TryParse(number, out glNumber))
dict.Add(glNumber, filename);
}
return dict;
}

How to Stream string data from a txt file into an array

I'm doing this exercise from a lab. the instructions are as follows
This method should read the product catalog from a text file called “catalog.txt” that you should
create alongside your project. Each product should be on a separate line.Use the instructions in the video to create the file and add it to your project, and to return an
array with the first 200 lines from the file (use the StreamReader class and a while loop to read
from the file). If the file has more than 200 lines, ignore them. If the file has less than 200 lines,
it’s OK if some of the array elements are empty (null).
I don't understand how to stream data into the string array any clarification would be greatly appreciated!!
static string[] ReadCatalogFromFile()
{
//create instance of the catalog.txt
StreamReader readCatalog = new StreamReader("catalog.txt");
//store the information in this array
string[] storeCatalog = new string[200];
int i = 0;
//test and store the array information
while (storeCatalog != null)
{
//store each string in the elements of the array?
storeCatalog[i] = readCatalog.ReadLine();
i = i + 1;
if (storeCatalog != null)
{
//test to see if its properly stored
Console.WriteLine(storeCatalog[i]);
}
}
readCatalog.Close();
Console.ReadLine();
return storeCatalog;
}
Here are some hints:
int i = 0;
This needs to be outside your loop (now it is reset to 0 each time).
In your while() you should check the result of readCatalog() and/or the maximum number of lines to read (i.e. the size of your array)
Thus: if you reached the end of the file -> stop - or if your array is full -> stop.
static string[] ReadCatalogFromFile()
{
var lines = new string[200];
using (var reader = new StreamReader("catalog.txt"))
for (var i = 0; i < 200 && !reader.EndOfStream; i++)
lines[i] = reader.ReadLine();
return lines;
}
A for-loop is used when you know the exact number of iterations beforehand. So you can say it should iterate exactly 200 time so you won't cross the index boundaries. At the moment you just check that your array isn't null, which it will never be.
using(var readCatalog = new StreamReader("catalog.txt"))
{
string[] storeCatalog = new string[200];
for(int i = 0; i<200; i++)
{
string temp = readCatalog.ReadLine();
if(temp != null)
storeCatalog[i] = temp;
else
break;
}
return storeCatalog;
}
As soon as there are no more lines in the file, temp will be null and the loop will be stopped by the break.
I suggest you use your disposable resources (like any stream) in a using statement. After the operations in the braces, the resource will automatically get disposed.

c# string array out of memory

I tried to take 2 txt file and combine the line that every line in file1 concat with every line in file2
Example:
file1:
a
b
file2:
c
d
result:
a c
a d
b c
b d
This is the code:
{
//int counter = 0;
string[] lines1 = File.ReadLines("e:\\1.txt").ToArray();
string[] lines2 = File.ReadLines("e:\\2.txt").ToArray();
int len1 = lines1.Length;
int len2 = lines2.Length;
string[] names = new string[len1 * len2];
int i = 0;
int finish = 0;
//Console.WriteLine("Check this");
for (i = 0; i < lines2.Length; i++)
{
for (int j = 0; j < lines1.Length; j++)
{
names[finish] = lines2[i] + ' ' + lines1[j];
finish++;
}
}
using (System.IO.StreamWriter file = new System.IO.StreamWriter(#"E:\text.txt"))
{
foreach (string line in names)
{
// If the line doesn't contain the word 'Second', write the line to the file.
file.WriteLine(line);
}
}
}
I get this exception:
"An unhandled exception of type 'System.OutOfMemoryException' occurred
in ConsoleApplication2.exe" on this line:
string[] names = new string[len1 * len2];
Is there other way to combine this 2 files without getting OutOfMemoryException?
something like
using (var output = new StreamWriter(#"E:\text.txt"))
{
foreach(var line1 in File.ReadLines("e:\\1.txt"))
{
foreach(var line2 in File.ReadLines("e:\\2.txt"))
{
output.WriteLine("{0} {1}", line1, line2);
}
}
}
Unless the lines are very long, this should avoid an OutOfMemoryException.
It looks like you want a cartesian product rather than concatenation. Instead of loading all lines into memory, use ReadLines with SelectMany, this may not be fast but will avoid the exception:
var file1 = File.ReadLines("e:\\1.txt");
var file2 = File.ReadLines("e:\\2.txt");
var lines = file1.SelectMany(x => file2.Select(y => string.Join(" ", x, y));
File.WriteAllLines("output.txt", lines);
Use an StringBuilder instance instead of concatenating strings. Strings are unmutable in .Net, so each change to any instance creates a new one, consuming the available memory.
Make names become StringBuilder[] names and use the Append method to construct your result.
If your files are large (this is the reason for out of memory) you should not (never) load the complete file into memory. Especially the result file (with size = size1 * size2) will get very large.
I'd suggest to use a StreamReader to read through the input files line by line and
use a StreamWriter to write the result file line by line.
Wth this technique you may process arbuitrarilly large files (as long as the result fits on your hard disk)
Use stringbuilder instead of appending via "+"
Use "List names" and "Add" your combined lines to the List.
So you don't need to alloc memory.

List sorting by multiple parameters

I have a .csv with the following headers and an example line from the file.
AgentID,Profile,Avatar,In_Time,Out_Time,In_Location,Out_Location,Target_Speed(m/s),Distance_Traveled(m),Congested_Duration(s),Total_Duration(s),LOS_A_Duration(s),LOS_B_Duration(s),LOS_C_Duration(s),LOS_D_Duration(s),LOS_E_Duration(s),LOS_F_Duration(s)
2177,DefaultProfile,DarkGreen_LowPoly,08:00:00,08:00:53,East12SubwayportalActor,EWConcourseportalActor,1.39653,60.2243,5.4,52.8,26.4,23,3.4,0,0,0
I need to sort this .csv by the 4th column (In_time) by increasing time ( 08:00:00, 08:00:01) and the 6th (In_Location) by alphabetical direction (e.g. East, North, etc).
So far my code looks like this:
List<string> list = new List<string>();
using (StreamReader reader = new StreamReader("JourneyTimes.csv"))
{
string line;
while ((line = reader.ReadLine()) != null)
{
line.Split(',');
list.Add(line);
}
I read in the .csv and split it using a comma (there are no other commas so this is not a concern). I then add each line to a list. My issue is how do I sort the list on two parameters and by the headers of the .csv.
I have been looking all day at this, I am relatively new to programming, this is my first program so I apologize for my lack of knowledge.
You can use LINQ OrderBy/ThenBy:
e.g.
listOfObjects.OrderBy (c => c.LastName).ThenBy (c => c.FirstName)
But first off, you should map your CSV line to some object.
To map CSV line to object you can predefine some type or create it dynamically
from line in File.ReadLines(fileName).Skip(1) //header
let columns = line.Split(',') //really basic CSV parsing, consider removing empty entries and supporting quotes
select new
{
AgentID = columns[0],
Profile = int.Parse(columns[1]),
Avatar = float.Parse(columns[2])
//other properties
}
And be aware that like many other LINQ methods, these two use deferred execution
You are dealing with two distinct problems.
First, ordering two columns in C# can be achieved with OrderBy, ThenBy
public class SpreadsheetExample
{
public DateTime InTime { get; set; }
public string InLocation { get; set; }
public SpreadsheetExample(DateTime inTime, string inLocation)
{
InTime = inTime;
InLocation = inLocation;
}
public static List<SpreadsheetExample> LoadMockData()
{
int maxMock = 10;
Random random = new Random();
var result = new List<SpreadsheetExample>();
for (int mockCount = 0; mockCount < maxMock; mockCount++)
{
var genNumber = random.Next(1, maxMock);
var genDate = DateTime.Now.AddDays(genNumber);
result.Add(new SpreadsheetExample(genDate, "Location" + mockCount));
}
return result;
}
}
internal class Class1
{
private static void Main()
{
var mockData = SpreadsheetExample.LoadMockData();
var orderedResult = mockData.OrderBy(m => m.InTime).ThenBy(m => m.InLocation);//Order, ThenBy can be used to perform ordering of two columns
foreach (var item in orderedResult)
{
Console.WriteLine("{0} : {1}", item.InTime, item.InLocation);
}
}
}
Now you can tackle the second issue of moving data into a class from Excel. VSTO is what you are looking for. There are lots of examples online. Follow the example I posted above. Replace your custom class in place of SpreadSheetExample.
You may use a DataTable:
var lines = File.ReadAllLines("test.csv");
DataTable dt = new DataTable();
var columNames = lines[0].Split(new char[] { ',' });
for (int i = 0; i < columNames.Length; i++)
{
dt.Columns.Add(columNames[i]);
}
for (int i = 1; i < lines.Length; i++)
{
dt.Rows.Add(lines[i].Split(new char[] { ',' }));
}
var rows = dt.Rows.Cast<DataRow>();
var result = rows.OrderBy(i => i["In_time"])
.ThenBy(i => i["In_Location"]);
// sum
var sum = rows.Sum(i => Int32.Parse(i["AgentID"].ToString()));

Categories

Resources