So I am creating a list of lines in a text file like this:
var lines = File.ReadAllLines("C:\\FileToSearch.txt")
.Where(x => !x.EndsWith("999999999999"));
and looping through the lines like this
foreach (var line in lines)
{
if (lineCounter == 1)
{
outputResults.Add(oData.ToCanadianFormatFileHeader());
}
else if (lineCounter == 2)
{
outputResults.Add(oData.ToCanadianFormatBatchHeader());
}
else
{
oData.FromUsLineFormat(line);
outputResults.Add(oData.ToCanadianLineFormat());
}
lineCounter = lineCounter + 1;
textBuilder += (line + "<br>");
}
Similary like I access the first two rows I would like to access the last and second last row individually
Here you can take advantage of LINQ once again:
var numberOfLinesToTake = 2;
var lastTwoLines = lines
.Skip(Math.Max(0, lines.Count() - numberOfLinesToTake))
.Take(numberOfLinesToTake);
var secondToLastLine = lastTwoLines.First();
var lastLine = lastTwoLines.Last();
Or, if you want to retrieve them individually:
var lastLine = lines.Last();
var secondToLastLine =
lines.Skip(Math.Max(0, lines.Count() - 2)).Take(1).First();
I added .First() to the end, because .Take(1) will return an array containing one item, which we then grab with First(). This can probably be optimized.
Again, you might want to familiarize yourself with LINQ since it's a real time-saver sometimes.
This one of the problems of using var when it's not appropriate.
ReadAllLines returns an array of strings: string[]
You can get the length of the array and index backwards from the end.
Related
so bassicly I've got 2 different List
List<string> result = new List<string>;
List<string> outputs= new List<string>;
the outputs List getting filled in an loop and after the loop ends it will add the outputs list in the result list.
So that outputs will have this different values after the loop ends.
For example after the first loop ends the output list has data like:
exampleData1
exampleData2
exampleData3
and so on
so that the result List would now have this three example data.
Now the loop will hit again and will load different data like:
exampleData1.1
exampleData2.1
exampleData3.1
and now It should add this three example data to the result list:
so that now the result list should look like that:
exampleData1, exampleData1.1
exampleData2, exampleData2.1
exampleData3, exampleData3.1
and so should It go on after each loop after the output list changed it should be add side by side like one string.
I have tried things like that:
foreach (var (item1, item2) in output.Zip(output, (xo, y) => (xo, y)))
result.Add($" {item1}, {item2}");
but that's only adding the existing list side by side so that I have 2x the same value side by side.
I hope I have explained it understandable and if not, pls let me know.
Zip is good idea, but care. From the documentation :
If the sequences do not have the same number of elements, the method merges sequences until it reaches the end of one of them.
As result is empty, the end is reached before to start. The solution is to complete with the elements not iterated by Zip :
result = result.Zip(outputs, (r, o) => r + ", " + o)
.Union(result.Skip(outputs.Count)) // Add element from result not iterated by Zip
.Union(outputs.Skip(result.Count)) // Add element from outputs not iterated by Zip
.ToList();
Have look at this: https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.zip?view=net-7.0 I think you have a mistake in your code. You're "zipping" outputs with outputs (i.e. itself) you need to zip it with result.
var mergedLists = result.Zip(outputs, (first, second) => first + " " + second);
UPDATED
Here's a full console app solution:
List<string> result = new();
for (var i = 0; i < 3; i++)
{
List<string> outputs = new();
for (var j = 0; j < 3; j++)
{
outputs.Add($"TestData{i}.{j}");
}
if (!result.Any())
{
result = outputs.ToList();
}
else
{
result = result.Zip(outputs, (first, second) => first + " " + second).ToList();
}
}
Console.WriteLine(string.Join(Environment.NewLine, result));
Console.ReadLine();
I am new to in C# need help to write parser for below cvs file of data
[INFO]
LINE_NAME,MACHINE_SN,MACHINE_NAME,OPERATOR_ID
LineName,ParmiMachineSN,PARMI_AOI_1,engineer
[INFO_END]
[PANEL_INSP_RESULT]
MODEL_NAME,MODEL_CODE,PANEL_SIDE,INDEX,BARCODE,DATE,START_TIME,END_TIME,DEFECT_NAME,DEFECT_CODE,RESULT
E11-03356-0388-A-TOP CNG,,BOTTOM,47,MLT0388A03358CSNSOF1232210200052-0001,20201023,12:46:57,12:47:04,,,OK
[PANEL_INSP_RESULT_END]
[BOARD_INSP_RESULT]
BOARD_NO,BARCODE,DEFECT_NAME,DEFECT_CODE,BADMARK,RESULT
1,MLT0388A03358CSNSOF1232210200052-0001,,,NO,OK
2,MLT0388A03358CSNSOF1232210200052-0004,,,NO,OK
3,MLT0388A03358CSNSOF1232210200052-0003,,,NO,OK
4,MLT0388A03358CSNSOF1232210200052-0002,,,NO,OK
[BOARD_INSP_RESULT_END]
[COMPONENT_INSP_RESULT]
BOARD_NO,LOCATION_NAME,PIN_NUMBER,POS_X,POS_Y,DEFECT_NAME,DEFECT_CODE,RESULT
[COMPONENT_INSP_RESULT_END]
I need to parse the above file
To parse the above CSV file in C#, you can use the following steps:
Read the entire file into a string using the File.ReadAllText method.
string fileText = File.ReadAllText("file.csv");
Split the file into individual sections by looking for the "[INFO]" and "
[INFO_END]" tags, and then use a loop to process each section.
string[] sections = fileText.Split(new string[] { "[INFO]", "[INFO_END]", "[PANEL_INSP_RESULT]", "[PANEL_INSP_RESULT_END]", "[BOARD_INSP_RESULT]", "[BOARD_INSP_RESULT_END]", "[COMPONENT_INSP_RESULT]", "[COMPONENT_INSP_RESULT_END]" }, StringSplitOptions.RemoveEmptyEntries);
foreach (string section in sections)
{
//Process each section
}
Within the loop, use the String.Split method to split each section into rows by looking for the newline character.
string[] rows = section.Split('\n');
Use the String.Split method again to split each row into cells by looking for the comma.
foreach (string row in rows)
{
string[] cells = row.Split(',');
//Process each cell
}
Now you can process each cell as you need, you can check the first cell value to decide which section this row belongs to, and then you can process the cells according to their type and position in the row.
You can use a switch case statement to check which section you are currently processing and then use appropriate logic to parse the data.
Please be aware that this is a simplified example, and you may need to add additional error handling and validation to ensure that the data is properly parsed.
This is an example how you can parse the csv file but you might need to handle various edge cases like empty rows, empty cells, etc based on your specific use case.
The following reads all text, creates an anonymous list with line index and line followed by looping through a list of sections. In the loop find a section and in this case displays to a console window.
internal partial class Program
{
static void Main(string[] args)
{
var items = (File.ReadAllLines("YourFileNameGoesHere")
.Select((line, index) => new { Line = line, Index = index })
.Select(lineData => lineData)).ToList();
List<string> sections = new List<string>()
{
"INFO",
"PANEL_INSP_RESULT",
"BOARD_INSP_RESULT",
"COMPONENT_INSP_RESULT"
};
foreach (var section in sections)
{
Console.WriteLine($"{section}");
var startItem = items.FirstOrDefault(x => x.Line == $"[{section}]");
var endItem = items.FirstOrDefault(x => x.Line == $"[{section}_END]");
if (startItem is not null && endItem is not null)
{
bool header = false;
for (int index = startItem.Index + 1; index < endItem.Index; index++)
{
if (header == false)
{
Console.WriteLine($"\t{items[index].Line}");
header = true;
}
else
{
Console.WriteLine($"\t\t{items[index].Line}");
}
}
}
else
{
Console.WriteLine("\tFailed to read this section");
}
}
}
}
I have a file with "Name|Number" in each line and I wish to remove the lines with names that contain another name in the list.
For example, if there is "PEDRO|3" , "PEDROFILHO|5" , "PEDROPHELIS|1" in the file, i wish to remove the lines "PEDROFILHO|5" , "PEDROPHELIS|1".
The list has 1.8 million lines, I made it like this but its too slow :
List<string> names = File.ReadAllLines("firstNames.txt").ToList();
List<string> result = File.ReadAllLines("firstNames.txt").ToList();
foreach (string name in names)
{
string tempName = name.Split('|')[0];
List<string> temp = names.Where(t => t.Contains(tempName)).ToList();
foreach (string str in temp)
{
if (str.Equals(name))
{
continue;
}
result.Remove(str);
}
}
File.WriteAllLines("result.txt",result);
Does anyone know a faster way? Or how to improve the speed?
Since you are looking for matches everywhere in the word, you will end up with O(n2) algorithm. You can improve implementation a bit to avoid string deletion inside a list, which is an O(n) operation in itself:
var toDelete = new HashSet<string>();
var names = File.ReadAllLines("firstNames.txt");
foreach (string name in names) {
var tempName = name.Split('|')[0];
toDelete.UnionWith(
// Length constraint removes self-matches
names.Where(t => t.Length > name.Length && t.Contains(tempName))
);
}
File.WriteAllLines("result.txt", names.Where(name => !toDelete.Contains(name)));
This works but I don't know if it's quicker. I haven't tested on millions of lines. Remove the tolower if the names are in the same case.
List<string> names = File.ReadAllLines(#"C:\Users\Rob\Desktop\File.txt").ToList();
var result = names.Where(w => !names.Any(a=> w.Split('|')[0].Length> a.Split('|')[0].Length && w.Split('|')[0].ToLower().Contains(a.Split('|')[0].ToLower())));
File.WriteAllLines(#"C:\Users\Rob\Desktop\result.txt", result);
test file had
Rob|1
Robbie|2
Bert|3
Robert|4
Jan|5
John|6
Janice|7
Carol|8
Carolyne|9
Geoff|10
Geoffrey|11
Result had
Rob|1
Bert|3
Jan|5
John|6
Carol|8
Geoff|10
I'm working on an importer that takes tab delimited text files. The first line of each file contains 'columns' like ItemCode, Language, ImportMode etc and there can be varying numbers of columns.
I'm able to get the names of each column, whether there's one or 10 and so on. I use a method to achieve this that returns List<string>:
private List<string> GetColumnNames(string saveLocation, int numColumns)
{
var data = (File.ReadAllLines(saveLocation));
var columnNames = new List<string>();
for (int i = 0; i < numColumns; i++)
{
var cols = from lines in data
.Take(1)
.Where(l => !string.IsNullOrEmpty(l))
.Select(l => l.Split(delimiter.ToCharArray(), StringSplitOptions.None))
.Select(value => string.Join(" ", value))
let split = lines.Split(' ')
select new
{
Temp = split[i].Trim()
};
foreach (var x in cols)
{
columnNames.Add(x.Temp);
}
}
return columnNames;
}
If I always knew what columns to be expecting, I could just create a new object, but since I don't, I'm wondering is there a way I can dynamically create an object with properties that correspond to whatever GetColumnNames() returns?
Any suggestions?
For what it's worth, here's how I used DataTables to achieve what I wanted.
// saveLocation is file location
// numColumns comes from another method that gets number of columns in file
var columnNames = GetColumnNames(saveLocation, numColumns);
var table = new DataTable();
foreach (var header in columnNames)
{
table.Columns.Add(header);
}
// itemAttributeData is the file split into lines
foreach (var row in itemAttributeData)
{
table.Rows.Add(row);
}
Although there was a bit more work involved to be able to manipulate the data in the way I wanted, Karthik's suggestion got me on the right track.
You could create a dictionary of strings where the first string references the "properties" name and the second string its characteristic.
I'm writing a duplicate file detector. To determine if two files are duplicates I calculate a CRC32 checksum. Since this can be an expensive operation, I only want to calculate checksums for files that have another file with matching size. I have sorted my list of files by size, and am looping through to compare each element to the ones above and below it. Unfortunately, there is an issue at the beginning and end since there will be no previous or next file, respectively. I can fix this using if statements, but it feels clunky. Here is my code:
public void GetCRCs(List<DupInfo> dupInfos)
{
var crc = new Crc32();
for (int i = 0; i < dupInfos.Count(); i++)
{
if (dupInfos[i].Size == dupInfos[i - 1].Size || dupInfos[i].Size == dupInfos[i + 1].Size)
{
dupInfos[i].CheckSum = crc.ComputeChecksum(File.ReadAllBytes(dupInfos[i].FullName));
}
}
}
My question is:
How can I compare each entry to its neighbors without the out of bounds error?
Should I be using a loop for this, or is there a better LINQ or other function?
Note: I did not include the rest of my code to avoid clutter. If you want to see it, I can include it.
Compute the Crcs first:
// It is assumed that DupInfo.CheckSum is nullable
public void GetCRCs(List<DupInfo> dupInfos)
{
dupInfos[0].CheckSum = null ;
for (int i = 1; i < dupInfos.Count(); i++)
{
dupInfos[i].CheckSum = null ;
if (dupInfos[i].Size == dupInfos[i - 1].Size)
{
if (dupInfos[i-1].Checksum==null) dupInfos[i-1].CheckSum = crc.ComputeChecksum(File.ReadAllBytes(dupInfos[i-1].FullName));
dupInfos[i].CheckSum = crc.ComputeChecksum(File.ReadAllBytes(dupInfos[i].FullName));
}
}
}
After having sorted your files by size and crc, identify duplicates:
public void GetDuplicates(List<DupInfo> dupInfos)
{
for (int i = dupInfos.Count();i>0 i++)
{ // loop is inverted to allow list items deletion
if (dupInfos[i].Size == dupInfos[i - 1].Size &&
dupInfos[i].CheckSum != null &&
dupInfos[i].CheckSum == dupInfos[i - 1].Checksum)
{ // i is duplicated with i-1
... // your code here
... // eventually, dupInfos.RemoveAt(i) ;
}
}
}
I have sorted my list of files by size, and am looping through to
compare each element to the ones above and below it.
The next logical step is to actually group your files by size. Comparing consecutive files will not always be sufficient if you have more than two files of the same size. Instead, you will need to compare every file to every other same-sized file.
I suggest taking this approach
Use LINQ's .GroupBy to create a collection of files sizes. Then .Where to only keep the groups with more than one file.
Within those groups, calculate the CRC32 checksum and add it to a collection of known checksums. Compare with previously calculated checksums. If you need to know which files specifically are duplicates you could use a dictionary keyed by this checksum (you can achieve this with another GroupBy. Otherwise a simple list will suffice to detect any duplicates.
The code might look something like this:
var filesSetsWithPossibleDupes = files.GroupBy(f => f.Length)
.Where(group => group.Count() > 1);
foreach (var grp in filesSetsWithPossibleDupes)
{
var checksums = new List<CRC32CheckSum>(); //or whatever type
foreach (var file in grp)
{
var currentCheckSum = crc.ComputeChecksum(file);
if (checksums.Contains(currentCheckSum))
{
//Found a duplicate
}
else
{
checksums.Add(currentCheckSum);
}
}
}
Or if you need the specific objects that could be duplicates, the inner foreach loop might look like
var filesSetsWithPossibleDupes = files.GroupBy(f => f.FileSize)
.Where(grp => grp.Count() > 1);
var masterDuplicateDict = new Dictionary<DupStats, IEnumerable<DupInfo>>();
//A dictionary keyed by the basic duplicate stats
//, and whose value is a collection of the possible duplicates
foreach (var grp in filesSetsWithPossibleDupes)
{
var likelyDuplicates = grp.GroupBy(dup => dup.Checksum)
.Where(g => g.Count() > 1);
//Same GroupBy logic, but applied to the checksum (instead of file size)
foreach(var dupGrp in likelyDuplicates)
{
//Create the key for the dictionary (your code is likely different)
var sample = dupGrp.First();
var key = new DupStats() {FileSize = sample.FileSize, Checksum = sample.Checksum};
masterDuplicateDict.Add(key, dupGrp);
}
}
A demo of this idea.
I think the for loop should be : for (int i = 1; i < dupInfos.Count()-1; i++)
var grps= dupInfos.GroupBy(d=>d.Size);
grps.Where(g=>g.Count>1).ToList().ForEach(g=>
{
...
});
Can you do a union between your two lists? If you have a list of filenames and do a union it should result in only a list of the overlapping files. I can write out an example if you want but this link should give you the general idea.
https://stackoverflow.com/a/13505715/1856992
Edit: Sorry for some reason I thought you were comparing file name not size.
So here is an actual answer for you.
using System;
using System.Collections.Generic;
using System.Linq;
public class ObjectWithSize
{
public int Size {get; set;}
public ObjectWithSize(int size)
{
Size = size;
}
}
public class Program
{
public static void Main()
{
Console.WriteLine("start");
var list = new List<ObjectWithSize>();
list.Add(new ObjectWithSize(12));
list.Add(new ObjectWithSize(13));
list.Add(new ObjectWithSize(14));
list.Add(new ObjectWithSize(14));
list.Add(new ObjectWithSize(18));
list.Add(new ObjectWithSize(15));
list.Add(new ObjectWithSize(15));
var duplicates = list.GroupBy(x=>x.Size)
.Where(g=>g.Count()>1);
foreach (var dup in duplicates)
foreach (var objWithSize in dup)
Console.WriteLine(objWithSize.Size);
}
}
This will print out
14
14
15
15
Here is a netFiddle for that.
https://dotnetfiddle.net/0ub6Bs
Final note. I actually think your answer looks better and will run faster. This was just an implementation in Linq.