Sort a Dynamic CSV string C#

Sort a Dynamic CSV string C# - c#

Please I need help to sort the following string in csv format alphabetically
Input
First, Second,Third,Fourth, Fifth
Beth,Charles,Danielle,Adam,Eric\n
17945,10091,10088,3907,10132\n
2,12,13,48,11
Output (After sorting)
First, Second,Third,Fourth, Fifth
Adam,Beth,Charles,Danielle,Eric\n
3907,17945,10091,10088,10132\n
48,2,12,13,11
This is what I have tried.
First I converted the csv into datatable
var rows = csv.Split('\n', StringSplitOptions.RemoveEmptyEntries);
var dtCsv = new DataTable();
for (int i = 0; i < rows.Count(); i++)
{
string[] rowValues = rows[i].Split(','); //split each row with comma to get individual values
{
if (i == 0)
{
for (int j = 0; j < rowValues.Count(); j++)
{
dtCsv.Columns.Add(rowValues[j]); //add headers
}
}
else
{
//DataRow dr = dtCsv.NewRow();
for (int k = 0; k < rowValues.Count(); k++)
{
//dr[k] = rowValues[k].ToString();
dtCsv.Columns.Add(rowValues[k]);
}
// dtCsv.Rows.Add(dr); //add other rows
}
}
}
Then I tried to convert back to csv hoping i can be able to sort the datatable, but I am hooked.
I appreciate in advcance.
Please I would appreciate a diferent approach if possible.

Although the model of data that you presented is not normal and accurate as Row, Column model, you can do it as per the below snippet code:
static void Main(string[] args)
{
Dictionary<int, List<string>> myDummyDic = ReadDataFromCsv();
foreach (var item in myDummyDic[0])
{
Console.WriteLine(item);
}
}
private static Dictionary<int, List<string>> ReadDataFromCsv()
{
Dictionary<int, List<string>> myDummyDic = new();
var result = File.ReadLines("data.csv")
.Select(line => line.Split(new char[] { ',' }, StringSplitOptions.RemoveEmptyEntries).ToList());
int rowC = 0;
foreach (var item in result)
{
List<string> lst = new();
for (int i = 0; i < item.Count; i++)
{
lst.Add(item[i]);
}
lst.Sort();
myDummyDic.Add(rowC++, lst);
}
return myDummyDic;
}
The output result:
Note that if you want to Transpose your CSV file entirely then I suggest you use Cinchoo ETL for that purpose.
Consider the following Question and Answers :
How to transpose matrix?
An approach without external library
Another approach using Cinchoo ETL to Transpose

Related

How to merge .txt files in c#? [duplicate]

using (StreamWriter writer = File.CreateText(FinishedFile))
{
int lineNum = 0;
while (lineNum < FilesLineCount.Min())
{
for (int i = 0; i <= FilesToMerge.Count() - 1; i++)
{
if (i != FilesToMerge.Count() - 1)
{
var CurrentFile = File.ReadLines(FilesToMerge[i]).Skip(lineNum).Take(1);
string CurrentLine = string.Join("", CurrentFile);
writer.Write(CurrentLine + ",");
}
else
{
var CurrentFile = File.ReadLines(FilesToMerge[i]).Skip(lineNum).Take(1);
string CurrentLine = string.Join("", CurrentFile);
writer.Write(CurrentLine + "\n");
}
}
lineNum++;
}
}
The current way i am doing this is just too slow. I am merging files that are each 50k+ lines long with various amounts of data.
for ex:
File 1
1
2
3
4
File 2
4
3
2
1
i need this to merge into being a third fileFile 3
1,4
2,3
3,2
4,1P.S. The user can pick as many files as they want from any locations.
Thanks for the help.

You approach is slow because of the Skip and Take in the loops.
You could use a dictionary to collect all line-index' lines:
string[] allFileLocationsToMerge = { "filepath1", "filepath2", "..." };
var mergedLists = new Dictionary<int, List<string>>();
foreach (string file in allFileLocationsToMerge)
{
string[] allLines = File.ReadAllLines(file);
for (int lineIndex = 0; lineIndex < allLines.Length; lineIndex++)
{
bool indexKnown = mergedLists.TryGetValue(lineIndex, out List<string> allLinesAtIndex);
if (!indexKnown)
allLinesAtIndex = new List<string>();
allLinesAtIndex.Add(allLines[lineIndex]);
mergedLists[lineIndex] = allLinesAtIndex;
}
}
IEnumerable<string> mergeLines = mergedLists.Values.Select(list => string.Join(",", list));
File.WriteAllLines("targetPath", mergeLines);

Here's another approach - this implementation only stores in memory one set of lines from each file simultaneously, thus reducing memory pressure significantly (if that is an issue).
public static void MergeFiles(string output, params string[] inputs)
{
var files = inputs.Select(File.ReadLines).Select(iter => iter.GetEnumerator()).ToArray();
StringBuilder line = new StringBuilder();
bool any;
using (var outFile = File.CreateText(output))
{
do
{
line.Clear();
any = false;
foreach (var iter in files)
{
if (!iter.MoveNext())
continue;
if (line.Length != 0)
line.Append(", ");
line.Append(iter.Current);
any = true;
}
if (any)
outFile.WriteLine(line.ToString());
}
while (any);
}
foreach (var iter in files)
{
iter.Dispose();
}
}
This also handles files of different lengths.

Find the maximum length of every column in a csv file

So, I was trying to present a csv document in a console application. However, due to the varying text size in it, the output was not in a presentable format.
To present it, I tried to count the maximum length of text for each column and then append white space to the remaining text in that column so that there's equal length of characters in each column.
I tried to get the character count, but can't seem to figure out how to proceed further.
var file = File.ReadAllLines(#"E:\File.csv");
var lineList = file.Select(x => x.Split(',').ToList()).ToList();
int maxColumn = lineList.Select(x => x.Count).Max(x => x);
List<int> maxElementSize = new List<int>();
for (int i = 0; i < maxColumn; i++)
{
//Some Logic
}
Any help would be highly appreciated.

Here's a sample console application to get maximum character length for each column :
static void Main(string[] args)
{
string CSVPath = #"D:\test.csv";
string outputText = "";
using (var reader = File.OpenText(CSVPath))
{
outputText = reader.ReadToEnd();
}
var colSplitter = ',';
var rowSplitter = new char[] { '\n' };
var rows = (from row in outputText.Split(rowSplitter, StringSplitOptions.RemoveEmptyEntries)
let cols = row.Split(colSplitter)
from col in cols
select new { totalCols = cols.Count(), cols = cols }).ToList();
int[] maxColLengths = new int[rows.Max(o => o.totalCols)];
for (int i = 0; i < rows.Count; i++)
{
for (int j = 0; j < rows[i].cols.Count(); j++)
{
int curLength = rows[i].cols[j].Trim().Length;
if (curLength > maxColLengths[j])
maxColLengths[j] = curLength;
}
}
Console.WriteLine(string.Join(", ", maxColLengths));
}
Hope this helped.

Try with a nested for loop:
var inputLines = File.ReadAllLines(#"E:\File.csv");
Dictionary<int,int> dictIndexLenght = new Dictionary<int,int>();
foreach(var line in inputLines)
{
List<string> columList = line.Split(',').ToList();
for (int i = 0; i < columList.Count; i++)
{
int tempVal = 0;
if(dictIndexLenght.TryGetValue(i,out tempVal))
{
if(tempVal<columList[i].Length)
{
dictIndexLenght[i]=columList[i].Length;
}
}
else
dictIndexLenght[i]=columList[i].Length;
}
}
Can check the result here or with this lines of code:
for(int i=0;i<dictIndexLenght.Count;i++)
{
Console.WriteLine("Column {0} : {1}", i, dictIndexLenght[i]);
}

Here's how I would do it, very similar to un-lucky's answer, only using a List<int> instead of a Dictionary<int, int>. I added dummy data for testing, but you can see the actual call to read the file is left in there, so you can just remove the dummy data and the line that reads it, and it should work ok:
static void Main(string[] args)
{
var fileLines = new List<string>
{
"Lorem, Ipsum, is, simply, dummy, text, of, the, printing, and, typesetting,",
"industry., Lorem, Ipsum, has, been, the, industry's, standard, dummy, text,",
"ever, since, the, 1500s, when, an, ",
"unknown, printer, took, a, galley, of, type, and, scrambled, it, to, make,",
"a, type, specimen, book.,",
"It, has, survived, not, only, five, centuries, but, also, the, leap,",
"into, electronic, typesetting, remaining, essentially, unchanged.,",
"It, was, popularised, in, the, 1960s, with, the, release,",
"of, Letraset, sheets, containing, Lorem, Ipsum, passages, and, more, ",
"recently, with, desktop, publishing,",
"software, like, Aldus, PageMaker, including, versions, of, Lorem, Ipsum."
};
var filePath = #"f:\public\temp\temp.csv";
var fileLinesColumns = File.ReadAllLines(filePath).Select(line => line.Split(','));
var colWidths = new List<int>();
// Remove this line to use file data
fileLinesColumns = fileLines.Select(line => line.Split(','));
// Get the max length of each column and add it to our list
foreach (var fileLineColumns in fileLinesColumns)
{
for (int i = 0; i < fileLineColumns.Length; i++)
{
if (i > colWidths.Count - 1)
{
colWidths.Add(fileLineColumns[i].Length);
}
else if (fileLineColumns[i].Length > colWidths[i])
{
colWidths[i] = fileLineColumns[i].Length;
}
}
}
// Write out our columns, padding each one to match the longest line
foreach (var fileLineColumns in fileLinesColumns)
{
for (int i = 0; i < fileLineColumns.Length; i++)
{
Console.Write(fileLineColumns[i].PadRight(colWidths[i]));
}
Console.WriteLine();
}
Console.Write("\nDone!\nPress any key to exit...");
Console.ReadKey();
}
Output

Initialise your list, then loop over your lines, and within that line, loop over your columns:
for (i = 0; i < lineList.Count; i++)
{
maxElementSize[i] = 0;
}
for (i = 0; i < lineList.Count; i++)
{
for (j = 0; j < maxColumn; j++)
{
if(lineList[i][j].Length > maxElementSize[j])
maxElementSize[j] = lineList[i][j].Length
}
}

I use the following code to make sure the columns in a database are large enough to take the csv input data...
#!/usr/bin/python3
import array as arr
from csv import reader
import argparse
def csv_getFldLens (in_file, has_header=0, delimiter=','):
# open file in read mode
fldMaxLens = arr.array('i')
headers = []
has_header = has_header
with open(in_file, 'r') as read_obj:
# pass the file object to reader() to get the reader object
csv_reader = reader(read_obj, delimiter=delimiter)
# Iterate over each row in the csv using reader object
rcnt = 0
lastIndx = 0
for row in csv_reader:
# row variable is a list that represents a row in csv
# print(row)
if has_header and rcnt == 0:
for fld in row:
headers.append(fld)
rcnt += 1
continue
j = 0
for fld in row:
fldLen = len(fld)
if (lastIndx == 0) or (lastIndx < j):
# print("if --- li, i: ", lastIndx, i, "\n")
fldMaxLens.append(fldLen)
lastIndx = j
else:
# print("else --- li, i: ", lastIndx, i, "\n")
v1 = fldMaxLens[j]
v2 = fldLen
fldMaxLens[j] = max(v1,v2)
j = j + 1
rcnt += 1
j = 0
if has_header:
for f in headers:
print(f,": ", fldMaxLens[j])
j += 1
else:
for i in fldMaxLens:
print("Col[",j+1,"]: ",fldMaxLens[j])
j += 1
if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Get column lengths of CVS fields.')
parser.add_argument('--in_file', default='', help='The CSV input file')
parser.add_argument('--has_header', action='store_true', help='The CSV file has headers')
parser.add_argument('--delimiter', default=',', help='Sets the delimiter. Default is comma \',\'.')
args = parser.parse_args()
csv_getFldLens(in_file=args.in_file, has_header=args.has_header, delimiter=args.delimiter)

StreamWriter C# formatting output

Problem Statement
In order to run gene annotation software, I need to prepare two types of files, vcard files and coverage tables, and there has to be one-to-one match of vcard to coverage table. Since Im running 2k samples, its hard to identify which file is not one-to-one match. I know that both files have unique identifier numbers, hence, if both folders have files that have same unique numbers, i treat that as "same" file
I made a program that compares two folders and reports unique entries in each folder. To do so, I made two list that contains unique file names to each directory.
I want to format the report file (tab delimited .txt file) such that it looks something like below:
Unique in fdr1 Unique in fdr2
file x file a
file y file b
file z file c
I find this difficult to do because I have to iterate twice (since I have two lists), but there is no way of going back to the previous line in StreamWriter as far as I know. Basically, once I iterate through the first list and fill the first column, how can I fill the second column with the second list?
Can someone help me out with this?
Thanks
If design of the code has to change (i.e. one list instead of two), please let me know
As requested by some user, this is how I was going to do (not working version)
// Write report
using (StreamWriter sw = new StreamWriter(dest_txt.Text + #"\" + "Report.txt"))
{
// Write headers
sw.WriteLine("Unique Entries in Folder1" + "\t" + "Unique Entries in Folder2");
// Write unique entries in fdr1
foreach(string file in fdr1FileList)
{
sw.WriteLine(file + "\t");
}
// Write unique entries in fdr2
foreach (string file in fdr2FileList)
{
sw.WriteLine(file + "\t");
}
sw.Dispose();
}
As requested for my approach for finding unique entries, here's my code snippet
Dictionary<int, bool> fdr1Dict = new Dictionary<int, bool>();
Dictionary<int, bool> fdr2Dict = new Dictionary<int, bool>();
List<string> fdr1FileList = new List<string>();
List<string> fdr2FileList = new List<string>();
string fdr1Path = folder1_txt.Text;
string fdr2Path = folder2_txt.Text;
// File names in the specified directory; path not included
string[] fdr1FileNames = Directory.GetFiles(fdr1Path).Select(Path.GetFileName).ToArray();
string[] fdr2FileNames = Directory.GetFiles(fdr2Path).Select(Path.GetFileName).ToArray();
// Iterate through the first directory, and add GL number to dictionary
for(int i = 0; i < fdr1FileNames.Length; i++)
{
// Grabs only the number from the file name
string number = Regex.Match(fdr1FileNames[i], #"\d+").ToString();
int glNumber;
// Make sure it is a number
if(Int32.TryParse(number, out glNumber))
{
fdr1Dict[glNumber] = true;
}
// If number not present, raise exception
else
{
throw new Exception(String.Format("GL Number not found in: {0}", fdr1FileNames[i]));
}
}
// Iterate through the second directory, and add GL number to dictionary
for (int i = 0; i < fdr2FileNames.Length; i++)
{
// Grabs only the number from the file name
string number = Regex.Match(fdr2FileNames[i], #"\d+").ToString();
int glNumber;
// Make sure it is a number
if (Int32.TryParse(number, out glNumber))
{
fdr2Dict[glNumber] = true;
}
// If number not present, raise exception
else
{
throw new Exception(String.Format("GL Number not found in: {0}", fdr2FileNames[i]));
}
}
// Iterate through the first directory, and find files that are unique to it
for (int i = 0; i < fdr1FileNames.Length; i++)
{
int glNumber = Int32.Parse(Regex.Match(fdr1FileNames[i], #"\d+").Value);
// If same file is not present in the second folder add to the list
if(!fdr2Dict[glNumber])
{
fdr1FileList.Add(fdr1FileNames[i]);
}
}
// Iterate through the second directory, and find files that are unique to it
for (int i = 0; i < fdr2FileNames.Length; i++)
{
int glNumber = Int32.Parse(Regex.Match(fdr2FileNames[i], #"\d+").Value);
// If same file is not present in the first folder add to the list
if (!fdr1Dict[glNumber])
{
fdr2FileList.Add(fdr2FileNames[i]);
}

I am a quite confident that this will work as I've tested it:
static void Main(string[] args)
{
var firstDir = #"Path1";
var secondDir = #"Path2";
var firstDirFiles = System.IO.Directory.GetFiles(firstDir);
var secondDirFiles = System.IO.Directory.GetFiles(secondDir);
print2Dirs(firstDirFiles, secondDirFiles);
}
private static void print2Dirs(string[] firstDirFile, string[] secondDirFiles)
{
var maxIndex = Math.Max(firstDirFile.Length, secondDirFiles.Length);
using (StreamWriter streamWriter = new StreamWriter("result.txt"))
{
streamWriter.WriteLine(string.Format("{0,-150}{1,-150}", "Unique in fdr1", "Unique in fdr2"));
for (int i = 0; i < maxIndex; i++)
{
streamWriter.WriteLine(string.Format("{0,-150}{1,-150}",
firstDirFile.Length > i ? firstDirFile[i] : string.Empty,
secondDirFiles.Length > i ? secondDirFiles[i] : string.Empty));
}
}
}
It's a quite simple code but if you need help understanding it just let me know :)

I would construct each line at a time. Something like this:
int row = 0;
string[] fdr1FileList = new string[0];
string[] fdr2FileList = new string[0];
while (row < fdr1FileList.Length || row < fdr2FileList.Length)
{
string rowText = "";
rowText += (row >= fdr1FileList.Length ? "\t" : fdr1FileList[row] + "\t");
rowText += (row >= fdr2FileList.Length ? "\t" : fdr2FileList[row]);
row++;
}

Try something like this:
static void Main(string[] args)
{
Dictionary<int, string> fdr1Dict = FilesToDictionary(Directory.GetFiles("path1"));
Dictionary<int, string> fdr2Dict = FilesToDictionary(Directory.GetFiles("path2"));
var unique_f1 = fdr1Dict.Where(f1 => !fdr2Dict.ContainsKey(f1.Key)).ToArray();
var unique_f2 = fdr2Dict.Where(f2 => !fdr1Dict.ContainsKey(f2.Key)).ToArray();
int f1_size = unique_f1.Length;
int f2_size = unique_f2.Length;
int list_length = 0;
if (f1_size > f2_size)
{
list_length = f1_size;
Array.Resize(ref unique_f2, list_length);
}
else
{
list_length = f2_size;
Array.Resize(ref unique_f1, list_length);
}
using (StreamWriter writer = new StreamWriter("output.txt"))
{
writer.WriteLine(string.Format("{0,-30}{1,-30}", "Unique in fdr1", "Unique in fdr2"));
for (int i = 0; i < list_length; i++)
{
writer.WriteLine(string.Format("{0,-30}{1,-30}", unique_f1[i].Value, unique_f2[i].Value));
}
}
}
static Dictionary<int, string> FilesToDictionary(string[] filenames)
{
Dictionary<int, string> dict = new Dictionary<int, string>();
for (int i = 0; i < filenames.Length; i++)
{
int glNumber;
string filename = Path.GetFileName(filenames[i]);
string number = Regex.Match(filename, #"\d+").ToString();
if (int.TryParse(number, out glNumber))
dict.Add(glNumber, filename);
}
return dict;
}

List sorting by multiple parameters

I have a .csv with the following headers and an example line from the file.
AgentID,Profile,Avatar,In_Time,Out_Time,In_Location,Out_Location,Target_Speed(m/s),Distance_Traveled(m),Congested_Duration(s),Total_Duration(s),LOS_A_Duration(s),LOS_B_Duration(s),LOS_C_Duration(s),LOS_D_Duration(s),LOS_E_Duration(s),LOS_F_Duration(s)
2177,DefaultProfile,DarkGreen_LowPoly,08:00:00,08:00:53,East12SubwayportalActor,EWConcourseportalActor,1.39653,60.2243,5.4,52.8,26.4,23,3.4,0,0,0
I need to sort this .csv by the 4th column (In_time) by increasing time ( 08:00:00, 08:00:01) and the 6th (In_Location) by alphabetical direction (e.g. East, North, etc).
So far my code looks like this:
List<string> list = new List<string>();
using (StreamReader reader = new StreamReader("JourneyTimes.csv"))
{
string line;
while ((line = reader.ReadLine()) != null)
{
line.Split(',');
list.Add(line);
}
I read in the .csv and split it using a comma (there are no other commas so this is not a concern). I then add each line to a list. My issue is how do I sort the list on two parameters and by the headers of the .csv.
I have been looking all day at this, I am relatively new to programming, this is my first program so I apologize for my lack of knowledge.

You can use LINQ OrderBy/ThenBy:
e.g.
listOfObjects.OrderBy (c => c.LastName).ThenBy (c => c.FirstName)
But first off, you should map your CSV line to some object.
To map CSV line to object you can predefine some type or create it dynamically
from line in File.ReadLines(fileName).Skip(1) //header
let columns = line.Split(',') //really basic CSV parsing, consider removing empty entries and supporting quotes
select new
{
AgentID = columns[0],
Profile = int.Parse(columns[1]),
Avatar = float.Parse(columns[2])
//other properties
}
And be aware that like many other LINQ methods, these two use deferred execution

You are dealing with two distinct problems.
First, ordering two columns in C# can be achieved with OrderBy, ThenBy
public class SpreadsheetExample
{
public DateTime InTime { get; set; }
public string InLocation { get; set; }
public SpreadsheetExample(DateTime inTime, string inLocation)
{
InTime = inTime;
InLocation = inLocation;
}
public static List<SpreadsheetExample> LoadMockData()
{
int maxMock = 10;
Random random = new Random();
var result = new List<SpreadsheetExample>();
for (int mockCount = 0; mockCount < maxMock; mockCount++)
{
var genNumber = random.Next(1, maxMock);
var genDate = DateTime.Now.AddDays(genNumber);
result.Add(new SpreadsheetExample(genDate, "Location" + mockCount));
}
return result;
}
}
internal class Class1
{
private static void Main()
{
var mockData = SpreadsheetExample.LoadMockData();
var orderedResult = mockData.OrderBy(m => m.InTime).ThenBy(m => m.InLocation);//Order, ThenBy can be used to perform ordering of two columns
foreach (var item in orderedResult)
{
Console.WriteLine("{0} : {1}", item.InTime, item.InLocation);
}
}
}
Now you can tackle the second issue of moving data into a class from Excel. VSTO is what you are looking for. There are lots of examples online. Follow the example I posted above. Replace your custom class in place of SpreadSheetExample.

You may use a DataTable:
var lines = File.ReadAllLines("test.csv");
DataTable dt = new DataTable();
var columNames = lines[0].Split(new char[] { ',' });
for (int i = 0; i < columNames.Length; i++)
{
dt.Columns.Add(columNames[i]);
}
for (int i = 1; i < lines.Length; i++)
{
dt.Rows.Add(lines[i].Split(new char[] { ',' }));
}
var rows = dt.Rows.Cast<DataRow>();
var result = rows.OrderBy(i => i["In_time"])
.ThenBy(i => i["In_Location"]);
// sum
var sum = rows.Sum(i => Int32.Parse(i["AgentID"].ToString()));

How to implement C# code for Order id separated by commas and range separated by hyphens, and display all info of order

Ex: 1,4-90, 292,123
It needs to display the whole order information of
1
4,5,6....90
292
123.
Whats the gud approach to solve this.
It is similar to tracking in UPS or fedex if multiple orders are given in search box.
I meant if in a search box I giv 1,4-90, 292,123 this string the result that needs to come back is a grid representation of all the data which is corresponding to each of the order id respectively. I want to know how to parse the string into collection and send them to the database and show the information in the grid for...
1
4,5,6....90
292
123.
as a different row...from where I can generate reports too (alternative)

Please try.
static ArrayList list;
static void Main(string[] args)
{
string str = "1,4-90,292,123";
string[] arr = str.Split(',');
list = new ArrayList();
for (int i = 0; i < arr.Length; i++)
{
string tmp = arr[i];
if (tmp.IndexOf('-') != -1)
{
Range(tmp);
}
else list.Add(int.Parse(tmp));
}
list.Sort();
object[] intResult = list.ToArray();
//print the final result
for (int i = 0; i < intResult.Length; i++)
{
Console.WriteLine(intResult[i].ToString());
}
Console.Read();
}
static void Range(string range)
{
string[] tmpArr = range.Split('-');
int stInt = int.Parse(tmpArr[0]);
int edInt = int.Parse(tmpArr[1]);
int[] intArr = new int[(edInt - stInt) + 1];
for (int i = 0; stInt <= edInt; i++)
{
list.Add(stInt++);
}
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Sort a Dynamic CSV string C# - c#

Related

How to merge .txt files in c#? [duplicate]

Find the maximum length of every column in a csv file

StreamWriter C# formatting output

List sorting by multiple parameters

How to implement C# code for Order id separated by commas and range separated by hyphens, and display all info of order

Categories

Resources