Grouping CSV lines into one - c#

I have a CSV file that has rows, where data for some columns only appear in one of those rows, with other columns repeating their value:
Heading1, Heading2, Heading3, Heading4
1 , 2 , , 4
1 , , 3 , 4
How can I end up with:
Heading1, Heading2, Heading3, Heading4
1 , 2 , 3 , 4
I want to group on Heading1 and Heading4 as they are unique to the repeated rows and get the first non-blank value for all other rows, ending up with a single string[].
I've got as far as grouping on new { Header1, Header4 } to get a group of rows, but I'm having a hard time turning that into something where I can select the first non null value for each column, then turning it back into a single row (string[]).
Ideally I'd like a function that works with any number of columns as in the actual file there are a large number.

It can be done with LINQ using Aggregate. Create a function to compare the running totals with the current row, setting the total for the column to the current value for the column if the total is not empty and the current value is not empty.
[TestMethod]
public void MergeArrays() {
string[] Input = new[] {
"H1, H2, H3, H4",
"1,2,,4",
"1,,3,4"
};
var header = Input.ElementAt(0) ;
var aggregation = string.Join(",", Input.Skip(1).Select(ln => ln.Split(',')).Aggregate(new[] { "", "", "", "" }, Agg));
var result = new string[] { header, aggregation };
Assert.AreEqual("H1, H2, H3, H4", header);
Assert.AreEqual("1,2,3,4", aggregation);
}
private static string[] Agg(string[] aggregation, string[] input) {
for (var idx = 0; idx < aggregation.GetLength(0); idx++) {
if (aggregation[idx] == string.Empty &&input[idx] != string.Empty){
aggregation[idx] = input[idx];
}
}
return aggregation;
}
hth,
Alan.

Make an array of length 4 with values initialized to some to empty strings (or zeros as appropriate).
For each non-header row, loop over the fields, storing the value in the corresponding array position whenever the field value is not blank.
Write-out the values in the array to the new CSV file.

Related

Reading a txt file and sending it to an Array C#

I have a text file that looks like this
Words Words
Words Words
Words Words
1 34.4e+1
2 34.3e+1
3 34.2e+1
4 34.1e+1.... // and so on
I need to get the string number and concert it to decimal/double and then send it to an array where I can the use the array outside of the for loop to get the average via Enumerable.Chunk
decimal[] raw = new decimal[] { };
decimal[] rawAvgList = new decimal[] { };
decimal RawAvg = 0m;
try
{
string bPath = aPath + "\\" + fileName.Name + "\\textfilename.txt";
string[] readText = File.ReadAllLines(bPath);
readText = readText.Skip(3).ToArray();
foreach (var line in readText)
{
raw = new decimal[] { Decimal.Parse(line.Substring(9).ToString(), style1) };
for (int i = 0; i < raw.Length; i++)
{
Console.WriteLine("{0} \t {1}", raw[i], i++);
}
}
rawAvgList = raw.Chunk(20).Select(chunk => chunk.Average()).ToArray();
RawAvg = rawAvgList.Average();
}
So for when I call the array outside of the loop it only grabs the last number in the text file. Am I calling the information wrong? I swear I have tried all the different way to call the numbers from the text file and I just keep running into errors. The error range from it not liking me using skip and substring at the same time or and enumerable error where it returned the error and not the number. Anything to help, Thanks!
You are assigning the variable raw to a new value on each loop iteration, wiping out any value that was stored previously. The end result is that after the loop terminates, it will only contain the value from the last line in the file as you are seeing.
You can declare raw as a List<decimal> instead, then within the loop, you would do
raw.Add(Decimal.Parse(line.Substring(9).ToString(), style1));
This way, once the loop finishes, you'll have all the numbers and not just the last one.

Effective way to loop without duplicate loop

I have a task to generate a file , so this is the example of data(from excel)
A Enr
B Cds
C Cdr
D Der
A Enr
B Cds
What i want is when you already found the same data, for example : The first row is A Enr, and the fifth row is A Enr, if you already loop the first row and find the same data (the fifth row) , i dont want when the variable int i = 4 , it will loop the A Enr again to search the same data ,because it already been search from i = 0 (which is the A ENR in the first row)
How to effective way to do the loop, because if you use the for statement, line 5 (A) will be looped back to check the same data, and that's the thing I don't want because on line 1 (A) has looped and found the same data in row 5.
Code Example
for (int i = 0; i < row; i++)
{
for(int k = 0 ; k < row ;k++)
{
if (fulldatadetail[i][0] == fulldatadetail[k][0])
{
if (!File.Exists(path))
{
using (StreamWriter sw = File.CreateText(path))
{
sw.WriteLine(fulldatadetail[i][0]);
}
}
else if (File.Exists(path))
{
using (StreamWriter sw = File.AppendText(path))
{
sw.WriteLine(fulldatadetail[i][0]);
}
}
}
}
}
P.S: sry for bad grammar, im trying to improve my english ..
it linq, group by, tuple does sound familiar to you, why don't try GroupBy?
// assume index is A,B,C,D
// assume value is Enr,Cds
var result = datas.GroupBy(x=> (x.index, x.value));
use foreach then print your key, your key in IGrouping<T> now is something like Tuple.
Item1 is index, Item2 is value.
foreach(var item in result.ToList())
{
something.Write(item.Key.Item1);
something.WriteLine(item.Key.Item2);
}
Try to declare an int/bool array with size according to your distinct elements (probably 22 as i see in this example),
Now, every row increase the cell corresponding to the latter that you see (e.g a[x-'A']++), also on every row do the inner loop only if the corresponding cell is 0.
So basicly just mark whenever you see a character so that the next time you see it you will know that you already saw one.
Provided you're trying to omit duplicate rows of data, and the only check you need is that the lines are completely equal (and not just based on the first character), this code may help you. It keeps track of every found value in a list and skips outputting it if it has already been output before.
string path = #"C:\Some\Text\File.txt";
List<string> outputValues = new List<string>
{
"A Enr",
"B Cds",
"C Cdr",
"D Der",
"A Enr",
"B Cds"
};
List<string> foundValues = new List<string>();
foreach (string outputValue in outputValues)
{
if (foundValues.Contains(outputValue))
continue; // Doesn't output this output value twice
foundValues.Add(outputValue);
using (StreamWriter sw = File.Exists(path) ? File.AppendText(path) : File.CreateText(path))
{
sw.WriteLine(outputValue);
}
}

Add two list data into array while using if condition

I have two lists, one list have some record (not known specific no of rec, but not more than 13 records) and second list have only empty value. I am using if condition on these two list. And want to add these two list in one array. I am using this code:
for (int i=0; i>12; i++)
{
List<string> excList = new List<string>();
//added column from table, which can varies
excList.Add((string)column.ColumnName);
string[] excelList = new string[] { };
List<string> stack = Enumerable.Range(excList.Count, 13)
.Select(z => string.Empty)
.ToList<string>();
if (excList.Count > i)
{
excelList = excList.ToArray();
}
if (excList.Count <= i)
{
excelList = stack.ToArray();
}
eCol0 = excelList[0].ToString();
//show first value, after adding two list in excelList
response.Write(eCol0);
}
Using this code, when the second condition started and list (excList) is adding in array (excelList) then excelList is showing only second list data.
I want to insert these two list (excList and stack) into arrayList (which have range of 13).But these two list must add on the bases of if condition as I'm using if condition in above code.
Well you never Add something to your string array excelList. You always assign it new.
Using an array is also not the best for adding values, since you need to know beforehand the size of the array.
If you really want an array in the end with both results, you should do something like this:
List<string> excList = new List<string>();
... fill your excList here and initialize the stack list with whatever you need ...
excList.AddRange(stack);
string[] excelList = excList.ToArray();
Edit: as the comments mention, your question is a little bit confusing and you are using one big loop with no clear reason why and adding empty values makes no sence too... so i tried to get the essence out of what you wanted to know
Edit:2
Wait a second, I think you want in the end, an array of strings, with the size of 13, where the elements are at least string.empty
List<string> excList = new List<string>();
//added column from table, which can varies
excList.Add((string)column.ColumnName);
string[] excelList = new string[13];
for (int i = 0; i < excList.Count; i++)
{
excelList[i] = excList[i];
}
for (int i = excList.Count; i < 13; i++)
{
excelList[i] = string.Empty;
}
no outer loop necessary
You've written a huge amount of confusing code that could be considerably more compact.
Through th commnts I was able to understand that you have a list of N strings, where N could be between 1 and 13, and you want to turn it into an array of 13 strings with all your list items at the start, and empty strings at the end
So a list of:
"a", "b", "c"
Becomes an array of:
"a", "b", "c", "", "", "", "", "", "", "", "", "", ""
If you want a one liner to generate you a list of 13 strings, from a List x of up to 13 strings:
string[] arrayOf13 = x.AddRange(Enumerable.Repeat("", 13 - x.Count)).ToArray();
If your list x will have an unknown number more than 13:
string[] arrayOf13 = x.AddRange(Enumerable.Repeat("", 13)).Take(13).ToArray();
Or without LINQ using either a for or a while loop:
for(; x.Count < 13; x.Add(""))
string[] arrayOf13 = x.ToArray();
while(x.Count < 13)
x.Add(""));
string[] arrayOf13 = x.ToArray();
If you're willing to have the strings be null rather than empty, you can just declare an array of size 13 (all null) and then use Array.CopyTo():
string[] arrayOf13 = new string[13];
x.ToArray().CopyTo(arrayOf13, 0);
It seems your goal is to get an array of 13 strings (excelList), where each element is eigher string.Empty by default or the corresponding (same index) element from some source list (excList).
So a short-code solution would be to first create a 13-element array, initialized with 'string.Empty' and then copy the source lists elements over, limited to max 13 elements:
var excelList = Enumerable.Repeat(string.Empty, 13).ToArray();
excList.CopyTo(0, excelList, 0, Math.Min(13, excList.Count));

split integer multiple values in one field into rows in ssis

Please help me split column's field values into multiple rows.
Table
ID Name Location DeptNo
1 Jack Florida 101,102,103
I'm looking for output like this
ID Name Location DeptNo
1 Jack FLorida 101
1 Jack FLorida 102
1 Jack FLorida 103
I've figured out the configuration in ssis using script component but not sure about my code
Please check
public class ScriptMain : UserComponent
{
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
int[] Edpt = Row.DeptNo.ToInt().Split(new int[] { ',' }, IntSplitOptions.None);
int i = 0;
while (i < DeptNo.Length)
{
Output0Buffer.AddRow();
Output0Buffer.ID = Row.ID;
Output0Buffer.Name = Row.Name;
Output0Buffer.Location = Row.Location;
Output0Buffer.DeptNo = DeptNo[i];
i++;
}
}
}
99% of the way there.
Given a source like
SELECT
1 AS ID
, 'Jack' AS Name
, 'Florida' AS Location
, '101,102,103' AS DeptNo;
Your Script task becomes Asynchronous as it will not be a 1:1 input to output buffer. I made 3 changes to your script.
The first was in the creation of edpt array. There might be a way to split the strings and convert the result directly to a nullable integer array but it didn't come to mind.
string[] Edpt = Row.DeptNo.Split(new char[] { ',' });
The second changes was your for loop. while (i < DeptNo.Length) is going to look at each character in the source DeptNo string. so you'd have something like 11 output buffers created (which would then fail when it attempts to put the comma into an integer (unless it treats it as a char data type and then uses the ascii value). At any rate, to heck with while loops unless you need them. The foreach helps eliminate the dreaded off by one mistakes. So, I enumerate through my collection (Edpt) and for each value I find, I assign it to a loop scoped variable called item
foreach (var item in Edpt)
The final change is to the assignment in my output buffer. Output0Buffer.DeptNo = DeptNo[i]; again would only be access a specific value in the original string (1, 0, 1, ,, 1, 0, 2, ,, etc). Instead, you want to operate on the splitted array like Output0Buffer.DeptNo = Edpt[i]; But, since we don't need to do any of that ordinal access, we just reference item.
Output0Buffer.DeptNo = Int32.Parse(item);
The final code looks like
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
// Create an array of the department numbers as strings
string[] Edpt = Row.DeptNo.Split(new char[] { ',' });
// no longer needed
int i = 0;
// foreach avoids off by one errors
foreach (var item in Edpt)
{
Output0Buffer.AddRow();
Output0Buffer.ID = Row.ID;
Output0Buffer.Name = Row.Name;
Output0Buffer.Location = Row.Location;
// use the iterator directly
Output0Buffer.DeptNo = Int32.Parse(item);
}
}

Sorting points list based on a min coordinate criteria

I hope you can help me out on this one. I have to do some "special sorting" of a list of points (x,y,z coordinates) as a part of a larger code.
This list (by definition) will:
i) Always have a column with all the values equal zero.
ii) The first and the last point will always be the same.
The sorting of the list will depend on which column is equal to zero. I can identify this column myself with no problem (see code) but I am struggling when it comes with the sorting bit. First I have to find a the point that meet some specific criteria, then reorganise the list based on this. I have explained the sorting criteria as comments in the code (see sections 2 and 3).
If you could provide assistance on 2) and 3) that would be great (the first part is ok I think).
Many thanks
The code:
using System;
using System.Collections.Generic;
using System.Linq;
class Program
{
[STAThread]
public static void Main(string[] args)
{
//Populating the list
var ListPoints = new List<double[]>();
double[] P0 = { 10, 10, 0 };
double[] P1 = { 10, 0, 0 };
double[] P2 = { 0, 0, 0 };
double[] P3 = { 0, 10, 0 };
ListPoints.Add(P0);
ListPoints.Add(P1);
ListPoints.Add(P2);
ListPoints.Add(P3);
ListPoints.Add(P0);
///This list (by definition) will:
/// i) Always have a column with all the values equal zero
/// ii) The first and the last point will always be the same.
///We need to detect the column with all values = zero, because the list sorting will depend on this.
/// 1) Detect which columns has all values equal to zero using the columnZero variable
var counterX = new List<int>();
var counterY = new List<int>();
var counterZ = new List<int>();
for (int i = 0; i < ListPoints.Count - 1; i++)
{
//First column with all values equal zero
if (ListPoints[i][0] == 0 && ListPoints[i][0] == ListPoints[i + 1][0]) { counterX.Add(1); }
//Second column with all values equal zero
if (ListPoints[i][1] == 0 && ListPoints[i][1] == ListPoints[i + 1][1]) { counterY.Add(1); }
//Third column with all values equal zero
if (ListPoints[i][2] == 0 && ListPoints[i][2] == ListPoints[i + 1][2]) { counterZ.Add(1); }
}
if (counterX.Count == ListPoints.Count - 1)
{ Console.WriteLine("all points of the 1st column are zero");}
if (counterY.Count == ListPoints.Count - 1)
{ Console.WriteLine("all points of the 2nd column are zero");}
if (counterZ.Count == ListPoints.Count - 1)
{ Console.WriteLine("all points of the 3rd column are zero");}
/// 2) Now a point "Q" must be found in the list according to this:
/// 2.1/ If the first column has all values = zero:
/// Find the row index of the smallest value in the second column.
/// If there are several rows in the second column with the same minimum value go and find between those which one has the smallest value in the third column.
/// If there is only one value in the second column keep that one.
/// 2.2/ If the second column has all values = zero:
/// Find the row index of the smallest value in the first column.
/// If there are several rows in the first column with the same minimum value go and find between those which one has the smallest value in the third column.
/// If there is only one value in the first column keep that one.
/// 2.3/ If the third column has all values = zero:
/// Find the row index of the smallest value in the first column.
/// If there are several rows in the first column with the same minimum value go and find between those which one has the smallest value in the second column.
/// If there is only one value in the first column keep that one.
///
/// 3) Once this value has been found we have to put the list starting by this point "Q", then copy the previous values at the end of the list and finally add again "Q".
/// Example:The column with all values = 0 is column 3 and the generic point "Q" is the point "P2" in my list. The input is P0-P1-P2-P3-P0 but I want to have it P2-P3-P0-P1-P2.
}
}
Given your criteria, the first two points of your sorting algorithm degenerate into one much simpler algorithm.
That is: No matter which column is full of 0's, you are always sorting by first column, then by second column, then by third column ascending values. This will leave your target item at the beginning of your list, and from there you just grab the correct order from the original list.
private List<double[]> SpecialSort(List<double[]> list)
{
// Make an editable duplicate of your original list and remove the duplicated array at the start (or end) of the list.
List<double[]> orig = new List<double[]>(list);
orig.RemoveAt(0);
// Copy and sort the list by 1st column, 2nd column, 3rd column.
List<double[]> copy = orig.OrderBy(p => p[0]).ThenBy(p => p[1]).ThenBy(p => p[2]).ToList();
// The first item in the copy-sorted list is your point "Q".
// Find its index in the original list.
int index = orig.IndexOf(copy[0]);
List<double[]> sorted = new List<double[]>();
// For the list count + 1 (adding point Q twice) add the original list
// objects in the correct order, starting at "point'Q'".
for (int i = 0; i <= orig.Count; i++)
{
sorted.Add(orig[(index + i) % orig.Count]);
}
return sorted;
}
Then just call as
ListPoints = this.SpecialSort(ListPoints);

Categories

Resources