I am using a hashtable to read data from file and make clusters.
Say the data in file is:
umair,i,umair
sajid,mark,i , k , i
The output is like:
[{umair,umair},i]
[sajid,mark,i,i,k]
But my code does not work. Here is the code:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Collections;
namespace readstringfromfile
{
class Program
{
static void Main()
{
/* int i = 0;
foreach (string line in File.ReadAllLines("newfile.txt"))
{
string[] parts = line.Split(',');
foreach (string part in parts)
{
Console.WriteLine("{0}:{1}", i,part);
}
i++; // For demo only
}*/
Hashtable hashtable = new Hashtable();
using (StreamReader r = new StreamReader("newfile.txt"))
{
string line;
while ((line = r.ReadLine()) != null)
{
string[] records = line.Split(',');
foreach (string record in records)
{
if (hashtable[records] == null)
hashtable[records] = (int)0;
hashtable[records] = (int)hashtable[records] + 1;
Console.WriteLine(hashtable.Keys);
}
/////this portion is not working/////////////////////////////////////
foreach (DictionaryEntry entry in hashtable)
{
for (int i = 0; i < (int)hashtable[records]; i++)
{
Console.WriteLine(entry);
}
}
}
}
}
}
}
You're working with the records array when inserting into the hashtable (and when reading from it) instead of using the foreach-variable record. Also, in the final look, you iterate based on records instead of the current entry.Key. You're also declaring the hashtable in a too wide scope, causing all rows to be inserted into the same hashtable, instead of one per row.
public static void Main() {
var lines = new[] { "umair,i,umair", "sajid,mark,i,k,i" };
foreach (var line in lines) {
var hashtable = new Hashtable();
var records = line.Split(',');
foreach (var record in records) {
if (hashtable[record] == null)
hashtable[record] = 0;
hashtable[record] = (Int32)hashtable[record] + 1;
}
var str = "";
foreach (DictionaryEntry entry in hashtable) {
var count = (Int32)hashtable[entry.Key];
for (var i = 0; i < count; i++) {
str += entry.Key;
if (i < count - 1)
str += ",";
}
str += ",";
}
// Remove last comma.
str = str.TrimEnd(',');
Console.WriteLine(str);
}
Console.ReadLine();
}
However, you should consider using the generic Dictionary<TKey,TValue> class, and use a StringBuilder if you're building alot of strings.
public static void Main() {
var lines = new[] { "umair,i,umair", "sajid,mark,i,k,i" };
foreach (var line in lines) {
var dictionary = new Dictionary<String, Int32>();
var records = line.Split(',');
foreach (var record in records) {
if (!dictionary.ContainsKey(record))
dictionary.Add(record, 1);
else
dictionary[record]++;
}
var str = "";
foreach (var entry in dictionary) {
for (var i = 0; i < entry.Value; i++) {
str += entry.Key;
if (i < entry.Value - 1)
str += ",";
}
str += ",";
}
// Remove last comma.
str = str.TrimEnd(',');
Console.WriteLine(str);
}
Console.ReadLine();
}
You're attempting to group elements of a sequence. LINQ has a built-in operator for that; it's used as group ... by ... into ... or the equivalent method .GroupBy(...)
That means you can write your code (excluding File I/O etc.) as:
var lines = new[] { "umair,i,umair", "sajid,mark,i,k,i" };
foreach (var line in lines) {
var groupedRecords =
from record in line.Split(',')
group record by record into recordgroup
from record in recordgroup
select record;
Console.WriteLine(
string.Join(
",", groupedRecords
)
);
}
If you prefer shorter code, the loop be equivalently written as:
foreach (var line in lines)
Console.WriteLine(string.Join(",",
line.Split(',').GroupBy(rec=>rec).SelectMany(grp=>grp)));
both versions will output...
umair,umair,i
sajid,mark,i,i,k
Note that you really shouldn't be using a Hashtable - that's just a type-unsafe slow version of Dictionary for almost all purposes. Also, the output example you mention includes [] and {} characters - but you didn't specify how or whether they're supposed to be included, so I left those out.
A LINQ group is nothing more than a sequence of elements (here, identical strings) with a Key (here a string). Calling GroupBy thus transforms the sequence of records into a sequence of groups. However, you want to simply concatenate those groups. SelectMany is such a concatenation: from a sequence of items, it concatenates the "contents" of each item into one large sequence.
Related
using (StreamWriter writer = File.CreateText(FinishedFile))
{
int lineNum = 0;
while (lineNum < FilesLineCount.Min())
{
for (int i = 0; i <= FilesToMerge.Count() - 1; i++)
{
if (i != FilesToMerge.Count() - 1)
{
var CurrentFile = File.ReadLines(FilesToMerge[i]).Skip(lineNum).Take(1);
string CurrentLine = string.Join("", CurrentFile);
writer.Write(CurrentLine + ",");
}
else
{
var CurrentFile = File.ReadLines(FilesToMerge[i]).Skip(lineNum).Take(1);
string CurrentLine = string.Join("", CurrentFile);
writer.Write(CurrentLine + "\n");
}
}
lineNum++;
}
}
The current way i am doing this is just too slow. I am merging files that are each 50k+ lines long with various amounts of data.
for ex:
File 1
1
2
3
4
File 2
4
3
2
1
i need this to merge into being a third fileFile 3
1,4
2,3
3,2
4,1P.S. The user can pick as many files as they want from any locations.
Thanks for the help.
You approach is slow because of the Skip and Take in the loops.
You could use a dictionary to collect all line-index' lines:
string[] allFileLocationsToMerge = { "filepath1", "filepath2", "..." };
var mergedLists = new Dictionary<int, List<string>>();
foreach (string file in allFileLocationsToMerge)
{
string[] allLines = File.ReadAllLines(file);
for (int lineIndex = 0; lineIndex < allLines.Length; lineIndex++)
{
bool indexKnown = mergedLists.TryGetValue(lineIndex, out List<string> allLinesAtIndex);
if (!indexKnown)
allLinesAtIndex = new List<string>();
allLinesAtIndex.Add(allLines[lineIndex]);
mergedLists[lineIndex] = allLinesAtIndex;
}
}
IEnumerable<string> mergeLines = mergedLists.Values.Select(list => string.Join(",", list));
File.WriteAllLines("targetPath", mergeLines);
Here's another approach - this implementation only stores in memory one set of lines from each file simultaneously, thus reducing memory pressure significantly (if that is an issue).
public static void MergeFiles(string output, params string[] inputs)
{
var files = inputs.Select(File.ReadLines).Select(iter => iter.GetEnumerator()).ToArray();
StringBuilder line = new StringBuilder();
bool any;
using (var outFile = File.CreateText(output))
{
do
{
line.Clear();
any = false;
foreach (var iter in files)
{
if (!iter.MoveNext())
continue;
if (line.Length != 0)
line.Append(", ");
line.Append(iter.Current);
any = true;
}
if (any)
outFile.WriteLine(line.ToString());
}
while (any);
}
foreach (var iter in files)
{
iter.Dispose();
}
}
This also handles files of different lengths.
I'm trying to create a new list from two separate lists like so:
List 1 (sCat) = MD0, MD1, MD3, MD4
List 2 (sLev) = 01, 02, 03, R
Output-->
MD0-01
MD0-02
MD0-03
MD0-R
MD1-01
MD1-02
MD1-03
MD1-R
MD3-01
MD3-02
MD3-03
MD3-R
etc...
I would like to know if there is a function that would produce the results above. Ultimately, I would like the user to provide List 2 and have that information added to List 1 and stored as a new list that I could call later.
enter code here
using System;
using System.Collections.Generic;
using System.Linq;
class Program
{
static void Main(string[] args)
{
List<string> sCat = new List<string>();
// add Categories for the Sheets
sCat.Add("MD0");
sCat.Add("MD1");
sCat.Add("MD3");
List<string> sLev = new List<string>();
// add Levels for the Project
sLev.Add("01");
sLev.Add("02");
sLev.Add("03");
sLev.Add("R");
for (int i = 0; i < sCat.Count; i++)
{
// I am getting stuck here.
// I don't know how to take one item from the sCat list and
// add it to the sLev List incrementally.
Console.WriteLine(sCat[i],i);
}
Console.ReadLine();
}
}
Combine the values of all the elements selected from the first collection with the elements contained in the other collection:
var combined = sCat.SelectMany(s => sLev.Select(s1 => $"{s}-{s1}")).ToList();
Which is like iterating the two collections in a nested for/foreach loop, adding each combined element to a new List<string>:
List<string> combined = new List<string>();
foreach (var s1 in sCat)
foreach (var s2 in sLev) {
combined.Add(s1 + "-" + s2);
}
You can replace your for loop with following:
foreach(var sCatValue in sCat)
{
foreach(var sLevValue in sLev)
{
Console.WriteLine($"{sCatValue}-{sLevValue}");
}
}
private static void Main()
{
List<string> sCat = new List<string>();
// add Categories for the Sheets
sCat.Add("MD0");
sCat.Add("MD1");
sCat.Add("MD3");
List<string> sLev = new List<string>();
// add Levels for the Project
sLev.Add("01");
sLev.Add("02");
sLev.Add("03");
sLev.Add("R");
string dash = "-";
List<string> newList = new List<string>();
for (int i = 0; i < sCat.Count; i++)
{
for (int j = 0; j < sLev.Count; j++)
{
newList.Add(sCat[i] + dash + sLev[j]);
}
}
foreach (var item in newList)
{
Console.WriteLine(item);
}
Console.ReadLine();
}
I am looking for the quickest algorithm:
GOAL: output the total number of pair occurrences found on a line. The individual elements may be in any order on any given line.
INPUT:
a;b;c;d
a;e;f;g
a;b;f;h
OUTPUT
a;b = 2
a;c = 1
a;d = 1
a;e = 1
a;f = 2
a;g = 1
b;c = 1
b;d = 1
I am programming in C#, I've got a nested for loop adding do a common dictionary of type where string is like a;b and when an occurrence is found it adds to the existing int tally or adds a new one at tally = 0.
Note this:
a;b = 1
b;a = 1
Should be reduced to this:
a;b = 1
I am open to using other languages, the output is in a plain text file which I feed into Gephi visualization tool.
Bonus: Very interested to know the name of this particular algorithm if it's out there. Pretty sure it is.
String[] data = File.ReadAllLines(#"C:\input.txt");
Dictionary<string, int> ress = new Dictionary<string, int>();
foreach (var line in data)
{
string[] outStrings = line.Split(';');
for (int i = 0; i < outStrings.Count(); i++)
{
for (int y = 0; y < outStrings.Count(); y++)
{
if (outStrings[i] != outStrings[y])
{
try
{
if (ress.Any(x => x.Key == outStrings[i] + ";" + outStrings[y]))
{
ress[outStrings[i] + ";" + outStrings[y]] += 1;
}
else
{
ress.Add(outStrings[i] + ";" + outStrings[y], 0);
}
}
catch (Exception)
{
}
}
}
}
}
foreach (var val in ress)
{
Console.WriteLine(val.Key + "----" + val.Value);
}
I think your inner loop should start with i + 1 instead of starting back at 0 again, and the outer loop should only run until Length - 1, since the last item will be compared on the inner loop. Also, when you add a new item, you should add the value 1, not 0 (since the whole reason we're adding it is because we found one).
You can also just store the key into a string once instead of doing multiple concatenations during your comparison and assignment, and you can use the ContainsKey method to determine if a key exists already.
Also, you might want to consider avoiding empty catch blocks unless you're really certain that you don't care if or what went wrong. If I'm expecting an exception and know how to handle it, then I catch that exception, otherwise I'll just let it bubble up the stack.
Here's one way you could modify your code to find all pairs and their counts:
Update
I added a check to ensure that the "pair" key is always sorted, so that "b;a" becomes "a;b". This wasn't an issue in your sample data, but I extended the data to include lines like b;a;a;b;a;b;a;. Also I added StringSplitOptions.RemoveEmptyEntries to the Split method to handle cases where a line begins or ends with a ; (otherwise the null value resulted in a pair like ";a").
private static void Main()
{
var data = File.ReadAllLines(#"f:\public\temp\temp.txt");
var pairCount = new Dictionary<string, int>();
foreach (var line in data)
{
var lineItems = line.Split(new[] {';'}, StringSplitOptions.RemoveEmptyEntries);
for (var outer = 0; outer < lineItems.Length - 1; outer++)
{
for (var inner = outer + 1; inner < lineItems.Length; inner++)
{
var outerComparedToInner = string.Compare(lineItems[outer],
lineItems[inner], StringComparison.Ordinal);
// If both items are the same character, ignore them and keep looping
if (outerComparedToInner == 0) continue;
// Create the pair such that the lower of the two
// values is first, so that "b;a" becomes "a;b"
var thisPair = outerComparedToInner < 0
? $"{lineItems[outer]};{lineItems[inner]}"
: $"{lineItems[inner]};{lineItems[outer]}";
if (pairCount.ContainsKey(thisPair))
{
pairCount[thisPair]++;
}
else
{
pairCount.Add(thisPair, 1);
}
}
}
}
Console.WriteLine("Pair\tCount\n----\t-----");
foreach (var val in pairCount.OrderBy(i => i.Key))
{
Console.WriteLine($"{val.Key}\t{val.Value}");
}
Console.Write("\nDone!\nPress any key to exit...");
Console.ReadKey();
}
Output
Given a file containing your sample data, the output is:
#mrmcgreg, finally after changing the implementation to the ECLAT algorythm everything runs in seconds instead of hours.
Basically for each unique tag, keep track of the LINE NUMBERS where those tags are found, and simply intersect the pair of list of numbers by combination pairs to get the count.
Dictionary<string, List<int>> uniqueTagList = new Dictionary<string, List<int>>();
foreach (var uniqueTag in uniquetags)
{
List<int> lineNumbers = new List<int>();
foreach (var item in data.Select((value, i) => new { i, value }))
{
var value = item.value;
var index = item.i;
//split data into tags
var tags = item.ToString().Split(new[] { ';' }, StringSplitOptions.RemoveEmptyEntries);
foreach (var tag in tags)
{
if (uniqueTag == tag)
{
lineNumbers.Add(index);
}
}
}
//remove all having support threshold.
if (lineNumbers.Count > 5)
{
uniqueTagList.Add(uniqueTag, lineNumbers);
}
}
I want to be able to write some values to a file whilst creating blank lines in between. Here is the code that I have so far:
TextWriter w_Test = new StreamWriter(file_test);
foreach (string results in searchResults)
{
w_Test.WriteLine(Path.GetFileNameWithoutExtension(results));
var list1 = File.ReadAllLines(results).Skip(10);
foreach (string listResult in list1)
{
w_Test.WriteLine(listResult);
}
}
w_Test.Close();
This creates 'Test' with the following output:
result1
listResult1
listResult2
result2
listResult3
result3
result4
I want to write the results so that each result block is 21 lines in size before writing the next, e.g.
result1
(20 lines even if no 'listResult' found)
result2
(20 lines even if no 'listResult' found)
etc.......
What would be the best way of doing this??
TextWriter w_Test = new StreamWriter(file_test);
foreach (string results in searchResults)
{
int noLinesOutput = 0;
w_Test.WriteLine(Path.GetFileNameWithoutExtension(results));
noLinesOutput++;
var list1 = File.ReadAllLines(results).Skip(10);
foreach (string listResult in list1)
{
w_Test.WriteLine(listResult);
noLinesOutput++;
}
for ( int i = 20; i > noLinesOutput; i-- )
w_Test.WriteLine();
}
w_Test.Close();
Here's a simple helper method I use in such cases:
// pad the sequence with 'elem' until it's 'count' elements long
static IEnumerable<T> PadRight<T>(IEnumerable<T> enm,
T elem,
int count)
{
int ii = 0;
foreach(var elem in enm)
{
yield return elem;
ii += 1;
}
for (; ii < count; ++ii)
{
yield return elem;
}
}
Then
foreach (string listResult in
PadRight(list1, "", 20))
{
w_Test.WriteLine(listResult);
}
should do the trick.
Perhaps with this loop:
var lines = 20;
foreach(string fullPath in searchResults)
{
List<string> allLines = new List<string>();
allLines.Add(Path.GetFileNameWithoutExtension(fullPath));
int currentLine = 0;
foreach(string line in File.ReadLines(fullPath).Skip(10))
{
if(++currentLine > lines) break;
allLines.Add(line);
}
while (currentLine++ < lines)
allLines.Add(String.Empty);
File.WriteAllLines(fullPath, allLines);
}
I know there's a couple similarly worded questions on SO about permutation listing, but they don't seem to be quite addressing really what I'm looking for. I know there's a way to do this but I'm drawing a blank. I have a flat file that resembles this format:
Col1|Col2|Col3|Col4|Col5|Col6
a|b,c,d|e|f|g,h|i
. . .
Now here's the trick: I want to create a list of all possible permutations of these rows, where a comma-separated list in the row represents possible values. For example, I should be able to take an IEnumerable<string> representing the above to rows as such:
IEnumerable<string> row = new string[] { "a", "b,c,d", "e", "f", "g,h", "i" };
IEnumerable<string> permutations = GetPermutations(row, delimiter: "/");
This should generate the following collection of string data:
a/b/e/f/g/i
a/b/e/f/h/i
a/c/e/f/g/i
a/c/e/f/h/i
a/d/e/f/g/i
a/d/e/f/h/i
This to me seems like it would elegantly fit into a recursive method, but apparently I have a bad case of the Mondays and I can't quite wrap my brain around how to approach it. Some help would be greatly appreciated. What should GetPermutations(IEnumerable<string>, string) look like?
You had me at "recursive". Here's another suggestion:
private IEnumerable<string> GetPermutations(string[] row, string delimiter,
int colIndex = 0, string[] currentPerm = null)
{
//First-time initialization:
if (currentPerm == null) { currentPerm = new string[row.Length]; }
var values = row[colIndex].Split(',');
foreach (var val in values)
{
//Update the current permutation with this column's next possible value..
currentPerm[colIndex] = val;
//..and find values for the remaining columns..
if (colIndex < (row.Length - 1))
{
foreach (var perm in GetPermutations(row, delimiter, colIndex + 1, currentPerm))
{
yield return perm;
}
}
//..unless we've reached the last column, in which case we create a complete string:
else
{
yield return string.Join(delimiter, currentPerm);
}
}
}
I'm not sure whether this is the most elegant approach, but it might get you started.
private static IEnumerable<string> GetPermutations(IEnumerable<string> row,
string delimiter = "|")
{
var separator = new[] { ',' };
var permutations = new List<string>();
foreach (var cell in row)
{
var parts = cell.Split(separator);
var perms = permutations.ToArray();
permutations.Clear();
foreach (var part in parts)
{
if (perms.Length == 0)
{
permutations.Add(part);
continue;
}
foreach (var perm in perms)
{
permutations.Add(string.Concat(perm, delimiter, part));
}
}
}
return permutations;
}
Of course, if the order of the permutations is important, you can add an .OrderBy() at the end.
Edit: added an alernative
You could also build a list of string arrays, by calculating some numbers before determining the permutations.
private static IEnumerable<string> GetPermutations(IEnumerable<string> row,
string delimiter = "|")
{
var permutationGroups = row.Select(o => o.Split(new[] { ',' })).ToArray();
var numberOfGroups = permutationGroups.Length;
var numberOfPermutations =
permutationGroups.Aggregate(1, (current, pg) => current * pg.Length);
var permutations = new List<string[]>(numberOfPermutations);
for (var n = 0; n < numberOfPermutations; n++)
{
permutations.Add(new string[numberOfGroups]);
}
for (var position = 0; position < numberOfGroups; position++)
{
var permutationGroup = permutationGroups[position];
var numberOfCharacters = permutationGroup.Length;
var numberOfIterations = numberOfPermutations / numberOfCharacters;
for (var c = 0; c < numberOfCharacters; c++)
{
var character = permutationGroup[c];
for (var i = 0; i < numberOfIterations; i++)
{
var index = c + (i * numberOfCharacters);
permutations[index][position] = character;
}
}
}
return permutations.Select(p => string.Join(delimiter, p));
}
One algorithm you can use is basically like counting:
Start with the 0th item in each list (00000)
Increment the last value (00001, 00002 etc.)
When you can't increas one value, reset it and increment the next (00009, 00010, 00011 etc.)
When you can't increase any value, you're done.
Function:
static IEnumerable<string> Permutations(
string input,
char separator1, char separator2,
string delimiter)
{
var enumerators = input.Split(separator1)
.Select(s => s.Split(separator2).GetEnumerator()).ToArray();
if (!enumerators.All(e => e.MoveNext())) yield break;
while (true)
{
yield return String.Join(delimiter, enumerators.Select(e => e.Current));
if (enumerators.Reverse().All(e => {
bool finished = !e.MoveNext();
if (finished)
{
e.Reset();
e.MoveNext();
}
return finished;
}))
yield break;
}
}
Usage:
foreach (var perm in Permutations("a|b,c,d|e|f|g,h|i", '|', ',', "/"))
{
Console.WriteLine(perm);
}
I really thought this would be a great recursive function, but I ended up not writing it that way. Ultimately, this is the code I created:
public IEnumerable<string> GetPermutations(IEnumerable<string> possibleCombos, string delimiter)
{
var permutations = new Dictionary<int, List<string>>();
var comboArray = possibleCombos.ToArray();
var splitCharArr = new char[] { ',' };
permutations[0] = new List<string>();
permutations[0].AddRange(
possibleCombos
.First()
.Split(splitCharArr)
.Where(x => !string.IsNullOrEmpty(x.Trim()))
.Select(x => x.Trim()));
for (int i = 1; i < comboArray.Length; i++)
{
permutations[i] = new List<string>();
foreach (var permutation in permutations[i - 1])
{
permutations[i].AddRange(
comboArray[i].Split(splitCharArr)
.Where(x => !string.IsNullOrEmpty(x.Trim()))
.Select(x => string.Format("{0}{1}{2}", permutation, delimiter, x.Trim()))
);
}
}
return permutations[permutations.Keys.Max()];
}
... my test conditions provided me with exactly the output I expected:
IEnumerable<string> row = new string[] { "a", "b,c,d", "e", "f", "g,h", "i" };
IEnumerable<string> permutations = GetPermutations(row, delimiter: "/");
foreach(var permutation in permutations)
{
Debug.Print(permutation);
}
This produced the following output:
a/b/e/f/g/i
a/b/e/f/h/i
a/c/e/f/g/i
a/c/e/f/h/i
a/d/e/f/g/i
a/d/e/f/h/i
Thanks to everyone's suggestions, they really were helpful in sorting out what needed to be done in my mind. I've upvoted all your answers.