Distinguishing string being parsed using String Split - c#

I need to parse a line that is in a similar format as following:
s = "Jun 21 09:47:50 ez-x5 user.debug if_comm: [TX] 02 30 20 0f 30 31 39 24 64 31 30 31 03 54 ";
I am splitting the line with [TX] or [RX]. Here's what I do with the parsed string:
s = "Jun 21 09:47:50 ez-x5 user.debug if_comm: [TX] 02 30 20 0f 30 31 39 24 64 31 30 31 03 54 ";
string[] stringSeparators = new string[] { "[TX] " + start_key };
string transfer = s.Split(stringSeparators, 2, StringSplitOptions.None)[1];
//At this point, transfer[] = 02 30 20 0f 30 31 39 24 64 31 30 31 03 54
if (!string.IsNullOrEmpty(transfer))
{
string key = "";
string[] split = transfer.Split(' ');
if (split[0] == start_key)
{
for (int i = 0; i < key_length; i++)
{
key += split[i + Convert.ToInt32(key_index)];
}
TX_Handle(key);
}
}
stringSeparators = new string[] { "[RX]" + start_key };
transfer = s.Split(stringSeparators, 2, StringSplitOptions.None)[1];
if (!string.IsNullOrEmpty(transfer))
{
string key = "";
string[] split = transfer.Split(' ');
if (split[0] == start_key)
{
for (int i = 0; i < key_length; i++)
{
key += split[i + Convert.ToInt32(key_index)];
}
RX_Handle(key);
}
}
Basically, because I have no realistic way of comparing whether the given token is [TX] or [RX], I am forced to use the above approach to separate the string, which requires me to write essentially the same code twice.
What is a way I can get around this problem and know which token is being parsed so that I don't have to duplicate my code?

The best way to do this is look at what is common. What is common in your code? Splitting based on 2 different tokens and a function call based on 2 different tokens. This can be broken into a conditional, so, why not move the common element into a conditional?
const string receiveToken = "[RX] ";
const string transmitToken = "[TX] ";
string token = s.IndexOf(receiveToken) > -1 ? receiveToken : transmitToken;
..now you have your token, so you can remove most of the duplication.
stringSeparators = new string[] { token + start_key };
transfer = s.Split(stringSeparators, 2, StringSplitOptions.None)[1];
if (!string.IsNullOrEmpty(transfer))
{
string key = "";
string[] split = transfer.Split(' ');
if (split[0] == start_key)
{
for (int i = 0; i < key_length; i++)
{
key += split[i + Convert.ToInt32(key_index)];
}
RX_TX_Handle(key, token);
}
}
..then you can have a common handler, eg:
void RX_TX_Handle(string key, string token)
{
token == receiveToken ? RX_Handle(key) : TX_Handle(key);
}

How about a different approach and use a regular expression. Mixin a little bit of LINQ and you have some pretty easy to follow code.
static void ParseLine(
string line,
int keyIndex,
int keyLength,
Action<List<byte>> txHandler,
Action<List<byte>> rxHandler)
{
var re = new Regex(#"\[(TX|RX)\](?: ([0-9a-f]{2}))+");
var match = re.Match(line);
if (match.Success)
{
var mode = match.Groups[1].Value; // either TX or RX
var values = match.Groups[2]
.Captures.Cast<Capture>()
.Skip(keyIndex)
.Take(keyLength)
.Select(c => Convert.ToByte(c.Value, 16))
.ToList();
if (mode == "TX") txHandler(values);
else if (mode == "RX") rxHandler(values);
}
}
Or without regular expressions:
static void ParseLine(
string line,
int keyIndex,
int keyLength,
Action<List<byte>> txHandler,
Action<List<byte>> rxHandler)
{
var start = line.IndexOf('[');
var end = line.IndexOf(']', start);
var mode = line.Substring(start + 1, end - start - 1);
var values = line.Substring(end + 1)
.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries)
.Skip(keyIndex)
.Take(keyLength)
.Select(s => Convert.ToByte(s, 16))
.ToList();
if (mode == "TX") txHandler(values);
else if (mode == "RX") rxHandler(values);
}

I am not 100% sure if this answers your questions but I would create a TokenParser class that is responsible for parsing a token. You'll find it much easier to unit test.
public enum TokenType
{
Unknown = 0,
Tx = 1,
Rx = 2
}
public class Token
{
public TokenType TokenType { get; set; }
public IEnumerable<string> Values { get; set; }
}
public class TokenParser
{
public Token ParseToken(string input)
{
if (string.IsNullOrWhiteSpace(input)) throw new ArgumentNullException("input");
var token = new Token { TokenType = TokenType.Unknown };
input = input.ToUpperInvariant();
if (input.Contains("[TX]"))
{
token.TokenType = TokenType.Tx;
}
if (input.Contains("[RX]"))
{
token.TokenType = TokenType.Rx;
}
input = input.Substring(input.LastIndexOf("]", System.StringComparison.Ordinal) + 1);
token.Values = input.Trim().Split(Convert.ToChar(" "));
return token;
}
}
The example could be easily extended to allow multiple token parsers if the logic for parsing each token is vastly different.

Related

Errors using F__AnonymousType in C#

I spent hours trying to resolve compilation issues related to f__AnonymousType. Seems to gets a lot of errors regarding expressions needing directives but not sure exactly what to do.
public static void ChangeSerialNumber(char volume, uint newSerial)
{
var source = new <>f__AnonymousType0<string, int, int>[]
{
new
{
Name = "FAT32",
NameOffs = 82,
SerialOffs = 67
},
new
{
Name = "FAT",
NameOffs = 54,
SerialOffs = 39
},
new
{
Name = "NTFS",
NameOffs = 3,
SerialOffs = 72
}
};
using (Helpers.Disk disk = new Helpers.Disk(volume))
{
byte[] sector = new byte[512];
disk.ReadSector(0U, sector);
var <>f__AnonymousType = source.FirstOrDefault(f => Helpers.Strncmp(f.Name, sector, f.NameOffs));
if (<>f__AnonymousType == null)
{
throw new NotSupportedException("This file system is not supported");
}
uint num = newSerial;
int i = 0;
while (i < 4)
{
sector[<>f__AnonymousType.SerialOffs + i] = (byte)(num & 255U);
i++;
num >>= 8;
}
disk.WriteSector(0U, sector);
}
}
This is used for USB Stick Refurbishments in order as part of the software to secure wipe, we would like to change the serial numbers of the drive (in effect spoof them) in case of a chargeback we can match the drive they return to make sure its the one we sent out.
The point about anonymous type, is that you don't have to give them a name, the compiler will do it for you.
<>f__AnonymousType0 is not a valid name in user code, but looks like the name generated by the compiler. You can't use it.
Just use anonymous syntax :
public static void ChangeSerialNumber(char volume, uint newSerial)
{
var sources = new[]
{
new
{
Name = "FAT32",
NameOffs = 82,
SerialOffs = 67
},
new
{
Name = "FAT",
NameOffs = 54,
SerialOffs = 39
},
new
{
Name = "NTFS",
NameOffs = 3,
SerialOffs = 72
}
};
using (Helpers.Disk disk = new Helpers.Disk(volume))
{
byte[] sector = new byte[512];
disk.ReadSector(0U, sector);
var source = sources.FirstOrDefault(f => Helpers.Strncmp(f.Name, sector, f.NameOffs));
if (source == null)
{
throw new NotSupportedException("This file system is not supported");
}
var num = newSerial;
var i = 0;
while (i < 4)
{
sector[source.SerialOffs + i] = (byte) (num & 255U);
i++;
num >>= 8;
}
disk.WriteSector(0U, sector);
}
}

Find out if a list of strings contains permutations of words from another string (counter for each combination)

I didn't know exactly how to ask this question better so I will try to explain it as best as I can.
Let's say I have one list of 20 strings myList1<string> and I have another string string ToCompare. Now each of the strings in the list as well as the string ToCompare have 8 words divided by empty spaces. I want to know how many times combination of any three words from string ToCompare in any possible order is to be found in the strings of myList1<string>. For an example:
This is the list (short version - example):
string1 = "AA BB CC DD EE FF GG HH";
string2 = "BB DD EE AA HH II JJ MM";
.......
string20 = "NN OO AA RR EE BB FF KK";
string ToCompare = "BB GG AA FF CC MM RR II";
Now I want to know how many times any combination of 3 words from ToCompare string is to be found in myList1<string>. To clarify futher three words from ToCompare "BB AA CC" are found in string1 of the list thus the counter for these 3 words would be 1. Another 3 words from ToCompare "BB AA II" are found in the string2 of myList1<string> but the counter here would be also 1 because it's not the same combination of words (I have "AA" and "BB" but also "II". They are not equal). Order of these 3 words doesn't matter, that means "AA BB CC" = "BB AA CC" = "CC BB AA". I want to know how many combinations of all (any) 3 words from ToCompare are found in myList1<string>. I hope it's clear what I mean.
Any help would be appreciated, I don't have a clue how to solve this. Thanks.
Example from Vanest:
List<string> source = new List<string>();
source.Add("2 4 6 8 10 12 14 99");
source.Add("16 18 20 22 24 26 28 102");
source.Add("33 6 97 38 50 34 87 88");
string ToCompare = "2 4 6 15 20 22 28 44";
The rest of the code is exacty the same, and the result:
Key = 2 4 6, Value = 2
Key = 2 4 20, Value = 1
Key = 2 4 22, Value = 1
Key = 2 4 28, Value = 1
Key = 2 6 20, Value = 1
Key = 2 6 22, Value = 1
Key = 2 6 28, Value = 1
Key = 2 20 22, Value = 1
Key = 2 20 28, Value = 1
Key = 2 22 28, Value = 1
Key = 4 6 20, Value = 1
Key = 4 6 22, Value = 1
Key = 4 6 28, Value = 1
Key = 4 20 22, Value = 1
Key = 4 20 28, Value = 1
Key = 4 22 28, Value = 1
Key = 6 20 22, Value = 1
Key = 6 20 28, Value = 1
Key = 6 22 28, Value = 1
Key = 20 22 28, Value = 1
As you can see there are combinations which not exist in the strings, and the value of the first combination is 2 but it comes only one time in the first string
I think this should suffice your ask,
List<string> source = new List<string>();
source.Add("AA BB CC DD EE FF GG HH");
source.Add("BB DD EE AA HH II JJ MM");
source.Add("NN OO AA RR EE BB FF KK");
string ToCompare = "BB GG AA FF CC MM RR II";
string word1, word2, word3, existingKey;
string[] compareList = ToCompare.Split(new string[] { " " }, StringSplitOptions.None);
Dictionary<string, int> ResultDictionary = new Dictionary<string, int>();
for (int i = 0; i < compareList.Length - 2; i++)
{
word1 = compareList[i];
for (int j = i + 1; j < compareList.Length - 1; j++)
{
word2 = compareList[j];
for (int z = j + 1; z < compareList.Length; z++)
{
word3 = compareList[z];
source.ForEach(x =>
{
if (x.Contains(word1) && x.Contains(word2) && x.Contains(word3))
{
existingKey = ResultDictionary.Keys.FirstOrDefault(y => y.Contains(word1) && y.Contains(word2) && y.Contains(word3));
if (string.IsNullOrEmpty(existingKey))
{
ResultDictionary.Add(word1 + " " + word2 + " " + word3, 1);
}
else
{
ResultDictionary[existingKey]++;
}
}
});
}
}
}
ResultDictionary will have the 3 word combinations that occur in myList1<string> with their count of occurrences. To get the total count, retrieve and add all the value fields from ResultDictionary.
EDIT:
Below snippet produces correct result with the given input,
List<string> source = new List<string>();
source.Add("2 4 6 8 10 12 14 99");
source.Add("16 18 20 22 24 26 28 102");
source.Add("33 6 97 38 50 34 87 88");
string ToCompare = "2 4 6 15 20 22 28 44";
string word1, word2, word3, existingKey;
string[] compareList = ToCompare.Split(new string[] { " " }, StringSplitOptions.None);
string[] sourceList, keywordList;
Dictionary<string, int> ResultDictionary = new Dictionary<string, int>();
source.ForEach(x =>
{
sourceList = x.Split(new string[] { " " }, StringSplitOptions.RemoveEmptyEntries);
for (int i = 0; i < compareList.Length - 2; i++)
{
word1 = compareList[i];
for (int j = i + 1; j < compareList.Length - 1; j++)
{
word2 = compareList[j];
for (int z = j + 1; z < compareList.Length; z++)
{
word3 = compareList[z];
if (sourceList.Contains(word1) && sourceList.Contains(word2) && sourceList.Contains(word3))
{
existingKey = ResultDictionary.Keys.FirstOrDefault(y =>
{
keywordList = y.Split(new string[] { " " }, StringSplitOptions.None);
return keywordList.Contains(word1) && keywordList.Contains(word2) && keywordList.Contains(word3);
});
if (string.IsNullOrEmpty(existingKey))
{
ResultDictionary.Add(word1 + " " + word2 + " " + word3, 1);
}
else
{
ResultDictionary[existingKey]++;
}
}
}
}
}
});
Hope this helps...
I think this will do what you're asking for:
void Main()
{
var list =
new List<String>
{
"AA BB CC DD EE FF GG HH",
"BB DD EE AA HH II JJ MM",
"NN OO AA RR EE BB FF KK"
};
var toCompare = "BB GG AA FF CC MM RR II";
var permutations = CountPermutations(list, toCompare);
}
public Int32 CountPermutations(List<String> list, String compare)
{
var words = compare.Split(' ');
return list
.Select(l => l.Split(' '))
.Select(l => new { String = String.Join(" ", l), Count = l.Join(words, li => li, wi => wi, (li, wi) => li).Count()})
.Sum(x => x.Count - 3);
}
[edit: 2/20/2019]
You can use the following to get all the matches to each list item with the total number of unique combinations
void Main()
{
var list =
new List<String>
{
"AA BB CC DD EE FF GG HH",
"BB DD EE AA HH II JJ MM",
"NN OO AA RR EE BB FF KK",
"AA AA CC DD EE FF GG HH"
};
list.Select((l, i) => new { Index = i, Item = l }).ToList().ForEach(x => Console.WriteLine($"List Item{x.Index + 1}: {x.Item}"));
var toCompare = "BB GG AA FF CC MM RR II";
Console.WriteLine($"To Compare: {toCompare}");
Func<Int32, Int32> Factorial = x => x < 0 ? -1 : x == 0 || x == 1 ? 1 : Enumerable.Range(1, x).Aggregate((c, v) => c * v);
var words = toCompare.Split(' ');
var matches = list
// Get a list of the list items with all their parts
.Select(l => new { Parts = l.Split(' '), Original = l })
// Join each part from the to-compare item to each part of the list item
.Select(l => new { String = String.Join(" ", l), Matches = l.Parts.Join(words, li => li, wi => wi, (li, wi) => li), l.Original })
// Only consider items with at least 3 matches
.Where(l => l.Matches.Count() >= 3)
// Get the each item including how many parts matched and how many unique parts there are of each part
.Select(l => new { l.Original, Matches = String.Join(" ", l.Matches), Count = l.Matches.Count(), Groups = l.Matches.GroupBy(m => m).Select(m => m.Count()) })
// To calculate the unique combinations for each match use the following mathematical equation: match_count! / (frequency_part_1! * frequency_part_2! * ... * frequency_part_n!)
.Select(l => new { l.Original, l.Matches, Combinations = Factorial(l.Count) / l.Groups.Aggregate((c, v) => c * Factorial(v)) })
.ToList();
matches.ForEach(m => Console.WriteLine($"Original: {m.Original}, Matches: {m.Matches}, Combinations: {m.Combinations}"));
var totalUniqueCombinations = matches.Sum(x => x.Combinations);
Console.WriteLine($"Total Unique Combinations: {totalUniqueCombinations}");
}

Converting a delimited string to 2-D array

I have a grid of two-digit numbers written as a string delimited with newlines and spaces, e.g.:
string grid = "58 96 69 22 \n" +
"87 54 21 36 \n" +
"02 26 08 15 \n" +
"88 09 12 45";
I would like to split it into a 4-by-4 array, so that I can access it via something like separatedGrid[i, j]. I know I can use grid.Split(' ') to separate the numbers in each row, but how do I get a 2-D array out of it?
So what you want is to convert a delimited multi-Line String into a 2D-Array:
string grid = "58 96 69 22 \n" +
"87 54 21 36 \n" +
"02 26 08 15 \n" +
"88 09 12 45";
var lines = grid.Split(new string[] { "\n" }, StringSplitOptions.RemoveEmptyEntries)
.Select(x => x.Trim().Split(' ')).ToArray();
int numberOfRows = lines.Length;
int maxNumberOfColumns = lines.Max(x => x.Length);
string[,] separatedGrid = new string[numberOfRows, maxNumberOfColumns];
for (int i = 0; i < lines.Count(); i++)
{
string[] values = lines.ElementAt(i);
for (int j = 0; j < values.Length; j++)
{
separatedGrid.SetValue(values[j], i, j);
}
}
Yes use split like this:
string grid = "58 96 69 22 \n87 54 21 36 \n02 26 08 15 \n88 09 12 45";
var jagged = grid.Split('\n').Select(
x => new string[4] { x.Split(' ')[0], x.Split(' ')[1], x.Split(' ')[2], x.Split(' ')[3] }
).ToArray();
And if you want a 2-D Array:
var _2D = new String[jagged.Length, jagged[0].Length];
for (var i = 0; i != jagged.Length; i++)
for (var j = 0; j != jagged[0].Length; j++)
_2D[i, j] = jagged[i][j];
The result:

Find duplicate entries in a file for each day

I want to write a c# code that reads my file which is in the below given format and prints all the duplicate entries for each date along with the number of occurrence.
Example.txt :
March 03 2014 abcd March 03 2014 def March 03 2014 abcd March 04 2014 xyz March 04 2014 xyz
Output :
March 03 2014 abcd 2
March 04 2014 xyz 2
Can someone help me with this?
I was thinking about using dictionary where the event would be my key and for each duplicate event, I would increment the value. But I am not sure how to group the result for each day.
It might be good case for LINQ power:
var input = "March 03 2014 abcd March 03 2014 def March 03 2014 abcd March 04 2014 xyz March 04 2014 xyz";
var format = "MMMM dd yyyy";
var results = input.Split(' ')
.Select((v, i) => new { v, i })
.GroupBy(x => x.i / 4, x => x.v, (k, g) => g.ToList())
.Select(g => new
{
Date = DateTime.ParseExact(String.Join(" ", g.Take(3)), format, CultureInfo.InvariantCulture),
Event = g[3]
})
.GroupBy(x => x)
.Where(g => g.Count() > 1)
.Select(g => new
{
Item = g.Key,
Count = g.Count()
});
foreach (var i in results)
Console.WriteLine("{0} {1} {2}", i.Item.Date.ToString(format), i.Item.Event, i.Count.ToString());
Prints exactly what you need.
Going by your original description of the problem and sample data, this code will probably work with some tweaks. You could probably do it using some of the LINQ libraries as well.
List<String> outputStringList = new List<string>();
IEnumerable<String> stringEnumerable = System.IO.File.ReadLines(#"c:\tmp\test.txt");
System.Collections.Generic.HashSet<String> uniqueHashSet = new System.Collections.Generic.HashSet<String>();
foreach (String line in stringEnumerable) { uniqueHashSet.Add(line); }
foreach (String output in uniqueHashSet)
{
Int32 count = stringEnumerable.Count(element => element == output);
if (count > 1) { outputStringList.Add(output + " " + count); }
//if (count > 1) { System.Diagnostics.Debug.WriteLine(output + " " + count); }
}
I see that you changed the formatting of your data while I was writing up my answer. Please disregard as this solution will no longer work.
You can split your text by using a Regular Expression.
public IEnumerable<KeyValuePair<String, Int32>> SearchDuplicates(string file){
var file = File.ReadLines(file);
var pattern = new Regex("[A-Za-z]* [0-9]{2} [0-9]{4} [A-Za-z]*");
var results = new Dictionary<string, int>();
foreach(var line in file) {
foreach(Match match in pattern.Matches(line)) {
if(!results.ContainsKey(match.Value))
results.Add(match.Value, 0);
results[match.Value]++;
}
}
return results.Where(v => v.Value > 1);
}
Note: I've written this to be simple to read, with comments explaining the process.
If you are also the one writing this file, to separate each "file" with a Record separator, which if you look on the ascii table has a value of 30. If this is not the case, and you HAVE to use the file format given in the OP let me know and I can add a case for that.
// Reads in the entire file into one string variable.
string allTheText = File.ReadAllText(string filePath);
// Splits each "file" into a string of its own.
string[] files = allTheText.Split((char)30);
// Do this if you have a newline inbetween each "file" instead of just spaces.
string[] files = File.ReadAllLines(string filePath);
// Make a Dictionary<string, string> to hold all these (you could use DateTime but I opted to not).
Dictionary<string, string> entries = new Dictionary<string, string>();
foreach(string file in files)
{
// Now lets get the Date of this "file".
// We need the index of the 3rd space
var offset = file.IndexOf(' ');
offset = file.IndexOf(' ', offset+1);
offset = file.IndexOf(' ', offset+1);
// Now split up the string by this offset
string date = file.Substring(0, offset-1);
string filecont = file.Substring(offset);
// Only add if it isn't already in there
if(!entries.Keys.Contains(date))
entries.Add(date, filecont);
}
// Print them out
foreach(string key in entries)
{
Console.WriteLine(key + " " + entries[key]);
}
You can tokenize it based on the month delimiter if you want
public static void Main (string[] args)
{
var str = "March 03 2014 abcd March 03 2014 def March 03 2014 abcd March 04 2014 xyz March 04 2014 xyz";
var rawResults = tokenize (str).GroupBy(i => i);
foreach (var item in rawResults) {
Console.WriteLine ("Item {0} happened {1} times", item.Key, item.Count());
}
}
static List<String> tokenize (string str)
{
var months = new[]{ "March", "April", "May" }; //etc
var strTokens = str.Split (new []{ ' ' }, StringSplitOptions.RemoveEmptyEntries);
var results = new List<string> ();
var current = "";
foreach (var token in strTokens) {
if (months.Contains(token)) {
if (current != null && current != "") {
results.Add (current);
}
current = token + " ";
} else {
current += token + " ";
}
}
results.Add (current);
return results;
}
Better yet, use a parser combinator to do it
A simple solution using regex
string input = "March 03 2014 abcd March 03 2014 def March 03 2014 abcd March 04 2014 xyz March 04 2014 xyz";
List<string> dates = new List<string>();
string[] splitted = input.Split(' ');
for (int i = 0; i < splitted.Length; i = i + 4)
{
string strDate = splitted[i] + " " + splitted[i + 1] + " " + splitted[i + 2] + " " + splitted[i + 3];
if (!dates.Contains(strDate))
{
dates.Add(strDate);
if (Regex.Matches(input, strDate).Count > 1)
Console.WriteLine(strDate + " " + Regex.Matches(input, strDate).Count);
}
}

Dividing a list of strings

Didn't quite know what to title this question so please feel free to edit.
I have a list of strings where all elements are strings with a length of 40.
What I want to do is split the list elements at character 20 and push the last part of the now divided string to the next element in the list, appending all other elements in the list.
E.g.
list[0] = 0011
list[1] = 2233
list[2] = 4455
^split here
// new list results in:
list[0] = 00
list[1] = 11
list[3] = 22
list[4] = 33
list[5] = 44
list[6] = 55
How can this be achieved?
list = list.SelectMany(s => new [] { s.Substring(0, 20), s.Substring(20, 20) })
.ToList();
list = list.SelectMany(x=>new[]{x.Substring(0, 20), x.Substring(20)}).ToList();
Not sure why you want to do that, but it's quite simple with linq:
List<string> split = list.SelectMany(s => new []{s.Substring(0, 2), s.Substring(2)}).ToList();
If you must work with the existing array:
const int elementCount = 3;
const int indexToSplit = 2;
string[] list = new string[elementCount * 2] { "0011", "0022", "0033", null, null, null };
for (int i = elementCount; i > 0; --i)
{
var str = list[i-1];
var left = str.Substring( 0, indexToSplit );
var right = str.Substring( indexToSplit, str.Length - indexToSplit );
var rightIndex = i * 2 - 1;
list[rightIndex] = right;
list[rightIndex - 1] = left;
}
foreach( var str in list )
{
Console.WriteLine( str );
}

Categories

Resources