How to parse delimited files to be compared - c#

There is a text file formatted like the example below that I need to search for a students class name:
Michael | Straham | Eng101(4.0) | Mth303
Jacob | Black | SCI 210 (2.3) | Eng101
Ian | Summers | Mth303(3.30) | Sci 210
The delimited symbols are ( | )
The class names are "ENG101, SCI210, MTH303." I would like to search each line from the text for that class name and somehow index them so that they can be compared. The end result would be this:
ENG101:
Michael Straham, Jacob Black
Please assist. Thanks in advance!

I'm assuming you're already reading in the input line by line.
You can use String.Split() to accomplish (the first part of) what you are trying to do.
For example, the following code
String s1 = "Michael | Straham | Eng101(4.0) | Mth303";
char[] separators = { '|' };
String[] values = s1.Split(separators);
would give you an array of 4 strings ( "Michael", "Straham", "Eng101(4.0)", "Mth303"). You can then analyze the values array to see who is in which class. I'd probably have code roughly that looks like this (in pseudocode):
foreach (line in input)
{
String s1 = line;
char[] separators = { '|' };
String[] values = s1.Split(separators);
String firstName = values[0];
String lastName = values[1];
for (i = 2, i < values.length)
{
if (values[i] looks like "ENG101")
{
add firstName lastName to "ENG101" student list
}
else if (values[i] looks like "MTH303")
{
....
}
....
}
}

Related

I want to split column data into Different column

i have data in a column which i want to split into different column.
data in column is not consistent.
eg:-
974/mt (ICD TKD)
974/mt (+AD 91.27/mt, ICD/TKD)
970-980/mt
970-980/mt
i have tried with substring but not found any solution
OUTPUT SHOULD BE:-
min |max | unit | description
-------------------------
NULL | 974 | /mt | ICD TKD
NULL | 974 | /mt |+AD 91.27/mt, ICD/TKD
970 | 980 | /mt |NULL
You can use Regex to parse the information, and then add columns with the parsed data.
Assumptions (due to lack of clarity in OP)
Min Value is optional
If present, Min Value is succeeded by a "/", followed by Max Value
Description is optional
Since OP haven't mentioned what to assume when Min Value is not available, I have used string type for Min/Max values, but should be ideally replaced by apt DataType.
public Sample Split(string columnValue)
{
var regex = new Regex(#"(?<min>\d+-)?(?<max>\d+)(?<unit>[\/a-zA-Z]+)\s?(\((?<description>(.+))\))?",RegexOptions.Compiled);
var match = regex.Match(columnValue);
if(match.Success)
{
return new Sample
{
Min = match.Groups["min"].Value,
Max = match.Groups["max"].Value,
Unit = match.Groups["unit"].Value,
Description = match.Groups["description"].Value
};
}
return default;
}
public class Sample
{
public string Min{get;set;}
public string Max{get;set;}
public string Unit{get;set;}
public string Description{get;set;}
}
For Example,
var list = new []
{
#"974/mt (ICD TKD)",
#"974/mt (+AD 91.27/mt, ICD/TKD)",
#"970-980/mt",
"970-980/mt"
};
foreach(var item in list)
{
var result = Split(item);
Console.WriteLine($"Min={result.Min},Max={result.Max},Unit={result.Unit},Description={result.Description}");
}
Output
Min=,Max=974,Unit=/mt,Description=ICD TKD
Min=,Max=974,Unit=/mt,Description=+AD 91.27/mt, ICD/TKD
Min=970-,Max=980,Unit=/mt,Description=
Min=970-,Max=980,Unit=/mt,Description=

C# Regular Expression Nested Paraenthisis Recursive

Need help to find the regular expression.
Text = #"{'the quick' | 'the lazy'}{{{'BEFORE'} 'fox' } | {{'BEFORE'} 'lion'}}"
Result String Array Should be -
[0] = 'the quick' | 'the lazy',
[1] = BEFORE/1 fox | BEFORE/2 lion
Unless two or more strings are split by |, I need them to side by side.
thanks for your help. After thinking a bit, I was able to find the simple solution.
string sampleText = "{'what is' | 'when is'}{BEFORE/2 }{doing | 'to do'}{ NEAR/3 }{'to ensure our' | 'to ensure that' | 'to make sure our' | 'to make sure that our' | 'so our' | 'so that our'}{ BEFORE/4 }{doesn*t | 'does not' | 'will not' | won*t}";
List<List<string>> list = new List<List<string>>();
Match mt = Regex.Match(sampleText, #"\}([^{]*)\{");
string value = sampleText.Replace(mt.Value, "},{");
string[] newArray = value.Split(",".ToCharArray());
foreach (string st in newArray)
{
list.Add(new List<string>(st.Replace("{", "").Replace("}", "").Split('|')));
}

Splitting an array into 2 parts

I am attempting to read a log file in this format:
date | cost
date | cost
..ect
Using the following code to read the file in to an array:
string[] lines = File.ReadAllLines("log.txt");
My question is how do I slice the array in to 2 parts per line so that I can add them to a list view of 2 columns? I was thinking perhaps a dictionary would be a good start..
Assuming this is C# rather than C, the following may do what you're looking for:
public class LogEntry{
public string Date;
public string Cost;
public LogEntry(string date,string cost){
Date=date;
Cost=cost;
}
}
...
// Grab the lines from the file:
string[] lines = File.ReadAllLines("log.txt");
// Create our output set:
LogEntry[] logEntries=new LogEntry[lines.Length];
// For each line in the file:
for(int i=0;i<lines.Length;i++){
// Split the line:
string[] linePieces=lines[i].Split('|');
// Safety check - make sure this is a line we want:
if(linePieces.Length!=2){
// No thanks!
continue;
}
// Create the entry:
logEntries[i]=new LogEntry( linePieces[0] , linePieces[1] );
}
// Do something with logEntries.
Note that this sort of processing should only be done with a relatively small log file. File.ReadAllLines("log.txt") becomes very inefficient with large files, at which point using a raw FileStream is more suitable.
var lines = File.ReadAllLines("log.txt").Select(l=> l.Split('|'));
var dictionary= lines.ToDictionary(x => x[0], y => y[1]);
Use a 2D array and string.Split('-')
string[] lines = File.ReadAllLines("log.txt");
//Create an array with lines.Length rows and 2 columns
string[,] table = new string[lines.Length,2];
for (int i = 0; i < lines.Length; i++)
{
//Split the line in 2 with the | character
string[] parts = lines[i].Split('|');
//Store them in the array, trimming the spaces off
table[i,0] = parts[0].Trim();
table[i,1] = parts[1].Trim();
}
Now you will have an array that looks like this:
table[date, cost]
You could use a dictionary so you only have to look up the date if you want to improve it. EDIT: As #Damith has done
Additionally, with LINQ you could simplify this into:
var table = File.ReadAllLines("log.txt").Select(s => s.Split('|')).ToDictionary(k => k[0].TrimEnd(' '), v => v[1].TrimStart(' '));
Which you can now easily get the results from the LINQ expression with:
foreach (KeyValuePair<string, string> kv in table)
{
Console.WriteLine("Key: " + kv.Key + " Value: " + kv.Value);
}
Also note if you do not need the spaces in your file you can omit the Trim()s
And just because this post was originally tagged C :)
Here is a C example:
With a data file (I called it temp.txt) that looks like this:
3/13/56 | 13.34
3/14/56 | 14.14
3/15/56 | 15.00
3/16/56 | 16.56
3/17/56 | 17.87
3/18/56 | 18.34
3/19/56 | 19.31
3/20/56 | 20.01
3/21/56 | 21.00
This code will read it, parse it into a single 2 dim string array, char col[2][80][20];
#include <ansi_c.h>
int main()
{
int i;
char *buf;
char line[260];
char col[2][80][20];
FILE *fp;
fp = fopen("c:\\dev\\play\\temp.txt", "r");
i=-1;
while(fgets(line, 260, fp))
{
i++;
buf = strtok(line, "|");
if(buf) strcpy(col[0][i], buf);
buf = strtok(NULL, "|");
if(buf) strcpy(col[1][i], buf);
}
fclose(fp);
return 0;
}

Regex Pattern for filter out anything that doesn't Match

Using Regex.Replace(mystring, #"[^MV:0-9]", "") will remove any Letters that are not M,V,:, or 0-9 (\d could also be used) the problem is I want to remove anything that is not MV: then numbers.
I need to replace anything that is not this pattern with nothing:
Starting String | Wanted Result
---------------------------------------------------------
sdhfuiosdhusdhMV:1234567890sdfahuosdho | MV:1234567890
MV:2138911230989hdsafh89ash32893h8u098 | MV:2138911230989
809308ej0efj0934jf0934jf4fj84j8904jf09 | Null
123MV:1234321234mnnnio234324234njiojh3 | MV:1234321234
mdfmsdfuiovvvajio123oij213432ofjoi32mm | Null
But what I get with what I have is:
Starting String | Returned Result
---------------------------------------------------------
sdhfuiosdhusdhMV:1234567890sdfahuosdho | MV:1234567890
MV:2138911230989hdsafh89ash32893h8u098 | MV:213891123098989328938098
809308ej0efj0934jf0934jf4fj84j8904jf09 | 809308009340934484890409
123MV:1234321234mnnnio234324234njiojh3 | 123MV:12343212342343242343
mdfmsdfuiovvvajio123oij213432ofjoi32mm | mmvvv1232134232mm
And even if there is a Regex pattern for this would I be better off using something along the lines of:
if (Regex.IsMatch(strMyString, #"MV:"))
{
string[] strarMyString = Regex.Split(strMyString, #"MV:");
string[] strarNumbersAfterMV = Regex.Split(strarMyString[1], #"[^\d]");
string WhatIWant = strarNumbersAfterMV[0]
}
If I went with the Latter option would there be away to have:
string[] strarNumbersAfterMV = Regex.Split(strarMyString[1], #"[^\d]");
Only make one split at the first change from numbers? (It will always start with number following the MV:)
Can't you just do:
string matchedText = null;
var match = Regex.Match(myString, #"MV:[0-9]+");
if (match.Success)
{
matchedText = Value;
}
Console.WriteLine((matchedText == null) ? "Not found" : matchedText);
That should give you exactly what you need.

sorting when name includes letters and numeric digits

I have following array
[0] = GB_22_T0001.jpg
[1] = GB_22_T0002.jpg
[2] = GB_22_T0003.jpg
[3] = GB_22_T0006.jpg
[4] = GB_22_T0007.jpg
[5] = GB_22_T0008.jpg
[6] = GB_22_T0009.jpg
[7] = GB_22_T00010.jpg
[8] = GB_22_T00011.jpg
[9] = GB_22_T00012.jpg
[10] = GB_22_T00013.jpg
I have put this items in a listbox and noticed that 'GB_22_T00010' comes straight after 'GB_22_T0001' instead of 'GB_22_T0002'
Seems to be a common issue with c# but cannot find a common answer to the problem.
I tried sorting the array with Array.sort(data) and also tried LinQ's OrderBy method but none of them helps.
Anyone with a solution?
This is my code to sort a string having both alpha and numeric characters.
First, this extension method :
public static IEnumerable<string> AlphanumericSort(this IEnumerable<string> me)
{
return me.OrderBy(x => Regex.Replace(x, #"\d+", m => m.Value.PadLeft(50, '0')));
}
Then, simply use it anywhere in your code like this :
List<string> test = new List<string>() { "The 1st", "The 12th", "The 2nd" };
test = test.AlphanumericSort();
How does it works ? By replaceing with zeros :
Original | Regex Replace | The | Returned
List | Apply PadLeft | Sorting | List
| | |
"The 1st" | "The 001st" | "The 001st" | "The 1st"
"The 12th" | "The 012th" | "The 002nd" | "The 2nd"
"The 2nd" | "The 002nd" | "The 012th" | "The 12th"
Works with multiples numbers :
Alphabetical Sorting | Alphanumeric Sorting
|
"Page 21, Line 42" | "Page 3, Line 7"
"Page 21, Line 5" | "Page 3, Line 32"
"Page 3, Line 32" | "Page 21, Line 5"
"Page 3, Line 7" | "Page 21, Line 42"
Hope that's will help.
GB_22_T0001 is a string not a number. So it's sorted lexicographically instead of numerically. So you need to parse a part of the string to an int.
var ordered = array.Select(Str => new { Str, Parts=Str.Split('_') })
.OrderBy(x => int.Parse(x.Parts.Last().Substring(1)))
.Select(x => x.Str);
Split('_') splits the string into substrings on a delimiter _. The last substring contains your numeric value. Then i use String.Substring to take only the numeric part(remove the starting T) for int.Parse. This integer is used for Enumerable.OrderBy. The last step is to select just the string instead of the anonymous type.
Edit: Here is the version that supports Paths:
var ordered = array.Select(str => {
string fileName = Path.GetFileNameWithoutExtension(str);
string[] parts = fileName.Split('_');
int number = int.Parse(parts.Last().Substring(1));
return new{ str, fileName, parts, number };
})
.OrderBy(x => x.number)
.Select(x => x.str);
Windows has a built-in comparison function that you can use to compare strings like this (mix of strings and numbers): StrCmpLogicalW
You can use it as the guts of a IComparer to do your sorting.
This blog entry has many details about this: http://gregbeech.com/blog/natural-sort-order-of-strings-and-files
It works really well.
Edit: The implementation I used based on the above blog:
public sealed class NaturalStringComparer : IComparer<string>
{
public static readonly NaturalStringComparer Default = new NaturalStringComparer();
public int Compare(string x, string y)
{
return SafeNativeMethods.StrCmpLogicalW(x, y);
}
}
[SuppressUnmanagedCodeSecurity]
internal static class SafeNativeMethods
{
[DllImport("shlwapi.dll", CharSet = CharSet.Unicode, ExactSpelling = true)]
public static extern int StrCmpLogicalW(string psz1, string psz2);
}
Then to be used using LINQ:
var sortedItems = items.OrderBy(i => i, new NaturalStringComparer());

Categories

Resources