I want to split column data into Different column - c#

i have data in a column which i want to split into different column.
data in column is not consistent.
eg:-
974/mt (ICD TKD)
974/mt (+AD 91.27/mt, ICD/TKD)
970-980/mt
970-980/mt
i have tried with substring but not found any solution
OUTPUT SHOULD BE:-
min |max | unit | description
-------------------------
NULL | 974 | /mt | ICD TKD
NULL | 974 | /mt |+AD 91.27/mt, ICD/TKD
970 | 980 | /mt |NULL

You can use Regex to parse the information, and then add columns with the parsed data.
Assumptions (due to lack of clarity in OP)
Min Value is optional
If present, Min Value is succeeded by a "/", followed by Max Value
Description is optional
Since OP haven't mentioned what to assume when Min Value is not available, I have used string type for Min/Max values, but should be ideally replaced by apt DataType.
public Sample Split(string columnValue)
{
var regex = new Regex(#"(?<min>\d+-)?(?<max>\d+)(?<unit>[\/a-zA-Z]+)\s?(\((?<description>(.+))\))?",RegexOptions.Compiled);
var match = regex.Match(columnValue);
if(match.Success)
{
return new Sample
{
Min = match.Groups["min"].Value,
Max = match.Groups["max"].Value,
Unit = match.Groups["unit"].Value,
Description = match.Groups["description"].Value
};
}
return default;
}
public class Sample
{
public string Min{get;set;}
public string Max{get;set;}
public string Unit{get;set;}
public string Description{get;set;}
}
For Example,
var list = new []
{
#"974/mt (ICD TKD)",
#"974/mt (+AD 91.27/mt, ICD/TKD)",
#"970-980/mt",
"970-980/mt"
};
foreach(var item in list)
{
var result = Split(item);
Console.WriteLine($"Min={result.Min},Max={result.Max},Unit={result.Unit},Description={result.Description}");
}
Output
Min=,Max=974,Unit=/mt,Description=ICD TKD
Min=,Max=974,Unit=/mt,Description=+AD 91.27/mt, ICD/TKD
Min=970-,Max=980,Unit=/mt,Description=
Min=970-,Max=980,Unit=/mt,Description=

Related

How to convert from a string to a Flags enum format in C#

I have this enum:
[Flags]
public enum Countries
{
None = 0,
USA = 1,
Mexico = 2,
Canada = 4,
Brazil = 8,
Chile = 16
}
I receive in input strings like these:
string selectedCountries = "Usa, Brazil, Chile";
how to convert it (in C#) back to:
var myCountries = Countries.Usa | Countries.Brazil | Countries.Chile;
Use Enum.Parse.
e.g. Countries c = (Countries)Enum.Parse(typeof(Countries), "Usa, Brazil...");
This seems to work for me assuming your country string is separated by a comma:
private static Countries ConvertToCountryEnum(string country)
{
return country.Split(',').Aggregate(Countries.None, (seed, c) => (Countries)Enum.Parse(typeof(Countries), c, true) | seed);
}
Actually I realized that it is easier than what i thought.
All I need to do is convert that string into an int, in my case, or a long in general, and cast it to Countries.
It will convert that number into the expected format.
In other words:
(Countries) 25 = Countries.Usa | Countries.Brazil | Countries.Chile;

Processing a text file where the fields are not consistent

A vendor is providing a delimited text file but the file can and likely will be custom for each customer. So if the specification provides 100 fields I may only receive 10 fields.
My concern is the overhead of each loop. In all I am using a while and 2 for loops just for the header and there will at least as many for the detail.
My answer is as follows:
using (StreamReader sr = new StreamReader(flName))
{
//Process first line to get field names
flHeader = sr.ReadLine().Split(charDelimiters);
//Check first field to determine header or detail file
if (flHeader[0].ToUpper() == "ORDERID")
{
header = true;
} else if (flHeader[0].ToUpper() == "ORDERITEMID"){
detail = true;
}
}
//Use TextFieldParser to read and parse files
using (TextFieldParser parser = new TextFieldParser(flName))
{
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(delimiters);
while (!parser.EndOfData)
{
string[] fields = parser.ReadFields();
//Send read line to header or detail processor
if (header == true)
{
if (flHeader[0] != fields[0])
{
ProcessHeader(fields);
}
}
if (detail == true)
{
if (flHeader[0] != fields[0])
{
ProcessDetail(fields);
}
}
}
//Header Processor snippet
//Declare header class
Data.BLL.OrderExportHeader_BLL OrderHeaderBLL = new Data.BLL.OrderExportHeader_BLL();
foreach (string field in fields)
{
int fldCnt = fields.Count();
//Loop through each field then use the switch to determine which field is to be filled in
for (int flds = 0; flds < fldCnt; flds++ )
{
string strField = field.Trim();
switch (flHeader[flds].ToUpper())
{
case "ORDERID":
OrderHeaderBLL.OrderID = strField;
break;
}
}
}
//header file
OrderID ManufacturerID CustomerID SalesRepID PONumber OrderDate CustomerName CustomerNumber RepNumber Discount Terms ShipVia Notes ShipToCompanyName ShipToContactName ShipToContactPhone ShipToFax ShipToContactEmail ShipToAddress1 ShipToAddress2 ShipToCity ShipToState ShipToZip ShipToCountry ShipDate BillingAddress1 BillingAddress2 BillingCity BillingState BillingZip BillingCountry FreightTerm PriceLevel OrderType OrderStatus IsPlaced ContactName ContactPhone ContactEmail ContactFax Exported ExportDate Source ContainerName ContainerCubes Origin MarketName FOB SubTotal OrderTotal TaxRate TaxTotal ShippingTotal IsDeleted IsContainer OrderGUID CancelDate DoNotShipBefore WrittenByName WrittenForName WrittenForRepNumber CatalogCode CatalogName ShipToCode
491975 18 0 2621 1234 7/17/2014 RepZio 2499174 0 Test 561-351-7416 max#repzio.com 465 Ocean Ridge Way Juno Beach FL 33408 7/18/2014 465 Ocean Ridge Way Juno Beach FL 33408 USA 0 ShopZio True Max Fraser 561-351-7416 max#repzio.com False ShopZio 0.00 ShopZio 1500.0000 1500.0000 0.000 0.0000 0.0000 False False 63960a7b-86b7-47a2-ad11-9763a6b52fd0 7/31/2014 7/18/2014
Your sample data is the key, and your sample is currently obscure, but I think it matches the description that follows.
Per your example of 10 fields out of a possible 100.
In parsing each line, you only need to split in into 10 fields. It looks like you are delimited by whitespace, but you have a problem in that fields can contain embedded whitespace. Perhaps your data is actually tab delimited in which case you are ok.
For simplicity, I am going to assume your 100 fields are name 'fld0', 'fld1', ..., 'fld99'
Now, assuming the received file contains this header
fld10, fld50, fld0, fld20, fld80, fld70, fld0, fld90, fld50, fld60
and a line of data looks like
Alpha Bravo Charlie Delta Echo Foxtrot Golf Hotel India Juliet
e.g.
split[0] = "Alpha", split[1] = "Bravo", etc.
You parse the header and find that the indexes in your master list of 100 fields are 10,50,0 etc.
So you build a lookupFld array with these index value, i.e., lookupFld[0] = 10, lookupFld[1] = 50, etc
Now, as you process each line, split into 10 fields and you have an immediate indexed lookup of the correct corresponding field in your master field list.
Now MasterList[0] = "fld0", MasterList[1] = "fld1", ..., MasterList[99] = "fld99"
for (ii=0; ii<lookupFld.count; ++ii)
{
// MasterField[lookupFld[ii]] is represented by with split[ii]
// when ii = 0
// lookupFld[0] is 10
// so MasterField[10] /* fld10 */ is represented by split[0] /* alpha */
}

Regex Pattern for filter out anything that doesn't Match

Using Regex.Replace(mystring, #"[^MV:0-9]", "") will remove any Letters that are not M,V,:, or 0-9 (\d could also be used) the problem is I want to remove anything that is not MV: then numbers.
I need to replace anything that is not this pattern with nothing:
Starting String | Wanted Result
---------------------------------------------------------
sdhfuiosdhusdhMV:1234567890sdfahuosdho | MV:1234567890
MV:2138911230989hdsafh89ash32893h8u098 | MV:2138911230989
809308ej0efj0934jf0934jf4fj84j8904jf09 | Null
123MV:1234321234mnnnio234324234njiojh3 | MV:1234321234
mdfmsdfuiovvvajio123oij213432ofjoi32mm | Null
But what I get with what I have is:
Starting String | Returned Result
---------------------------------------------------------
sdhfuiosdhusdhMV:1234567890sdfahuosdho | MV:1234567890
MV:2138911230989hdsafh89ash32893h8u098 | MV:213891123098989328938098
809308ej0efj0934jf0934jf4fj84j8904jf09 | 809308009340934484890409
123MV:1234321234mnnnio234324234njiojh3 | 123MV:12343212342343242343
mdfmsdfuiovvvajio123oij213432ofjoi32mm | mmvvv1232134232mm
And even if there is a Regex pattern for this would I be better off using something along the lines of:
if (Regex.IsMatch(strMyString, #"MV:"))
{
string[] strarMyString = Regex.Split(strMyString, #"MV:");
string[] strarNumbersAfterMV = Regex.Split(strarMyString[1], #"[^\d]");
string WhatIWant = strarNumbersAfterMV[0]
}
If I went with the Latter option would there be away to have:
string[] strarNumbersAfterMV = Regex.Split(strarMyString[1], #"[^\d]");
Only make one split at the first change from numbers? (It will always start with number following the MV:)
Can't you just do:
string matchedText = null;
var match = Regex.Match(myString, #"MV:[0-9]+");
if (match.Success)
{
matchedText = Value;
}
Console.WriteLine((matchedText == null) ? "Not found" : matchedText);
That should give you exactly what you need.

sorting when name includes letters and numeric digits

I have following array
[0] = GB_22_T0001.jpg
[1] = GB_22_T0002.jpg
[2] = GB_22_T0003.jpg
[3] = GB_22_T0006.jpg
[4] = GB_22_T0007.jpg
[5] = GB_22_T0008.jpg
[6] = GB_22_T0009.jpg
[7] = GB_22_T00010.jpg
[8] = GB_22_T00011.jpg
[9] = GB_22_T00012.jpg
[10] = GB_22_T00013.jpg
I have put this items in a listbox and noticed that 'GB_22_T00010' comes straight after 'GB_22_T0001' instead of 'GB_22_T0002'
Seems to be a common issue with c# but cannot find a common answer to the problem.
I tried sorting the array with Array.sort(data) and also tried LinQ's OrderBy method but none of them helps.
Anyone with a solution?
This is my code to sort a string having both alpha and numeric characters.
First, this extension method :
public static IEnumerable<string> AlphanumericSort(this IEnumerable<string> me)
{
return me.OrderBy(x => Regex.Replace(x, #"\d+", m => m.Value.PadLeft(50, '0')));
}
Then, simply use it anywhere in your code like this :
List<string> test = new List<string>() { "The 1st", "The 12th", "The 2nd" };
test = test.AlphanumericSort();
How does it works ? By replaceing with zeros :
Original | Regex Replace | The | Returned
List | Apply PadLeft | Sorting | List
| | |
"The 1st" | "The 001st" | "The 001st" | "The 1st"
"The 12th" | "The 012th" | "The 002nd" | "The 2nd"
"The 2nd" | "The 002nd" | "The 012th" | "The 12th"
Works with multiples numbers :
Alphabetical Sorting | Alphanumeric Sorting
|
"Page 21, Line 42" | "Page 3, Line 7"
"Page 21, Line 5" | "Page 3, Line 32"
"Page 3, Line 32" | "Page 21, Line 5"
"Page 3, Line 7" | "Page 21, Line 42"
Hope that's will help.
GB_22_T0001 is a string not a number. So it's sorted lexicographically instead of numerically. So you need to parse a part of the string to an int.
var ordered = array.Select(Str => new { Str, Parts=Str.Split('_') })
.OrderBy(x => int.Parse(x.Parts.Last().Substring(1)))
.Select(x => x.Str);
Split('_') splits the string into substrings on a delimiter _. The last substring contains your numeric value. Then i use String.Substring to take only the numeric part(remove the starting T) for int.Parse. This integer is used for Enumerable.OrderBy. The last step is to select just the string instead of the anonymous type.
Edit: Here is the version that supports Paths:
var ordered = array.Select(str => {
string fileName = Path.GetFileNameWithoutExtension(str);
string[] parts = fileName.Split('_');
int number = int.Parse(parts.Last().Substring(1));
return new{ str, fileName, parts, number };
})
.OrderBy(x => x.number)
.Select(x => x.str);
Windows has a built-in comparison function that you can use to compare strings like this (mix of strings and numbers): StrCmpLogicalW
You can use it as the guts of a IComparer to do your sorting.
This blog entry has many details about this: http://gregbeech.com/blog/natural-sort-order-of-strings-and-files
It works really well.
Edit: The implementation I used based on the above blog:
public sealed class NaturalStringComparer : IComparer<string>
{
public static readonly NaturalStringComparer Default = new NaturalStringComparer();
public int Compare(string x, string y)
{
return SafeNativeMethods.StrCmpLogicalW(x, y);
}
}
[SuppressUnmanagedCodeSecurity]
internal static class SafeNativeMethods
{
[DllImport("shlwapi.dll", CharSet = CharSet.Unicode, ExactSpelling = true)]
public static extern int StrCmpLogicalW(string psz1, string psz2);
}
Then to be used using LINQ:
var sortedItems = items.OrderBy(i => i, new NaturalStringComparer());

How to parse delimited files to be compared

There is a text file formatted like the example below that I need to search for a students class name:
Michael | Straham | Eng101(4.0) | Mth303
Jacob | Black | SCI 210 (2.3) | Eng101
Ian | Summers | Mth303(3.30) | Sci 210
The delimited symbols are ( | )
The class names are "ENG101, SCI210, MTH303." I would like to search each line from the text for that class name and somehow index them so that they can be compared. The end result would be this:
ENG101:
Michael Straham, Jacob Black
Please assist. Thanks in advance!
I'm assuming you're already reading in the input line by line.
You can use String.Split() to accomplish (the first part of) what you are trying to do.
For example, the following code
String s1 = "Michael | Straham | Eng101(4.0) | Mth303";
char[] separators = { '|' };
String[] values = s1.Split(separators);
would give you an array of 4 strings ( "Michael", "Straham", "Eng101(4.0)", "Mth303"). You can then analyze the values array to see who is in which class. I'd probably have code roughly that looks like this (in pseudocode):
foreach (line in input)
{
String s1 = line;
char[] separators = { '|' };
String[] values = s1.Split(separators);
String firstName = values[0];
String lastName = values[1];
for (i = 2, i < values.length)
{
if (values[i] looks like "ENG101")
{
add firstName lastName to "ENG101" student list
}
else if (values[i] looks like "MTH303")
{
....
}
....
}
}

Categories

Resources