Get specific char from a string request - c#

I need to get 4 variable from a string
That's a resquest someone can make to a server :
String request= "Jouer_un_bulletin <nbT> <mise> <numeros_grilleA> <numeros_grilleB>"
I need to get nbt, mise, numeros_grilleA and numeros_grilleB

You can try using regular expressions:
String request = "Jouer_un_bulletin <nbT> <mise> <numeros_grilleA> <numeros_grilleB>";
String pattern = #"(?<=<)[^<>]*(?=>)";
String[] prms = Regex
.Matches(request, pattern)
.OfType<Match>()
.Select(match => match.Value)
.ToArray();
Test:
// nbT
// mise
// numeros_grilleA
// numeros_grilleB
Console.Write(String.Join(Environment.NewLine, prms));

You need to parse the request. How to do this is entirely based on what you expect to receive and how you validate it. Assuming (and this is a big assumption) that you will always get the format above, you can use String.Split to split the string on the open bracket character, and then take the component pieces (ignoring the first one) and trim off the close bracket and any additional spaces. These will be your variables. This is not, by any means, a good way of sending data, and you should at a minimum validate this data a lot before using it.
The very basic (and by all means this is terrible code you shouldn't use) concept is:
String request= "Jouer_un_bulletin <nbT> <mise> <numeros_grilleA> <numeros_grilleB>";
var pieces = request.Split('<');
var strList = new List<string>();
for(int i = 1 ; i < pieces.Length; i++)
{
strList.Add(pieces[i].Trim(' ','>'));
}

Related

How to find one of many possible substrings in a larger string?

I have a simple problem, but I could not find a simple solution yet.
I have a string containing for example this
UNB+123UNH+234BGM+345DTM+456
The actual string is lots larger, but you get the idea
now I have a set of values I need to find in this string
for example UNH and BGM and DTM and so on
So I need to search in the large string, and find the position of the first set of values.
something like this (not existing but to explain the idea)
string[] chars = {"UNH", "BGM", "DTM" };
int pos = test.IndexOfAny(chars);
in this case pos would be 8 because from all 3 substrings, UNH is the first occurrence in the variable test
What I actually trying to accomplish is splitting the large string into a list of strings, but the delimiter can be one of many values ("BGM", "UNH", "DTM")
So the result would be
UNB+123
UNH+234
BGM+345
DTM+456
I can off course build a loop that does IndexOf for each of the substrings, and then remember the smallest value, but that seems so inefficient. I am hoping for a better way to do this
EDIT
the substrings to search for are always 3 letters, but the text in between can be anything at all with any length
EDIT
It are always 3 alfanumeric characters, and then anything can be there, also lots of + signs
You will find more problems with EDI than just splitting into corresponding fields, what about conditions or multiple values or lists?. I recommend you to take a look at EDI.net
EDIT:
EDIFact is a format pretty complex to just use regex, as I mentioned before, you will have conditions for each format/field/process, you will need to catch the whole field in order to really parse it, means as example DTM can have one specific datetime format and in another EDI can have a DateTime format totally different.
However, this is the structure of a DTM field:
DTM DATE/TIME/PERIOD
Function: To specify date, and/or time, or period.
010 C507 DATE/TIME/PERIOD M 1
2005 Date or time or period function code
qualifier M an..3
2380 Date or time or period text C an..35
2379 Date or time or period format code C an..3
So you will have always something like 'DTM+d3:d35:d3' to search for.
Really, it doesn't worth the struggle, use EDI.net, create your own POCO classes and work from there.
Friendly reminder that EDIFact changes every 6 months on Europe.
If the separators can be any one of UNB, UNH, BGM, or DTM, the following Regex could work:
foreach (Match match in Regex.Matches(input, #"(UNB|UNH|BGM|DTM).+?(?=(UNB|UNH|BGM|DTM)|$)"))
{
Console.WriteLine(match.Value);
}
Explanation:
(UNB|UNH|BGM|DTM) matches either of the separators
.+? matches any string with at least one character (but as short as possible)
(?=(UNB|UNH|BGM|DTM)|$) matches if either a separator follows or if the string ends there - the match is however not included in the value.
It sounds like the other answer recognises the format - you should definitely consider a library specifically for parsing this format!
If you're intent on parsing it yourself, you could simply find the index of your identifiers in the string, determine the first 2 by position, and use those positions to Substring the original input
var input = "UNB+123UNH+234BGM+345DTM+456";
var chars = new[]{"UNH", "BGM", "DTM" };
var indexes = chars.Select(c => new{Length=c.Length,Position= input.IndexOf(c)}) // Get position and length of each input
.Where(x => x.Position>-1) // where there is actually a match
.OrderBy(x =>x.Position) // put them in order of the position in the input
.Take(2) // only interested in first 2
.ToArray(); // make it an array
if(indexes.Length < 2)
throw new Exception("Did not find 2");
var result = input.Substring(indexes[0].Position + indexes[0].Length, indexes[1].Position - indexes[0].Position - indexes[0].Length);
Live example: https://dotnetfiddle.net/tDiQLG
There is already a lot of answers here, but I took the time to write mine so might as well post it even if it's not as elegant.
The code assumes all tags are accounted for in the chars array.
string str = "UNB+123UNH+234BGM+345DTM+456";
string[] chars = { "UNH", "BGM", "DTM" };
var locations = chars.Select(o => str.IndexOf(o)).Where(i => i > -1).OrderBy(o => o);
var resultList = new List<string>();
for(int i = 0;i < locations.Count();i++)
{
var nextIndex = locations.ElementAtOrDefault(i + 1);
nextIndex = nextIndex > 0 ? nextIndex : str.Length;
nextIndex = nextIndex - locations.ElementAt(i);
resultList.Add(str.Substring(locations.ElementAt(i), nextIndex));
}
This is a fairly efficient O(n) solution using a HashSet
It's extremely simple, low allocations, more efficient than regex, and doesn't need a library
Given
private static HashSet<string> _set;
public static IEnumerable<string> Split(string input)
{
var last = 0;
for (int i = 0; i < input.Length-3; i++)
{
if (!_set.Contains(input.Substring(i, 3))) continue;
yield return input.Substring(last, i - last);
last = i;
}
yield return input.Substring(last);
}
Usage
_set = new HashSet<string>(new []{ "UNH", "BGM", "DTM" });
var results = Split("UNB+123UNH+234BGM+345DTM+456");
foreach (var item in results)
Console.WriteLine(item);
Output
UNB+123
UNH+234
BGM+345
DTM+456
Full Demo Here
Note : You could get this faster with a simple sorted tree, but would require more effort

How can i analise millions of strings that merge into each other?

I have millions of strings, around 8GB worth of HEX; each string is 3.2kb in length.
Each of these strings contains multiple parts of data I need to extract.
This is an example of one such string:
GPGGA,104644.091,,,,,0,0,,,M,,M,,*43$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header Test.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ$GPGGA,104645.091,,,,,0,0,,,M,,M,,*42$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header Test.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ ÿÿ!ÿÿ"ÿÿ#ÿÿ$ÿÿ%ÿÿ&ÿÿ'ÿÿ(ÿÿ)ÿÿ*ÿÿ+ÿÿ,ÿÿ-ÿÿ.ÿÿ/ÿÿ0ÿÿ1ÿÿ$GPGGA,104646.091,,,,,0,0,,,M,,M,,*41$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header Test2ÿÿ3ÿÿ4ÿÿ5ÿÿ6ÿÿ7ÿÿ8ÿÿ9ÿÿ:ÿÿ;ÿÿ<ÿÿ=ÿÿ>ÿÿ?ÿÿ#ÿÿAÿÿBÿÿCÿÿDÿÿEÿÿFÿÿGÿÿHÿÿIÿÿJÿÿ$GPGGA,104647.091,,,,,0,0,,,M,,M,,*40$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header TestKÿÿLÿÿMÿÿNÿÿOÿÿPÿÿQÿÿRÿÿSÿÿTÿÿUÿÿVÿÿWÿÿXÿÿYÿÿZÿÿ[ÿÿ\ÿÿ]ÿÿ^ÿÿ_ÿÿ`ÿÿaÿÿbÿÿcÿÿ$GPGGA,104648.091,,,,,0,0,,,M,,M,,*4F$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header Testdÿÿeÿÿfÿÿgÿÿhÿÿiÿÿjÿÿkÿÿlÿÿmÿÿnÿÿoÿÿpÿÿqÿÿrÿÿsÿÿtÿÿuÿÿvÿÿwÿÿxÿÿyÿÿzÿÿ{ÿÿ|ÿÿ$GPGGA,104649.091,,,,,0,0,,,M,,M,,*4E$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header Test}ÿÿ~ÿÿ.ÿÿ€ÿÿ.ÿÿ‚ÿÿƒÿÿ„ÿÿ…ÿÿ†ÿÿ‡ÿÿˆÿÿ‰ÿÿŠÿÿ‹ÿÿŒÿÿ.ÿÿŽÿÿ.ÿÿ.ÿÿ‘ÿÿ’ÿÿ“ÿÿ”ÿÿ•ÿÿ$GPGGA,104650.091,,,,,0,0,,,M,,M,,*46$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Head
as you can see it is pretty much this repeated:
GPGGA,104644.091,,,,,0,0,,,M,,M,,*43$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header Test.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ$GPGGA,104645.091,,,,,0,0,,,M,,M,,*42$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header Test.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ ÿÿ!ÿÿ"ÿÿ#ÿÿ$ÿÿ%ÿÿ&ÿÿ'ÿÿ(ÿÿ)ÿÿ*ÿÿ+ÿÿ,ÿÿ-ÿÿ.ÿÿ/ÿÿ0ÿÿ1ÿÿ
I want to separate this string into two lists like this:
_GPSList
$GPGGA,104644.091,,,,,0,0,,,M,,M,,*43
$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*
$GPVTG,0.00,T,,M,0.00,N,0.00,K,N
_WavList
32HeaderTest.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ
32HeaderTest.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ ÿÿ!ÿÿ"ÿÿ#ÿÿ$ÿÿ%ÿÿ&ÿÿ'ÿÿ(ÿÿ)ÿÿ*ÿÿ+ÿÿ,ÿÿ-ÿÿ.ÿÿ/ÿÿ0ÿÿ1ÿÿ
Issue 1:
This repetition isn't containing within a single string, it overflows into the next string. so if some data crosses the end and start of two strings how to I deal with that?
Issue 2: How do I analyse the string and extract only the parts I need?
The solution I'm providing is not a complete answer but more like an idea which might help you get what you want.
Everything else which I present is an assumption on my behalf.
//Assuming your data is stored in a file "yourdatafile"
//Splitting all the text on "$" assuming this will separate GPSData
string[] splittedstring = File.ReadAllText("yourdatafile").Split('$');
//I found an extra string lingering in the sample you provided
//because I splitted on "$", so you gotta take that into account
var GPSList = new List<string>();
var WAVList = new List<string>();
foreach (var str in splittedstring)
{
//So if the string contains "Header" we would want to separate it from GPS data
if (str.Contains("Header"))
{
string temp = str.Remove(str.IndexOf("Header"));
int indexOfAsterisk = temp.LastIndexOf("*");
string stringBeforeAsterisk = str.Substring(0, indexOfAsterisk + 1);
string stringAfterAsterisk = str.Replace(stringBeforeAsterisk, "");
WAVList.Add(stringAfterAsterisk);
GPSList.Add("$" + stringBeforeAsterisk);
}
else
GPSList.Add("$" + str);
}
This provides the exact output as you need, only exception is with that extra string. Also some non-standard characters might look like black blocks.

Parse Line and Break it into Variables

I have a text file that contain only the FULL version number of an application that I need to extract and then parse it into separate Variables.
For example lets say the version.cs contains 19.1.354.6
Code I'm using does not seem to be working:
char[] delimiter = { '.' };
string currentVersion = System.IO.File.ReadAllText(#"C:\Applicaion\version.cs");
string[] partsVersion;
partsVersion = currentVersion.Split(delimiter);
string majorVersion = partsVersion[0];
string minorVersion = partsVersion[1];
string buildVersion = partsVersion[2];
string revisVersion = partsVersion[3];
Altough your problem is with the file, most likely it contains other text than a version, why dont you use Version class which is absolutely for this kind of tasks.
var version = new Version("19.1.354.6");
var major = version.Major; // etc..
What you have works fine with the correct input, so I would suggest making sure there is nothing else in the file you're reading.
In the future, please provide error information, since we can't usually tell exactly what you expect to happen, only what we know should happen.
In light of that, I would also suggest looking into using Regex for parsing in the future. In my opinion, it provides a much more flexible solution for your needs. Here's an example of regex to use:
var regex = new Regex(#"([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9])");
var match = regex.Match("19.1.354.6");
if (match.Success)
{
Console.WriteLine("Match[1]: "+match.Groups[1].Value);
Console.WriteLine("Match[2]: "+match.Groups[2].Value);
Console.WriteLine("Match[3]: "+match.Groups[3].Value);
Console.WriteLine("Match[4]: "+match.Groups[4].Value);
}
else
{
Console.WriteLine("No match found");
}
which outputs the following:
// Match[1]: 19
// Match[2]: 1
// Match[3]: 354
// Match[4]: 6

Regex matching dynamic words within an html string

I have an html string to work with as follows:
string html = new MvcHtmlString(item.html.ToString()).ToHtmlString();
There are two different types of text I need to match although very similar. I need the initial ^^ removed and the closing |^^ removed. Then if there are multiple clients I need the ^ separating clients changed to a comma(,).
^^Client One- This text is pretty meaningless for this task, but it will exist in the real document.|^^
^^Client One^Client Two^Client Three- This text is pretty meaningless for this task, but it will exist in the real document.|^^
I need to be able to match each client and make it bold.
Client One- This text is pretty meaningless for this task, but it will exist in the real document.
Client One, Client Two, Client Three- This text is pretty meaningless for this task, but it will exist in the real document.
A nice stack over flow user provided the following but I could not get it to work or find any matches when I tested it on an online regex tester.
const string pattern = #"\^\^(?<clients>[^-]+)(?<text>-.*)\|\^\^";
var result = Regex.Replace(html, pattern,
m =>
{
var clientlist = m.Groups["clients"].Value;
var newClients = string.Join(",", clientlist.Split('^').Select(s => string.Format("<strong>{0}</strong>", s)));
return newClients + m.Groups["text"];
});
I am very new to regex so any help is appreciated.
I'm new to C# so forgive me if I make rookie mistakes :)
const string pattern = #"\^\^([^-]+)(-[^|]+)\|\^\^";
var temp = Regex.Replace(html, pattern, "<strong>$1</strong>$2");
var result = Regex.Replace(temp, #"\^", "</strong>, <strong>");
I'm using $1 even though MSDN is vague about using that syntax to reference subgroups.
Edit: if it's possible that the text after - contains a ^ you can do this:
var result = Regex.Replace(temp, #"\^(?=.*-)", "</strong>, <strong>");

cutting from string in C#

My strings look like that: aaa/b/cc/dd/ee . I want to cut first part without a / . How can i do it? I have many strings and they don't have the same length. I tried to use Substring(), but what about / ?
I want to add 'aaa' to the first treeNode, 'b' to the second etc. I know how to add something to treeview, but i don't know how can i receive this parts.
Maybe the Split() method is what you're after?
string value = "aaa/b/cc/dd/ee";
string[] collection = value.Split('/');
Identifies the substrings in this instance that are delimited by one or more characters specified in an array, then places the substrings into a String array.
Based on your updates related to a TreeView (ASP.Net? WinForms?) you can do this:
foreach(string text in collection)
{
TreeNode node = new TreeNode(text);
myTreeView.Nodes.Add(node);
}
Use Substring and IndexOf to find the location of the first /
To get the first part:
// from memory, need to test :)
string output = String.Substring(inputString, 0, inputString.IndexOf("/"));
To just cut the first part:
// from memory, need to test :)
string output = String.Substring(inputString,
inputString.IndexOf("/"),
inputString.Length - inputString.IndexOf("/");
You would probably want to do:
string[] parts = "aaa/b/cc/dd/ee".Split(new char[] { '/' });
Sounds like this is a job for... Regular Expressions!
One way to do it is by using string.Split to split your string into an array, and then string.Join to make whatever parts of the array you want into a new string.
For example:
var parts = input.Split('/');
var processedInput = string.Join("/", parts.Skip(1));
This is a general approach. If you only need to do very specific processing, you can be more efficient with string.IndexOf, for example:
var processedInput = input.Substring(input.IndexOf('/') + 1);

Categories

Resources