counting a string with special characters in a string in c#

counting a string with special characters in a string in c# - c#

I would like to count a string (search term) in another string (logfile).
Splitting the string with the method Split and searching the array afterwards is too inefficient for me, because the logfile is very large.
In the net I found the following possibility, which worked quite well so far. However,
count = Regex.Matches(_editor.Text, txtLookFor.Text, RegexOptions.IgnoreCase).Count;
I am now running into another problem there, that I get the following error when I count a string in the format of "Nachricht erhalten (".
Errormessage:
System.ArgumentException: "Nachricht erhalten (" analysed - not enough )-characters.

You need to escape the ( symbol as it has a special function in regular expressions:
var test = Regex.Matches("Nachricht erhalten (3)", #"Nachricht erhalten \(", RegexOptions.IgnoreCase).Count;
If you do this by user input where the user is not familiar with regular expressions you probably easier off using IndexOf in a while loop, where you keep using the new index found in the last loop. Which might also be a bit better on performance than a regular expression. Example:
var test = "This is a test";
var searchFor = "is";
var count = 0;
var index = test.IndexOf(searchFor, 0);
while (index != -1)
{
++count;
index = test.IndexOf(searchFor, index + searchFor.Length);
}

Related

How to find one of many possible substrings in a larger string?

I have a simple problem, but I could not find a simple solution yet.
I have a string containing for example this
UNB+123UNH+234BGM+345DTM+456
The actual string is lots larger, but you get the idea
now I have a set of values I need to find in this string
for example UNH and BGM and DTM and so on
So I need to search in the large string, and find the position of the first set of values.
something like this (not existing but to explain the idea)
string[] chars = {"UNH", "BGM", "DTM" };
int pos = test.IndexOfAny(chars);
in this case pos would be 8 because from all 3 substrings, UNH is the first occurrence in the variable test
What I actually trying to accomplish is splitting the large string into a list of strings, but the delimiter can be one of many values ("BGM", "UNH", "DTM")
So the result would be
UNB+123
UNH+234
BGM+345
DTM+456
I can off course build a loop that does IndexOf for each of the substrings, and then remember the smallest value, but that seems so inefficient. I am hoping for a better way to do this
EDIT
the substrings to search for are always 3 letters, but the text in between can be anything at all with any length
EDIT
It are always 3 alfanumeric characters, and then anything can be there, also lots of + signs

You will find more problems with EDI than just splitting into corresponding fields, what about conditions or multiple values or lists?. I recommend you to take a look at EDI.net
EDIT:
EDIFact is a format pretty complex to just use regex, as I mentioned before, you will have conditions for each format/field/process, you will need to catch the whole field in order to really parse it, means as example DTM can have one specific datetime format and in another EDI can have a DateTime format totally different.
However, this is the structure of a DTM field:
DTM DATE/TIME/PERIOD
Function: To specify date, and/or time, or period.
010 C507 DATE/TIME/PERIOD M 1
2005 Date or time or period function code
qualifier M an..3
2380 Date or time or period text C an..35
2379 Date or time or period format code C an..3
So you will have always something like 'DTM+d3:d35:d3' to search for.
Really, it doesn't worth the struggle, use EDI.net, create your own POCO classes and work from there.
Friendly reminder that EDIFact changes every 6 months on Europe.

If the separators can be any one of UNB, UNH, BGM, or DTM, the following Regex could work:
foreach (Match match in Regex.Matches(input, #"(UNB|UNH|BGM|DTM).+?(?=(UNB|UNH|BGM|DTM)|$)"))
{
Console.WriteLine(match.Value);
}
Explanation:
(UNB|UNH|BGM|DTM) matches either of the separators
.+? matches any string with at least one character (but as short as possible)
(?=(UNB|UNH|BGM|DTM)|$) matches if either a separator follows or if the string ends there - the match is however not included in the value.

It sounds like the other answer recognises the format - you should definitely consider a library specifically for parsing this format!
If you're intent on parsing it yourself, you could simply find the index of your identifiers in the string, determine the first 2 by position, and use those positions to Substring the original input
var input = "UNB+123UNH+234BGM+345DTM+456";
var chars = new[]{"UNH", "BGM", "DTM" };
var indexes = chars.Select(c => new{Length=c.Length,Position= input.IndexOf(c)}) // Get position and length of each input
.Where(x => x.Position>-1) // where there is actually a match
.OrderBy(x =>x.Position) // put them in order of the position in the input
.Take(2) // only interested in first 2
.ToArray(); // make it an array
if(indexes.Length < 2)
throw new Exception("Did not find 2");
var result = input.Substring(indexes[0].Position + indexes[0].Length, indexes[1].Position - indexes[0].Position - indexes[0].Length);
Live example: https://dotnetfiddle.net/tDiQLG

There is already a lot of answers here, but I took the time to write mine so might as well post it even if it's not as elegant.
The code assumes all tags are accounted for in the chars array.
string str = "UNB+123UNH+234BGM+345DTM+456";
string[] chars = { "UNH", "BGM", "DTM" };
var locations = chars.Select(o => str.IndexOf(o)).Where(i => i > -1).OrderBy(o => o);
var resultList = new List<string>();
for(int i = 0;i < locations.Count();i++)
{
var nextIndex = locations.ElementAtOrDefault(i + 1);
nextIndex = nextIndex > 0 ? nextIndex : str.Length;
nextIndex = nextIndex - locations.ElementAt(i);
resultList.Add(str.Substring(locations.ElementAt(i), nextIndex));
}

This is a fairly efficient O(n) solution using a HashSet
It's extremely simple, low allocations, more efficient than regex, and doesn't need a library
Given
private static HashSet<string> _set;
public static IEnumerable<string> Split(string input)
{
var last = 0;
for (int i = 0; i < input.Length-3; i++)
{
if (!_set.Contains(input.Substring(i, 3))) continue;
yield return input.Substring(last, i - last);
last = i;
}
yield return input.Substring(last);
}
Usage
_set = new HashSet<string>(new []{ "UNH", "BGM", "DTM" });
var results = Split("UNB+123UNH+234BGM+345DTM+456");
foreach (var item in results)
Console.WriteLine(item);
Output
UNB+123
UNH+234
BGM+345
DTM+456
Full Demo Here
Note : You could get this faster with a simple sorted tree, but would require more effort

Can't get Messagebox to display List

I'm trying to have a MessageBox appear that shows the changelog inside my C# program
This is the text file.
Current Version 0.2.3.4
Added Hash decoder
Attempted to change code into OOP design
Cleaned up random code with ReSharper
Version 0.1.3.4 - 8/29/2016
No change logs before this point
The goal is to get the text between Current Version 0.2.3.4 and Version 0.1.3.4 - 8/29/2016
I've had tried doing this with the code below
Regex changeLogMatch = new Regex("Current Version\\s.*?\\n(.*?\\n)+Version\\s.*?\\s\\-\\s\\d");
Match changeLogInfo = changeLogMatch.Match(changeLog);
int changeLogCount = Regex.Matches(changeLog, "Current Version\\s.*?\\n(.*?\\n)+Version\\s.*?\\s\\-\\s\\d").Count;
List<string> changeLogList = new List<string>();
for (int i = 0; i < changeLogCount; i++)
{
changeLogList.Add(changeLogInfo.Groups[1].Captures[i].ToString());
}
string changeLogString = string.Join(Environment.NewLine, changeLogList);
Console.WriteLine(changeLogString);
MessageBox.Show("New Changes" + Environment.NewLine + changeLogString
, "New Version Found: " + newVersion);
The issue I'm having is that changeLogString only displays Added Hash decoder and nothing else.
Any ideas on what I'm doing wrong?

In your case changeLogCount always be 1. So in changeLogList will be always changeLogInfo.Groups[1].Captures[0].ToString() what is refers to Added Hash decoder string.
You are checking for "Current Version\\s.*?\\n((.*?\\n)+)Version\\s.*?\\s\\-\\s\\d" regex, it is matching the whole string and matches 1 time. But the first group (.*?\\n) matches 3 times. So, if you are checking for count of matches of full regex - you will get 1, if you want to get number of captures of first group - you will get 3.
So you should fix your code in the following manner:
Regex changeLogMatch = new Regex("Current Version\\s.*?\\n(.*?\\n)+Version\\s.*?\\s\\-\\s\\d");
Match changeLogInfo = changeLogMatch.Match(changeLog);
string changeLogString = string.Join(Environment.NewLine, changeLogInfo.Groups[1].Captures.OfType<Capture>());
Console.WriteLine(changeLogString);
Note, that you have no need to iterate through captures - the required string will be stored in changeLogString.

How to replace text within a string based on their indices

I have a string of text coming from a database. I also have a list of links from a database which have a start index and length correstponding to my string. I want to append the links within the text to be links
<a href=...
I.e
var stringText = "Hello look at http://www.google.com and this hello.co.uk";
This would have in the database
Link:http://www.google.com
Index:14
Length:21
Link:hello.co.uk
Index:45
Length:11
I eventually want
var stringText = "Hello look at http://www.google.com and this hello.co.uk";
There may be many links in the string, so I need a way of looping through these links and replacing based on the index and length. I would just loop through and replace based on the link (string.replace) but causes issues if there are the same link twice
var stringText = "www.google.com www.google.com www.google.com";
www.google.com would become a link and the second time would make the link within the link... a link.
I can obviously find the first index, but if I change it at that point, the index's are no longer valid.
Is there an easy way to do this or am I missing something?

You simply need to remove the subject from source using String.Remove, then use String.Insert to insert your replacement string.
As #hogan suggested in comments you need to sort the replacement list and do the replacement in reverse order (from last to first) to make it work.
If you need to perform many replacements in single string I recommend StringBuilder for performance reasons.

I would use regural expressions.
Take a look at this: Regular expression to find URLs within a string
It might help.

Here's solution without Remove or Insert, or regexes. Just addition.
string stringText = "Hello look at http://www.google.com and this hello.co.uk!";
var replacements = new [] {
new { Link = "http://www.google.com", Index = 14, Length = 21 },
new { Link = "hello.co.uk", Index = 45, Length = 11 } };
string result = "";
for (int i = 0; i <= replacements.Length; i++)
{
int previousIndex = i == 0 ? 0 : replacements[i - 1].Index + replacements[i - 1].Length;
int nextIndex = i < replacements.Length ? replacements[i].Index : replacements[i - 1].Index + replacements[i - 1].Length + 1;
result += stringText.Substring(previousIndex, nextIndex - previousIndex);
if (i < replacements.Length)
{
result += String.Format("{1}", replacements[i].Link,
stringText.Substring(replacements[i].Index, replacements[i].Length));
}
}

Retrieving string from a source string which is between 2 strings

It might be very simple but would like to know that,is there any alternative to find a string between a source string which by passing it start and end string
the following is achievable by this code ,but this there any better code than this as i think this will slow the system if used in many conditions.
string strSource = "The LoadUserProfile call failed with the following error: ";
string strResult = string.Empty;
string strStart = "loaduserProfile";
string strEnd = "error";
int startindex = strSource.IndexOf(strStart, StringComparison.OrdinalIgnoreCase);
int endindex = strSource.LastIndexOf(strEnd, StringComparison.OrdinalIgnoreCase);
startindex = startindex + strStart.Length;
int endindex = endindex - startindex;
strResult = strSource.Substring(startindex, endindex);
Thanks
D.Mahesh

Use regex and find the group value, but not sure if it will be faster or slower.
Here is an example code to implement this using Regex (no VS, so excuse if there is syntax error)
string pattern = Regex.Escape(strStart) + "(?<middle>[\s\S]*)" + Regex.Escape(strEnd);
Match match = Regex.Match(strSource, pattern);
if (match.Success)
{
// read the group value matches the name "middle"
......
}

Your code is pretty spot-on string manipulation. I don't think it can be made faster algorithmically. You can also do this using a regular expression, but I don't believe it will end up being faster in that case as well.
If you don't need case insensitivity, changing StringComparison.OrdinalIgnoreCase to StringComparison.Ordinal should provide some speedup.
Otherwise, you probably have to look elsewhere for speed improvements.

Remove characters after specific character in string, then remove substring?

I feel kind of dumb posting this when this seems kind of simple and there are tons of questions on strings/characters/regex, but I couldn't find quite what I needed (except in another language: Remove All Text After Certain Point).
I've got the following code:
[Test]
public void stringManipulation()
{
String filename = "testpage.aspx";
String currentFullUrl = "http://localhost:2000/somefolder/myrep/test.aspx?q=qvalue";
String fullUrlWithoutQueryString = currentFullUrl.Replace("?.*", "");
String urlWithoutPageName = fullUrlWithoutQueryString.Remove(fullUrlWithoutQueryString.Length - filename.Length);
String expected = "http://localhost:2000/somefolder/myrep/";
String actual = urlWithoutPageName;
Assert.AreEqual(expected, actual);
}
I tried the solution in the question above (hoping the syntax would be the same!) but nope. I want to first remove the queryString which could be any variable length, then remove the page name, which again could be any length.
How can I get the remove the query string from the full URL such that this test passes?

For string manipulation, if you just want to kill everything after the ?, you can do this
string input = "http://www.somesite.com/somepage.aspx?whatever";
int index = input.IndexOf("?");
if (index >= 0)
input = input.Substring(0, index);
Edit: If everything after the last slash, do something like
string input = "http://www.somesite.com/somepage.aspx?whatever";
int index = input.LastIndexOf("/");
if (index >= 0)
input = input.Substring(0, index); // or index + 1 to keep slash
Alternately, since you're working with a URL, you can do something with it like this code
System.Uri uri = new Uri("http://www.somesite.com/what/test.aspx?hello=1");
string fixedUri = uri.AbsoluteUri.Replace(uri.Query, string.Empty);

To remove everything before the first /
input = input.Substring(input.IndexOf("/"));
To remove everything after the first /
input = input.Substring(0, input.IndexOf("/") + 1);
To remove everything before the last /
input = input.Substring(input.LastIndexOf("/"));
To remove everything after the last /
input = input.Substring(0, input.LastIndexOf("/") + 1);
An even more simpler solution for removing characters after a specified char is to use the String.Remove() method as follows:
To remove everything after the first /
input = input.Remove(input.IndexOf("/") + 1);
To remove everything after the last /
input = input.Remove(input.LastIndexOf("/") + 1);

Here's another simple solution. The following code will return everything before the '|' character:
if (path.Contains('|'))
path = path.Split('|')[0];
In fact, you could have as many separators as you want, but assuming you only have one separation character, here is how you would get everything after the '|':
if (path.Contains('|'))
path = path.Split('|')[1];
(All I changed in the second piece of code was the index of the array.)

The Uri class is generally your best bet for manipulating Urls.

To remove everything before a specific char, use below.
string1 = string1.Substring(string1.IndexOf('$') + 1);
What this does is, takes everything before the $ char and removes it. Now if you want to remove the items after a character, just change the +1 to a -1 and you are set!
But for a URL, I would use the built in .NET class to take of that.

Request.QueryString helps you to get the parameters and values included within the URL
example
string http = "http://dave.com/customers.aspx?customername=dave"
string customername = Request.QueryString["customername"].ToString();
so the customername variable should be equal to dave
regards

I second Hightechrider: there is a specialized Url class already built for you.
I must also point out, however, that the PHP's replaceAll uses regular expressions for search pattern, which you can do in .NET as well - look at the RegEx class.

you can use .NET's built in method to remove the QueryString.
i.e., Request.QueryString.Remove["whatever"];
here whatever in the [ ] is name of the querystring which you want to
remove.
Try this...
I hope this will help.

You can use this extension method to remove query parameters (everything after the ?) in a string
public static string RemoveQueryParameters(this string str)
{
int index = str.IndexOf("?");
return index >= 0 ? str.Substring(0, index) : str;
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

counting a string with special characters in a string in c# - c#

Related

How to find one of many possible substrings in a larger string?

Can't get Messagebox to display List

How to replace text within a string based on their indices

Retrieving string from a source string which is between 2 strings

Remove characters after specific character in string, then remove substring?

Categories

Resources