C# locating where the * is in a string separated by pipes - c#

I have to find where a * is at when it could be none at all , 1st position | 2nd position | 3rd position.
The positions are separated by pipes |
Thus
No * wildcard would be
`ABC|DEF|GHI`
However, while that could be 1 scenario, the other 3 are
string testPosition1 = "*|DEF|GHI";
string testPosition2 = "ABC|*|GHI";
string testPosition3 = "ABC|DEF|*";
I gather than I should use IndexOf , but it seems like I should incorporate | (pipe) to know the position ( not just the length as the values could be long or short in each of the 3 places. So I just want to end up knowing if * is in first, second or third position ( or not at all )
Thus I was doing this but i'm not going to know about if it is before 1st or 2nd pipe
if(testPosition1.IndexOf("*") > 0)
{
// Look for pipes?
}

There are lots of ways you could approach this. The most readable might actually just be to do it the hard way (i.e. scan the string to find the first '*' character, keeping track of how many '|' characters you see along the way).
That said, this could be a similarly readable and more concise:
int wildcardPosition = Array.IndexOf(testPosition1.Split('|'), "*");
Returns -1 if not found, otherwise 0-based index for which segment of the '|' delimited string contains the wildcard string.
This only works if the wildcard is exactly the one-character string "*". If you need to support other variations on that, you will still want to split the string, but then you can loop over the array looking for whatever criteria you need.

You can try with linq splitting the string at the pipe character and then getting the index of the element that contains just a *
var x = testPosition2.Split('|').Select((k, i) => new { text = k, index = i}).FirstOrDefault(p => p.text == "*" );
if(x != null) Console.WriteLine(x.index);
So the first line starts splitting the string at the pipe creating an array of strings. This sequence is passed to the Select extension that enumerates the sequence passing the string text (k) and the index (i). With these two parameters we build a sequences of anonymous objects with two properties (text and index). FirstOrDefault extract from this sequence the object with text equals to * and we can print the property index of that object.

The other answers are fine (and likely better), however here is another approach, the good old fashioned for loop and the try-get pattern
public bool TryGetStar(string input, out int index)
{
var split = input.Split('|');
for (index = 0; index < split.Length; index++)
if (split[index] == "*")
return true;
return false;
}
Or if you were dealing with large strings and trying to save allocations. You could remove the Split entirely and use a single parse O(n)
public bool TryGetStar(string input, out int index)
{
index = 0;
for (var i = 0; i < input.Length; i++)
if (input[i] == '|') index++;
else if (input[i] == '*') return true;
return false;
}
Note : if performance was a consideration, you could also use unsafe and pointers, or Span<Char> which would afford a small amount of efficiency.

Try DotNETFiddle:
testPosition.IndexOf("*") - testPosition.Replace("|","").IndexOf("*")
Find the index of the wildcard ("*") and see how far it moves if you remove the pipe ("|") characters. The result is a zero-based index.

From the question you have the following code segment:
if(testPosition1.IndexOf("*") > 0)
{
}
If you're now inside the if statement, you're sure the asterisk exists.
From that point, an efficient solution could be to check the first two chars, and the last two chars.
if (testPosition1.IndexOf("*") > 0)
{
if (testPosition1[0] == '*' && testPosition[1] == '|')
{
// First position.
}
else if (testPosition1[testPosition.Length - 1] == '*' && testPosition1[testPosition.Length - 2] == '|')
{
// Third (last) position.
}
else
{
// Second position.
}
}
This assumes that no more than one * can exist, and also assumes that if an * exist, it can only be surrounded by pipes. For example, I assume an input like ABC|DEF|G*H is invalid.
If you want to remove this assumptions, you could do a one-pass loop over the string and keeping track with the necessary information.

Related

how to add a sign between each letter in a string in C#?

I have a task, in which i have to write a function called accum, which transforms given string into something like this:
Accumul.Accum("abcd"); // "A-Bb-Ccc-Dddd"
Accumul.Accum("RqaEzty"); // "R-Qq-Aaa-Eeee-Zzzzz-Tttttt-Yyyyyyy"
Accumul.Accum("cwAt"); // "C-Ww-Aaa-Tttt"
So far I only converted each letter to uppercase and... Now that I am writing about it, I think it could be easier for me to - firstly multiply the number of each letter and then add a dash there... Okay, well let's say I already multiplied the number of them(I will deal with it later) and now I need to add the dash. I tried several manners to solve this, including: for and foreach(and now that I think of it, I can't use foreach if I want to add a dash after multiplying the letters) with String.Join, String.Insert or something called StringBuilder with Append(which I don't exactly understand) and it does nothing to the string.
One of those loops that I tried was:
for (int letter = 0; letter < s.Length-1; letter += 2) {
if (letter % 2 == 0) s.Replace("", "-");
}
and
for (int letter = 0; letter < s.Length; letter++) {
return String.Join(s, "-");
}
The second one returns "unreachable code" error. What am I doing wrong here, that it does nothing to the string(after uppercase convertion)? Also, is there any method to copy each letter, in order to increase the number of them?
As you say string.join can be used as long as an enumerable is created instead of a foreach. Since the string itself is enumerable, you can use the Linq select overload which includes an index:
var input = "abcd";
var res = string.Join("-", input.Select((c,i) => Char.ToUpper(c) + new string(Char.ToLower(c),i)));
(Assuming each char is unique or can be used. e.g. "aab" would become "A-Aa-Bbb")
Explanation:
The Select extension method takes a lambda function as parameter with c being a char and i the index. The lambda returns an uppercase version of the char (c) folowed by a string of the lowercase char of the index length (new string(char,length)), (which is an empty string for the first index). Finally the string.join concatenates the resulting enumeration with a - between each element.
Use this code.
string result = String.Empty;
for (int i = 0; i < s.Length; i++)
{
char c = s[i];
result += char.ToUpper(c);
result += new String(char.ToLower(c), i);
if (i < s.Length - 1)
{
result += "-";
}
}
It will be better to use StringBuilder instead of strings concatenation, but this code can be a bit more clear.
Strings are immutable, which means that you cannot modify them once you created them. It means that Replace function return a new string that you need to capture somehow:
s = s.Replace("x", "-");
you currently are not assigning the result of the Replace method anywhere, that's why you don't see any results
For the future, the best way to approach problems like this one is not to search for the code snippet, but write down step by step algorithm of how you can achieve the expected result in plain English or some other pseudo code, e.g.
Given I have input string 'abcd' which should turn into output string 'A-Bb-Ccc-Dddd'.
Copy first character 'a' from the input to Buffer.
Store the index of the character to Index.
If Buffer has only one character make it Upper Case.
If Index is greater then 1 trail Buffer with Index-1 lower case characters.
Append dash '-' to the Buffer.
Copy Buffer content to Output and clear Buffer.
Copy second character 'b' from the input to Buffer.
...
etc.
Aha moment often happens on the third iteration. Hope it helps! :)

Checking two conditions while iterating over an array of characters

Having learned the basics/fundamentals of the C# programming language, I am now trying to tackle my first real-world problem: Write a program that, given a string, finds its longest sub-string that contains at least one upper-case letter but no digits (and then displays the length of this longest sub-string). This could be two qualifying conditions for an acceptable password, for example...
I have written the code below all by myself, which means there is probably performance issues, but that is for later consideration. I am stuck at the point where I have to make sure there is no digit in the sub-string. The comments in my code show my thinking while writing the program...
I thought first I should check to see if there is an upper-case letter in an extracted sub-string, and if there was, then I can store that qualifying sub-string in a list and then break out of the loop. But now I wonder how to check the no-digit condition at the same time in the same sub-string?
I am trying to keep it neat and simple (as I said I have only just started writing programs longer than a few lines!) so I thought doing a nested loop to check every character against !char.IsNumber(letter) might not be optimal. Or should I first check to see if there is no digit, then see if there is at least a capital character?
I feel confused how to achieve both restrictions, so I would appreciate some help in resolving this issue. I would also appreciate any observations or suggestions you might have. For example, is it OK to store my sub-strings in a list? Should I make a dictionary of some sort? Is my all-possible-sub-string extraction nested-loop optimal?
p.s. Some bits are still unfinished; for example I am still to implement the last step to find the longest sub-string and display to the user its length...
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace PasswordRestriction
{
class Program /// Write a program that, given a string, finds the longest substring that is a valid password and returns its length.
{
static void Main(string[] args)
{
// Ask the user for an alphanumeric string.
Console.WriteLine("Enter a string of alphanumeric values:");
// Receive the user input as string.
string password = Console.ReadLine();
// Print the length of the longest qualifying substring of the user string.
Console.WriteLine("Length of the longest qualifying substring:\n" + Solution(password).Length );
// Prevent the console window from closing.
Console.ReadLine();
}
/// The method that exracts the longest substring that is a valid password.
/// Note that a substring is a 'contiguous' segment of a string.
public static string Solution(string str)
{
// Only allow non-empty strings.
if ( String.IsNullOrEmpty(str) )
{
return "";
}
else
{
// Only allow letters and digits.
if ( str.All(char.IsLetterOrDigit) )
{
// A list for containing qualifying substrings.
List<string> passwordList = new List<string>();
// Generate all possible substrings. Note that
// string itself is not a substring of itself!
for (int i = 1; i < str.Length; i++)
{
for (int j = 0; j <= (str.Length-i); j++)
{
string subStr = str.Substring(j, i);
Console.WriteLine(subStr);
bool containsNum = false;
bool containsUpper = false;
// Convert the substrings to arrays of characters with ToCharArray.
// This method is called on a string and returns a new character array.
// You can manipulate this array in-place, which you cannot do with a string.
char[] subStrArray = subStr.ToCharArray();
// Go through every character in each substring.
// If there is at least one uppercase letter and
// no digits, put the qualifying substring in a list.
for (int k = 0; k < subStrArray.Length; k++)
{
char letter = subStrArray[k];
if ( char.IsNumber(letter) )
{
containsNum = true;
break;
}
if ( char.IsUpper(letter) )
{
containsUpper = true;
}
if ( containsUpper && (containsNum == false) && (k == subStrArray.Length - 1) )
{
Console.WriteLine("Found the above legit password!");
passwordList.Add(subStr);
}
}
}
}
//Find the longest stored string in the list.
//if (passwordList.Count != 0)
//{
string maxLength = passwordList[0];
foreach (string s in passwordList)
{
if (s.Length > maxLength.Length)
{
maxLength = s;
}
}
//}
// Return the qualifying substring.
return maxLength;
}
else
{
return "aaaaaaaaaa";
}
}
}
}
}
A good problem for Linq
contains no digits - Split on digits
at least one upper-case letter - Where + Any
longest (not shortest) OrderByDescending
longest (just one) - FirstOrDefault
Implementation
string source = ....
var result = source
.Split('0', '1', '2', '3', '4', '5', '6', '7', '8', '9')
.Where(line => line.Any(c => c >= 'A' && c <= 'Z')) // or char.IsUpper(c)
.OrderByDescending(line => line.Length)
.FirstOrDefault(); // null if there're no such substrings at all
As an alternative to the Linq answer, and if I understand you correctly, this is what I'd do, replacing the content of the str.All condition:
string qualifier;
string tempQualifier;
bool containsUpper = false;
for (int i = 0; i < str.Length(); i++) {
tempQualifier += str[i];
if (char.IsNumber(str[i])) {
if (containsUpper) {
if (tempQualifier.Length > qualifier.Length && tempQualifier.Length != str.Length) {
qualifier = tempQualifier;
}
containsUpper = false;
}
tempQualifier = "";
} else if (char.IsUpper(str[i])) {
containsUpper = true;
}
}
return qualifier;
This would go through the string, building up the substring until it comes across a number. If the substring contains an uppercase letter and is longer than any previous qualifier, it is stored as the new qualifier (also assuming that it isn't the length of the string provided). Apologies if I've made any mistakes (I'm not well versed in C#).
It's much longer than the Linq answer, but I thought it'd be handy for you to see the process broken down so you can understand it better.

Bitwise OR on strings for large strings in c#

I have two strings(with 1's and 0's) of equal lengths(<=500) and would like to apply Logical OR on these strings.
How should i approach on this. I'm working with c#.
When i consider the obvious solution, reading each char and applying OR | on them, I have to deal with apx, 250000 strings each with 500 length. this would kill my performance.
Performance is my main concern.
Thanks in advance!
This is fastest way:
string x="";
string y="";
StringBuilder sb = new StringBuilder(x.Length);
for (int i = 0; i < x.Length;i++ )
{
sb.Append(x[i] == '1' || y[i] == '1' ? '1' : '0');
}
string result = sb.ToString();
Since it was mentioned that speed is a big factor, it would be best to use bit-wise operations.
Take a look at an ASCII table:
The character '0' is 0x30, or 00110000 in binary.
The character '1' is 0x31, or 00110001 in binary.
Only the last bit of the character is different. As such - we can safely say that performing a bitwise OR on the characters themselves will produce the correct character.
Another important thing we can do is do to optimize speed is to use a StringBuilder, initialized to the initial capacity of our string. Or even better: we can reuse our StringBuilder for multiple operations, although we have to ensure the StringBuilder has enough capacity.
With those optimizations considered, we can make this method:
string BinaryStringBitwiseOR(string a, string b, StringBuilder stringBuilder = null)
{
if (a.Length != b.Length)
{
throw new ArgumentException("The length of given string parameters didn't match");
}
if (stringBuilder == null)
{
stringBuilder = new StringBuilder(a.Length);
}
else
{
stringBuilder.Clear().EnsureCapacity(a.Length);
}
for (int i = 0; i < a.Length; i++)
{
stringBuilder.Append((char)(a[i] | b[i]));
}
return stringBuilder.ToString();
}
Note that this will work for all bit-wise operations you would like to perform on your strings, you only have to modify the | operator.
I've found this to be faster than all proposed solutions. It combines elements from #Gediminas and #Sakura's answers, but uses a pre-initialized char[] rather than a StringBuilder.
While StringBuilder is efficient at memory management, each Append operation requires some bookkeeping of the marker, and performs more actions than only an index into an array.
string x = ...
string y = ...
char[] c = new char[x.Length];
for (int i = 0; i < x.Length; i++)
{
c[i] = (char)(x[i] | y[i]);
}
string result = new string(c);
I have two strings(with 1's and 0's) of equal lengths(<=500) and would
like to apply Logical OR on these strings.
You can write a custom logical OR operator or function which takes two characters as input and produces result (e.g. if at least one of input character is '1' return '1' - otherwise return '0'). Apply this function to each character in your strings.
You can also look at this approach. You'd first need to convert each character to boolean (e.g. '1' corresponds to true), perform OR operation between two boolean values, convert back result to character '0' or '1' - depending if result of logical OR was false or true respectively. Then just append each result of this operation to each other.
You can use a Linq query to zip and then aggregate the results:
var a = "110010";
var b = "001110";
var result = a.Zip(b, (i, j) => i == '1' || j == '1' ? '1' : '0')
.Select(i => i + "").Aggregate((i, j) => i + j);
Basically, the Zip extension method, takes two sequences and apply an action on each corresponding elements of the two sequences. Then I use Select to cast from char to String and finally I aggregate the results from a sequence of strings (of "0" and "1") to a String.

Is there a fast extraction of specific value located inside tags in a slice from a list of strings possible?

I'm using a 2 step regex to extract the value of the first occurance of a specific marker inside a list of strings:
Regex regexComplete = new Regex(
#"MyNumberMarker"
+ #"[\d]+"
+ #"[\s]+Endmarker"
);
Regex regexOnlyNumber = new Regex(
#"MyNumberMarker"
+ #"[\d]+"
);
int indexmyNumber = eintraegeListe.FindIndex(
5,
10000,
x => regexComplete.IsMatch(x)
);
if (indexmyNumber >= 0)
{
int myNumber = 0;
string myNumberString = regexOnlyNumber.Match(regexComplete.Match(eintraegeListe[indexmyNumber]).Value).Value;
myNumberString = myNumberString.Replace("MyNumberMarker", "").Replace("\n", "").Replace("\r", "").Trim();
if (Int32.TryParse(myNumberString, out myNumber))
{
return myNumber;
}
}
As one can see the value I really want is located between "MyNumberMarker" and "Endmarker". It is in a specific part of the list which I search through with the findIndex command. Then I use regex to extract the complete value + tag and reduce it to "just" the begin tag and the value and then manually cut away the begin tag and all could be white spaces (including \n and \r).
Now this works quite fine as intended but if I do this a couple of thousand times it is quite slow in the end. Thus my question.
Is there any better (faster) way to do this?
As a note: eintraegeListe can have between 100 and 30000 entries.
For example if I have the following small list:
[0]This is a test
[1]22.09.2015 01:00:00
[2]Until 22.09.2015 03:00:00
[3]................................
[4]................................
[5]........ TESTDATA
[6]...............................
[7]................................
[8]MyNumberMarker519 Endmarker
[9]This is a small
[10]Slice of Test data with
[11]520 - 1 as data.
I would expect 519 to be returned.
Since you are returning a single item, the performance of code past FindIndex is irrelevant: it is executed only once, and it takes a single string, so it should complete in microseconds on any modern hardware.
The code that takes the bulk of CPU is in x => regexComplete.IsMatch(x) call. You can tell that this code is returning false most of the time, because the loop is over the first time it returns true.
This means that you should be optimizing for the negative case, i.e. returning false as soon as you can. One way to achieve this would be to look for "MyNumberMarker" before employing regex. If there is no marker, return false right away; otherwise, fall back on using the regex, and start from the position where you found the marker:
int indexmyNumber = eintraegeListe.FindIndex(
5,
10000,
x => {
// Scan the string for the marker in non-regex mode
int pos = x.IndexOf("MyNumberMarker", StringComparison.Ordinal);
// If the marker is not there, do not bother with regex, and return false
return pos < 0
? false
// Only if the marker is there, check the match with regex.
: regexComplete.IsMatch(x, pos);
}
);
You can actually merge the two regexps into 1 containing a capturing group that will let you access the sequence of digits directly via the group name (here, "number").
Regex regexComplete = new Regex(#"MyNumberMarker(?<number>\d+)\s+Endmarker");
Now, you do not need regexOnlyNumber.
Then, you can add a non-regex condition as in the other answer. Maybe this will be enough (the .Contains will be executed first and the whole expression should evaluate to false if the first condition is not met - see "short-circuit" evaluation) (IndexOf with StringComparison.Ordinal looks preferable anyway):
int indexmyNumber = eintraegeListe.FindIndex(5, 10000, x => x.Contains("MyNumberMarker") && regexComplete.IsMatch(x));
And then:
if (indexmyNumber >= 0)
{
int myNumber = 0;
string myNumberString = regexComplete.Match(eintraegeListe[indexmyNumber]).Groups["number"].Value;
if (Int32.TryParse(myNumberString, out myNumber))
{
return myNumber;
}
}

How to parse a numbered sequence from a List of filenames?

I would like to automatically parse a range of numbered sequences from an already sorted List<FileData> of filenames by checking which part of the filename changes.
Here is an example (file extension has already been removed):
First filename: IMG_0000
Last filename: IMG_1000
Numbered Range I need: 0000 and 1000
Except I need to deal with every possible type of file naming convention such as:
0000 ... 9999
20080312_0000 ... 20080312_9999
IMG_0000 - Copy ... IMG_9999 - Copy
8er_green3_00001 .. 8er_green3_09999
etc.
I would like the entire 0-padded range e.g. 0001 not just 1
The sequence number is 0-padded e.g. 0001
The sequence number can be located anywhere e.g. IMG_0000 - Copy
The range can start and end with anything i.e. doesn't have to start with 1 and end with 9999
Numbers may appear multiple times in the filename of the sequence e.g. 20080312_0000
Whenever I get something working for 8 random test cases, the 9th test breaks everything and I end up re-starting from scratch.
I've currently been comparing only the first and last filenames (as opposed to iterating through all filenames):
void FindRange(List<FileData> files, out string startRange, out string endRange)
{
string firstFile = files.First().ShortName;
string lastFile = files.Last().ShortName;
...
}
Does anyone have any clever ideas? Perhaps something with Regex?
If you're guaranteed to know the files end with the number (eg. _\d+), and are sorted, just grab the first and last elements and that's your range. If the filenames are all the same, you can sort the list to get them in order numerically. Unless I'm missing something obvious here -- where's the problem?
Use a regex to parse out the numbers from the filenames:
^.+\w(\d+)[^\d]*$
From these parsed strings, find the maximum length, and left-pad any that are less than the maximum length with zeros.
Sort these padded strings alphabetically. Take the first and last from this sorted list to give you your min and max numbers.
Firstly, I will assume that the numbers are always zero-padded so that they are the same length. If not then bigger headaches lie ahead.
Secondly, assume that the file names are exactly the same apart from the increment number component.
If these assumptions are true then the algorithm should be to look at each character in the first and last filenames to determine which same-positioned characters do not match.
var start = String.Empty;
var end = String.Empty;
for (var index = 0; index < firstFile.Length; index++)
{
char c = firstFile[index];
if (filenames.Any(filename => filename[index] != c))
{
start += firstFile[index];
end += lastFile[index];
}
}
// convert to int if required
edit: Changed to check every filename until a difference is found. Not as efficient as it could be but very simple and straightforward.
Here is my solution. It works with all of the examples that you have provided and it assumes the input array to be sorted.
Note that it doesn't look exclusively for numbers; it looks for a consistent sequence of characters that might differ across all of the strings. So if you provide it with {"0000", "0001", "0002"} it will hand back "0" and "2" as the start and end strings, since that's the only part of the strings that differ. If you give it {"0000", "0010", "0100"}, it will give you back "00" and "10".
But if you give it {"0000", "0101"}, it will whine since the differing parts of the string are not contiguous. If you would like this behavior modified so it will return everything from the first differing character to the last, that's fine; I can make that change. But if you are feeding it a ton of filenames that will have sequential changes to the number region, this should not be a problem.
public static class RangeFinder
{
public static void FindRange(IEnumerable<string> strings,
out string startRange, out string endRange)
{
using (var e = strings.GetEnumerator()) {
if (!e.MoveNext())
throw new ArgumentException("strings", "No elements.");
if (e.Current == null)
throw new ArgumentException("strings",
"Null element encountered at index 0.");
var template = e.Current;
// If an element in here is true, it means that index differs.
var matchMatrix = new bool[template.Length];
int index = 1;
string last = null;
while (e.MoveNext()) {
if (e.Current == null)
throw new ArgumentException("strings",
"Null element encountered at index " + index + ".");
last = e.Current;
if (last.Length != template.Length)
throw new ArgumentException("strings",
"Element at index " + index + " has incorrect length.");
for (int i = 0; i < template.Length; i++)
if (last[i] != template[i])
matchMatrix[i] = true;
}
// Verify the matrix:
// * There must be at least one true value.
// * All true values must be consecutive.
int start = -1;
int end = -1;
for (int i = 0; i < matchMatrix.Length; i++) {
if (matchMatrix[i]) {
if (end != -1)
throw new ArgumentException("strings",
"Inconsistent match matrix; no usable pattern discovered.");
if (start == -1)
start = i;
} else {
if (start != -1 && end == -1)
end = i;
}
}
if (start == -1)
throw new ArgumentException("strings",
"Strings did not vary; no usable pattern discovered.");
if (end == -1)
end = matchMatrix.Length;
startRange = template.Substring(start, end - start);
endRange = last.Substring(start, end - start);
}
}
}

Categories

Resources