Find word(s) between two values in a string - c#

I have a txt file as a string, and I need to find words between two characters and Ltrim/Rtrim everything else. It may have to be conditional because the two characters may change depending on the string.
Example:
car= (data between here I want) ;
car = (data between here I want) </value>
Code:
int pos = st.LastIndexOf("car=", StringComparison.OrdinalIgnoreCase);
if (pos >= 0)
{
server = st.Substring(0, pos);..............
}

This is a simple extension method I use:
public static string Between(this string src, string findfrom, string findto)
{
int start = src.IndexOf(findfrom);
int to = src.IndexOf(findto, start + findfrom.Length);
if (start < 0 || to < 0) return "";
string s = src.Substring(
start + findfrom.Length,
to - start - findfrom.Length);
return s;
}
With this you can use
string valueToFind = sourceString.Between("car=", "</value>")
You can also try this:
public static string Between(this string src, string findfrom,
params string[] findto)
{
int start = src.IndexOf(findfrom);
if (start < 0) return "";
foreach (string sto in findto)
{
int to = src.IndexOf(sto, start + findfrom.Length);
if (to >= 0) return
src.Substring(
start + findfrom.Length,
to - start - findfrom.Length);
}
return "";
}
With this you can give multiple ending tokens (their order is important)
string valueToFind = sourceString.Between("car=", ";", "</value>")

You could use regex
var input = "car= (data between here I want) ;";
var pattern = #"car=\s*(.*?)\s*;"; // where car= is the first delimiter and ; is the second one
var result = Regex.Match(input, pattern).Groups[1].Value;

Related

C# Get particular string between two string (multiple occurance)

how can I retrieve both string between STRING & END in this sentence
"This is STRING a222 END, and this is STRING b2838 END."
strings that I want to get:
a222
b2838
Following is my code, and i only manage to get first string which is a222
string myString = "This is STRING a222 END, and this is STRING b2838 END.";
int first = myString.IndexOf("STRING") + "STRING".Length;
int second= myString.LastIndexOf("END");
string result = St.Substring(first, second - first);
.
Here is the solution using Regular Expressions. Working Code here
var reg = new Regex("(?<=STRING ).*?(?= END)");
var matched = reg.Matches("This is STRING a222 END, and this is STRING b2838 END.");
foreach(var m in matched)
{
Console.WriteLine(m.ToString());
}
You can pass a value for startIndex to string.IndexOf(), you can use this while looping:
IEnumerable<string> Find(string input, string startDelimiter, string endDelimiter)
{
int first = 0, second;
do
{
// Find start delimiter
first = input.IndexOf(startDelimiter, startIndex: first) + startDelimiter.Length;
if (first == -1)
yield break;
// Find end delimiter
second = input.IndexOf(endDelimiter, startIndex: first);
if (second == -1)
yield break;
yield return input.Substring(first, second - first).Trim();
first = second + endDelimiter.Length + 1;
}
while (first < input.Length);
}
You can iterate over indexes,
string myString = "This is STRING a222 END, and this is STRING b2838 END.";
//Jump to starting index of each `STRING`
for(int i = myString.IndexOf("STRING");i > 0; i = myString.IndexOf("STRING", i+1))
{
//Get Index of each END
var endIndex = myString.Substring(i + "STARTING".Length).IndexOf("END");
//PRINT substring between STRING and END of each occurance
Console.WriteLine(myString.Substring(i + "STARTING".Length-1, endIndex));
}
.NET FIDDLE
In your case, STRING..END occurs multiple times, but you were getting index of only first STRING and last index of END which will return substring, starts with first STRING to last END.
i.e.
a222 END, and this is STRING b2838
You've already got some good answers but I'll add another that uses ReadOnlyMemory from .NET core. That provides a solution that doesn't allocate new strings which can be nice. C# iterators are a common way to transform one sequence, of chars in this case, into another. This method would be used to transform the input string into sequence of ReadOnlyMemory each containing the tokens your after.
public static IEnumerable<ReadOnlyMemory<char>> Tokenize(string source, string beginPattern, string endPattern)
{
if (string.IsNullOrEmpty(source) ||
string.IsNullOrEmpty(beginPattern) ||
string.IsNullOrEmpty(endPattern))
yield break;
var sourceText = source.AsMemory();
int start = 0;
while (start < source.Length)
{
start = source.IndexOf(beginPattern, start);
if (-1 != start)
{
int end = source.IndexOf(endPattern, start);
if (-1 != end)
{
start += beginPattern.Length;
yield return sourceText.Slice(start, (end - start));
}
else
break;
start = end + endPattern.Length;
}
else
{
break;
}
}
}
Then you'd just call it like so to iterate over the tokens...
static void Main(string[] args)
{
const string Source = "This is STRING a222 END, and this is STRING b2838 END.";
foreach (var token in Tokenize(Source, "STRING", "END"))
{
Console.WriteLine(token);
}
}
string myString = "This is STRING a222 END, and this is STRING b2838 END.";
// Fix the issue based on #PaulF's comment.
if (myString.StartsWith("STRING"))
myString = $"DUMP {myString}";
var arr = myString.Split(new string[] { "STRING", "END" }, StringSplitOptions.RemoveEmptyEntries);
for (int i = 0; i < arr.Length; i++)
{
if(i%2 > 0)
{
// This is your string
Console.WriteLine(arr[i].Trim());
}
}

How to remove all but the first occurences of a character from a string?

I have a string of text and want to ensure that it contains at most one single occurrence of a specific character (,). Therefore I want to keep the first one, but simply remove all further occurrences of that character.
How could I do this the most elegant way using C#?
This works, but not the most elegant for sure :-)
string a = "12,34,56,789";
int pos = 1 + a.IndexOf(',');
return a.Substring(0, pos) + a.Substring(pos).Replace(",", string.Empty);
You could use a counter variable and a StringBuilder to create the new string efficiently:
var sb = new StringBuilder(text.Length);
int maxCount = 1;
int currentCount = 0;
char specialChar = ',';
foreach(char c in text)
if(c != specialChar || ++currentCount <= maxCount)
sb.Append(c);
text = sb.ToString();
This approach is not the shortest but it's efficient and you can specify the char-count to keep.
Here's a more "elegant" way using LINQ:
int commasFound = 0; int maxCommas = 1;
text = new string(text.Where(c => c != ',' || ++commasFound <= maxCommas).ToArray());
I don't like it because it requires to modify a variable from a query, so it's causing a side-effect.
Regular expressions are elegant, right?
Regex.Replace("Eats, shoots, and leaves.", #"(?<=,.*),", "");
This replaces every comma, as long as there is a comma before it, with nothing.
(Actually, it's probably not elegant - it may only be one line of code, but it may also be O(n^2)...)
If you don't deal with large strings and you reaaaaaaly like Linq oneliners:
public static string KeepFirstOccurence (this string #string, char #char)
{
var index = #string.IndexOf(#char);
return String.Concat(String.Concat(#string.TakeWhile(x => #string.IndexOf(x) < index + 1)), String.Concat(#string.SkipWhile(x=>#string.IndexOf(x) < index)).Replace(#char.ToString(), ""));
}
You could write a function like the following one that would split the string into two sections based on the location of what you were searching (via the String.Split() method) for and it would only remove matches from the second section (using String.Replace()) :
public static string RemoveAllButFirst(string s, string stuffToRemove)
{
// Check if the stuff to replace exists and if not, return the original string
var locationOfStuff = s.IndexOf(stuffToRemove);
if (locationOfStuff < 0)
{
return s;
}
// Calculate where to pull the first string from and then replace the rest of the string
var splitLocation = locationOfStuff + stuffToRemove.Length;
return s.Substring(0, splitLocation) + (s.Substring(splitLocation)).Replace(stuffToRemove,"");
}
You could simply call it by using :
var output = RemoveAllButFirst(input,",");
A prettier approach might actually involve building an extension method that handled this a bit more cleanly :
public static class StringExtensions
{
public static string RemoveAllButFirst(this string s, string stuffToRemove)
{
// Check if the stuff to replace exists and if not, return the
// original string
var locationOfStuff = s.IndexOf(stuffToRemove);
if (locationOfStuff < 0)
{
return s;
}
// Calculate where to pull the first string from and then replace the rest of the string
var splitLocation = locationOfStuff + stuffToRemove.Length;
return s.Substring(0, splitLocation) + (s.Substring(splitLocation)).Replace(stuffToRemove,"");
}
}
which would be called via :
var output = input.RemoveAllButFirst(",");
You can see a working example of it here.
static string KeepFirstOccurance(this string str, char c)
{
int charposition = str.IndexOf(c);
return str.Substring(0, charposition + 1) +
str.Substring(charposition, str.Length - charposition)
.Replace(c, ' ').Trim();
}
Pretty short with Linq; split string into chars, keep distinct set and join back to a string.
text = string.Join("", text.Select(c => c).Distinct());

Convert string to GSM 7-bit using C#

How can I convert a string to correct GSM encoded value to be sent to a mobile operator?
Below is a port of gnibbler's answer, slightly modified and with a detailed explantation.
Example:
string output = GSMConverter.StringToGSMHexString("Hello World");
// output = "48-65-6C-6C-6F-20-57-6F-72-6C-64"
Implementation:
// Data/info taken from http://en.wikipedia.org/wiki/GSM_03.38
public static class GSMConverter
{
// The index of the character in the string represents the index
// of the character in the respective character set
// Basic Character Set
private const string BASIC_SET =
"#£$¥èéùìòÇ\nØø\rÅåΔ_ΦΓΛΩΠΨΣΘΞ\x1bÆæßÉ !\"#¤%&'()*+,-./0123456789:;<=>?" +
"¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑܧ¿abcdefghijklmnopqrstuvwxyzäöñüà";
// Basic Character Set Extension
private const string EXTENSION_SET =
"````````````````````^```````````````````{}`````\\````````````[~]`" +
"|````````````````````````````````````€``````````````````````````";
// If the character is in the extension set, it must be preceded
// with an 'ESC' character whose index is '27' in the Basic Character Set
private const int ESC_INDEX = 27;
public static string StringToGSMHexString(string text, bool delimitWithDash = true)
{
// Replace \r\n with \r to reduce character count
text = text.Replace(Environment.NewLine, "\r");
// Use this list to store the index of the character in
// the basic/extension character sets
var indicies = new List<int>();
foreach (var c in text)
{
int index = BASIC_SET.IndexOf(c);
if(index != -1) {
indicies.Add(index);
continue;
}
index = EXTENSION_SET.IndexOf(c);
if(index != -1) {
// Add the 'ESC' character index before adding
// the extension character index
indicies.Add(ESC_INDEX);
indicies.Add(index);
continue;
}
}
// Convert indicies to 2-digit hex
var hex = indicies.Select(i => i.ToString("X2")).ToArray();
string delimiter = delimitWithDash ? "-" : "";
// Delimit output
string delimited = string.Join(delimiter, hex);
return delimited;
}
}
The code of Omar Didn't work for me. But I found The actual code that works:
public static string Encode7bit(string s)
{
string empty = string.Empty;
for (int index = s.Length - 1; index >= 0; --index)
empty += Convert.ToString((byte)s[index], 2).PadLeft(8, '0').Substring(1);
string str1 = empty.PadLeft((int)Math.Ceiling((Decimal)empty.Length / new Decimal(8)) * 8, '0');
List<byte> byteList = new List<byte>();
while (str1 != string.Empty)
{
string str2 = str1.Substring(0, str1.Length > 7 ? 8 : str1.Length).PadRight(8, '0');
str1 = str1.Length > 7 ? str1.Substring(8) : string.Empty;
byteList.Add(Convert.ToByte(str2, 2));
}
byteList.Reverse();
var messageBytes = byteList.ToArray();
var encodedData = "";
foreach (byte b in messageBytes)
{
encodedData += Convert.ToString(b, 16).PadLeft(2, '0');
}
return encodedData.ToUpper();
}

Replacing character after single quote

I am using the following C# code to modify a lowercase letter to uppercase after a single quote:
public virtual string FirstName
{
get { return _firstName; }
set
{
if (value != null)
{
int pos = value.IndexOf("'", 0);
int strlength = value.Length - 1;
if (pos >= 0 && pos != strlength)
{
string temp = value[pos + 1].ToString();
temp = temp.ToUpper();
value = value.Remove(pos + 1, 1);
value = value.Insert(pos + 1, temp);
}
}
}
}
To me this looks like overkill. Is there an easier way to achieve the desired result:
Value: Mc'donald
Expected: Mc'Donald
here is without regex
int pos = data.IndexOf("'");
if (pos >= 0 && pos < data.Length - 1)
{
StringBuilder sbl = new StringBuilder(data);
sbl[pos + 1] = char.ToUpper(sbl[pos + 1]);
data = sbl.ToString();
}
Since you're open to Regex, would this overload of the Regex.Replace do what you need?
Regex.Replace Method (String, MatchEvaluator)
Here's a modified version of the example given at the link above. I've changed it to use the '\w pattern and to return the match in upper case.
using System;
using System.Text.RegularExpressions;
class RegExSample
{
static string CapText(Match m)
{
// Return the match in upper case
return m.ToString().ToUpperInvariant();
}
static void Main()
{
string text = "Mc'donald";
System.Console.WriteLine("text=[" + text + "]");
Regex rx = new Regex(#"'\w");
string result = rx.Replace(text, new MatchEvaluator(RegExSample.CapText));
System.Console.WriteLine("result=[" + result + "]");
}
}
Perhaps regular expressions?
string value = "Mc'donald";
string found = Regex.Match(value, "'[\\w]").Value;
string result = value.Replace(found, found.ToUpper());
Console.WriteLine(result); // Mc'Donald

finding an string expression in an sentence

I have like a three word expression: "Shut The Door" and I want to find it in a sentence. Since They are kind of seperated by space what would be the best solution for it.
If you have the string:
string sample = "If you know what's good for you, you'll shut the door!";
And you want to find where it is in a sentence, you can use the IndexOf method.
int index = sample.IndexOf("shut the door");
// index will be 42
A non -1 answer means the string has been located. -1 means it does not exist in the string. Please note that the search string ("shut the door") is case sensitive.
Use build in Regex.Match Method for matching strings.
string text = "One car red car blue car";
string pat = #"(\w+)\s+(car)";
// Compile the regular expression.
Regex r = new Regex(pat, RegexOptions.IgnoreCase);
// Match the regular expression pattern against a text string.
Match m = r.Match(text);
int matchCount = 0;
while (m.Success)
{
Console.WriteLine("Match"+ (++matchCount));
for (int i = 1; i <= 2; i++)
{
Group g = m.Groups[i];
Console.WriteLine("Group"+i+"='" + g + "'");
CaptureCollection cc = g.Captures;
for (int j = 0; j < cc.Count; j++)
{
Capture c = cc[j];
System.Console.WriteLine("Capture"+j+"='" + c + "', Position="+c.Index);
}
}
m = m.NextMatch();
}
http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.match(v=vs.71).aspx
http://support.microsoft.com/kb/308252
if (string1.indexOf(string2) >= 0)
...
The spaces are nothing special, they are just characters, so you can find a string like this like yuo would find any other string in your sentence, for example using "indexOf" if you need the position, or just "Contains" if you need to know if it exists or not.
E.g.
string sentence = "foo bar baz";
string phrase = "bar baz";
Console.WriteLine(sentence.Contains(phrase)); // True
Here is some C# code to find a substrings using a start string and end string point but you can use as a base and modify (i.e. remove need for end string) to just find your string...
2 versions, one to just find the first instance of a substring, other returns a dictionary of all starting positions of the substring and the actual string.
public Dictionary<int, string> GetSubstringDic(string start, string end, string source, bool includeStartEnd, bool caseInsensitive)
{
int startIndex = -1;
int endIndex = -1;
int length = -1;
int sourceLength = source.Length;
Dictionary<int, string> result = new Dictionary<int, string>();
try
{
//if just want to find string, case insensitive
if (caseInsensitive)
{
source = source.ToLower();
start = start.ToLower();
end = end.ToLower();
}
//does start string exist
startIndex = source.IndexOf(start);
if (startIndex != -1)
{
//start to check for each instance of matches for the length of the source string
while (startIndex < sourceLength && startIndex > -1)
{
//does end string exist?
endIndex = source.IndexOf(end, startIndex + 1);
if (endIndex != -1)
{
//if we want to get length of string including the start and end strings
if (includeStartEnd)
{
//make sure to include the end string
length = (endIndex + end.Length) - startIndex;
}
else
{
//change start index to not include the start string
startIndex = startIndex + start.Length;
length = endIndex - startIndex;
}
//add to dictionary
result.Add(startIndex, source.Substring(startIndex, length));
//move start position up
startIndex = source.IndexOf(start, endIndex + 1);
}
else
{
//no end so break out of while;
break;
}
}
}
}
catch (Exception ex)
{
//Notify of Error
result = new Dictionary<int, string>();
StringBuilder g_Error = new StringBuilder();
g_Error.AppendLine("GetSubstringDic: " + ex.Message.ToString());
g_Error.AppendLine(ex.StackTrace.ToString());
}
return result;
}
public string GetSubstring(string start, string end, string source, bool includeStartEnd, bool caseInsensitive)
{
int startIndex = -1;
int endIndex = -1;
int length = -1;
int sourceLength = source.Length;
string result = string.Empty;
try
{
if (caseInsensitive)
{
source = source.ToLower();
start = start.ToLower();
end = end.ToLower();
}
startIndex = source.IndexOf(start);
if (startIndex != -1)
{
endIndex = source.IndexOf(end, startIndex + 1);
if (endIndex != -1)
{
if (includeStartEnd)
{
length = (endIndex + end.Length) - startIndex;
}
else
{
startIndex = startIndex + start.Length;
length = endIndex - startIndex;
}
result = source.Substring(startIndex, length);
}
}
}
catch (Exception ex)
{
//Notify of Error
result = string.Empty;
StringBuilder g_Error = new StringBuilder();
g_Error.AppendLine("GetSubstring: " + ex.Message.ToString());
g_Error.AppendLine(ex.StackTrace.ToString());
}
return result;
}
You may want to make sure the check ignores the case of both phrases.
string theSentence = "I really want you to shut the door.";
string thePhrase = "Shut The Door";
bool phraseIsPresent = theSentence.ToUpper().Contains(thePhrase.ToUpper());
int phraseStartsAt = theSentence.IndexOf(
thePhrase,
StringComparison.InvariantCultureIgnoreCase);
Console.WriteLine("Is the phrase present? " + phraseIsPresent);
Console.WriteLine("The phrase starts at character: " + phraseStartsAt);
This outputs:
Is the phrase present? True
The phrase starts at character: 21

Categories

Resources