C# using regexp while working with strings - c#

I have a string in this format "string1;string2;string3;...;stringn"
; as a delimiter
I need to delete some string value of which I know, say valueForDelete
I use string.Split(';') method to find my value, delete it and then create a new string without deleted value.
I'm wondering is it possible to make this process easy with regex?

var values = "string1;string2;string3;string4";
var cleanedValues = String.Join(";",
values.Split(';')
.Where(x => x != "string3")
.ToArray())
Regex is a useful tool, and could be used for this, but often hard to maintain. Something like the above would likely provide an easier to maintain solution. Regex can be tricky if your string also contain regex characters. As a bonus, this is easy to extend.
static string CleanItUp(string values, params string[] removeMe)
{
return String.Join(";",
values.Split(';')
.Except(removeMe)
.ToArray());
}
Used like.
var retString = CleanItUp("string1;string2;string3;", "string1", "string2");
// returns "string3"

Why not just:
string s = "string1;string2;string3;valueForDelete;string4"
s = s.Replace("valueForDelete;", string.Empty).Replace("valueForDelete", string.Empty);
The second replace is for if the value is the last one.

However possible with RegEx, using Split and Join will be your easiest, most functional choice. If you had a more complex method of choosing what Strings to delete, you could use the Where clause.
String input = "string1;string2;string3";
String valueForDelete = "string2";
String[] parts = input.Split(';');
var allowed = parts.Where(str => !str.Equals(valueForDelete));
String output = String.Join(";", allowed);
If you are simply removing an exact value than String.Replace would be better.

Use this Regex to find valueForDelete: (?<=;|^)valueForDelete(?=;|$)
const string Pattern = #"(?<=;|^)string3(?=;|$)";
var s = "string1;string2;string3;string4;string5;";
var res = Regex.Replace(s, Pattern, string.Empty);

Regex will do that but not sure that for what you are asking it would be faster. .Split is fast. If you were spitting on something more complex then you would have to use regex. I assume you are using StringBuilder to build the new string? String += is slow. When you new the StringBuilder make it the size you expect.

For Replacement ensuring no other data is affected (using LINQ):
string test = "string1;string2;string3;valueForDelete;stringn";
test = String.Join(";", test.Split(';').Where(s => s != "valueForDelete"));
For simple replacement (using String.Replace()):
string test = "string1;string2;string3;valueForDelete;stringn";
test = test.Replace("valueForDelete;", "");

Couldn't you just say
var myString = "string1;string2;string3;string4;string5;";
myString = myString.Replace("string3;", "");
The result would be a myString with the value "string1;string2;string4;string5;"
EDIT: Created as a regex
public static Regex regex = new Regex("(?:^|;)string3(;|$)",
RegexOptions.CultureInvariant | RegexOptions.Compiled
);
myString = regex.Replace(myString, ";");
...only flaw I see at the moment is if myString = "string3"; it results in myString = ";";

Why Can't you just do this?
public static string Replace(string input)
{
input = input.Replace("valueToDelete;", "");
return input ;
}

Related

Need to replace a string C#

I need to replace:
string input = "%someAlphabets%.ZIP"
string replaceWith = "Hello"
string result = "Hello.ZIP"
I tried with Regex.Replace(inputString,"[%][A-Za-z][%]", replacedWith); but it is not working.
The problem in your expression is that, there is only one alphabet in between % signs. You need to repeat the alphabets.
Regex.Replace(inputString,"[%][A-Za-z]{1,}[%]", replacedWith);
Try this:
string input= "%someAlphabets%.ZIP"
string regex = "(%.*%)";
string result = Regex.Replace(input, regex, "Hello");
It doesn't care if the name is alphabet only but that you can change by changing the .* caluse to your selection logic.
As already mentioned in the comments, you don't need RegEx for this.
More simpler alternatives may be:
Using string.Format
string.Format("{0}", input)`
Using string interpolation
var input = "Hello";
var result = $"{input}.zip";
Using string.Replace method
var input = "%pattern%.ZIP"
var with = "Hello"
var result = input.Replace("%pattern%", with);

How to use String.Replace

Quick question:
I have this String m_Author, m_Editor But I have some weird ID stuff within the string so if I do a WriteLine it will look like:
'16;#Luca Hostettler'
I know I can do the following:
string author = m_Author.Replace("16;#", "");
string editor = m_Editor.Replace("16;#", "");
And after that I will just have the name,
But I think in future I will have other people and other ID's.
So the question: Can I tell the String.Replace("#AndEverythingBeforeThat", "")
So i could also have
'14;#Luca Hostettler'
'15;#Hans Meier'
And would get the Output: Luca Hostettler, Hans Meier, without changing the code manually to m_Editor.Replace("14;#", ""), m_Editor.Replace("15;#", "")...?
It sounds like you want a regex of "at least one digit, then semi-colon and hash", with an anchor for "only at the start of the string":
string author = Regex.Replace(m_Author, #"^\d+;#", "");
Or to make it more reusable:
private static readonly Regex IdentifierMatcher = new Regex(#"^\d+;#");
...
string author = IdentifierMatcher.Replace(m_Author, "");
string editor = IdentifierMatcher.Repalce(m_Editor, "");
Note that there may be different appropriate solutions if:
The ID can be non-numeric
There may be other ignorable parts and you only want the value after the last hash
You could use regex or (what i'd prefer) IndexOf + Substring:
int indexOfHash = m_Author.IndexOf("#");
if(indexOfHash >= 0)
{
string author = m_Author.Substring(indexOfHash + 1);
}
or just,
var author = m_Author.Split('#').Last();
You can Split you string with # using string.Split() function this will give you two strings first everything before # and second everything after #
use String.Format
int number=5;
string userId = String.Format("{0};#",number)
string author = m_Author.Replace(userId, "");
If all you want to do is filter out everything that is not a letter or space, try:
var originalName = "#123;Firstname Lastname";
var filteredName = new string(originalName
.Where(c => Char.IsLetter(c) ||
Char.IsWhiteSpace(c))
.ToArray());
The example will produce Firstname Lastname
List<char> originalName = "15;#Hans Meier".ToList();
string newString = string.Concat(originalName.Where(x => originalName.IndexOf(x) > originalName.IndexOf('#')).ToList());

best possible way to get given substring

lets say I have string in format as below:
[val1].[val2].[val3] ...
What is the best way to get the value from the last bracket set [valx] ?
so for given example
[val1].[val2].[val3]
the result would be val3
You have to define best first, best in terms of readability or cpu-cycles?
I assume this is efficient and readable enough:
string values = "[val1].[val2].[val3]";
string lastValue = values.Split('.').Last().Trim('[',']');
or with Substring which can be more efficient, but it's not as safe since you have to handle the case that's there no dot at all.
lastValue = values.Substring(values.LastIndexOf('.') + 1).Trim('[',']');
So you need to check this first:
int indexOflastDot = values.LastIndexOf('.');
if(indexOflastDot >= 0)
{
lastValue = values.Substring(indexOflastDot + 1).Trim('[',']');
}
For a quick solution to your problem (so not structural),
I'd say:
var startIndex = input.LastIndexOf(".["); // getting the last
then using the Substring method
var value = input.Substring(startIndex + 2, input.Length - (startIndex - 2)); // 2 comes from the length of ".[".
then removing the "]" with TrimEnd function
var value = value.TrimEnd(']');
But this is by all means not the only solution, and not structural to apply.. Just one of many answers to your problem.
I think you want to access the valx.
The easiest solution that comes in my mind is this one:
public void Test()
{
var splitted = "[val1].[val2].[val3]".Split('.');
var val3 = splitted[2];
}
You can use following:
string[] myStrings = ("[val1].[val2].[val3]").Split('.');
Now you can access via index. For last you can use myStrings[myStrings.length - 1]
Providing, that none of val1...valN contains '.', '[' or ']' you can use a simple Linq code:
String str = #"[val1].[val2].[val3]";
String[] vals = str.Split('.').Select((x) => x.TrimStart('[').TrimEnd(']')).ToArray();
Or if all you want is the last value:
String str = #"[val1].[val2].[val3]";
String last = str.Split('.').Last().TrimStart('[').TrimEnd(']');
I'm assuming you always need the last brace. I would do it like this:
string input = "[val1].[val2].[val3]";
string[] splittedInput = input.split('.');
string lastBraceSet = splittedInput[splittedInput.length-1];
string result = lastBraceSet.Substring(1, lastBraceSet.Length - 2);
string str = "[val1].[val2].[val3]";
string last = str.Split('.').LastOrDefault();
string result = last.Replace("[", "").Replace("]", "");
string input="[val1].[val2].[val3]";
int startpoint=input.LastIndexOf("[")+1;
string result=input.Substring(startpoint,input.Length-startpoint-1);
I'd use the below regex. One warning is that it won't work if there are unbalanced square brackets after the last pair of brackets. Most of the answers given suffer from that though.
string s = "[val1].[val2].[val3]"
string pattern = #"(?<=\[)[^\]]+(?=\][^\[\]]*$)"
Match m = Regex.Match(s, pattern)
string result;
if (m.Success)
{
result = m.Value;
}
I would use regular expression, as they are the most clear from intention point of view:
string input = "[val1].[val2].[val3] ...";
string match = Regex.Matches(input, #"\[val\d+\]")
.Cast<Match>()
.Select(m => m.Value)
.Last();

Remove String After Determinate String

I need to remove certain strings after another string within a piece of text.
I have a text file with some URLs and after the URL there is the RESULT of an operation. I need to remove the RESULT of the operation and leave only the URL.
Example of text:
http://website1.com/something Result: OK(registering only mode is on)
http://website2.com/something Result: Problems registered 100% (SOMETHING ELSE) Other Strings;
http://website3.com/something Result: error: "Âíèìàíèå, îáíàðóæåíà îøèáêà - Ìåñòî æèòåëüñòâà ñîäåðæèò íåäîïóñòèìûå ê
I need to remove all strings starting from Result: so the remaining strings have to be:
http://website1.com/something
http://website2.com/something
http://website3.com/something
Without Result: ........
The results are generated randomly so I don't know exactly what there is after RESULT:
One option is to use regular expressions as per some other answers. Another is just IndexOf followed by Substring:
int resultIndex = text.IndexOf("Result:");
if (resultIndex != -1)
{
text = text.Substring(0, resultIndex);
}
Personally I tend to find that if I can get away with just a couple of very simple and easy to understand string operations, I find that easier to get right than using regex. Once you start going into real patterns (at least 3 of these, then one of those) then regexes become a lot more useful, of course.
string input = "Action2 Result: Problems registered 100% (SOMETHING ELSE) Other Strings; ";
string pattern = "^(Action[0-9]*) (.*)$";
string replacement = "$1";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(input, replacement);
You use $1 to keep the match ActionXX.
Use Regex for this.
Example:
var r = new System.Text.RegularExpressions.Regex("Result:(.)*");
var result = r.Replace("Action Result:1231231", "");
Then you will have "Action" in the result.
You can try with this code - by using string.Replace
var pattern = "Result:";
var lineContainYourValue = "jdfhkjsdfhsdf Result:ljksdfljh"; //I want replace test
lineContainYourValue.Replace(pattern,"");
Something along the lines of this perhaps?
string line;
using ( var reader = new StreamReader ( File.Open ( #"C:\temp\test.txt", FileMode.Open ) ) )
using ( var sw = new StreamWriter(File.Open( #"C:\Temp\test.edited.txt", FileMode.CreateNew ) ))
while ( (line = reader.ReadLine()) != null )
if(!line.StartsWith("Result:")) sw.WriteLine(line);
You can use RegEx for this kind of processing.
using System.Text.RegularExpressions;
private string ParseString(string originalString)
{
string pattern = ".*(?=Result:.*)";
Match match = Regex.Match(originalString, pattern);
return match.Value;
}
A Linq approach:
IEnumerable<String> result = System.IO.File
.ReadLines(path)
.Where(l => l.StartsWith("Action") && l.Contains("Result"))
.Select(l => l.Substring(0, l.IndexOf("Result")));
Given your current example, where you want only the website, regex match the spaces.
var fileLine = "http://example.com/sub/ random text";
Regex regexPattern = new Regex("(.*?)\\s");
var websiteMatch = regexPattern.Match(fileLine).Groups[1].ToString();
Debug.Print("!" + websiteMatch + "!");
Repeating for each line in your text file. Regex explained: .* matches anything, ? makes the match ungreedy, (brackets) puts the match into a group, \\s matches whitespace.

string replace using a List<string>

I have a List of words I want to ignore like this one :
public List<String> ignoreList = new List<String>()
{
"North",
"South",
"East",
"West"
};
For a given string, say "14th Avenue North" I want to be able to remove the "North" part, so basically a function that would return "14th Avenue " when called.
I feel like there is something I should be able to do with a mix of LINQ, regex and replace, but I just can't figure it out.
The bigger picture is, I'm trying to write an address matching algorithm. I want to filter out words like "Street", "North", "Boulevard", etc. before I use the Levenshtein algorithm to evaluate the similarity.
How about this:
string.Join(" ", text.Split().Where(w => !ignoreList.Contains(w)));
or for .Net 3:
string.Join(" ", text.Split().Where(w => !ignoreList.Contains(w)).ToArray());
Note that this method splits the string up into individual words so it only removes whole words. That way it will work properly with addresses like Northampton Way #123 that string.Replace can't handle.
Regex r = new Regex(string.Join("|", ignoreList.Select(s => Regex.Escape(s)).ToArray()));
string s = "14th Avenue North";
s = r.Replace(s, string.Empty);
Something like this should work:
string FilterAllValuesFromIgnoreList(string someStringToFilter)
{
return ignoreList.Aggregate(someStringToFilter, (str, filter)=>str.Replace(filter, ""));
}
What's wrong with a simple for loop?
string street = "14th Avenue North";
foreach (string word in ignoreList)
{
street = street.Replace(word, string.Empty);
}
If you know that the list of word contains only characters that do not need escaping inside a regular expression then you can do this:
string s = "14th Avenue North";
Regex regex = new Regex(string.Format(#"\b({0})\b",
string.Join("|", ignoreList.ToArray())));
s = regex.Replace(s, "");
Result:
14th Avenue
If there are special characters you will need to fix two things:
Use Regex.Escape on each element of ignore list.
The word-boundary \b will not match a whitespace followed by a symbol or vice versa. You may need to check for whitespace (or other separating characters such as punctuation) using lookaround assertions instead.
Here's how to fix these two problems:
Regex regex = new Regex(string.Format(#"(?<= |^)({0})(?= |$)",
string.Join("|", ignoreList.Select(x => Regex.Escape(x)).ToArray())));
If it's a short string as in your example, you can just loop though the strings and replace one at a time. If you want to get fancy you can use the LINQ Aggregate method to do it:
address = ignoreList.Aggregate(address, (a, s) => a.Replace(s, String.Empty));
If it's a large string, that would be slow. Instead you can replace all strings in a single run through the string, which is much faster. I made a method for that in this answer.
LINQ makes this easy and readable. This requires normalized data though, particularly in that it is case-sensitive.
List<string> ignoreList = new List<string>()
{
"North",
"South",
"East",
"West"
};
string s = "123 West 5th St"
.Split(' ') // Separate the words to an array
.ToList() // Convert array to TList<>
.Except(ignoreList) // Remove ignored keywords
.Aggregate((s1, s2) => s1 + " " + s2); // Reconstruct the string
Why not juts Keep It Simple ?
public static string Trim(string text)
{
var rv = text.trim();
foreach (var ignore in ignoreList) {
if(tv.EndsWith(ignore) {
rv = rv.Replace(ignore, string.Empty);
}
}
return rv;
}
You can do this using and expression if you like, but it's easier to turn it around than using a Aggregate. I would do something like this:
string s = "14th Avenue North"
ignoreList.ForEach(i => s = s.Replace(i, ""));
//result is "14th Avenue "
public static string Trim(string text)
{
var rv = text;
foreach (var ignore in ignoreList)
rv = rv.Replace(ignore, "");
return rv;
}
Updated For Gabe
public static string Trim(string text)
{
var rv = "";
var words = text.Split(" ");
foreach (var word in words)
{
var present = false;
foreach (var ignore in ignoreList)
if (word == ignore)
present = true;
if (!present)
rv += word;
}
return rv;
}
If you have a list, I think you're going to have to touch all the items. You could create a massive RegEx with all your ignore keywords and replace to String.Empty.
Here's a start:
(^|\s+)(North|South|East|West){1,2}(ern)?(\s+|$)
If you have a single RegEx for ignore words, you can do a single replace for each phrase you want to pass to the algorithm.

Categories

Resources