Regex.Match doesnt work correctly

Regex.Match doesnt work correctly - c#

I have a string extension that was defined exactly like this:
public static string GetStringBetween(this string value, string start, string end)
{
start = Regex.Escape(start);
end = Regex.Escape(end);
GroupCollection matches = Regex.Match(value, start + #"([^)]*)" + end).Groups;
return matches[1].Value;
}
But when I call this:
string str = "The pre-inspection image A. Valderama (1).jpg of client Valderama is not...";
Console.WriteLine(str.GetStringBetween("pre-inspection image ", " of client"));
It doesn't write anything. But when the str value is like this:
string str = "The pre-inspection image A. Valderama.jpg of client Valderama is not...";
It works fine. Why was it like this?
My code is in C#, framework 4, build in VS2010 Pro.
Please help. Thanks in advance.

Because you specify to exclude the character ) in the capturing group of your regex: [^)] in #"([^)]*)"
And since ) appears in the first string: Valderama (1).jpg, it will not be able to match.
You probably want #"(.*)" instead.

Related

Can I shorten this code with a loop in C#?

I have this code written in C# but looks kind of "bad" and I would like to shorten it somehow and keep it clean and simple.
All this code works pretty fine but I want to know if there's any other way I can achieve the same thing.
EDIT: I forgot to mention that the firstLine has a bad date format attached with it, so it is like this: "This_is_my_first_line_20220126". So I split the string and then only join it with the corrected date. The problem is that I can never know how long the new string would be and I don't want to handle the code like this and go up to 100 parts.
Here's my code:
string correctDate = "26012022";
string[] lines = File.ReadAllLines("text.txt");
string firstLine = lines.FirstOrDefault();
//note: firstLine looks like this: This_is_my_first_line_20220126
string[] sub = firstLine.Split('_');
string name="";
if(sub.Length==2)
name = sub[0]+"_"+sub[1]+"_"+correctDate;
else if(sub.Length==3)
name = sub[0]+"_"+sub[1]+"_"+sub[2]+"_"correctDate;
...
else if(sub.Length==20)
name = sub[0]+"_"+ ... "_" + sub[19];
Now, my final name value should be "This_is_my_line_26012022" but I want it to depend on the length of the given string. So far I know that the maximum length would go up to 20 but I don't want my code to look like this. Can I shorten it somehow?

you can find the LastIndexOf the underscore and drop the date by using Substring:
string firstLine = "This_is_my_first_line_20220126";
string correctDate = "26012022";
string correctString = firstLine.Substring(0, firstLine.LastIndexOf("_") + 1) + correctDate;

Still a little perplexed with the split aproach, but this a way to join back all elements
string name = string.Join("_", sub.Take(sub.Length - 1).Append(correctDate));
Or use the substring method (and no need of all that split & join)
name = firstLine.Substring(0, firstLine.LastIndexOf("_") +1) + correctDate;

I forgot to mention that firstLine has a bad date format like "This_is_my_Line_20220125"
If you want to correct just the first line:
string correctDate = "26012022";
string[] lines = File.ReadAllLines("text.txt");
lines[0] = lines[0][..^8] + correctDate;
[..^8] uses C# 9's "indices and ranges" feature, that allows for a more compact way of taking a substring. It means "from the start of the string, up to the index 8 back from the end of the string".
If you get a wiggly line and possibly a messages like "... is not available in C# version X" you can use the older syntax, which would be more like lines[0] = lines[0].Remove(lines[0].Length - 8) + correctDate;
If you want to correct all lines:
string correctDate = "26012022";
string[] lines = File.ReadAllLines("text.txt");
for(int x = 0; x < lines.Length; x++)
lines[x] = lines[x][..^8] + correctDate;
If the incorrect date isn't always 8 characters long, you can use LastIndexOf('_') to locate the last _, and snip it to that point

Allow only single type of mark in string

I'm trying to figure out, how to allow only single type of mark in string. For example if string inputStr contains different marks:
string inputStr = "hello, how are you? ~ say something: what's up? hi... tell me. what?! ok: so,";
this way:
string outputStr = Regex.Replace(inputStr, #"[^\w\s]", "");
In result I will get outputStr without any marks:
hello how are you say something whatsup hi tell me what ok so
with desired result, I want keep only single specific colon ":" mark in outputStr:
hello how are you say something: whatsup hi tell me what ok: so
Any guide, advice or example would be helpful

I hope I understood your question correctly. You could use following.
[^\w\s:]
Code
string outputStr = Regex.Replace(inputStr, #"[^\w\s:]", "");
If you want to have a custom function, you can do the following so that you can reuse the method with different characters.
public static string ExtendedReplace(string sourceString, char charToRetain)
{
return Regex.Replace(sourceString, $#"[^\w\s{charToRetain}]", "");
}
Now you could use as following.
string inputStr = "hello, how are you? ~ say something: what's up? hi... tell me. what?! ok: so";
string outputStr = ExtendedReplace(inputStr, ':');

Substring issue

I have the below code:
sDocType = pqReq.Substring(0, pqReq.IndexOf(#"\t"));
The string pqReq is like this: "CSTrlsEN\t001\t\\sgprt\Projects2\t001\tCSTrl". But even though I can clearly see the t\ in the string, pqReq.IndexOf(#"\t") returns -1, so an error is thrown.
What's the correct way to do this? I don't want to split the string pqReq until later on in the code.

Use \\t instead of \t. The \t is seen as a tab-character. sDocType = pqReq.Substring(0, pqReq.IndexOf(#"\t"));
Edit:
I didn't notice the \t being literal due to the #. But is your input string a literal string? If not, place an # before the value of pqReq.
string pqReq = #"CSTrlsEN\t001\t\\sgprt\Projects2\t001\tCSTrl";
int i = pqReq.IndexOf(#"\t");
//i = 8

I can't reproduce this issue. The following code (.NET Fiddle here):
var pqReq=#"CSTrlsEN\t001\t\\sgprt\Projects2\t001\tCSTrl";
var idx=pqReq.IndexOf(#"\t");
Console.WriteLine(idx);
var sDocType = pqReq.Substring(0, idx);
Console.WriteLine(sDocType);
produces:
8
CSTrlsEN
Did you forget to prefix pqReq with #?

How To Remove the Symbol from the string

i am having the string like this
string value="{\"email\":\"test#example.com\",\"password\":\"passworddata\"}"
i want to remove this symbol("\")
and i want string as like this
"{"email":"gg.com","password":"ff"}"

The backslashes are automatically escaped u shouldn't need to do anything.

If you don't mind, you could try this code:
string result = value.Replace("\\", string.Empty);

That looks like it could be JSON; although, you're representing it as an embedded C# string.
Let's say it was this in c#
string value = "{\\\"email\\\":\\\"xxx#example.com\\\",\\\"password\\\":\\\"passworddata\\\"}";
that on console output looked like:
{\"email\":\"xxx#example.com\",\"password\":\"passworddata\"}
You could use regex to strip the escapes:
var val = Regex.Replace(value, "\\\\([^\\\\])", "$1");
so that on output you would have:
{"email":"xxx#example.com","password":"passworddata"}

I think this may help you:
public string Formatter(string MainText, char CharToRemove)
{
string result = MainText;
foreach (char c in result)
{
if(c == CharToRemove)
result = result.Remove(result.IndexOf(c), 1);
}
return result;
}

Try This:
string value = "{\"email\":\"xxx#gamil.com\",\"password\":\"passworddata\"}";
value="\"" + value.Replace("\\", "") + "\"";
Output:
"{"email":"gg.com","password":"passworddata"}"

Place the edit cursor (I-bar) behind each \ character and press Bksp
As per King King's request:
An aternative method would be to place the I-bar in front of the \ and press Del

Remove characters after specific character in string, then remove substring?

I feel kind of dumb posting this when this seems kind of simple and there are tons of questions on strings/characters/regex, but I couldn't find quite what I needed (except in another language: Remove All Text After Certain Point).
I've got the following code:
[Test]
public void stringManipulation()
{
String filename = "testpage.aspx";
String currentFullUrl = "http://localhost:2000/somefolder/myrep/test.aspx?q=qvalue";
String fullUrlWithoutQueryString = currentFullUrl.Replace("?.*", "");
String urlWithoutPageName = fullUrlWithoutQueryString.Remove(fullUrlWithoutQueryString.Length - filename.Length);
String expected = "http://localhost:2000/somefolder/myrep/";
String actual = urlWithoutPageName;
Assert.AreEqual(expected, actual);
}
I tried the solution in the question above (hoping the syntax would be the same!) but nope. I want to first remove the queryString which could be any variable length, then remove the page name, which again could be any length.
How can I get the remove the query string from the full URL such that this test passes?

For string manipulation, if you just want to kill everything after the ?, you can do this
string input = "http://www.somesite.com/somepage.aspx?whatever";
int index = input.IndexOf("?");
if (index >= 0)
input = input.Substring(0, index);
Edit: If everything after the last slash, do something like
string input = "http://www.somesite.com/somepage.aspx?whatever";
int index = input.LastIndexOf("/");
if (index >= 0)
input = input.Substring(0, index); // or index + 1 to keep slash
Alternately, since you're working with a URL, you can do something with it like this code
System.Uri uri = new Uri("http://www.somesite.com/what/test.aspx?hello=1");
string fixedUri = uri.AbsoluteUri.Replace(uri.Query, string.Empty);

To remove everything before the first /
input = input.Substring(input.IndexOf("/"));
To remove everything after the first /
input = input.Substring(0, input.IndexOf("/") + 1);
To remove everything before the last /
input = input.Substring(input.LastIndexOf("/"));
To remove everything after the last /
input = input.Substring(0, input.LastIndexOf("/") + 1);
An even more simpler solution for removing characters after a specified char is to use the String.Remove() method as follows:
To remove everything after the first /
input = input.Remove(input.IndexOf("/") + 1);
To remove everything after the last /
input = input.Remove(input.LastIndexOf("/") + 1);

Here's another simple solution. The following code will return everything before the '|' character:
if (path.Contains('|'))
path = path.Split('|')[0];
In fact, you could have as many separators as you want, but assuming you only have one separation character, here is how you would get everything after the '|':
if (path.Contains('|'))
path = path.Split('|')[1];
(All I changed in the second piece of code was the index of the array.)

The Uri class is generally your best bet for manipulating Urls.

To remove everything before a specific char, use below.
string1 = string1.Substring(string1.IndexOf('$') + 1);
What this does is, takes everything before the $ char and removes it. Now if you want to remove the items after a character, just change the +1 to a -1 and you are set!
But for a URL, I would use the built in .NET class to take of that.

Request.QueryString helps you to get the parameters and values included within the URL
example
string http = "http://dave.com/customers.aspx?customername=dave"
string customername = Request.QueryString["customername"].ToString();
so the customername variable should be equal to dave
regards

I second Hightechrider: there is a specialized Url class already built for you.
I must also point out, however, that the PHP's replaceAll uses regular expressions for search pattern, which you can do in .NET as well - look at the RegEx class.

you can use .NET's built in method to remove the QueryString.
i.e., Request.QueryString.Remove["whatever"];
here whatever in the [ ] is name of the querystring which you want to
remove.
Try this...
I hope this will help.

You can use this extension method to remove query parameters (everything after the ?) in a string
public static string RemoveQueryParameters(this string str)
{
int index = str.IndexOf("?");
return index >= 0 ? str.Substring(0, index) : str;
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex.Match doesnt work correctly - c#

Because you specify to exclude the character ) in the capturing group of your regex: [^)] in #"([^)])" And since ) appears in the first string: Valderama (1).jpg, it will not be able to match. You probably want #"(.)" instead.

Related

Can I shorten this code with a loop in C#?

Allow only single type of mark in string

Substring issue

How To Remove the Symbol from the string

Remove characters after specific character in string, then remove substring?

Categories

Resources

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex.Match doesnt work correctly - c#

Because you specify to exclude the character ) in the capturing group of your regex: [^)] in #"([^)]*)" And since ) appears in the first string: Valderama (1).jpg, it will not be able to match. You probably want #"(.*)" instead.

Related

Can I shorten this code with a loop in C#?

Allow only single type of mark in string

Substring issue

How To Remove the Symbol from the string

Remove characters after specific character in string, then remove substring?

Categories

Resources

Because you specify to exclude the character ) in the capturing group of your regex: [^)] in #"([^)])" And since ) appears in the first string: Valderama (1).jpg, it will not be able to match. You probably want #"(.)" instead.