Regex to find and replace a year in a string - c#

I have a string in my c#:
The.Big.Bang.Theory.(2013).S07E05.Release.mp4
I need to find an occurance of (2013), and replace the whole thing, including the brackets, with _ (Three underscores). So the output would be:
The.Big.Bang.Theory._.S07E05.Release.mp4
Is there a regex that can do this? Or is there a better method?
I then do some processing on the new string - but later, need to report that '(2013)' was removed .. so I need to store the value that is replaced.

Tried with your string. It works
string pattern = #"\(\d{4}\)";
string search = "The.Big.Bang.Theory.(2013).S07E05.Release.mp4";
var m = Regex.Replace(search, pattern, "___");
Console.WriteLine(m);
This will find any 4 digits number enclosed in open/close brakets.
If the year number can change, I think that Regex is the best approach .
Instead this code will tell you if there a match for your pattern
var k = Regex.Matches(search, pattern);
if(k.Count > 0)
Console.WriteLine(k[0].Value);

Many of these answers forgot the original question in that you wanted to know what you are replacing.
string pattern = #"\((19|20)\d{2}\)";
string search = "The.Big.Bang.Theory.(2013).S07E05.Release.mp4";
string replaced = Regex.Match(search, pattern).Captures[0].ToString();
string output = Regex.Replace(search, pattern, "___");
Console.WriteLine("found: {0} output: {1}",replaced,output);
gives you the output
found: (2013) output: The.Big.Bang.Theory.___.S07E05.Release.mp4
Here is an explanation of my pattern too.
\( -- match the (
(19|20) -- match the numbers 19 or 20. I assume this is a date for TV shows or movies from 1900 to now.
\d{2} -- match 2 more digits
\) -- match )

Here is a working snippet from a console application, note the regex \(\d{4}\):
var r = new System.Text.RegularExpressions.Regex(#"\(\d{4}\)");
var s = r.Replace("The.Big.Bang.Theory.(2013).S07E05.Release.mp4", "___");
Console.WriteLine(s);
and the output from the console application:
The.Big.Bang.Theory.___.S07E05.Release.mp4
and you can reference this Rubular for proof.
Below is a modified solution taking into consideration your additional requirement:
var m = r.Match("The.Big.Bang.Theory.(2013).S07E05.Release.mp4");
if (m.Success)
{
var s = "The.Big.Bang.Theory.(2013).S07E05.Release.mp4".Replace(m.Value, "___");
var valueReplaced = m.Value;
}

Try this:
string s = "The.Big.Bang.Theory.(2013).S07E05.Release.mp4";
var info = Regex.Split(
Regex.Matches(s, #"\(.*?\)")
.Cast<Match>().First().ToString(), #"[\s,]+");
s = s.Replace(info[0], "___");
Result
The.Big.Bang.Theory.___.S07E05.Release.mp4

try this :
string str="The.Big.Bang.Theory.(2013).S07E05.Release.mp4";
var matches = Regex.Matches(str, #"\([0-9]{4}\)");
List<string> removed=new List<string>();
if (matches.Count > 0)
{
for (int i = 0; i < matches.Count; i++)
{
List.add(matches.value);
}
}
str=Regex.replace(str,#"\([0-9]{4}\)","___");
System.out.println("Removed Strings are:")
foreach(string s in removed )
{
System.out.println(s);
}
output:
Removed Strings are:
(2013)

You don't need a regex for a simple replace (you can use one, but's it's not needed)
var name = "The.Big.Bang.Theory.(2013).S07E05.Release.mp4";
var replacedName = name.Replace("(2013)", "___");

Related

How to split string by another string

I have this string (it's from EDI data):
ISA*ESA?ISA*ESA?
The * indicates it could be any character and can be of any length.
? indicates any single character.
Only the ISA and ESA are guaranteed not to change.
I need this split into two strings which could look like this: "ISA~this is date~ESA|" and
"ISA~this is more data~ESA|"
How do I do this in c#?
I can't use string.split, because it doesn't really have a delimeter.
You can use Regex.Split for accomplishing this
string splitStr = "|", inputStr = "ISA~this is date~ESA|ISA~this is more data~ESA|";
var regex = new Regex($#"(?<=ESA){Regex.Escape(splitStr)}(?=ISA)", RegexOptions.Compiled);
var items = regex.Split(inputStr);
foreach (var item in items) {
Console.WriteLine(item);
}
Output:
ISA~this is date~ESA
ISA~this is more data~ESA|
Note that if your string between the ISA and ESA have the same pattern that we are looking for, then you will have to find some smart way around it.
To explain the Regex a bit:
(?<=ESA) Look-behind assertion. This portion is not captured but still matched
(?=ISA) Look-ahead assertion. This portion is not captured but still matched
Using these look-around assertions you can find the correct | character for splitting
Simply use the
int x = whateverString.indexOf("?ISA"); // replace ? with the actual character here
and then just use the substring from 0 to that indexOf, indexOf to length.
Edit:
If ? is not known,
can we just use the regex Pattern and Matcher.
Matcher matcher = Patter.compile("ISA.*ESA").match(whateverString);
if(matcher.find()) {
matcher.find();
int x = matcher.start();
}
Here x would give that start index of that match.
Edit: I mistakenly saw it as java one, for C#
string pattern = #"ISA.*ESA";
Regex myRegex = new Regex(pattern, RegexOptions.IgnoreCase);
Match m = myRegex.Match(whateverString); // m is the first match
while (m.Success)
{
Console.writeLine(m.value);
m = m.NextMatch(); // more matches
}
RegEx will probably be the best for this. See this link
Mask would be
ISA(?<data1>.*?)ESA.ISA(?<data2>.*?)ESA.
This will give you 2 groups with data you need
Match match = Regex.Match(input, #"ISA(?<data1>.*?)ESA.ISA(?<data2>.*?)ESA.",RegexOptions.IgnoreCase);
if (match.Success)
{
var data1 = match.Groups["data1"].Value;
var data2 = match.Groups["data2"].Value;
}
Use Regex.Matches If you need multiple matches found, and specify different RegexOptions if needed.
It's kinda hacky but you could do...
string x = "ISA*ESA?ISA*ESA?";
x = x.Replace("*","~"); // OR SOME OTHER DELIMITER
string[] y = x.Split('~');
Not perfect in all situations, but it could solve your problem simply.
You could split by "ISA" and "ESA" and then put the parts back together.
string input = "ISA~this is date~ESA|ISA~this is more data~ESA|";
string start = "ISA",
end = "ESA";
var splitedInput = input.Split(new[] { start, end }, StringSplitOptions.None);
var firstPart = $"{start}{splitedInput[1]}{end}{splitedInput[2]}";
var secondPart = $"{start}{splitedInput[3]}{end}{splitedInput[4]}";
firstPart = "ISA~this is date~ESA|"
secondPart = "ISA~this is more data~ESA|";
Use a Regex like ISA(.+?)ESA and select the first group
string input = "ISA~mycontent+ESA";
Match match = Regex.Match(input, #"ISA(.+?)ESA",RegexOptions.IgnoreCase);
if (match.Success)
{
string key = match.Groups[1].Value;
}
Instead of "splitting" by a string, I would instead describe your question as "grouping" by a string. This can easily be done using a regular expression:
Regular expression: ^(ISA.*?(?=ESA)ESA.)(ISA.*?(?=ESA)ESA.)$
Explanation:
^ - asserts position at start of the string
( - start capturing group
ISA - match string ISA exactly
.*?(?=ESA) - match any character 0 or more times, positive lookahead on the
string ESA (basically match any character until the string ESA is found)
ESA - match string ESA exactly
. - match any character
) - end capturing group
repeat one more time...
$ - asserts position at end of the string
Try it on Regex101
Example:
string input = "ISA~this is date~ESA|ISA~this is more data~ESA|";
Regex regex = new Regex(#"^(ISA.*?(?=ESA)ESA.)(ISA.*?(?=ESA)ESA.)$",
RegexOptions.Compiled);
Match match = regex.Match(input);
if (match.Success)
{
string firstValue = match.Groups[1].Value; // "ISA~this is date~ESA|"
string secondValue = match.Groups[2].Value; // "ISA~this is more data~ESA|"
}
There are two answers to the question "How to split a string by another string".
var matches = input.Split(new [] { "ISA" }, StringSplitOptions.RemoveEmptyEntries);
and
var matches = Regex.Split(input, "ISA").ToList();
However, the first removes empty entries, while the second does not.

Extract digit in a string

I have a list of string
goal0=1234.4334abc12423423
goal1=-234234
asdfsdf
I want to extract the number part from string that start with goal,
in the above case is
1234.4334, -234234
(if two fragments of digit get the first one)
how should i do it easily?
Note that "goal0=" is part of the string, goal0 is not a variable.
Therefore I would like to have the first digit fragment that come after "=".
You can do the following:
string input = "goal0=1234.4334abc12423423";
input = input.Substring(input.IndexOf('=') + 1);
IEnumerable<char> stringQuery2 = input.TakeWhile(c => Char.IsDigit(c) || c=='.' || c=='-');
string result = string.Empty;
foreach (char c in stringQuery2)
result += c;
double dResult = double.Parse(result);
Try this
string s = "goal0=-1234.4334abc12423423";
string matches = Regex.Match(s, #"(?<=^goal\d+=)-?\d+(\.\d+)?").Value;
The regex says
(?<=^goal\d+=) - A positive look behind which means look back and make sure goal(1 or more number)= is at the start of the string, but dont make it part of the match
-? - A minus sign which is optional (the ? means 1 or more)
\d+ - One or more digits
(\.\d+)? - A decimal point followed by 1 or more digits which is optional
This will work if your string contains multiple decimal points as well as it will only take the first set of numbers after the first decimal point if there are any.
Use a regex for extracting:
x = Regex.Match(string, #"\d+").Value;
Now convert the resulting string to the number by using:
finalNumber = Int32.Parse(x);
Please try this:
string sample = "goal0=1234.4334abc12423423goal1=-234234asdfsdf";
Regex test = new Regex(#"(?<=\=)\-?\d*(\.\d*)?", RegexOptions.Singleline);
MatchCollection matchlist = test.Matches(sample);
string[] result = new string[matchlist.Count];
if (matchlist.Count > 0)
{
for (int i = 0; i < matchlist.Count; i++)
result[i] = matchlist[i].Value;
}
Hope it helps.
I didn't get the question at first. Sorry, but it works now.
I think this simple expression should work:
Regex.Match(string, #"\d+")
You can use the old VB Val() function from C#. That will extract a number from the front of a string, and it's already available in the framework:
result0 = Microsoft.VisualBasic.Conversion.Val(goal0);
result1 = Microsoft.VisualBasic.Conversion.Val(goal1);
string s = "1234.4334abc12423423";
var result = System.Text.RegularExpressions.Regex.Match(s, #"-?\d+");
List<String> list = new List<String>();
list.Add("goal0=1234.4334abc12423423");
list.Add("goal1=-23423");
list.Add("asdfsdf");
Regex regex = new Regex(#"^goal\d+=(?<GoalNumber>-?\d+\.?\d+)");
foreach (string s in list)
{
if(regex.IsMatch(s))
{
string numberPart = regex.Match(s).Groups["GoalNumber"];
// do something with numberPart
}
}

Remove String After Determinate String

I need to remove certain strings after another string within a piece of text.
I have a text file with some URLs and after the URL there is the RESULT of an operation. I need to remove the RESULT of the operation and leave only the URL.
Example of text:
http://website1.com/something Result: OK(registering only mode is on)
http://website2.com/something Result: Problems registered 100% (SOMETHING ELSE) Other Strings;
http://website3.com/something Result: error: "Âíèìàíèå, îáíàðóæåíà îøèáêà - Ìåñòî æèòåëüñòâà ñîäåðæèò íåäîïóñòèìûå ê
I need to remove all strings starting from Result: so the remaining strings have to be:
http://website1.com/something
http://website2.com/something
http://website3.com/something
Without Result: ........
The results are generated randomly so I don't know exactly what there is after RESULT:
One option is to use regular expressions as per some other answers. Another is just IndexOf followed by Substring:
int resultIndex = text.IndexOf("Result:");
if (resultIndex != -1)
{
text = text.Substring(0, resultIndex);
}
Personally I tend to find that if I can get away with just a couple of very simple and easy to understand string operations, I find that easier to get right than using regex. Once you start going into real patterns (at least 3 of these, then one of those) then regexes become a lot more useful, of course.
string input = "Action2 Result: Problems registered 100% (SOMETHING ELSE) Other Strings; ";
string pattern = "^(Action[0-9]*) (.*)$";
string replacement = "$1";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(input, replacement);
You use $1 to keep the match ActionXX.
Use Regex for this.
Example:
var r = new System.Text.RegularExpressions.Regex("Result:(.)*");
var result = r.Replace("Action Result:1231231", "");
Then you will have "Action" in the result.
You can try with this code - by using string.Replace
var pattern = "Result:";
var lineContainYourValue = "jdfhkjsdfhsdf Result:ljksdfljh"; //I want replace test
lineContainYourValue.Replace(pattern,"");
Something along the lines of this perhaps?
string line;
using ( var reader = new StreamReader ( File.Open ( #"C:\temp\test.txt", FileMode.Open ) ) )
using ( var sw = new StreamWriter(File.Open( #"C:\Temp\test.edited.txt", FileMode.CreateNew ) ))
while ( (line = reader.ReadLine()) != null )
if(!line.StartsWith("Result:")) sw.WriteLine(line);
You can use RegEx for this kind of processing.
using System.Text.RegularExpressions;
private string ParseString(string originalString)
{
string pattern = ".*(?=Result:.*)";
Match match = Regex.Match(originalString, pattern);
return match.Value;
}
A Linq approach:
IEnumerable<String> result = System.IO.File
.ReadLines(path)
.Where(l => l.StartsWith("Action") && l.Contains("Result"))
.Select(l => l.Substring(0, l.IndexOf("Result")));
Given your current example, where you want only the website, regex match the spaces.
var fileLine = "http://example.com/sub/ random text";
Regex regexPattern = new Regex("(.*?)\\s");
var websiteMatch = regexPattern.Match(fileLine).Groups[1].ToString();
Debug.Print("!" + websiteMatch + "!");
Repeating for each line in your text file. Regex explained: .* matches anything, ? makes the match ungreedy, (brackets) puts the match into a group, \\s matches whitespace.

regex for PropertyName e.g. HelloWorld2HowAreYou would get Hello HelloWorld2 HelloWorld2How

I need a regex for PropertyName e.g. HelloWorld2HowAreYou would get:
Hello HelloWorld2 HelloWorld2How etc.
I want to use it in C#
[A-Z][a-z0-9]+ would give you all words that start with capital letter. You can write code to concat them one by one to get the complete set of words.
For example matching [A-Z][a-z0-9]+ against HelloWorld2HowAreYou with global flag set, you will get the following matches.
Hello
World2
How
Are
You
Just iterate through the matches and concat them to form the words.
Port this to C#
var s = "HelloWorld2HowAreYou";
var r = /[A-Z][a-z0-9]+/g;
var m;
var matches = [];
while((m = r.exec(s)) != null)
matches.push(m[0]);
var o = "";
for(var i = 0; i < matches.length; i++)
{
o += matches[i]
console.log(o + "\n");
}
I think something like this is what you want:
var s = "HelloWorld2HowAreYou";
Regex r = new Regex("(?=[A-Z]|$)(?<=(.+))");
foreach (Match m in r.Matches(s)) {
Console.WriteLine(m.Groups[1]);
}
The output is (as seen on ideone.com):
Hello
HelloWorld2
HelloWorld2How
HelloWorld2HowAre
HelloWorld2HowAreYou
How it works
The regex is based on two assertions:
(?=[A-Z]|$) matches positions just before an uppercase, and at the end of the string
(?<=(.+)) is a capturing lookbehind for .+ behind the current position into group 1
Essentially, the regex translates to:
"Everywhere just before an uppercase, or at the end of the string"...
"grab everything behind you if it's not an empty string"

Regular Expression to match numbers inside parenthesis inside square brackets with optional text

Firstly, I'm in C# here so that's the flavor of RegEx I'm dealing with. And here are thing things I need to be able to match:
[(1)]
or
[(34) Some Text - Some Other Text]
So basically I need to know if what is between the parentheses is numeric and ignore everything between the close parenthesis and close square bracket. Any RegEx gurus care to help?
This should work:
\[\(\d+\).*?\]
And if you need to catch the number, simply wrap \d+ in parentheses:
\[\((\d+)\).*?\]
Do you have to match the []? Can you do just ...
\((\d+)\)
(The numbers themselves will be in the groups).
For example ...
var mg = Regex.Match( "[(34) Some Text - Some Other Text]", #"\((\d+)\)");
if (mg.Success)
{
var num = mg.Groups[1].Value; // num == 34
}
else
{
// No match
}
Regex seems like overkill in this situation. Here is the solution I ended up using.
var src = test.IndexOf('(') + 1;
var dst = test.IndexOf(')') - 1;
var result = test.SubString(src, dst-src);
Something like:
\[\(\d+\)[^\]]*\]
Possibly with some more escaping required?
How about "^\[\((d+)\)" (perl style, not familiar with C#). You can safely ignore the rest of the line, I think.
Depending on what you're trying to accomplish...
List<Boolean> rslt;
String searchIn;
Regex regxObj;
MatchCollection mtchObj;
Int32 mtchGrp;
searchIn = #"[(34) Some Text - Some Other Text] [(1)]";
regxObj = new Regex(#"\[\(([^\)]+)\)[^\]]*\]");
mtchObj = regxObj.Matches(searchIn);
if (mtchObj.Count > 0)
rslt = new List<bool>(mtchObj.Count);
else
rslt = new List<bool>();
foreach (Match crntMtch in mtchObj)
{
if (Int32.TryParse(crntMtch.Value, out mtchGrp))
{
rslt.Add(true);
}
}
How's this? Assuming you only need to determine if the string is a match, and need not extract the numeric value...
string test = "[(34) Some Text - Some Other Text]";
Regex regex = new Regex( "\\[\\(\\d+\\).*\\]" );
Match match = regex.Match( test );
Console.WriteLine( "{0}\t{1}", test, match.Success );

Categories

Resources