C# Replace and Remove text - c#

I am having a little problem with how to replace and remove the text from the label.
label1.Text = Users online: 1 browsing: 1 pages
I am using gethtmldocument to receive the label1.Text to be like above. My problem is I want the text to show only Users Online: (number).
Now I am using label1.Text.Remove(17). So I will get Users online: 1 but the problem is when the users exceed the limit is 10 the text will count to 1 again not 10.
And I am trying to use label1.Text.replace("browsing: 1 pages",""). But when user is online the browsing: 1 pages will change to browsing: 2 pages or others.
So my question is how can I receive the text only Users online: ???
Thank you.

Try using regular expressions: match the groups and represent them in the desired way:
using System.Text.RegularExpressions;
...
string source = "Users online: 479 browsing: 153 pages";
// match.Groups["text"] - "Users online: "
// match.Groups["number"] - "479"
var match = Regex.Match(source, "^(?<text>.*?)(?<number>[0-9]+)");
// Users online: (479)
label1.Text = $"{match.Groups["text"].Value.Trim()} ({match.Groups["number"].Value})";
Edit: Regular expression's pattern ^(?<text>.*?)(?<number>[0-9]+) explanation:
^ - anchor: string's beginning
(?<text> ...) - group named "text" which contains
.*? - any characters, as few as possible
(?<number> ...) - group named "number" which contains
[0-9]+ - digits (char in [0..9] range); "+" - at least one

You could try to use substring. Something like this:
var x = //get the text
var textToDisplay = x.Substring(0, x.IndexOf("b");
Label1.Text = textToDisplay;

Related

Why does my reg ex not capture 2nd and subsequent lines?

Update
I tried adding RegexOptions.Singleline to my regex options. It worked in that it captured the lines that weren't previously captured, but it put the entire text file into the first match instead of creating one match per date as desired.
End of Update
Update #2
Added new output showing matches and groups when using Poul Bak's modification. See screen shot below titled Output from Poul Bak's modification
End of Update #2
Final Update
Updating the target framework from 4.6.1 to 4.7.1 and tweaking Poul Bak's reg ex a little bit solved all problems. See Poul Bak's answer below
End of Final Update
Original Question: Background
I have the following text file test_text.txt:
2018-10-16 12:00:01 - Error 1<CR><LF>
Error 1 text line 1<CR><LF>
Error 1 text line 2<CR><LF>
2018-10-16 12:00:02 AM - Error 2<CR><LF>
Error 2 text line 1<CR><LF>
Error 2 text line 2<CR><LF>
Error 2 text line 3<CR><LF>
Error 2 text line 4<CR><LF>
2018-10-16 12:00:03 PM - Error 3
Objective
My objective is to have each match be comprised of 3 named groups: Date, Delim, and Text as shown below.
Note: apostrophes used only to denote limits of matched text.
Matches I expect to see:
Match 1: '2018-10-16 12:00:01 - Error 1<CR><LF>'
Date group = '2018-10-16 12:00:01'
Delim group = ' - '
Text group = 'Error 1<CR><LF>Error 1 text line 1<CR><LF>Error 1 text line 2<CR><LF>'
Match 2: '2018-10-16 12:00:02 AM - Error 2<CR><LF>'
Date group = '2018-10-16 12:00:02 AM'
Delim group = ' - '
Text group = 'Error 2 text line 1<CR><LF>Error 2 text line 2<CR><LF>Error 2 text line 3<CR><LF>Error 2 text line 4<CR><LF>'
Match 3: `2018-10-16 12:00:03 PM - Error 3`
Date group = '2018-10-16 12:00:03 PM'
Delim group = ' - '
Text group = 'Error 3'
The problem
My regex is not working in that 2nd and subsequent lines of text (e.g., 'Error 1 text line 1', 'Error 2 text line 1') are not being captured. I expect them to be captured because I'm using the Multiline option.
How do I modify my regex to capture 2nd and subsequent lines of text?
Current code
using System;
using System.Text.RegularExpressions;
namespace ConsoleApp_RegEx
{
class Program
{
static void Main(string[] args)
{
string text = System.IO.File.ReadAllText(#"C:\Users\bill\Desktop\test_text.txt");
string pattern = #"(?<Date>\d{4}-\d{2}-\d{2}\s{1}\d{2}:\d{2}:\d{2}.*)(?<Delim>\s-\s)(?<Text>.*\n|.*)";
RegexOptions regexOptions = (RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.CultureInvariant | RegexOptions.IgnorePatternWhitespace | RegexOptions.Compiled);
Regex rx = new Regex(pattern, regexOptions);
MatchCollection ms = rx.Matches(text);
// Find matches.
MatchCollection matches = rx.Matches(text);
Console.WriteLine("Input Text\n--------------------\n{0}\n--------------------\n", text);
// Report the number of matches found.
Console.WriteLine("Output ({0} matches found)\n--------------------\n", matches.Count);
int m = 1;
// Report on each match.
foreach (Match match in matches)
{
Console.WriteLine("Match #{0}: ", m++, match.Value);
int g = 1;
GroupCollection groups = match.Groups;
foreach (Group group in groups)
{
Console.WriteLine(" Group #{0} {1}", g++, group.Value);
}
Console.WriteLine();
}
Console.Read();
}
}
}
Current Output
Output from Poul Bak's modification (on the right track, but not quite there yet)
#"(?<Date>\d{4}-\d{2}-\d{2}\s{1}\d{2}:\d{2}:\d{2}(?:\s\w\w)?)(?<Delim>\s-\s)(?<Text>([\s\S](?!\d{4}))*)"
You can use the following regex, modified from yours:
#"(?<Date>\d{4}-\d{2}-\d{2}\s{1}\d{2}:\d{2}:\d{2}(?:\s\w\w)?)(?<Delim>\s-\s)(?<Text>([\s\S](?!\d{4}))*)"
I have changed the 'Date' Group so it accepts 'AM' or 'PM' (otherwise it will only match the first).
Then I have changed the 'Text' Group, so it matches any number of any char (including Newlines) until it looks forward and finds a new date.
Edit:
I don't understand it, when you say 'AM' and 'PM' are not matched, they are part of the 'Date' Group. I assume you want them to be part of the 'Delim' Group, so I have moved the check to that Group.
I have also changed a Group to a non capturing Group.
The new regex:
#"(?<Date>\d{4}-\d{2}-\d{2}\s{1}\d{2}:\d{2}:\d{2})(?<Delim>(?:\s\w\w)?\s-\s)(?<Text>(?:[\s\S](?!\d{4}))*)"
BTW: You should change your code for checking Groups, like this:
foreach (Group group in groups)
{
Console.WriteLine(" Group #{0} {1}", group.Name, group.Value);
}
Then you will see your named Groups by Name and Value. When you have named Groups, there's no need for accessing by index.
Edit 2:
About 'group.Name': I had mistakenly used 'Group' (capitalized), it should be: 'group.Name'.
This is what the regex look like now:
#"(?<Date>\d{4}-\d{2}-\d{2}\s{1}\d{2}:\d{2}:\d{2}(?:\s\w\w)?)(?<Delim>\s-\s)(?<Text>(?:[\s\S](?!\d{4}))*)"
I suggest you set the 'RegexOptions.ExplicitCapture' flag, then you only get named groups.

How to delete first text? (c#)

I want to code
var text = "14. hello my friends we meet 1 test, 2 baby 3 wiki 4 marvel";
string[] split = text.Split('14.', 1, 2, 3, 4);
var needText = split[0].Replace('14.', '');
"1" "2" "3" "4" is static text.
but, "14." is dynamic text.
ex)
var text2 = "1972. google youtube. 1 phone, 2 star 3 tv 4 mouse";
string[] split = text.Split('1972.', 1, 2, 3, 4);
var needText = split[0].Replace('1972.', '');
If you have dynamic separators like this, String.Split is not suitable. Use Regex.Split instead.
You can give a pattern to Regex.Split and it will treat every substring that matches the pattern as a separator.
In this case, you need a pattern like this:
\d+\. |1|2|3|4
| are or operators. \d matches any digit character. + means match between 1 to unlimited times. \. matches the dot literally because . has special meaning in regex.
Usage:
var split = Regex.Split(text, "\\d+\\. |1|2|3|4");
And I think the text you need is at index 1 of split.
Remember to add a using directive to System.Text.RegularExpressions!
If you use IndexOf() with Substring(), you can very easily grab the information you need. If it's any more complex than your examples then use Regex.
var text = "14. hello my friends we meet 1 test, 2 baby 3 wiki 4 marvel";
var strArr = text.Substring(text.IndexOf(' ')).Split('1', '2', '3', '4');

Regex masking of words that contain a digit

Trying to come up with a 'simple' regex to mask bits of text that look like they might contain account numbers.
In plain English:
any word containing a digit (or a train of such words) should be matched
leave the last 4 digits intact
replace all previous part of the matched string with four X's (xxxx)
So far
I'm using the following:
[\-0-9 ]+(?<m1>[\-0-9]{4})
replacing with
xxxx${m1}
But this misses on the last few samples below
sample data:
123456789
a123b456
a1234b5678
a1234 b5678
111 22 3333
this is a a1234 b5678 test string
Actual results
xxxx6789
a123b456
a1234b5678
a1234 b5678
xxxx3333
this is a a1234 b5678 test string
Expected results
xxxx6789
xxxxb456
xxxx5678
xxxx5678
xxxx3333
this is a xxxx5678 test string
Is such an arrangement possible with a regex replace?
I think I"m going to need some greediness and lookahead functionality, but I have zero experience in those areas.
This works for your example:
var result = Regex.Replace(
input,
#"(?<!\b\w*\d\w*)(?<m1>\s?\b\w*\d\w*)+",
m => "xxxx" + m.Value.Substring(Math.Max(0, m.Value.Length - 4)));
If you have a value like 111 2233 33, it will print xxxx3 33. If you want this to be free from spaces, you could turn the lambda into a multi-line statement that removes whitespace from the value.
To explain the regex pattern a bit, it's got a negative lookbehind, so it makes sure that the word behind it does not have a digit in it (with optional word characters around the digit). Then it's got the m1 portion, which looks for words with digits in them. The last four characters of this are grabbed via some C# code after the regex pattern resolves the rest.
I don't think that regex is the best way to solve this problem and that's why I am posting this answer. For so complex situations, building the corresponding regex is too difficult and, what is worse, its clarity and adaptability is much lower than a longer-code approach.
The code below these lines delivers the exact functionality you are after, it is clear enough and can be easily extended.
string input = "this is a a1234 b5678 test string";
string output = "";
string[] temp = input.Trim().Split(' ');
bool previousNum = false;
string tempOutput = "";
foreach (string word in temp)
{
if (word.ToCharArray().Where(x => char.IsDigit(x)).Count() > 0)
{
previousNum = true;
tempOutput = tempOutput + word;
}
else
{
if (previousNum)
{
if (tempOutput.Length >= 4) tempOutput = "xxxx" + tempOutput.Substring(tempOutput.Length - 4, 4);
output = output + " " + tempOutput;
previousNum = false;
}
output = output + " " + word;
}
}
if (previousNum)
{
if (tempOutput.Length >= 4) tempOutput = "xxxx" + tempOutput.Substring(tempOutput.Length - 4, 4);
output = output + " " + tempOutput;
previousNum = false;
}
Have you tried this:
.*(?<m1>[\d]{4})(?<m2>.*)
with replacement
xxxx${m1}${m2}
This produces
xxxx6789
xxxx5678
xxxx5678
xxxx3333
xxxx5678 test string
You are not going to get 'a123b456' to match ... until 'b' becomes a number. ;-)
Here is my really quick attempt:
(\s|^)([a-z]*\d+[a-z,0-9]+\s)+
This will select all of those test cases. Now as for C# code, you'll need to check each match to see if there is a space at the beginning or end of the match sequence (e.g., the last example will have the space before and after selected)
here is the C# code to do the replace:
var redacted = Regex.Replace(record, #"(\s|^)([a-z]*\d+[a-z,0-9]+\s)+",
match => "xxxx" /*new String("x",match.Value.Length - 4)*/ +
match.Value.Substring(Math.Max(0, match.Value.Length - 4)));

Regex for matching season and episode

I'm making small app for myself, and I want to find strings which match to a pattern but I could not find the right regular expression.
Stargate.SG-1.S01E08.iNT.DVDRip.XviD-LOCK.avi
That is expamle of string I have and I only want to know if it contains substring of S[NUMBER]E[NUMBER] with each number max 2 digits long.
Can you give me a clue?
Regex
Here is the regex using named groups:
S(?<season>\d{1,2})E(?<episode>\d{1,2})
Usage
Then, you can get named groups (season and episode) like this:
string sample = "Stargate.SG-1.S01E08.iNT.DVDRip.XviD-LOCK.avi";
Regex regex = new Regex(#"S(?<season>\d{1,2})E(?<episode>\d{1,2})");
Match match = regex.Match(sample);
if (match.Success)
{
string season = match.Groups["season"].Value;
string episode = match.Groups["episode"].Value;
Console.WriteLine("Season: " + season + ", Episode: " + episode);
}
else
{
Console.WriteLine("No match!");
}
Explanation of the regex
S // match 'S'
( // start of a capture group
?<season> // name of the capture group: season
\d{1,2} // match 1 to 2 digits
) // end of the capture group
E // match 'E'
( // start of a capture group
?<episode> // name of the capture group: episode
\d{1,2} // match 1 to 2 digits
) // end of the capture group
There's a great online test site here: http://gskinner.com/RegExr/
Using that, here's the regex you'd want:
S\d\dE\d\d
You can do lots of fancy tricks beyond that though!
Take a look at some of the media software like XBMC they all have pretty robust regex filters for tv shows
See here, here
The regex I would put for S[NUMBER1]E[NUMBER2] is
S(\d\d?)E(\d\d?) // (\d\d?) means one or two digit
You can get NUMBER1 by <matchresult>.group(1), NUMBER2 by <matchresult>.group(2).
I would like to propose a little more complex regex. I don't have ". : - _"
because i replace them with space
str_replace(
array('.', ':', '-', '_', '(', ')'), ' ',
This is the capture regex that splits title to title season and episode
(.*)\s(?:s?|se)(\d+)\s?(?:e|x|ep)\s?(\d+)
e.g. Da Vinci's Demons se02ep04 and variants
https://regex101.com/r/UKWzLr/3
The only case that i can't cover is to have interval between season and the number, because the letter s or se is becoming part if the title that does not work for me. Anyhow i haven't seen such a case, but still it is an issue.
Edit:
I managed to get around it with a second line
$title = $matches[1];
$title = preg_replace('/(\ss|\sse)$/i', '', $title);
This way i remove endings on ' s' and ' se' if name is part of series

How does this regex find triangular numbers?

Part of a series of educational regex articles, this is a gentle introduction to the concept of nested references.
The first few triangular numbers are:
1 = 1
3 = 1 + 2
6 = 1 + 2 + 3
10 = 1 + 2 + 3 + 4
15 = 1 + 2 + 3 + 4 + 5
There are many ways to check if a number is triangular. There's this interesting technique that uses regular expressions as follows:
Given n, we first create a string of length n filled with the same character
We then match this string against the pattern ^(\1.|^.)+$
n is triangular if and only if this pattern matches the string
Here are some snippets to show that this works in several languages:
PHP (on ideone.com)
$r = '/^(\1.|^.)+$/';
foreach (range(0,50) as $n) {
if (preg_match($r, str_repeat('o', $n))) {
print("$n ");
}
}
Java (on ideone.com)
for (int n = 0; n <= 50; n++) {
String s = new String(new char[n]);
if (s.matches("(\\1.|^.)+")) {
System.out.print(n + " ");
}
}
C# (on ideone.com)
Regex r = new Regex(#"^(\1.|^.)+$");
for (int n = 0; n <= 50; n++) {
if (r.IsMatch("".PadLeft(n))) {
Console.Write("{0} ", n);
}
}
So this regex seems to work, but can someone explain how?
Similar questions
How to determine if a number is a prime with regex?
Explanation
Here's a schematic breakdown of the pattern:
from beginning…
| …to end
| |
^(\1.|^.)+$
\______/|___match
group 1 one-or-more times
The (…) brackets define capturing group 1, and this group is matched repeatedly with +. This subpattern is anchored with ^ and $ to see if it can match the entire string.
Group 1 tries to match this|that alternates:
\1., that is, what group 1 matched (self reference!), plus one of "any" character,
or ^., that is, just "any" one character at the beginning
Note that in group 1, we have a reference to what group 1 matched! This is a nested/self reference, and is the main idea introduced in this example. Keep in mind that when a capturing group is repeated, generally it only keeps the last capture, so the self reference in this case essentially says:
"Try to match what I matched last time, plus one more. That's what I'll match this time."
Similar to a recursion, there has to be a "base case" with self references. At the first iteration of the +, group 1 had not captured anything yet (which is NOT the same as saying that it starts off with an empty string). Hence the second alternation is introduced, as a way to "initialize" group 1, which is that it's allowed to capture one character when it's at the beginning of the string.
So as it is repeated with +, group 1 first tries to match 1 character, then 2, then 3, then 4, etc. The sum of these numbers is a triangular number.
Further explorations
Note that for simplification, we used strings that consists of the same repeating character as our input. Now that we know how this pattern works, we can see that this pattern can also match strings like "1121231234", "aababc", etc.
Note also that if we find that n is a triangular number, i.e. n = 1 + 2 + … + k, the length of the string captured by group 1 at the end will be k.
Both of these points are shown in the following C# snippet (also seen on ideone.com):
Regex r = new Regex(#"^(\1.|^.)+$");
Console.WriteLine(r.IsMatch("aababc")); // True
Console.WriteLine(r.IsMatch("1121231234")); // True
Console.WriteLine(r.IsMatch("iLoveRegEx")); // False
for (int n = 0; n <= 50; n++) {
Match m = r.Match("".PadLeft(n));
if (m.Success) {
Console.WriteLine("{0} = sum(1..{1})", n, m.Groups[1].Length);
}
}
// 1 = sum(1..1)
// 3 = sum(1..2)
// 6 = sum(1..3)
// 10 = sum(1..4)
// 15 = sum(1..5)
// 21 = sum(1..6)
// 28 = sum(1..7)
// 36 = sum(1..8)
// 45 = sum(1..9)
Flavor notes
Not all flavors support nested references. Always familiarize yourself with the quirks of the flavor that you're working with (and consequently, it almost always helps to provide this information whenever you're asking regex-related questions).
In most flavors, the standard regex matching mechanism tries to see if a pattern can match any part of the input string (possibly, but not necessarily, the entire input). This means that you should remember to always anchor your pattern with ^ and $ whenever necessary.
Java is slightly different in that String.matches, Pattern.matches and Matcher.matches attempt to match a pattern against the entire input string. This is why the anchors can be omitted in the above snippet.
Note that in other contexts, you may need to use \A and \Z anchors instead. For example, in multiline mode, ^ and $ match the beginning and end of each line in the input.
One last thing is that in .NET regex, you CAN actually get all the intermediate captures made by a repeated capturing group. In most flavors, you can't: all intermediate captures are lost and you only get to keep the last.
Related questions
(Java) method matches not work well - with examples on how to do prefix/suffix/infix matching
Is there a regex flavor that allows me to count the number of repetitions matched by * and + (.NET!)
Bonus material: Using regex to find power of twos!!!
With very slight modification, you can use the same techniques presented here to find power of twos.
Here's the basic mathematical property that you want to take advantage of:
1 = 1
2 = (1) + 1
4 = (1+2) + 1
8 = (1+2+4) + 1
16 = (1+2+4+8) + 1
32 = (1+2+4+8+16) + 1
The solution is given below (but do try to solve it yourself first!!!!)
(see on ideone.com in PHP, Java, and C#):
^(\1\1|^.)*.$

Categories

Resources