Removing part of json string with using REGEX

Removing part of json string with using REGEX - c#

My json string structure as below
...
}],"twitter":[{"id": .... blaa"}]}
...
I am trying to remove this part as below
Regex.Replace(_VarJson, string.Format("{0}.*?{1}", "\"twitter\":[{", "\"}]"), string.Empty)
But nothing removes. Where is my wrong?
Thank you in advance

In your regex pattern [{}] symbols should be escaped with \ symbol since they are reserved regex symbols ([] stands for charactrers group and {} stands for repetitions count).
So your replacement could be done as
_VarJson = Regex.Replace(_VarJson,
string.Format("{0}.*?{1}",
"\"twitter\":\\[\\{", "\"\\}\\]"),
string.Empty);
But I strongly agreed with opinion of #CommuSoft posted in comments - it's better to use some JSON library to parse your source JSON, then remove all you need from object model and write back JSON as text if needed.

Related

How do I see if a string contains another string with quotes in it?

I am trying to see if a large string contains this line of HTML:
<label ng-class="choiceCaptionClass" class="ng-binding choice-caption">Was this information helpful?</label>
As you can see, this snippet has quotations in multiple places and it's causing problems when I do something like this:
Assert.IsTrue(responseContent.Contains("<label ng-class="choiceCaptionClass" class="ng - binding choice - caption">Was this information helpful?</label>"));
I've tried both of these ways of defining the string:
#"<label ng-class=""choiceCaptionClass"" class=""ng - binding choice - caption"">Was this information helpful?</label>"
and
"<label ng-class=\"choiceCaptionClass\" class=\"ng - binding choice - caption\">Was this information helpful?</label>"
But in each case the Contains() method looks for the literal string with either the double quotes or the backslashes. Is there another way I could define this string so I can correctly search for it?

Escaping the double-quotes with backslashes is the proper thing to do.
The reason your search may be failing is that the strings don't actually match. For example, in your version with backslashes, you have spaces around some of the dashes but your HTML string does not.

Try using regular expressions. I made this one for you but you can test your own regex here.
var regex = new Regex(#"<label\s+ng-class\s*=\s*""choiceCaptionClass""\s+class\s*=\s*""ng-binding choice-caption""\s*>\s*Was this information helpful\?\s*</label>", RegexOptions.IgnoreCase);
Assert.IsTrue(regex.IsMatch(responseContent));
If this is not working use the tester tool to figure it out what part of the pattern is getting off.
Hope this help!

c# regex I cannot remove two lines from text because of the spacing

I am trying to remove one of the lines from an xml but I cannot remove it because of the spacing, I guess. Can anybody help me with REGEX? I am not very expert on it.
here is my xml lines that I want to remove...
<otv_ek44_Bildirimi>
<otv_ek44_Bildirimi>
I want to remove one these two lines from xml regardless of any spacing before,middle or after. How can I do that?
here is my poor code.
string s2 = #" <otv_ek44_Bildirimi>
<otv_ek44_Bildirimi>";
fileContents = Regex.Replace(fileContents, s2, "");

If you really want to use Regex, try replacing the spaces in your s2 regular expression with \s+ (=match for more than one whitespace character - space, tab, etc.).
string s2 = #"\s+<otv_ek44_Bildirimi>
\s+<otv_ek44_Bildirimi>";
I would strongly suggest using string.Replace(old, new) in this case.
Furthermore, I suggest to not modify XML or any structured data with string manipulation or Regex. You could use an XML parser, or use CsQuery to run jQuery(CSS)-like queries on your XML and manipulate it that way.

Strip out content between and including h2 tag

I am trying to strip the content from between the h2 tags in a string using a Regex in C#:
<h2>content needs removing</h2> other content...
I have the following Regex, which according to the Regex buddy software I used to test it, should work, but it doesn't:
myString = Regex.Replace(myString, #"<h[0-9]>.*</h[0-9]>", String.Empty);
I have another Regex that is run after this to remove all other HTML tags, it is called in the same way and works fine. Can anyone help me out with why this isn't working?

Don't use Regular Expressions.
HTML is not a Regular Language, thus it can't be parsed correctly with a Regular Expression.
For example, your Regex would match:
<h2>sample</h1>
which is not valid. When dealing with nested structures, this would lead to unexpected results (.* is greedy and matches everything until the last closing h[0-9] tag in your input HTML string)
You can use XMLDocument (HTML is not XML but that would be sufficient for what you're trying to do) or you can use Html Agility Pack.

try this code :
String sourcestring = "<h2>content needs removing</h2> other content...";
String matchpattern = #"\s?<h[0-9]>[^<]+</h[0-9]>\s?";
String replacementpattern = #"";
MessageBox.Show(Regex.Replace(sourcestring,matchpattern,replacementpattern));
[^<]+ is more safer than .+ because it stops collecting where it sees a <.

This works fine for me:
string myString = "<h2>content needs removing</h2> other content...";
Console.WriteLine(myString);
myString = Regex.Replace(myString, "<h[0-9]>.*</h[0-9]>", string.Empty);
Console.WriteLine(myString);
Displays:
<h2>content needs removing</h2> other content...
other content...
As expected.
If you problem is that your real case has several different heading tags, then you have an issue with the greedy * quantifier. It will create the longest match that it can. For example, if you have:
<h2>content needs removing</h2> other content...<h3>some more headings</h3> and some other stuff
You will match everything from <h2> to </h3> and replace it. To fix this, you need to use a lazy quantifier:
myString = Regex.Replace(myString, "<h[0-9]>.*?</h[0-9]>", string.Empty);
Will leave you with:
other content... and some other stuff
Note however, that this will not fix nested <h> tags. As #fardjad said, using Regex for HTML isn't generally a good idea.

Regex for a string

It would be great if someone could provide me the Regular expression for the following string.
Sample 1: <div>abc</div><br>
Sample 2: <div>abc</div></div></div></div></div><br>
As you can see in the samples provided above, I need to match the string no matter how many number of </div> occurs. If there occurs any other string between </div> and <br>, say like this <div>abc</div></div></div>DEF</div></div><br> OR <div>abc</div></div></div></div></div>DEF<br>, then the Regex should not match.
Thanks in advance.

Try this:
<div>([^<]+)(?:<\/div>)*<br>
As seen on rubular
Notes:
This only works if there are not tags in the abc part (or anything that has a < symbol).
You might want to use start and end of string anchors (^<div>([^<]+)(?:<\/div>)*<br>$ if you want your string to match the pattern exactly.
If you want to allow the abc part to be empty, use * instead of +
That being said, you should be wary of using regex to parse HTML.
In this example, you can use regex because you are parsing a (hopefully) known, regular subset of HTML. But a more robust solution (ie: an [X]HTML parser like HtmlAgilityPack) is preferred when it comes to parsing HTML.

You need to use a real parser. Things like infinitely nested tags can't be handled via regex.

You could also include a named group in the the expression, e.g.:
<div>(?<text>[^<]*)(?:<\/div>)*<br>
Implemented in C#:
var regex = new Regex(#"<div>(?<text>[^<]*)(?:<\/div>)*<br>");
Func<Match, string> getGroupText = m => (m.Success && m.Groups["text"] != null) ? m.Groups["text"].Value : null;
Func<string, string> getText = s => getGroupText(regex.Match(s));
Console.WriteLine(getText("<div>abc</div><br>"));
Console.WriteLine(getText("<div>123</div></div></div></div></div><br>"));

NullUserException's answer is good. Here are a couple of questions, and variations, depending on what you want.
Do you want to prevent anything from occurring before the open div tag? If so, keep the ^ at the beginning of the regex. If not, drop it.
The rest of this post refers to the following section of the regex:
([^<]+?)
Do you want to capture the contents of the div, or just know that it matches your form? To capture, leave it as is. If you don't need to capture, drop the parentheses from the above.
Do you want to match if there is nothing inside the div? If so change the + in the above to *
Finally, although it will work fine, you don't need the ? in the above.

I think, this regex is more flexible:
<div\b[^><]*+>(?>.*?</div>)(?:\s*+</div>)*+\s*+<br(?:\s*+/)?>
I don't include the ^ and $ in the beginning and the end of my regex because we cannot assure that your sample will always in a single line.

Best way to escape javascript string? (json?)

Using C# .net I am parsing some data with some partial html/javascript inside it (i dont know who made that decision) and i need to pull a link. The link looks like this
http:\/\/fc0.site.net\/fs50\/i\/2009\/name.jpg
It came from this which i assume is javascript and looks like json
"name":{"id":"589","src":"http:\/\/fc0.site.net\/fs50\/i\/2009\/name.jpg"}
But anyways how should i escape the first link so i get
http://fc0.site.net/fs50/i/2009/name.jpg
In this case i could just replace '\' with '' since links dont contain \ nor " so i could do that but i am a fan of knowing the right solution and doing things properly. So how might i escape this. After looking at that link for a minute i thought is that valid? does java script or json escape / with \? It doesnt seem like it should?

In your case:
"name":{"id":"589","src":"http://fc0.site.net/fs50/i/2009/name.jpg"}
"/" is a valid escape sequence. However, it is not required that / be escaped. You may escape it if you need to. The reason JSON explicitly allows escaping of slash is because HTML does not allow a string in a to contain "...
Update:
Check out this post

Odd, it doesn’t look like any JavaScript/JSON escaping you’d expect. You can have forward slashes in JavaScript strings just fine.

Why dont you try a regex on the escaped slashes to replace them in the C# code...
String url = #"http:\/\/fc0.site.net\/fs50\/i\/2009\/name.jpg";
String pattern = #"\\/";
String cleanUrl = Regex.Replace(url, pattern, "/");
Hope it helps!

Actually you want to unescape the string. Answered in this question.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Removing part of json string with using REGEX - c#

My json string structure as below ... }],"twitter":[{"id": .... blaa"}]} ... I am trying to remove this part as below Regex.Replace(_VarJson, string.Format("{0}.*?{1}", "\"twitter\":[{", "\"}]"), string.Empty) But nothing removes. Where is my wrong? Thank you in advance

Related

How do I see if a string contains another string with quotes in it?

c# regex I cannot remove two lines from text because of the spacing

Strip out content between and including h2 tag

Regex for a string

Best way to escape javascript string? (json?)

Categories

Resources