C# Regex Replace - c#

I have a file that is supposed to be \n\n delimited, but of course its not. Some of the lines contains spaces after the \n\n. How do I find a remove all spaces after a \n\n that starts a new line but that is before any other character.
Sample:
\n\nData,Mo re,Data
\n\n Some,Li st,Of
\n\n\nOther,St uff
\n\n\n\n This is another
Desired Output
\n\nData,Mo re,Data
\n\nSome,Li st,Of
\n\nOther,St uff
\n\nThis is another
Regex is probably the answer, but I'm still learning regex. Here's more or less what I've come up with Regex.Replace(input,"^(\n{2,}\s*)", "\n\n") but it doesn't work.
Edit: I should note that I pre-convert from various different line break encodings to \n before this code is needed.

The backslash character needs escaping. Try:
Regex.Replace(input,"^(\n{2,}\\s*)", "\n\n")
Also, you should consider changing \\s* to \\s+ so you don't replace valid line starts unnecesarily.

string test = "\n\nData,Mo re,Data \r\n\n\n Some,Li st,Of \r\n\n\n\nOther,St uff \r\n\n\n\n\n This is another \r\n";
string pattern = "^\n{2,}\\s*";
string result = Regex.Replace(test, pattern, "\n\n", RegexOptions.Multiline);
First, you need the Multiline option. Second, how does your data really look? Notice that where you have a visible cr-lf, I put in \r\n. I did so because you said that it is a \n\n at the beginning of a line that delimits the data. The \n is confusing to work with, so be sure of your data.

Related

Replacing strings with changing data

I've been looking at encryption and long story short, I need to remove an xml tag from a string.
Each string needs a global rule for replacing the string, and I've looked at regex for this, but it doesn't make sense.
Here are some examples
<BitStrength>384</BitStrength>
<BitStrength>1024</BitStrength>
<BitStrength>12300</BitStrength>
I need to replace the whole string and the number inside as well, with nothing.
I've tried things like:
string.replace("<BitStrength>12300</BitStrength>","");
But the issue is the length and characters of the number, and a match is never found.
Has anyone got a solution? Maybe regex is the way to go?
PS. Preferably a solution in C#.
EDIT: I'm looking for a solution that replaces the whole string in not only this kind of example but strings in general.
<BitStrength>4633</BitStrength>
<BitStrength>336</BitStrength>
!!SomeConstantData!!5437!!EndConstant!!
I would like 2 eggs today.
I would like 17 eggs today.
I would like 258367 eggs today.
Now if I put string.replace("I would like ","").replace(" eggs today.") I would be left with the number 258367, because I didn't cover this in my statement. I'm looking for a solution to delete this data. It can be any value.
In my particular example I'm looking to replace <BitStrength>384</BitStrength> in <BitStrength>384</BitStrength><RSAKeyValue><Modulus>code</Modulus><Exponent>code</Exponent></RSAKeyValue>
The Issue I face is that the number between the bitstrength tags can be anything between 386 and 16384, and I need to remove the entire bitstrength string.
string input = "<BitStrength>384</BitStrength>";
string pattern = #"<BitStrength>\d*</BitStrength>";
string replacement = " ";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(input, replacement);
Console.WriteLine("Original String: {0}", input);
Console.WriteLine("Replacement String: {0}", result);
returns
Original String: <BitStrength>384</BitStrength>
Replacement String:
You don't show what you have tried, and I wonder if the failure is because you are not making the pattern string as a literal using the # before it?
This example works:
Regex.Replace(#"Blah<BitStrength>12300</BitStrength>Blah",
#"(\<BitStrength\>12300\</BitStrength\>)",
string.Empty)
and returns
BlahBlah
If the actual number does not matter use this pattern:
(\<BitStrength\>\d+\</BitStrength\>)
This works:
var source = string.Join(Environment.NewLine,
"<BitStrength>384</BitStrength>",
"<BitStrength>1024</BitStrength>",
"<BitStrength>12300</BitStrength>");
var result = source.Replace("<BitStrength>12300</BitStrength>", string.Empty);

Regular Expression to match a quoted string embedded in another quoted string

I have a data source that is comma-delimited, and quote-qualified. A CSV. However, the data source provider sometimes does some wonky things. I've compensated for all but one of them (we read in the file line-by-line, then write it back out after cleansing), and I'm looking to solve the last remaining problem when my regex-fu is pretty weak.
Matching a Quoted String inside of another Quoted String
So here is our example string...
"foobar", 356, "Lieu-dit "chez Métral", Chilly, FR", "-1,000.09", 467, "barfoo", 1,345,456,235,231, "935.18"
I am looking to match the substring "chez Métral", in order to replace it with the substring chez Métral. Ideally, in as few lines of code as possible. The final goal is to write the line back out (or return it as a method return value) with the replacement already done.
So our example string would end up as...
"foobar", 356, "Lieu-dit chez Métral, Chilly, FR", "-1,000.09", 467, "barfoo", 1,345,456,235,231, "935.18"
I know I could define a pattern such as (?<quotedstring>\"\w+[^,]+\") to match quoted strings, but my regex-fu is weak (database developer, almost never use C#), so I'm not sure how to match another quoted string within the named group quotedstring.
FYI: For those noticing the large integer that is formatted with commas but not quote-qualified, that's already handled. As is the random use of row-delimiters (sometimes CR, sometimes LF). As other problems...
Replace with this regex
(?<!,\s*|^)"([^",]*)"
now replace it with $1
try it here
escaping " with "" it would become
(?<!,\s*|^)""([^"",]*)""

Splitting a string in C#

Let's say I have this string:
"param1,r:1234,p:myparameters=1,2,3"
...and I would like to split it into:
param1
r:1234
p:myparameters=1,2,3
I've used the split function and of course it splits it at every comma. Is there a way to do this using regex or will I have to write my own split function?
Personally, I would try something like this:
,(?=[^,]+:.*?)
Basically, use a positive look-ahead to find a comma, followed by a "key-value" pair (this defined by a key, a colon, and more information [data] (including other commas). This should disqualify the commas between the numbers, too.
You can use ; for separating values which makes easy to work with it.
Since you have , for separation and also for values it is difficult to split it.
You have
string str = "param1,r:1234,p:myparameters=1,2,3"
Recommended to use
string str = "param1;r:1234;p:myparameters=1,2,3"
which can be splited as
var strArray = str.Split(';');
strArray[0]; // contains param1
strArray[1]; // r:1234
strArray[2]; // p:myparameters=1,2,3
I'm not sure how you would write a split that knew which commas to split on there, honestly.
Unless it's a fixed number each time in which case, just use the String.Split overload that takes an int specifying how many substrings to return at max
If you're going to have comma-delimited data that's not always a fixed number of items and it could have literal commas in the data itself, they really should be quoted. If you can control the input in any way, you should encourage that, and use an actual CSV parser instead of String.Split
That depends. You can't parse it with regex (or anything else) unless you can identify a consistent rule separating one group from another. Based on your sample, I can't clearly identify such a rule (though I have some guesses). How does the system know that p:myparameters=1,2,3 is a single item? For example, if there were another item after it, what would be the difference between that and the 1,2,3? Figure that out and you'll be pretty close to a solution.
If you're able to change the format of the input string, why not decide on a consistent delimiter between your groups? ; would be a good choice. Use an input like param1;r:1234;p:myparameters=1,2,3 and there will be no ambiguity where the groups are, plus you can just split on ; and you won't need regex.
The simplest approach would be changing your delimiter from "," to something like "|". Then you can split on "|" no problem. However if you can't change the delimiting character then maybe you could encode the sections in a fashion similar to CSV.
CSV files have the same issue... the standard there is to put double quotes "" around columns.
For example, your string would be "param1","r:1234","p:myparameters=1,2,3".
Then you could use the Microsoft.VisualBasic.FileIO.TextFieldParser to split/parse. You can include this in c# even though its in the VisualBasic namespace.
TextFieldParser
Do you mean that:string[] str = System.Text.RegularExpression.Regex.Spilt("param1,r:1234,p:myparameters=1,2,3",#"\,");

C# Reading a file and writing out replacing string

What I have is a C# windows app that reads a bunch of SQL tables and creates a bunch of queries based on the results. What I'm having a small issue with is the final "," on my query
This is what I have
ColumnX,
from
I need to read the entire file, write out exactly what is in the file and just replace the last , before the from with nothing.
I tried .replace(#",\n\nfrom),(#"\n\nfrom) but it's not finding it. Any help is appreciated.
Example:
ColumnX,
from
Result:
ColumnX
from
The line break is most likely the two character combination CR + LF:
.replace(",\r\n\r\nfrom","\r\n\r\nfrom")
If you want the line break for the current system, you can use the Environment.NewLine constant:
.replace(","+Environment.NewLine+Environment.NewLine+"from",Environment.NewLine+Environment.NewLine+"from")
Note that the # in front of a string means that it doesn't use backslash escape sequences, but on the other hand it can contain line breaks, so you could write it in this somewhat confusing way:
str = str.replace(#",
from", #"
from");
There are two solutions that you can try:
Remove the # symbol, as that means it's going to look for the literal characters of \n rather than a newline.
Try .replace("," + Environment.NewLine + Environment.NewLine + from, Environment.NewLine + Environment.NewLine + "from)
Instead of replacing or removing the comma when you read the file, it would probably be preferable to remove it before the file is written. That way you only have to bother with the logic once. As you are building your column list, just remove the last comma after the list is created. Hopefully you are in a position where you have control over that process.
If you can assume you always want to remove the last occurrence of the comma you can use the string function LastIndexOf to find the index for the last comma and use Remove from there.
myString = myString.Remove(myString.LastIndexOf(","), 1);
What about using Regex? Does that handle different forms of linefeed better?
var result = Regex.Replace(input, #",(\n*)from", "$1from");

.NET string IndexOf unexpected result

A string variable str contains the following somewhere inside it: se\">
I'm trying to find the beginning of it using:
str.IndexOf("se\\\">")
which returns -1
Why isn't it finding the substring?
Note: due to editing the snippet showed 5x \ for a while, the original had 3 in a row.
Your code is in fact searching for 'se\\">'. When searching for strings including backslashes I usually find it easier to use verbatim strings:
str.IndexOf(#"se\"">")
In this case you also have a quote in the search string, so there is still some escaping, but I personally find it easier to read.
Update: my answer was based on the edit that introduced extra slashes in the parameter to the IndexOf call. Based on current version, I would place my bet on str simply not containing the expected character sequence.
Update 2:
Based on the comments on this answer, it seems to be some confusion regarding the role of the '\' character in the strings. When you inspect a string in the Visual Studio debugger, it will be displayed with escaping characters.
So, if you have a text box and type 'c:\' in it, inspecting the Text property in the debugger will show 'c:\\'. An extra backslash is added for escaping purposes. The actual string content is still 'c:\' (which can be verified by checking the Length property of the string; it will be 3, not 4).
If we take the following string (taken from the comment below)
" '<em
class=\"correct_response\">a
night light</em><br
/><br /><table
width=\"100%\"><tr><td
class=\"right\">Ingrid</td></tr></table>')"
...the \" sequences are simply escaped quotation marks; the backslashes are not part of the string content. So, you are in fact looking for 'se">', not 'se\">'. Either of these will work:
str.IndexOf(#"se"">"); // verbatim string; escape quotation mark by doubling it
str.IndexOf("se\">"); // regular string; escape quotation mark using backslash
This works:
string str = "<case\\\">";
int i = str.IndexOf("se\\\">"); // i = 3
Maybe you're not correctly escaping one of the two strings?
EDIT there's an extra couple of \ in the string you are searching for.
Maybe the str variable does not actually contain the backslash.
It may be just that when you mouse over the variable while debugging, the debugger tooltip will show the escape character.
e.g. If you put a breakpoint after this assignment
string str = "123\"456";
the tooltip will show 123\"456 and not 123"456.
However if you click on the visualize icon, you will get the correct string 123"456
Following code:
public static void RunSnippet()
{
string s = File.ReadAllText (#"D:\txt.txt");
Console.WriteLine (s);
int i = s.IndexOf("se\\\">");
Console.WriteLine (i);
}
Gives following output:
some text before se\"> some text after
17
Seems like working to me...
TextBox2.Text = TextBox1.Text.IndexOf("se\"">")
seems to work in VB.
DoubleQuotes within a string need to be specified like "" Also consider using verbatim strings - So an example would be
var source = #"abdefghise\"">jklmon";
Console.WriteLine(source.IndexOf(#"se\"">")); // returns 8
If you are looking for se\">
then
str.IndexOf(#"se\"">")
is less error-prone. Note the double "" and single \
Edit, after the comment: it seems like the string may contain ecaping itself, in which case in se\"> the \" was an escaped quote, so the literal text is simply se"> and the string to use is Indexof("se\">")

Categories

Resources