String custom formatter .NET - c#

I have one trouble with custom formatting string.
I want to make smth like that:
var str = "SOME_ORIGINAL_FIELD_NAME";
var format1 = "XX_X_X";
var format2 = "X_XXX";
var strFormat1 = String.Format(str, format1); // SOMEORIGINAL_FIELD_NAME
var strFormat2 = String.Format(str, format2); // SOME_ORIGINALFIELDNAME
Does anybody know the right direction for searching? Maybe I should look at IFormatProvider and ICustomFormatter side.

Sure, you just have to:
split the source string into its components,
use {i} placeholders instead of X, and
reverse the parameters to String.Format (format is first, data follows).
Example code (fiddle):
var components = "SOME_ORIGINAL_FIELD_NAME".Split('_');
var format1 = "{0}{1}_{2}_{3}";
var format2 = "{0}_{1}{2}{3}";
var strFormat1 = String.Format(format1, components); // SOMEORIGINAL_FIELD_NAME
var strFormat2 = String.Format(format2, components); // SOME_ORIGINALFIELDNAME

Replace the X's in the format string with successive placeholders, and split the input string value into a string array, then apply string.Format():
public static string FormatSplitAndJoin(string input, string formatTemplate, string delimiter = "_", string placeholder = "X")
{
// split "a_b_c" into ["a", "b", "c"]
var parts = input.Split(delimiter);
// turn "X_X_X" into "{0}_{1}_{2}"
var index = 0;
var formatString = Regex.Replace(formatTemplate, placeholder, m => string.Format("{{{0}}}", index++));
// validate input length
if(index > parts.Length)
throw new ArgumentException(string.Format("input string resulted in fewer arguments than expected, {0} placeholders found", index));
// apply string.Format()
return string.Format(formatString, parts);
}
Now you can do:
var str = "SOME_ORIGINAL_FIELD_NAME";
var format1 = "XX_X_X";
var format2 = "X_XXX";
var strFormat1 = FormatSplitAndJoin(str, format1); // SOMEORIGINAL_FIELD_NAME
var strFormat2 = FormatSplitAndJoin(str, format2); // SOME_ORIGINALFIELDNAME

Related

How to perform multiple Regex replacements in sequence from a list of unique items cleanly in C#

I'm trying to find a cleaner way of performing multiple sequential replacements on a single string where each replacement has a unique pattern and string replacement.
For example, if I have 3 pairs of patterns-substitutions strings:
1. /(?<!\\)\\n/, "\n"
2. /(\\)(?=[\;\:\,])/, ""
3. /(\\{2})/, "\\"
I want to apply regex replacement 1 on the original string, then apply 2 on the output of 1, and so on and so forth.
The following console program example does exactly what I want, but it has a lot of repetition, I am looking for a cleaner way to do the same thing.
SanitizeString
static public string SanitizeString(string param)
{
string retval = param;
//first replacement
Regex SanitizePattern = new Regex(#"([\\\;\:\,])");
retval = SanitizePattern.Replace(retval, #"\$1");
//second replacement
SanitizePattern = new Regex(#"\r\n?|\n");
retval = SanitizePattern.Replace(retval, #"\n");
return retval;
}
ParseCommands
static public string ParseCommands(string param)
{
string retval = param;
//first replacement
Regex SanitizePattern = new Regex(#"(?<!\\)\\n");
retval = SanitizePattern.Replace(retval, System.Environment.NewLine);
//second replacement
SanitizePattern = new Regex(#"(\\)(?=[\;\:\,])");
retval = SanitizePattern.Replace(retval, "");
//third replacement
SanitizePattern = new Regex(#"(\\{2})");
retval = SanitizePattern.Replace(retval, #"\");
return retval;
}
Main
using System;
using System.IO;
using System.Text.RegularExpressions;
...
static void Main(string[] args)
{
//read text that contains user input
string sampleText = File.ReadAllText(#"c:\sample.txt");
//sanitize input with certain rules
sampleText = SanitizeString(sampleText);
File.WriteAllText(#"c:\sanitized.txt", sampleText);
//parses escaped characters back into the original text
sampleText = ParseCommands(sampleText);
File.WriteAllText(#"c:\parsed_back.txt", sampleText);
}
Don't mind the file operations. I just used that as a quick way to visualize the actual output. In my program I'm going to use something different.
Here's one way:
var replacements = new List<(Regex regex, string replacement)>()
{
(new Regex(#"(?<!\\)\\n"), System.Environment.NewLine),
(new Regex(#"(\\)(?=[\;\:\,])"), ""),
(new Regex(#"(\\{2})"), #"\"),
};
(Ideally cache that in a static readonly field):
Then:
string retval = param;
foreach (var (regex, replacement) in replacements)
{
retval = regex.Replace(retval, replacement);
}
Or you could go down the linq route:
string retval = replacements
.Aggregate(param, (str, x) => x.regex.Replace(str, x.replacement));

Objects to be comma separated and with double quote

I have an object array:
object[] keys
I need to transform this array into a string which is comma separated and I did it by doing this:
var newKeys = string.Join(",", keys);
My problem here is I want this values to be double quoted.
ex:
"value1","value2","value3"
var new= "\"" + string.Join( "\",\"", keys) + "\"";
To include a double quote in a string, you escape it with a backslash character, thus "\"" is a string consisting of a single double quote character, and "\", \"" is a string containing a double quote, a comma, a space, and another double quote.
Please give a try to this.
var keys = new object[] { "test1", "hello", "world", null, "", "oops"};
var csv = string.Join(",", keys.Select(k => string.Format("\"{0}\"", k)));
Because you have an object[] array, string.Format can deal with null as well as other types than strings. This solutions also works in .NET 3.5.
When the object[] array is empty, then a empty string is returned.
If performance is the key, you can always use a StringBuilder to concatenate everything.
Here's a fiddle to see it in action, but the main part can be summarized as:
// these look like snails, but they are actually pretty fast
using #_____ = System.Collections.Generic.IEnumerable<object>;
using #______ = System.Func<object, object>;
using #_______ = System.Text.StringBuilder;
public static string GetCsv(object[] input)
{
// use a string builder to make things faster
var #__ = new StringBuilder();
// the rest should be self-explanatory
Func<#_____, #______, #_____>
#____ = (_6,
_2) => _6.Select(_2);
Func<#_____, object> #_3 = _6
=> _6.FirstOrDefault();
Func<#_____, #_____> #_4 = _8
=> _8.Skip(input.Length - 1);
Action<#_______, object> #_ = (_9,
_2) => _9.Append(_2);
Action<#_______>
#___ = _7 =>
{ if (_7.Length > 0) #_(
#__, ",");
}; var #snail =
#____(input, (#_0 =>
{ #___(#__); #_(#__, #"""");
#_(#__, #_0); #_(#__, #"""");
return #__; }));
var #linq = #_4(#snail);
var #void = #_3(#linq);
// get the result
return #__.ToString();
}

Regex without escaping Characters - Problems

I found some solutions for my problem, which is quite simple:
I have a string, which is looking like this:
"\r\nContent-Disposition: form-data; name=\"ctl00$cphMainContent$grid$ctl03$ucPicture$ctl00\""
My goal is to break it down, so I have a Dictionary of values, like:
Key = "name", value ? "ctl..."
My approach was: Split it by "\r\n" and then by the equal or the colon sign.
This worked fine, but then some funny Tester uploaded a file with all allowed charactes, which made the String looking like this:
"\r\nContent-Disposition: form-data; name=\"ctl00_cphMainContent_grid_ctl03_ucPicture_btnUpload$fileUpload\"; filename=\"C:\\Users\\matthias.mueller\\Desktop\\- ie+![]{}_-´;,.$¨##ç %&()=~^`'.jpg\"\r\nContent-Type: image/jpeg"
Of course, the simple splitting doesn't work anymore, since it splits now the filename.
I corrected this by reading out "filename=" and escaping the signs I'm looking to split, and then creating a regex.
Now comes my problem: I found two Regex-samples, which could do the work for the equal sign, the semicolon and the colon. one is:
[^\\]=
The other one I found was:
(?<!\\\\)=
The problem is, the first one doesn't only split, but it splits the equal sign and one character before this sign, which means my key in the Dictionary is "nam" instead of "name"
The second one works fine on this matter, but it still splits the escaped equal sign in the filename.
Is my approach for this problem even working? Would there be a better solution for this? And why is the first Regex cutting a character?
Edit: To avoid confusion, my escaped String looks like this:
"Content-Disposition: form-data; name=\"ctl00_cphMainContent_grid_ctl03_ucPicture_btnUpload$fileUpload\"; filename=\"C\:\Users\matthias.mueller\Desktop\- ie+![]{}_-´\;,.$¨##ç %&()\=~^`'.jpg\""
So I want basically: Split by equal Sign EXCEPT the escaped ones. By the way: The string here shows only one \, but there are 2.
Edit 2: OK seems like I have a working solution, but it's so ugly:
Dictionary<string, string> ParseHeader(byte[] bytes, int pos)
{
Dictionary<string, string> items;
string header;
string[] headerLines;
int start;
int end;
string input = _encoding.GetString(bytes, pos, bytes.Length - pos);
start = input.IndexOf("\r\n", 0);
if (start < 0) return null;
end = input.IndexOf("\r\n\r\n", start);
if (end < 0) return null;
WriteBytes(false, bytes, pos, end + 4 - 0); // Write the header to the form content
header = input.Substring(start, end - start);
items = new Dictionary<string, string>();
headerLines = Regex.Split(header, "\r\n");
Regex regLineParts = new Regex(#"(?<!\\\\);");
Regex regColon = new Regex(#"(?<!\\\\):");
Regex regEqualSign = new Regex(#"(?<!\\\\)=");
foreach (string hl in headerLines)
{
string workString = hl;
//Escape the Semicolon in filename
if (hl.Contains("filename"))
{
String orig = hl.Substring(hl.IndexOf("filename=\"") + 10);
orig = orig.Substring(0, orig.IndexOf('"'));
string toReplace = orig;
toReplace = toReplace.Replace(toReplace, toReplace.Replace(";", #"\\;"));
toReplace = toReplace.Replace(toReplace, toReplace.Replace(":", #"\\:"));
toReplace = toReplace.Replace(toReplace, toReplace.Replace("=", #"\\="));
workString = hl.Replace(orig, toReplace);
}
string[] lineParts = regLineParts.Split(workString);
for (int i = 0; i < lineParts.Length; i++)
{
string[] p;
if (i == 0)
p = regColon.Split(lineParts[i]);
else
p = regEqualSign.Split(lineParts[i]);
if (p.Length == 2)
{
string orig = p[0];
orig = orig.Replace(#"\\;", ";");
orig = orig.Replace(#"\\:", ":");
orig = orig.Replace(#"\\=", "=");
p[0] = orig;
orig = p[1];
orig = orig.Replace(#"\\;", ";");
orig = orig.Replace(#"\\:", ":");
orig = orig.Replace(#"\\=", "=");
p[1] = orig;
items.Add(p[0].Trim(), p[1].Trim());
}
}
}
return items;
}
Needs some further testing.
I had a go at writing a parser for you. It handles literal strings, like "here is a string", as the values in name-value pairs. I've also written a few tests, and the last shows an '=' character inside a literal string. It also handles escaping quotes (") inside literal strings by escaping as \" -- I'm not sure if this is right, but you could change it.
A quick explanation. I first find anything that looks like a literal string and replace it with a value like PLACEHOLDER8230498234098230498. This means the whole thing is now literal name-value pairs; eg
key="value"
becomes
key=PLACEHOLDER8230498234098230498
The original string value is stored off in the literalStrings dictionary for later.
So now we split on semicolons (to get key=value strings) and then on equals, to get the proper key/value pairs.
Then I substitute the placeholder values back in before returning the result.
public class HttpHeaderParser
{
public NameValueCollection Parse(string header)
{
var result = new NameValueCollection();
// 'register' any string values;
var stringLiteralRx = new Regex(#"""(?<content>(\\""|[^\""])+?)""", RegexOptions.IgnorePatternWhitespace);
var equalsRx = new Regex("=", RegexOptions.IgnorePatternWhitespace);
var semiRx = new Regex(";", RegexOptions.IgnorePatternWhitespace);
Dictionary<string, string> literalStrings = new Dictionary<string, string>();
var cleanedHeader = stringLiteralRx.Replace(header, m =>
{
var replacement = "PLACEHOLDER" + Guid.NewGuid().ToString("N");
var stringLiteral = m.Groups["content"].Value.Replace("\\\"", "\"");
literalStrings.Add(replacement, stringLiteral);
return replacement;
});
// now it's safe to split on semicolons to get name-value pairs
var nameValuePairs = semiRx.Split(cleanedHeader);
foreach(var nameValuePair in nameValuePairs)
{
var nameAndValuePieces = equalsRx.Split(nameValuePair);
var name = nameAndValuePieces[0].Trim();
var value = nameAndValuePieces[1];
string replacementValue;
if (literalStrings.TryGetValue(value, out replacementValue))
{
value = replacementValue;
}
result.Add(name, value);
}
return result;
}
}
There's every chance there are some proper bugs in it.
Here's some unit tests you should incorporate, too;
[TestMethod]
public void TestMethod1()
{
var tests = new[] {
new { input=#"foo=bar; baz=quux", expected = #"foo|bar^baz|quux"},
new { input=#"foo=bar;baz=""quux""", expected = #"foo|bar^baz|quux"},
new { input=#"foo=""bar"";baz=""quux""", expected = #"foo|bar^baz|quux"},
new { input=#"foo=""b,a,r"";baz=""quux""", expected = #"foo|b,a,r^baz|quux"},
new { input=#"foo=""b;r"";baz=""quux""", expected = #"foo|b;r^baz|quux"},
new { input=#"foo=""b\""r"";baz=""quux""", expected = #"foo|b""r^baz|quux"},
new { input=#"foo=""b=r"";baz=""quux""", expected = #"foo|b=r^baz|quux"},
};
var parser = new HttpHeaderParser();
foreach(var test in tests)
{
var actual = parser.Parse(test.input);
var actualAsString = String.Join("^", actual.Keys.Cast<string>().Select(k => string.Format("{0}|{1}", k, actual[k])));
Assert.AreEqual(test.expected, actualAsString);
}
}
Looks to me like you'll need a bit more of a solid parser for this than a regex split. According to this page the name/value pairs can either be 'raw';
x=1
or quoted;
x="foo bar baz"
So you'll need to look for a solution that not only splits on the equals, but ignores any equals inside;
x="y=z"
It might be that there is a better or more managed way for you to access this info. If you are using a classic ASP.NET WebForms FileUpload control, you can access the filename using the properties of the control, like
FileUpload1.HasFile
FileUpload1.FileName
If you're using MVC, you can use the HttpPostedFileBase class as a parameter to the action method. See this answer
[HttpPost]
public ActionResult Index(HttpPostedFileBase file)
{
// Verify that the user selected a file
if (file != null && file.ContentLength > 0)
{
// extract only the fielname
var fileName = Path.GetFileName(file.FileName);
// store the file inside ~/App_Data/uploads folder
var path = Path.Combine(Server.MapPath("~/App_Data/uploads"), fileName);
file.SaveAs(path);
}
// redirect back to the index action to show the form once again
return RedirectToAction("Index");
}
This:
(?<!\\\\)=
matches = not preceded by \\.
It should be:
(?<!\\)=
(Make sure you use # (verbatim) strings for the regex, to avoid confusion)

Remove specific symbol from string

I have different string that starts and ends with { } like so {somestring}. I want to remove the delimiters from the string so that it shows somestring only. I can't do anything that counts the letters because I don't always know the length of the string.
Maybe this will help. Here is the code, somewhere here I want to delete the delimiters.
private static MvcHtmlString RenderDropDownList(FieldModel model)
{
ISerializer serializer = new SerializeJSon();
var value = "";
var tb1 = new TagBuilder("select");
tb1.MergeAttribute("id", model.QuestionId);
tb1.MergeAttribute("name", model.QuestionId);
tb1.MergeAttributes(GetHtmlAttributes(model.HtmlAttributes));
tb1.AddCssClass("form-field");
var sb = new StringBuilder();
MatchCollection matches = RegexHelper.GetBetweenDelimiter(model.FieldValues, "{", "}");
foreach (Match match in matches)
{
var o = match; //Solution var o = match.toString();
var tb2 = new TagBuilder("option");
//Solution string newString = o.trim(new [] { "{","}"});
tb2.SetInnerText(o.ToString()); //Solution tb2.SetInnerText(newString);
sb.Append(tb2.ToString(TagRenderMode.Normal) + "\n");
}
tb1.InnerHtml = sb.ToString();
return new MvcHtmlString(tb1.ToString(TagRenderMode.Normal));
}
string newString = originalString.Trim(new[] {'{', '}'});
Can you use Replace
string somestring = somestring.Replace("{","").Replace("}","");
Alternatively, you can use StartsWith and EndsWith which will only remove from the beginning and the end of the string, for example:
string foo = "{something}";
if (foo.StartsWith("{"))
{
foo = foo.Remove(0, 1);
}
if (foo.EndsWith("}"))
{
foo = foo.Remove(foo.Length-1, 1);
}
You could use replace e.g.
string someString = "{somestring}";
string someOtherString = someString.Replace("{","").Replace("}","");

Extract data from a big string

First of all, i'm using the function below to read data from a pdf file.
public string ReadPdfFile(string fileName)
{
StringBuilder text = new StringBuilder();
if (File.Exists(fileName))
{
PdfReader pdfReader = new PdfReader(fileName);
for (int page = 1; page <= pdfReader.NumberOfPages; page++)
{
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));
text.Append(currentText);
pdfReader.Close();
}
}
return text.ToString();
}
As you can see , all data is saved in a string. The string looks like this:
label1: data1;
label2: data2;
label3: data3;
.............
labeln: datan;
My question: How can i get the data from string based on labels ?
I've tried this , but i'm getting stuck:
if ( string.Contains("label1"))
{
extracted_data1 = string.Substring(string.IndexOf(':') , string.IndexOf(';') - string.IndexOf(':') - 1);
}
if ( string.Contains("label2"))
{
extracted_data2 = string.Substring(string.IndexOf("label2") + string.IndexOf(':') , string.IndexOf(';') - string.IndexOf(':') - 1);
}
Have a look at the String.Split() function, it tokenises a string based on an array of characters supplied.
e.g.
string[] lines = text.Split(new[] {';'}, StringSplitOptions.RemoveEmptyEntries);
now loop through that array and split each one again
foreach(string line in lines) {
string[] pair = line.Split(new[] {':'});
string key = pair[0].Trim();
string val = pair[1].Trim();
....
}
Obviously check for empty lines, and use .Trim() where needed...
[EDIT]
Or alternatively as a nice Linq statement...
var result = from line in text.Split(new[] {';'}, StringSplitOptions.RemoveEmptyEntries)
let tokens = line.Split(new[] {':'})
select tokens;
Dictionary<string, string> =
result.ToDictionary (key => key[0].Trim(), value => value[1].Trim());
It's pretty hard-coded, but you could use something like this (with a little bit of trimming to your needs):
string input = "label1: data1;" // Example of your input
string data = input.Split(':')[1].Replace(";","").Trim();
You can do this by using Dictionary<string,string>,
Dictionary<string, string> dicLabelData = new Dictionary<string, string>();
List<string> listStrSplit = new List<string>();
listStrSplit = strBig.Split(';').ToList<string>();//strBig is big string which you want to parse
foreach (string strSplit in listStrSplit)
{
if (strSplit.Split(':').ToList<string>().Count > 1)
{
List<string> listLable = new List<string>();
listLable = strSplit.Split(':').ToList<string>();
dicLabelData.Add(listLable[0],listLable[1]);//Key=Label,Value=Data
}
}
dicLabelData contains data of all label....
i think you can use regex to solve this problem. Just split the string on the break line and use a regex to get the right number.
You can use a regex to do it:
Regex rx = new Regex("label([0-9]+): ([^;]*);");
var matches = rx.Matches("label1: a string; label2: another string; label100: a third string;");
foreach (Match match in matches) {
var id = match.Groups[1].ToString();
var data = match.Groups[2].ToString();
var idAsNumber = int.Parse(id);
// Here you use an array or a dictionary to save id/data
}

Categories

Resources