Is there a way to trim leading characters using String.Format()? - c#

Need to know of there is a way to use String.Format to remove leading characters from a string. I have a limitation in some existing code that I can only pass in a string and a format string for it.
So can you do something like
String.Format("Test output: {0:#}","001")
and produce the output
"Test output: 1"
I think the answer is 'No' but I wanted to make sure.
EDIT: To clarify, the format string will be put in a configuration file and the string to be formatted is a value coming out of a database. I can't execute any code on it. Has to be through the format string.

You could do it on the arg you are passing
String.Format("Test output: {0:#}", "001".TrimStart('0'))
Alternatively you could probably do a find with replace using a regular expression on the resulting string.
An other alternative is to define and pass in your own formatter using a custom implementation of IFormatProvider. I am not sure if this is allowed or not based your your last edit.
However, based on the restrictions listed, there is no way to do it with just the format string input

Related

replace items in a string while also confirming format

I have a string input in the format of "string#int" and I want to convert it to "string-int" for web friendliness reasons for an api i am using.
To do this I could obviously just replace the single character # with a - using string.replace, but ideally I'd like to do a check that the input (which is user provided by the way) is in the correct format (string#int) while or before converting to the web friendly version with a "-" instead. Essentially I'm wondering if there is a method in C# that I could use to check that this input is in the correct format and convert it to the required result format.
There is no built-in way obviously, since the format you request is quite specific. Also, a string can contain anything, also a hastag, #, so I guess you need to narrow that down.
You could use regular expressions to check if the string is in the correct format. This would be possible expression:
[A-Za-z ]+#[0-9]+
Which matches for:
this is a string#123
There's nothing built in, but you could do the following:
var parts = input.Split(new char[] { '#' });
if (parts.Length != 2) incorrect format
int result;
if (!int.TryParse(parts[1], out result) incorrect format
output = String.Join("-", parts);
This takes the input and splits it on the "#" character. If the result isn't two parts then the string is invalid. You then check that the second part is an integer - if the TryParse fails it's not valid. The last step is to rejoin the two parts, but this time with a - as the separator.

unicode to human readable string c# .net

This is probably a very basic question, but really appreciate if you could help me with this:
I want to convert an string that contains characters like \u000d\u000a\u000d\u000 to a human readable string, however I don't want to use .Replace method since the Unicode characters might be much more than what I include the software to check and replace.
string = "Test \u000d\u000a\u000d\u000aTesting with new line. \u000d\u000a\u000d\u000aone more new line"
I receive this string as a json Object from my server.
Do you even need that?
For example, the following code will print abc which is the actual decoded value:
var unicodeString = "\u0061\u0062\u0063";
Console.WriteLine(unicodeString);

Surprising int.ToString output

I have been working on a project, and found an interesting problem:
2.ToString("TE"+"000"); // output = TE000
2.ToString("TR"+"000"); // output = TR002
I also have tried with several strings other than "TE" but all have the same correct output.
Out of curiosity, I am wondering how come this could have happened?
Simply based on Microsoft's documentation, Custom Numeric Format Strings, your strings "TE000" and "TR000" are both custom format strings, but clearly they are parsed differently.
2.ToString("TE000") is just a bug in the formatter; it's going down a buggy path because of the unescaped "E". So it's unexpectedly assuming the whole thing is a literal.
2.ToString("TR000") is being interpreted as an implied "TR" literal plus 3 zero-filled digits for an integer value; therefore, you get "TR002".
If you truly want TE and TR verbatim, the expressions 2.ToString("\"TE\"000") and 2.ToString("\"TR\"000") will accomplish that for you by specifying TE and TR as explicit literals, instead of letting the formatter guess if they are valid format specifiers (and getting it wrong).
The ToString needs to PARSE the format string and understand what to do with it.
Let's take a look to the following examples:
2.ToString("TE000"); //output TE000
2.ToString("E000"); //output 2E+000
2.ToString("0TE000); //output 2TE000
2.ToString("T"); //throws exception
2.ToString("TT"); //output TT
This shows that if the ToString parser can understand at least part of the format, it will assume that the rest is just extra characters to print with it. If the format is invalid for the given number (like when you use a DateTime string format on a number), it will throw an exception. If it can not make sense of the format, it will return the format string itself as the result.
You cannot use a numeric format to achieve a custom format, instead use something like this:
int i = 2;
String.Format("TE{0:X3}", i);
See Custom Numeric Format Strings. The E means the exponent part of the scientific notation of the number. Since 2 is 2E000 in exponential notation, that might explain it.

Convert string to char

I get from another class string that must be converted to char. It usually contains only one char and that's not a problem. But control chars i receive like '\\n' or '\\t'.
Is there standard methods to convert this to endline or tab char or i need to parse it myself?
edit:
Sorry, parser eat one slash. I receive '\\t'
I assume that you mean that the class that sends you the data is sending you a string like "\n". In that case you have to parse this yourself using:
Char.Parse(returnedChar)
Otherwise you can just cast it to a string like this
(string)returnedChar
New line:
string escapedNewline = #"\\n";
string cleanupNewLine = escapedNewline.Replace(#"\\n", Environment.NewLine);
OR
string cleanupNewLine = escapedNewline.Replace(#"\\n", "\n");
Tab:
string escapedTab = #"\\t";
string cleanupTab= escapedTab.Replace(#"\\t", "\t");
Note the lack of the literal string (i.e. i did not use #"\t" because that will not represent a Tab)
Alternatively you could consider Regular Expressions if you need to replace a range of different string patterns.
You should probably write a utility function to encapsulate the common behaviour above for all the possible Escape Sequences
Then you'd write some Unit Tests to cover each of the cases you can think of.
As you encounter any bugs you add more unit tests to cover those cases.
UPDATE
You could represent a tab in the XML with a special character sequence:
see this article
This article applies to SQL Server but may well be relevant to C# also?
To be absolutely sure, you could try generating a string with a tab in it and putting it into some XML (programmatically) and using XmlSerializer to serialize that to a file to see what the output is, then you can be sure that this will faithfully 'round-trip' the string with the tab still in it.
how about using string.ToCharArray()
You can then add the appropriate logic to process whatever was in the string.
char.parse(string); is used to convert string to char and you can do vice versa
char.tostring();
100% solved

How to split a user-generated string which may contain the delimitter?

I'd like to String.Split() the following string using a comma as the delimitter:
John,Smith,123 Main Street,212-555-1212
The above content is entered by a user. If they enter a comma in their address, the resulting string would cause problems to String.Split() since you now have 5 fields instead of 4:
John,Smith,123 Main Street, Apt 101,212-555-1212
I can use String.Replace() on all user input to replace commas with something else, and then use String.Replace() again to convert things back to commas:
value = value.Replace(",", "*");
However, this can still be fooled if a user happens to use the placeholder delimitter "*" in their input. Then you'd end up with extra commas and no asterisks in the result.
I see solutions online for dealing with escaped delimitters, but I haven't found a solution for this seemingly common situation. What am I missing?
EDIT: This is called delimitter collision.
This is a common scenario — you have some arbitrary string values that you would like to compose into a structure, which is itself a string, but without allowing the values to interfere with the delimiters in structure around them.
You have several options:
Input restriction: If it is acceptable for your scenario, the simplest solution is to restrict the use of delimiters in the values. In your specific case, this means disallow commas.
Encoding: If input restriction is not appropriate, the next easiest option would be to encode the entire input value. Choose an encoding that does not have delimiters in its range of possible outputs (e.g. Base64 does not feature commas in its encoded output)
Escaping delimiters: A slightly more complex option is to come up with a convention for escaping delimiters. If you're working with something mainstream like CSV it is likely that the problem of escaping is already solved, and there's a standard library that you can use. If not, then it will take some thought to come up with a complete escaping system, and implement it.
If you have the flexibility to not use CSV for your data representation this would open up a host of other options. (e.g. Consider the way in which parameterised SQL queries sidestep the complexity of input escaping by storing the parameter values separately from the query string.)
This may not be an option for you but would is it not be easier to use a very uncommon character, say a pipe |, as your delimiter and not allow this character to be entered in the first instance?
If this is CSV, the address should be surrounded by quotes. CSV parsers are widely available that take this into account when parsing the text.
John,Smith,"123 Main Street, Apt. 6",212-555-1212
One foolproof solution would be to convert the user input to base64 and then delimit with a comma. It will mean that you will have to convert back after parsing.
You could try putting quotes, or some other begin and end delimiters, around each of the user inputs, and ignore any special character between a set of quotes.
This really comes down to a situation of cleansing user inputs. You should only allow desired characters in the user input and reject/strip invalid inputs from the user. This way you could use your asterisk delimiter.
The best solution is to define valid characters, and reject non valid characters somehow, then use the nonvalid character (which will not appear in the input since they are "banned") as you delimiters
Dont allow the user to enter that character which you are using as a Delimiter. I personally feel this is best way.
Funny solution (works if the address is the only field with coma):
Split the string by coma. First two pieces will be name and last name; the last piece is the telephone - take those away. Combine the rest by coma back - that would be address ;)
In a sense, the user is already "escaping" the comma with the space afterward.
So, try this:
string[] values = RegEx.Split(value, ",(?![ ])");
The user can still break this if they don't put a space, and there is a more foolproof method (using the standard CSV method of quoting values that contain commas), but this will do the trick for the use case you've presented.
One more solution: provide an "Address 2" field, which is where things like apartment numbers would traditionally go. User can still break it if they are lazy, though what they'll actually break the fields after address2.
Politely remind your users that properly-formed street addresses in the United States and Canada should NEVER contain any punctuation whatsoever, perhaps?
The process of automatically converting corrupted data into useful data is non-trivial without heuristic logic. You could try to outsource the parsing by calling a third-party address-formatting library to apply the USPS formatting rules.
Even USPS requires the user to perform much of the work, by having components of the address entered into distinct fields on their address "canonicalizer" page (http://zip4.usps.com/zip4/welcome.jsp).

Categories

Resources