Is there a class similar to HttpUtility to encode the content of a custom header? Ideally I would like to keep the content readable.
You can use the HttpEncoder.HeaderNameValueEncode Method in the .NET Framework 4.0 and above.
For previous versions of the .NET Framework, you can roll your own encoder, using the logic noted on the HttpEncoder.HeaderNameValueEncode reference page:
All characters whose Unicode value is less than ASCII character 32,
except ASCII character 9, are URL-encoded into a format of %NN where
the N characters represent hexadecimal values.
ASCII character 9 (the horizontal tab character) is not URL-encoded.
ASCII character 127 is encoded as %7F.
All other characters are not encoded.
Update:
As OliverBock point out the HttpEncoder.HeaderNameValueEncode Method is protected and internal. I went to open source Mono project and found the mono's implementation
void HeaderNameValueEncode (string headerName, string headerValue, out string encodedHeaderName, out string encodedHeaderValue)
{
if (String.IsNullOrEmpty (headerName))
encodedHeaderName = headerName;
else
encodedHeaderName = EncodeHeaderString (headerName);
if (String.IsNullOrEmpty (headerValue))
encodedHeaderValue = headerValue;
else
encodedHeaderValue = EncodeHeaderString (headerValue);
}
static void StringBuilderAppend (string s, ref StringBuilder sb)
{
if (sb == null)
sb = new StringBuilder (s);
else
sb.Append (s);
}
static string EncodeHeaderString (string input)
{
StringBuilder sb = null;
for (int i = 0; i < input.Length; i++) {
char ch = input [i];
if ((ch < 32 && ch != 9) || ch == 127)
StringBuilderAppend (String.Format ("%{0:x2}", (int)ch), ref sb);
}
if (sb != null)
return sb.ToString ();
return input;
}
Just FYI
[here ] (https://github.com/mono/mono/blob/master/mcs/class/System.Web/System.Web.Util/HttpEncoder.cs)
For me helped Uri.EscapeDataString(headervalue)
This does the same job as HeaderNameValueEncode(), but will also encode % characters so the header can be reliably decoded later.
static string EncodeHeaderValue(string value)
{
return Regex.Replace(value, #"[\u0000-\u0008\u000a-\u001f%\u007f]", (m) => "%"+((int)m.Value[0]).ToString("x2"));
}
static string DecodeHeaderValue(string encoded)
{
return Regex.Replace(encoded, #"%([0-9a-f]{2})", (m) => new String((char)Convert.ToInt32(m.Groups[1].Value, 16), 1), RegexOptions.IgnoreCase);
}
Mayeb This one ?
UrlEncode Function
sorry its off the top of my head but for your request object there should be a headers object you can add to.
i.e. request.headers.add("blah");
Thats not spot on but it should point you in the right direction.
Related
I am using Selenium(C#) on NUnit Framework and is getting a string from UI as $4850.19.
I want to compare above string with the value from backend (DB) to assert they are equal.
I am using a below method to parse my dollar amount from front-end, but the issue is that is also stripping the decimal point; and obviously the comparison with backend is failing.
Method used:
public static string RemoveNonNumeric(string s)
{
StringBuilder sb = new StringBuilder();
for (int i = 0; i < s.Length; i++)
if (Char.IsNumber(s[i]))
sb.Append(s[i]);
return sb.ToString();
}
How to strip out any '$' or ',' but keep '.' in the value?
With Reg ex it's trivial
Regex.Replace(s, "[^0-9.]", "")
You can also use the decimal.Parse method to parse a string formatted as currency into a decimal type:
string input = "$4,850.19";
decimal result = decimal.Parse(input, NumberStyles.Currency);
Console.WriteLine($"{input} => {result}");
Output:
Another way to do it if you don't want to go the decimal.Parse route is to simply return only numeric and '.' characters from the string:
public static string RemoveNonNumeric2(string s)
{
return string.Concat(s?.Where(c => char.IsNumber(c) || c == '.') ?? "");
}
Thanks for all inputs above, I kind of figured out an easy way to handle this for now (as shown below) -
public static string RemoveNonNumeric(string s)
{
StringBuilder sb = new StringBuilder();
for (int i = 0; i < s.Length; i++)
if (Char.IsNumber(s[i]) || s[i] == '.')
sb.Append(s[i]);
return sb.ToString();
}
I would like to try other ways of handling this as well soon.
I have a large XML file that contain tag names that implement the dash-separated naming convention. How can I use C# to convert the tag names to the camel case naming convention?
The rules are:
1. Convert all characters to lower case
2. Capitalize the first character after each dash
3. Remove all dashes
Example
Before Conversion
<foo-bar>
<a-b-c></a-b-c>
</foo-bar>
After Conversion
<fooBar>
<aBC></aBC>
</fooBar>
Here's a code example that works, but it's slow to process - I'm thinking that there is a better way to accomplish my goal.
string ConvertDashToCamelCase(string input)
{
input = input.ToLower();
char[] ca = input.ToCharArray();
StringBuilder sb = new StringBuilder();
for(int i = 0; i < ca.Length; i++)
{
if(ca[i] == '-')
{
string t = ca[i + 1].ToString().toUpper();
sb.Append(t);
i++;
}
else
{
sb.Append(ca[i].ToString());
}
}
return sb.ToString();
}
The reason your original code was slow is because you're calling ToString all over the place unnecessarily. There's no need for that. There's also no need for the intermediate array of char. The following should be much faster, and faster than the version that uses String.Split, too.
string ConvertDashToCamelCase(string input)
{
StringBuilder sb = new StringBuilder();
bool caseFlag = false;
for (int i = 0; i < input.Length; ++i)
{
char c = input[i];
if (c == '-')
{
caseFlag = true;
}
else if (caseFlag)
{
sb.Append(char.ToUpper(c));
caseFlag = false;
}
else
{
sb.Append(char.ToLower(c));
}
}
return sb.ToString();
}
I'm not going to claim that the above is the fastest possible. In fact, there are several obvious optimizations that could save some time. But the above is clean and clear: easy to understand.
The key is the caseFlag, which you use to indicate that the next character copied should be set to upper case. Also note that I don't automatically convert the entire string to lower case. There's no reason to, since you'll be looking at every character anyway and can do the appropriate conversion at that time.
The idea here is that the code doesn't do any more work than it absolutely has to.
For completeness, here's also a regular expression one-liner (inspred by this JavaScript answer):
string ConvertDashToCamelCase(string input) =>
Regex.Replace(input, "-.", m => m.Value.ToUpper().Substring(1));
It replaces all occurrences of -x with x converted to upper case.
Special cases:
If you want lower-case all other characters, replace input with input.ToLower() inside the expression:
string ConvertDashToCamelCase(string input) =>
Regex.Replace(input.ToLower(), "-.", m => m.Value.ToUpper().Substring(1));
If you want to support multiple dashes between words (dash--case) and have all of the dashes removed (dashCase), replace - with -+ in the regular expression (to greedily match all sequences of dashes) and keep only the final character:
string ConvertDashToCamelCase(string input) =>
Regex.Replace(input, "-+.", m => m.Value.ToUpper().Substring(m.Value.Length - 1));
If you want to support multiple dashes between words (dash--case) and remove only the final one (dash-Case), change the regular expression to match only a dash followed by a non-dash (rather than a dash followed by any character):
string ConvertDashToCamelCase(string input) =>
Regex.Replace(input, "-[^-]", m => m.Value.ToUpper().Substring(1));
string ConvertDashToCamelCase(string input)
{
string[] words = input.Split('-');
words = words.Select(element => wordToCamelCase(element));
return string.Join("", words);
}
string wordToCamelCase(string input)
{
return input.First().ToString().ToUpper() + input.Substring(1).ToLower();
}
Here is an updated version of #Jim Mischel's answer that will ignore the content - i.e. it will only camelCase tag names.
string ConvertDashToCamelCase(string input)
{
StringBuilder sb = new StringBuilder();
bool caseFlag = false;
bool tagFlag = false;
for(int i = 0; i < input.Length; i++)
{
char c = input[i];
if(tagFlag)
{
if (c == '-')
{
caseFlag = true;
}
else if (caseFlag)
{
sb.Append(char.ToUpper(c));
caseFlag = false;
}
else
{
sb.Append(char.ToLower(c));
}
}
else
{
sb.Append(c);
}
// Reset tag flag if necessary
if(c == '>' || c == '<')
{
tagFlag = (c == '<');
}
}
return sb.ToString();
}
using System;
using System.Text;
public class MyString
{
public static string ToCamelCase(string str)
{
char[] s = str.ToCharArray();
StringBuilder sb = new StringBuilder();
for(int i = 0; i < s.Length; i++)
{
if (s[i] == '-' || s[i] == '_')
sb.Append(Char.ToUpper(s[++i]));
else
sb.Append(s[i]);
}
return sb.ToString();
}
}
I have a string like:
About \xee\x80\x80John F Kennedy\xee\x80\x81\xe2\x80\x99s Assassination . unsolved mystery \xe2\x80\x93 45 years later. Over the last decade, a lot of individuals have speculated on conspiracy theories that ...
I understand that \xe2\x80\x93 is a dash character. But how should I decode the above string in C#?
If you have a string like that, then you have used the wrong encoding when you decoded it in the first place. There is no "UTF-8 string", the UTF-8 data is whent the text is encoded into binary data (bytes). When it's decoded into a string, then it's not UTF-8 any more.
You should use the UTF-8 encoding when you create the string from binary data, once the string is created using the wrong encoding, you can't reliably fix it.
If there is no other alternative, you could try to fix the string by encoding it again using the same wrong encoding that was used to create it, and then decode it using the corrent encoding. There is however no guarantee that this will work for all strings, some characters will simply be lost during the wrong decoding. Example:
// wrong use of encoding, to try to fix wrong decoding
str = Encoding.UTF8.GetString(Encoding.Default.GetBytes(str));
Scan the input string char-by-char and convert values starting with \x (string to byte[] and back to string using UTF8 decoder), leaving all other characters unchanged:
static string Decode(string input)
{
var sb = new StringBuilder();
int position = 0;
var bytes = new List<byte>();
while(position < input.Length)
{
char c = input[position++];
if(c == '\\')
{
if(position < input.Length)
{
c = input[position++];
if(c == 'x' && position <= input.Length - 2)
{
var b = Convert.ToByte(input.Substring(position, 2), 16);
position += 2;
bytes.Add(b);
}
else
{
AppendBytes(sb, bytes);
sb.Append('\\');
sb.Append(c);
}
continue;
}
}
AppendBytes(sb, bytes);
sb.Append(c);
}
AppendBytes(sb, bytes);
return sb.ToString();
}
private static void AppendBytes(StringBuilder sb, List<byte> bytes)
{
if(bytes.Count != 0)
{
var str = System.Text.Encoding.UTF8.GetString(bytes.ToArray());
sb.Append(str);
bytes.Clear();
}
}
Output:
About John F Kennedy’s Assassination . unsolved mystery – 45 years later. Over the last decade, a lot of individuals have speculated on conspiracy theories that ...
Finally I've used something like this:
public static string UnescapeHex(string data)
{
return Encoding.UTF8.GetString(Array.ConvertAll(Regex.Unescape(data).ToCharArray(), c => (byte) c));
}
I was wondering if you could tell me what the most efficient way to repeat a string would be. I need to create a string 33554432 bytes long, repeating the string "hello, world" until it fills that buffer. What is the best way to do it, C is easy in this case:
for (i = 0; i < BIGSTRINGLEN - step; i += step)
memcpy(bigstring+i, *s, step);
Thanks.
An efficient way would be to use a StringBuilder:
string text = "hello, world";
StringBuilder builder = new StringBuilder(BIGSTRINGLEN);
while (builder.Length + text.Length <= BIGSTRINGLEN) {
builder.Append(text);
}
string result = builder.ToString();
First, do you want the string to be 33554432 bytes long, or characters long? .NET and C# use 16-bit characters, so they are not equivalent.
If you want 33554432 characters, naive solution would be string concatenation. See Frédéric Hamidi's answer.
If you want bytes, you will need to do something a bit more interesting:
int targetLength = 33554432;
string filler = "hello, world";
byte[] target = new byte[targetLength];
// Convert filler to bytes. Can use other encodings here.
// I am using ASCII to match C++ output.
byte[] fillerBytes = Encoding.ASCII.GetBytes(filler);
//byte[] fillerBytes = Encoding.Unicode.GetBytes(filler);
//byte[] fillerBytes = Encoding.UTF8.GetBytes(filler);
int position = 0;
while((position + fillerBytes.Length) < target.Length)
{
fillerBytes.CopyTo(target, position);
position += fillerBytes.Length;
}
// At this point, need to possibly do a partial copy.
if (position < target.Length)
{
int bytesNecessary = target.Length - position;
Array.Copy(fillerBytes, 0, target, position, bytesNecessary);
}
I don't know if it's the most efficient way, but if you're using .NET 3.5 or later, this could work:
String.Join("", System.Linq.Enumerable.Repeat("hello, world", 2796203).ToArray()).Substring(0, 33554432);
If the length you want is dynamic, then you can replace some of the hard-coded numbers with simple math.
What about this? Set the StringBuilder to the max expected size and then add the desired string as long as adding another one will not exceed the desired max size.
StringBuilder sb = new StringBuilder(33554432);
int max = sb.MaxCapacity;
String hello = "hello, world";
while (sb.Length + hello.Length <= max)
{
sb.Append(hello);
}
string longString = sb.ToString();
This avoids a loop that repeatedly adds the string. Instead, I "double" the string until it gets close to the right length and then I put the "doubled" pieces together appropriately.
static string Repeat(string s, int length) {
if (length < s.Length) {
return s.Substring(0, length);
}
var list = new List<string>();
StringBuilder t = new StringBuilder(s);
do {
string temp = t.ToString();
list.Add(temp);
t.Append(temp);
} while(t.Length < length);
int index = list.Count - 1;
StringBuilder sb = new StringBuilder(length);
while (sb.Length < length) {
while (list[index].Length > length) {
index--;
}
if (list[index].Length <= length - sb.Length) {
sb.Append(list[index]);
}
else {
sb.Append(list[index].Substring(0, length - sb.Length));
}
}
return sb.ToString();
}
So, for example, on input ("Hello, world!", 64) we build the strings
13: Hello, World!
26: Hello, World!Hello, World!
52: Hello, World!Hello, World!Hello, World!Hello, World!
Then we would build the result by concatenating the string of length 52 to the substring of length 12 of the string of length 13.
I am, of course, assuming that by bytes you meant length. Otherwise, you can easily modify the above using encodings to get what you want in terms of bytes.
I'm a little surprised that there isn't some information on this on the web, and I keep finding that the problem is a little stickier than I thought.
Here's the rules:
You are starting with delimited/escaped data to split into an array.
The delimiter is one arbitrary character
The escape character is one arbitrary character
Both the delimiter and the escape character could occur in data
Regex is fine, but a good-performance solution is best
Edit: Empty elements (including leading or ending delimiters) can be ignored
The code signature (in C# would be, basically)
public static string[] smartSplit(
string delimitedData,
char delimiter,
char escape) {}
The stickiest part of the problem is the escaped consecutive escape character case, of course, since (calling / the escape character and , the delimiter): ////////, = ////,
Am I missing somewhere this is handled on the web or in another SO question? If not, put your big brains to work... I think this problem is something that would be nice to have on SO for the public good. I'm working on it myself, but don't have a good solution yet.
A simple state machine is usually the easiest and fastest way. Example in Python:
def extract(input, delim, escape):
# states
parsing = 0
escaped = 1
state = parsing
found = []
parsed = ""
for c in input:
if state == parsing:
if c == delim:
found.append(parsed)
parsed = ""
elif c == escape:
state = escaped
else:
parsed += c
else: # state == escaped
parsed += c
state = parsing
if parsed:
found.append(parsed)
return found
void smartSplit(string const& text, char delim, char esc, vector<string>& tokens)
{
enum State { NORMAL, IN_ESC };
State state = NORMAL;
string frag;
for (size_t i = 0; i<text.length(); ++i)
{
char c = text[i];
switch (state)
{
case NORMAL:
if (c == delim)
{
if (!frag.empty())
tokens.push_back(frag);
frag.clear();
}
else if (c == esc)
state = IN_ESC;
else
frag.append(1, c);
break;
case IN_ESC:
frag.append(1, c);
state = NORMAL;
break;
}
}
if (!frag.empty())
tokens.push_back(frag);
}
private static string[] Split(string input, char delimiter, char escapeChar, bool removeEmpty)
{
if (input == null)
{
return new string[0];
}
char[] specialChars = new char[]{delimiter, escapeChar};
var tokens = new List<string>();
var token = new StringBuilder();
for (int i = 0; i < input.Length; i++)
{
var c = input[i];
if (c.Equals(escapeChar))
{
if (i >= input.Length - 1)
{
throw new ArgumentException("Uncompleted escape sequence has been encountered at the end of the input");
}
var nextChar = input[i + 1];
if (nextChar != escapeChar && nextChar != delimiter)
{
throw new ArgumentException("Unknown escape sequence has been encountered: " + c + nextChar);
}
token.Append(nextChar);
i++;
}
else if (c.Equals(delimiter))
{
if (!removeEmpty || token.Length > 0)
{
tokens.Add(token.ToString());
token.Length = 0;
}
}
else
{
var index = input.IndexOfAny(specialChars, i);
if (index < 0)
{
token.Append(c);
}
else
{
token.Append(input.Substring(i, index - i));
i = index - 1;
}
}
}
if (!removeEmpty || token.Length > 0)
{
tokens.Add(token.ToString());
}
return tokens.ToArray();
}
The implementation of this kind of tokenizer in terms of a FSM is fairly straight forward.
You do have a few decisions to make (like, what do I do with leading delimiters? strip or emit NULL tokens).
Here is an abstract version which ignores leading and multiple delimiters, and doesn't allow escaping the newline:
state(input) action
========================
BEGIN(*): token.clear(); state=START;
END(*): return;
*(\n\0): token.emit(); state=END;
START(DELIMITER): ; // NB: the input is *not* added to the token!
START(ESCAPE): state=ESC; // NB: the input is *not* added to the token!
START(*): token.append(input); state=NORM;
NORM(DELIMITER): token.emit(); token.clear(); state=START;
NORM(ESCAPE): state=ESC; // NB: the input is *not* added to the token!
NORM(*): token.append(input);
ESC(*): token.append(input); state=NORM;
This kind of implementation has the advantage of dealing with consecutive excapes naturally, and can be easily extended to give special meaning to more escape sequences (i.e. add a rule like ESC(t) token.appeand(TAB)).
Here's my ported function in C#
public static void smartSplit(string text, char delim, char esc, ref List<string> listToBuild)
{
bool currentlyEscaped = false;
StringBuilder fragment = new StringBuilder();
for (int i = 0; i < text.Length; i++)
{
char c = text[i];
if (currentlyEscaped)
{
fragment.Append(c);
currentlyEscaped = false;
}
else
{
if (c == delim)
{
if (fragment.Length > 0)
{
listToBuild.Add(fragment.ToString());
fragment.Remove(0, fragment.Length);
}
}
else if (c == esc)
currentlyEscaped = true;
else
fragment.Append(c);
}
}
if (fragment.Length > 0)
{
listToBuild.Add(fragment.ToString());
}
}
Hope this helps someone in the future. Thanks to KenE for pointing me in the right direction.
Here's a more idiomatic and readable way to do it:
public IEnumerable<string> SplitAndUnescape(
string encodedString,
char separator,
char escape)
{
var inEscapeSequence = false;
var currentToken = new StringBuilder();
foreach (var currentCharacter in encodedString)
if (inEscapeSequence)
{
currentToken.Append(currentCharacter);
inEscapeSequence = false;
}
else
if (currentCharacter == escape)
inEscapeSequence = true;
else
if (currentCharacter == separator)
{
yield return currentToken.ToString();
currentToken.Clear();
}
else
currentToken.Append(currentCharacter);
yield return currentToken.ToString();
}
Note that this doesn't remove empty elements. I don't think that should be the responsibility of the parser. If you want to remove them, just call Where(item => item.Any()) on the result.
I think this is too much logic for a single method; it gets hard to follow. If someone has time, I think it would be better to break it up into multiple methods and maybe its own class.
You'ew looking for something like a "string tokenizer". There's a version I found quickly that's similar. Or look at getopt.