Uri.EscapeDataString weirdness - c#

Why does EscapeDataString behave differently between .NET 4 and 4.5? The outputs are
Uri.EscapeDataString("-_.!~*'()") => "-_.!~*'()"
Uri.EscapeDataString("-_.!~*'()") => "-_.%21~%2A%27%28%29"
The documentation
By default, the EscapeDataString method converts all characters except
for RFC 2396 unreserved characters to their hexadecimal
representation. If International Resource Identifiers (IRIs) or
Internationalized Domain Name (IDN) parsing is enabled, the
EscapeDataString method converts all characters, except for RFC 3986
unreserved characters, to their hexadecimal representation. All
Unicode characters are converted to UTF-8 format before being escaped.
For reference, unreserved characters are defined as follows in RFC 2396:
unreserved = alphanum | mark
mark = "-" | "_" | "." | "!" | "~" | "*" | "'" |
(" | ")"
And in RFC 3986:
ALPHA / DIGIT / "-" / "." / "_" / "~"
The source code
It looks like whether each character of EscapeDataString is escaped is determined roughly like this
is unicode above \x7F
? PERCENT ENCODE
: is a percent symbol
? is an escape char
? LEAVE ALONE
: PERCENT ENCODE
: is a forced character
? PERCENT ENCODE
: is an unreserved character
? PERCENT ENCODE
It's at that final check "is an unreserved character" where the choice between RFC2396 and RFC3986 is made. The source code of the method verbatim is
internal static unsafe bool IsUnreserved(char c)
{
if (Uri.IsAsciiLetterOrDigit(c))
{
return true;
}
if (UriParser.ShouldUseLegacyV2Quirks)
{
return (RFC2396UnreservedMarks.IndexOf(c) >= 0);
}
return (RFC3986UnreservedMarks.IndexOf(c) >= 0);
}
And that code refers to
private static readonly UriQuirksVersion s_QuirksVersion =
(BinaryCompatibility.TargetsAtLeast_Desktop_V4_5
// || BinaryCompatibility.TargetsAtLeast_Silverlight_V6
// || BinaryCompatibility.TargetsAtLeast_Phone_V8_0
) ? UriQuirksVersion.V3 : UriQuirksVersion.V2;
internal static bool ShouldUseLegacyV2Quirks {
get {
return s_QuirksVersion <= UriQuirksVersion.V2;
}
}
Confusion
It seems contradictory that the documentation says the output of EscapeDataString depends on whether IRI/IDN parsing is enabled, whereas the source code says the output is determined by the value of TargetsAtLeast_Desktop_V4_5. Could someone clear this up?

A lot of changes has been done in 4.5 comparing to 4.0 in terms of system functions and how it behaves.
U can have a look at this thread
Why does Uri.EscapeDataString return a different result on my CI server compared to my development machine?
or
U can directly go to the following link
http://msdn.microsoft.com/en-us/library/hh367887(v=vs.110).aspx
All this has been with the input from the users around the world.

Related

c# ASHX addHeader causing error

I'm working on a c# .ashx handler file and having this code:
context.Response.AddHeader("HTTP Header", "200");
context.Response.AddHeader("Content", "OK");
when this page is accessed using http protocol, it works fine but if I use https, it generates error below in chrome://net-internals/#events:
t=10983 [st=37] HTTP2_SESSION_RECV_INVALID_HEADER
--> error = "Invalid character in header name."
--> header_name = "http%20header"
--> header_value = "200"
t=10983 [st=37] HTTP2_SESSION_SEND_RST_STREAM
--> description = "Could not parse Spdy Control Frame Header."
--> error_code = "1 (PROTOCOL_ERROR)"
--> stream_id = 1
Is "HTTP Header" a safe header name? I read that "space" shouldn't be a problem in header, what's the actual issue?
So far, above happens in chrome/safari, but works fine in Firefox.
Any kind advise?
Space is not a valid character in a header name. HTTP is defined by RFC 7230.
The syntax of a header field is defined in section 3.2. Header Fields
Each header field consists of a case-insensitive field name followed
by a colon (":"), optional leading whitespace, the field value, and
optional trailing whitespace.
header-field = field-name ":" OWS field-value OWS
field-name = token
field-value = *( field-content / obs-fold )
field-content = field-vchar [ 1*( SP / HTAB ) field-vchar ]
field-vchar = VCHAR / obs-text
obs-fold = CRLF 1*( SP / HTAB )
; obsolete line folding
; see Section 3.2.4
So the field name is a token. Tokens are defined in 3.2.6. Field Value Components
Most HTTP header field values are defined using common syntax
components (token, quoted-string, and comment) separated by
whitespace or specific delimiting characters. Delimiters are chosen
from the set of US-ASCII visual characters not allowed in a token
(DQUOTE and "(),/:;?#[\]{}").
token = 1*tchar
tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*"
/ "+" / "-" / "." / "^" / "_" / "`" / "|" / "~"
/ DIGIT / ALPHA
; any VCHAR, except delimiters
The last piece is in 1.2. Syntax Notation
The following core rules are included by reference, as defined in
[RFC5234], Appendix B.1: ALPHA (letters), CR (carriage return), CRLF
(CR LF), CTL (controls), DIGIT (decimal 0-9), DQUOTE (double quote),
HEXDIG (hexadecimal 0-9/A-F/a-f), HTAB (horizontal tab), LF (line
feed), OCTET (any 8-bit sequence of data), SP (space), and VCHAR (any
visible [USASCII] character).
So whitespace is not allowed in the name of a header.

Regex for decimal number dot instead of comma (.NET)

I am using regex to parse data from an OCR'd document and I am struggling to match the scenarios where a 1000s comma separator has been misread as a dot, and also where the dot has been misread as a comma!
So if the true value is 1234567.89 printed as 1,234,567.89 but being misread as:
1.234,567.89
1,234.567.89
1,234,567,89
etc
I could probably sort this in C# but I'm sure that a regex could do it. Any regex-wizards out there that can help?
UPDATE:
I realise this is a pretty dumb question as the regex is pretty straight forward to catch all of these, it is then how I choose to interpret the match. Which will be in C#. Thanks - sorry to waste your time on this!
I will mark the answer to Dmitry as it is close to what I was looking for. Thank you.
Please notice, that there's ambiguity since:
123,456 // thousand separator
123.456 // decimal separator
are both possible (123456 and 123.456). However, we can detect some cases:
Too many decimal separators 123.456.789
Wrong order 123.456,789
Wrong digits count 123,45
So we can set up a rule: the separator can be decimal one if it's the last one and not followed by exactly three digits (see ambiguity above), all the
other separators should be treated as thousand ones:
1?234?567?89
^ ^ ^
| | the last one, followed by two digits (not three), thus decimal
| not the last one, thus thousand
not the last one, thus thousand
Now let's implement a routine
private static String ClearUp(String value) {
String[] chunks = value.Split(',', '.');
// No separators
if (chunks.Length <= 1)
return value;
// Let's look at the last chunk
// definitely decimal separator (e.g. "123,45")
if (chunks[chunks.Length - 1].Length != 3)
return String.Concat(chunks.Take(chunks.Length - 1)) +
"." +
chunks[chunks.Length - 1];
// may be decimal or thousand
if (value[value.Length - 4] == ',')
return String.Concat(chunks);
else
return String.Concat(chunks.Take(chunks.Length - 1)) +
"." +
chunks[chunks.Length - 1];
}
Now let's try some tests:
String[] data = new String[] {
// you tests
"1.234,567.89",
"1,234.567.89",
"1,234,567,89",
// my tests
"123,456", // "," should be left intact, i.e. thousand separator
"123.456", // "." should be left intact, i.e. decimal separator
};
String report = String.Join(Environment.NewLine, data
.Select(item => String.Format("{0} -> {1}", item, ClearUp(item))));
Console.Write(report);
the outcome is
1.234,567.89 -> 1234567.89
1,234.567.89 -> 1234567.89
1,234,567,89 -> 1234567.89
123,456 -> 123456
123.456 -> 123.456
Try this Regex:
\b[\.,\d][^\s]*\b
\b = Word boundaries
containing: . or comma or digits
Not containing spaces
Responding to update/comments: you do not need regex to do this. Instead, if you can isolate the number string from the surrounding spaces, you can pull it into a string-array using Split(',','.'). Based on the logic you outlined above, you could then use the last element of the array as the fractional part, and concatenate the first elements together for the whole part. (Actual code left as an exercise...) This will even work if the ambiguous-dot-or-comma is the last character in the string: the last element in the split-array will be empty.
Caveat: This will only work if there is always a decimal point--otherwise, you would not be able to differentiate logically between a thousands-place comma and a decimal with thousandths.

What's different Microsoft.JScript.GlobalObject.escape and Uri.EscapeUriString

The service received the string from Uri.EscapeUriString and Microsoft.JScript.GlobalObject.escape are difference, then I use Microsoft.JScript.GlobalObject.escape to handle url is ok.
What's different between Microsoft.JScript.GlobalObject.escape and Uri.EscapeUriString in c#?
Although Uri.EscapeUriString is available to use in C# out of the box, it can not convert all the characters exactly the same way as JavaScript escape function does.
For example let's say the original string is: "Some String's /Hello".
Uri.EscapeUriString("Some String's /Hello")
output:
"Some%20String's%20/Hello"
Microsoft.JScript.GlobalObject.escape("Some String's /Hello")
output:
"Some%20String%27s%20/Hello"
Note how the Uri.EscapeUriString did not escape the '.
That being said, lets look at a more extreme example. Suppose we have this string "& / \ # , + ( ) $ ~ % .. ' " : * ? < > { }". Lets see what escaping this with both methods give us.
Microsoft.JScript.GlobalObject.escape("& / \\ # , + ( ) $ ~ % .. ' \" : * ? < > { }")
output: "%26%20/%20%5C%20%23%20%2C%20+%20%28%20%29%20%24%20%7E%20%25%20..%20%27%20%22%20%3A%20*%20%3F%20%3C%20%3E%20%7B%20%7D"
Uri.EscapeUriString("& / \\ # , + ( ) $ ~ % .. ' \" : * ? < > { }")
output: "&%20/%20%5C%20#%20,%20+%20(%20)%20$%20~%20%25%20..%20'%20%22%20:%20*%20?%20%3C%20%3E%20%7B%20%7D"
Notice that Microsoft.JScript.GlobalObject.escape escaped all characters except +, /, * and ., even those that are valid in a uri. For example the ? and & where escaped even though they are valid in a query string.
So it all depends on where and when you wish to escape your URI and what type of URI you are creating/escaping.

using Regular Expression how to get (16.00 + 28.66 = 44.66) as 44.66 and (99) as 99

Im using regular expression to get values such as (16.00 + 28.66 = 44.66) as 44.66 ,(26.00) as 26.00
I have trouble to display data when its just(99) as 99 without any decimal.
I have used the below code to till now
string amount = DropDownList1.SelectedItem.Text;
Regex regex = new Regex("(\\d+\\.\\d{2})(?=\\))", RegexOptions.Multiline | RegexOptions.CultureInvariant
| RegexOptions.IgnorePatternWhitespace
| RegexOptions.Compiled
);
Someone please tell me how can i display a value without any decimal..
Eg-(99) as 99
Does your drop down list contain values like these?
(20.01 + 20.01 = 40.02)
(40.02)
(40)
If yes, you can try this Regular Expression
(\\d+(\\.\\d{2})?)(?=\\))
You can do it without Regex using ToString format, you can fix the number of decimal places. The 99 will be 99.00. You can read more about custom numeric formats over here.
string formatedNum = double.Parse(DropDownList1.SelectedItem.Text).ToString(".00");
The "0" custom format specifier serves as a zero-placeholder symbol.
If the value that is being formatted has a digit in the position where
the zero appears in the format string, that digit is copied to the
result string; otherwise, a zero appears in the result string. The
position of the leftmost zero before the decimal point and the
rightmost zero after the decimal point determines the range of digits
that are always present in the result string, MSDN.

C# Email Address validation

Just I want to clarify one thing. Per client request we have to create a regular expression in such a way that it should allow apostrophe in email address.
My Question according to RFC standard will an email address contain aportrophe? If so how to recreate regular expression to allow apostrophe?
The regular expression below implements the official RFC 2822 standard for email addresses. Using this regular expression in actual applications is NOT recommended. It is shown to illustrate that with regular expressions there's always a trade-off between what's exact and what's practical.
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")#(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
You could use the simplified one:
[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?
And yes, apostrophe is allowed in the email, as long as it is not in domain name.
Here's the validation attribute I wrote. It validates pretty much every "raw" email address, that is those of the form local-part#*domain*. It doesn't support any of the other, more...creative constructs that the RFCs allow (this list is not comprehensive by any means):
comments (e.g., jsmith#whizbang.com (work))
quoted strings (escaped text, to allow characters not allowed in an atom)
domain literals (e.g. foo#[123.45.67.012])
bang-paths (aka source routing)
angle addresses (e.g. John Smith <jsmith#whizbang.com>)
folding whitespace
double-byte characters in either local-part or domain (7-bit ASCII only).
etc.
It should accept almost any email address that can be expressed thusly
foo.bar#bazbat.com
without requiring the use of quotes ("), angle brackets ('<>') or square brackets ([]).
No attempt is made to validate that the rightmost dns label in the domain is a valid TLD (top-level domain). That is because the list of TLDs is far larger now than the "big 6" (.com, .edu, .gov, .mil, .net, .org) plus 2-letter ISO country codes. ICANN actually updates the TLD list daily, though I suspect that the list doesn't actually change daily. Further, ICANN just approved a big expansion of the generic TLD namespace). And some email addresses don't have what you're recognize as a TLD (did you know that postmaster#. is theoretically valid and mailable? Mail to that address should get delivered to the postmaster of the DNS root zone.)
Extending the regular expression to support domain literals, it shouldn't be too difficult.
Here you go. Use it in good health:
using System;
using System.ComponentModel.DataAnnotations;
using System.Text.RegularExpressions;
namespace ValidationHelpers
{
[AttributeUsage( AttributeTargets.Property | AttributeTargets.Field , AllowMultiple = false )]
sealed public class EmailAddressValidationAttribute : ValidationAttribute
{
static EmailAddressValidationAttribute()
{
RxEmailAddress = CreateEmailAddressRegex();
return;
}
private static Regex CreateEmailAddressRegex()
{
// references: RFC 5321, RFC 5322, RFC 1035, plus errata.
string atom = #"([A-Z0-9!#$%&'*+\-/=?^_`{|}~]+)" ;
string dot = #"(\.)" ;
string dotAtom = "(" + atom + "(" + dot + atom + ")*" + ")" ;
string dnsLabel = "([A-Z]([A-Z0-9-]{0,61}[A-Z0-9])?)" ;
string fqdn = "(" + dnsLabel + "(" + dot + dnsLabel + ")*" + ")" ;
string localPart = "(?<localpart>" + dotAtom + ")" ;
string domain = "(?<domain>" + fqdn + ")" ;
string emailAddrPattern = "^" + localPart + "#" + domain + "$" ;
Regex instance = new Regex( emailAddrPattern , RegexOptions.Singleline | RegexOptions.IgnoreCase );
return instance;
}
private static Regex RxEmailAddress;
public override bool IsValid( object value )
{
string s = Convert.ToString( value ) ;
bool fValid = string.IsNullOrEmpty( s ) ;
// we'll take an empty field as valid and leave it to the [Required] attribute to enforce that it's been supplied.
if ( !fValid )
{
Match m = RxEmailAddress.Match( s ) ;
if ( m.Success )
{
string emailAddr = m.Value ;
string localPart = m.Groups[ "localpart" ].Value ;
string domain = m.Groups[ "domain" ].Value ;
bool fLocalPartLengthValid = localPart.Length >= 1 && localPart.Length <= 64 ;
bool fDomainLengthValid = domain.Length >= 1 && domain.Length <= 255 ;
bool fEmailAddrLengthValid = emailAddr.Length >= 1 && emailAddr.Length <= 256 ; // might be 254 in practice -- the RFCs are a little fuzzy here.
fValid = fLocalPartLengthValid && fDomainLengthValid && fEmailAddrLengthValid ;
}
}
return fValid ;
}
}
}
Cheers!

Categories

Resources