c# ASHX addHeader causing error - c#

I'm working on a c# .ashx handler file and having this code:
context.Response.AddHeader("HTTP Header", "200");
context.Response.AddHeader("Content", "OK");
when this page is accessed using http protocol, it works fine but if I use https, it generates error below in chrome://net-internals/#events:
t=10983 [st=37] HTTP2_SESSION_RECV_INVALID_HEADER
--> error = "Invalid character in header name."
--> header_name = "http%20header"
--> header_value = "200"
t=10983 [st=37] HTTP2_SESSION_SEND_RST_STREAM
--> description = "Could not parse Spdy Control Frame Header."
--> error_code = "1 (PROTOCOL_ERROR)"
--> stream_id = 1
Is "HTTP Header" a safe header name? I read that "space" shouldn't be a problem in header, what's the actual issue?
So far, above happens in chrome/safari, but works fine in Firefox.
Any kind advise?

Space is not a valid character in a header name. HTTP is defined by RFC 7230.
The syntax of a header field is defined in section 3.2. Header Fields
Each header field consists of a case-insensitive field name followed
by a colon (":"), optional leading whitespace, the field value, and
optional trailing whitespace.
header-field = field-name ":" OWS field-value OWS
field-name = token
field-value = *( field-content / obs-fold )
field-content = field-vchar [ 1*( SP / HTAB ) field-vchar ]
field-vchar = VCHAR / obs-text
obs-fold = CRLF 1*( SP / HTAB )
; obsolete line folding
; see Section 3.2.4
So the field name is a token. Tokens are defined in 3.2.6. Field Value Components
Most HTTP header field values are defined using common syntax
components (token, quoted-string, and comment) separated by
whitespace or specific delimiting characters. Delimiters are chosen
from the set of US-ASCII visual characters not allowed in a token
(DQUOTE and "(),/:;?#[\]{}").
token = 1*tchar
tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*"
/ "+" / "-" / "." / "^" / "_" / "`" / "|" / "~"
/ DIGIT / ALPHA
; any VCHAR, except delimiters
The last piece is in 1.2. Syntax Notation
The following core rules are included by reference, as defined in
[RFC5234], Appendix B.1: ALPHA (letters), CR (carriage return), CRLF
(CR LF), CTL (controls), DIGIT (decimal 0-9), DQUOTE (double quote),
HEXDIG (hexadecimal 0-9/A-F/a-f), HTAB (horizontal tab), LF (line
feed), OCTET (any 8-bit sequence of data), SP (space), and VCHAR (any
visible [USASCII] character).
So whitespace is not allowed in the name of a header.

Related

Extracting dollar prices and numbers with comma as thousand separator from PDF converted to text format

I am trying to redact some pdfs with dollar amounts using c#. Below is what I have tried
#"/ (\d)(?= (?:\d{ 3})+(?:\.|$))| (\.\d\d ?)\d *$/ g"
#"(?<=each)(((\d*[,|.]\d{2,3}))*)"
#"(?<=each)(((\d*[,|.]\d{2,3}))*)"
#"\d+\.\d{2}"
Here are some test cases that it needs to match
76,249.25
131,588.00
7.09
21.27
420.42
54.77
32.848
3,056.12
0.009
0.01
32.85
2,948.59
$99,249.25
$9.0000
$1,800.0000
$1,000,000
Here are some test cases that it should not target
666-257-6443
F1A 5G9
Bolt, Locating, M8 x 1.25 x 30 L
Precision Washer, 304 SS, 0.63 OD x 0.31
Flat Washer 300 Series SS; Pack of 50
U-SSFAN 0.63-L6.00-F0.75-B0.64-T0.38-SC5.62
U-CLBUM 0.63-D0.88-L0.875
U-WSSS 0.38-D0.88-T0.125
U-BGHK 6002ZZ - H1.50
U-SSCS 0.38-B0.38
6412K42
Std Dowel, 3/8" x 1-1/2" Lg, Steel
2019.07.05
2092-002.0180
SHCMG 0.25-L1.00
280160717
Please note the c# portion is interfacing with iText 7 pdfSweep.
Guid g = new Guid();
CompositeCleanupStrategy strategy = new CompositeCleanupStrategy();
string guid = g.ToString();
string input = #"C:\Users\JM\Documents\pdftest\61882 _280011434 (1).pdf";
string output = #"C:\Users\JM\Documents\pdftest\61882 _2800011434 (1) x2" + guid+".pdf";
string regex = #"(?m)^\$?[0-9]{1,3}(?:,[0-9]{3})*(?:\.[0-9]+)?$";
strategy.Add(new RegexBasedCleanupStrategy(regex));
PdfDocument pdf = new PdfDocument(new PdfReader(input), new PdfWriter(output));
PdfAutoSweep autoSweep = new PdfAutoSweep(strategy);
autoSweep.CleanUp(pdf);
pdf.Close();
Please share your wisdom
You may use
\$?[0-9]{1,3}(?:,[0-9]{3})*(?:\.[0-9]+)?
Or, if the prices occur on whole lines:
^\$?[0-9]{1,3}(?:,[0-9]{3})*(?:\.[0-9]+)?$
See the regex demo
Bonus: To obtain only price values, you need to remove the ? after \$ to make it obligatory:
\$([0-9]{1,3}(?:,[0-9]{3})*(?:\.[0-9]+)?)
(I added a capturing group in case you need to access the number value separately from the $ char).
If you need to support any currency char, not just $, replace \$ with \p{Sc}.
Details
^ - start of string
\$? - an optional dollar symbol
[0-9]{1,3} - one to three digits
(?:,[0-9]{3})* - any 0 or more repetitions of a comma and then three digits
(?:\.[0-9]+)? - an optional sequence of a dot and then any 1 or more digits
$ - end of string.
C# check for a match:
if (Regex.IsMatch(str, #"^\$?[0-9]{1,3}(?:,[0-9]{3})*(?:\.[0-9]+)?$"))
{
// there is a match
}
pdfSweep notice:
Apply the fix from this answer. The point is that the line breaks are lost when parsing the text. The regex you need then is
#"(?m)^\$?[0-9]{1,3}(?:,[0-9]{3})*(?:\.[0-9]+)?\r?$"
where (?m) makes ^ and $ match start/end of lines and \r? is required as $ only matches before LF, not before CRLF in .NET regex.

Regex for ClassName.PropertyName

I don't know Regex,
But I need to have regex expression for evaluation of ClassName.PropertyName?
Need to validate some values from appSettings for being compliant with ClassName.PropertyName convention
"ClassName.PropertyName" - this is the only format that is valid, the rest below is invalid:
"Personnel.FirstName1" <- the only string that should match
"2Personnel.FirstName1"
"Personnel.33FirstName"
"Personnel..FirstName"
"Personnel.;FirstName"
"Personnel.FirstName."
"Personnel.FirstName "
" Personnel.FirstName"
" Personnel. FirstName"
" 23Personnel.3FirstName"
I have tried this (from the link posted as duplicate):
^\w+(.\w+)*$
but it doesn't work: I have false positives, e.g. 2Personnel.FirstName1 as well as Personnel.33FirstName passes the check when both should have been rejected.
Can someone help me with that?
Let's start from single identifier:
Its first character must be letter or underscope
It can contain letters, underscopes and digits
So the regular expression for an identifier is
[A-Za-z_][A-Za-z0-9_]*
Next, we should chain identifier with . (do not forget to escape .) an indentifier followed by zero or more . + identifier:
^[A-Za-z_][A-Za-z0-9_]*(?:\.[A-Za-z_][A-Za-z0-9_]*)*$
In case it must be exactly two identifiers (and not, say abc.def.hi - three ones)
^[A-Za-z_][A-Za-z0-9_]*\.[A-Za-z_][A-Za-z0-9_]*$
Tests:
string[] tests = new string[] {
"Personnel.FirstName1", // the only string that should be matched
"2Personnel.FirstName1",
"Personnel.33FirstName",
"Personnel..FirstName",
"Personnel.;FirstName",
"Personnel.FirstName.",
"Personnel.FirstName ",
" Personnel.FirstName",
" Personnel. FirstName",
" 23Personnel.3FirstName",
} ;
string pattern = #"^[A-Za-z_][A-Za-z0-9_]*(\.[A-Za-z_][A-Za-z0-9_]*)*$";
var results = tests
.Select(test =>
$"{"\"" + test + "\"",-25} : {(Regex.IsMatch(test, pattern) ? "matched" : "failed")}"");
Console.WriteLine(String.Join(Environment.NewLine, results));
Outcome:
"Personnel.FirstName1" : matched
"2Personnel.FirstName1" : failed
"Personnel.33FirstName" : failed
"Personnel..FirstName" : failed
"Personnel.;FirstName" : failed
"Personnel.FirstName." : failed
"Personnel.FirstName " : failed
" Personnel.FirstName" : failed
" Personnel. FirstName" : failed
" 23Personnel.3FirstName" : failed
Edit: In case culture specific names (like äöü.FirstName) should be accepted (see Rand Random's comments) then [A-Za-z] range should be changed into \p{L} - any letter. Exotic possibility - culture specific digits (e.g. Persian ones - ۰۱۲۳۴۵۶۷۸۹) can be solved by changing 0-9 into \d
// culture specific letters, but not digits
string pattern = #"^[\p{L}_][\p{L}0-9_]*(?:\.[\p{L}_][\p{L}0-9_]*)*$";
If each identifier should not exceed sertain length (say, 16) we should redesign initial identifier pattern: mandatory letter or underscope followed by [0..16-1] == {0,15} letters, digits or underscopes
[A-Za-z_][A-Za-z0-9_]{0,15}
And we have
string pattern = #"^[A-Za-z_][A-Za-z0-9_]{0,15}(?:\.[A-Za-z_][A-Za-z0-9_]{0,15})*$";
^[A-Za-z]*\.[A-Za-z]*[0-9]$
or
^[A-Za-z]*\.[A-Za-z]*[0-9]+$
if you need more than one numerical character in the number suffix

Url parameter values with special characters and IE

Have a webpage that is opened from another system with parameters that can contain extended ascii characters:
http://<host>/submitpage.cshtml?pname=SomeName
The cshtml webpage reads the parameters as usual with:
var pname = Request["pname"];
and shows it on the page with #pname
Works fine for all browsers except IE (even IE11) when pname=Günther or another name with foreign characters; ü, ø and so on.
Example:
http://<host>/submitpage.cshtml?pname=Günther
results in G�nther
The webpage is using <meta charset="UTF-8" />
Any solution? I have no control over the submitting system, som the url cannot be encoded before submit.
So between RFC3986 and RFC2234, we have the following relevant rules for URIs
pchar = unreserved / pct-encoded / sub-delims / ":" / "#"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
HEXDIG = DIGIT / "A" / "B" / "C" / "D" / "E" / "F"
pct-encoded = "%" HEXDIG HEXDIG
query = *( pchar / "/" / "?" )
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
ALPHA = %x41-5A / %x61-7A ; A-Z / a-z
DIGIT = %x30-39 ; 0-9
so any implementation that accepts the unencoded letter ü is non-standards compliant and any user agent issuing requests with such characters is also non-compliant. It sounds like you know that this is the case.
IMO, it's a dangerous game making your own system more permissive to patch over the faults of "out of your control" systems. Are you sure the issuer of this request can't fix their code?

What's different Microsoft.JScript.GlobalObject.escape and Uri.EscapeUriString

The service received the string from Uri.EscapeUriString and Microsoft.JScript.GlobalObject.escape are difference, then I use Microsoft.JScript.GlobalObject.escape to handle url is ok.
What's different between Microsoft.JScript.GlobalObject.escape and Uri.EscapeUriString in c#?
Although Uri.EscapeUriString is available to use in C# out of the box, it can not convert all the characters exactly the same way as JavaScript escape function does.
For example let's say the original string is: "Some String's /Hello".
Uri.EscapeUriString("Some String's /Hello")
output:
"Some%20String's%20/Hello"
Microsoft.JScript.GlobalObject.escape("Some String's /Hello")
output:
"Some%20String%27s%20/Hello"
Note how the Uri.EscapeUriString did not escape the '.
That being said, lets look at a more extreme example. Suppose we have this string "& / \ # , + ( ) $ ~ % .. ' " : * ? < > { }". Lets see what escaping this with both methods give us.
Microsoft.JScript.GlobalObject.escape("& / \\ # , + ( ) $ ~ % .. ' \" : * ? < > { }")
output: "%26%20/%20%5C%20%23%20%2C%20+%20%28%20%29%20%24%20%7E%20%25%20..%20%27%20%22%20%3A%20*%20%3F%20%3C%20%3E%20%7B%20%7D"
Uri.EscapeUriString("& / \\ # , + ( ) $ ~ % .. ' \" : * ? < > { }")
output: "&%20/%20%5C%20#%20,%20+%20(%20)%20$%20~%20%25%20..%20'%20%22%20:%20*%20?%20%3C%20%3E%20%7B%20%7D"
Notice that Microsoft.JScript.GlobalObject.escape escaped all characters except +, /, * and ., even those that are valid in a uri. For example the ? and & where escaped even though they are valid in a query string.
So it all depends on where and when you wish to escape your URI and what type of URI you are creating/escaping.

Uri.EscapeDataString weirdness

Why does EscapeDataString behave differently between .NET 4 and 4.5? The outputs are
Uri.EscapeDataString("-_.!~*'()") => "-_.!~*'()"
Uri.EscapeDataString("-_.!~*'()") => "-_.%21~%2A%27%28%29"
The documentation
By default, the EscapeDataString method converts all characters except
for RFC 2396 unreserved characters to their hexadecimal
representation. If International Resource Identifiers (IRIs) or
Internationalized Domain Name (IDN) parsing is enabled, the
EscapeDataString method converts all characters, except for RFC 3986
unreserved characters, to their hexadecimal representation. All
Unicode characters are converted to UTF-8 format before being escaped.
For reference, unreserved characters are defined as follows in RFC 2396:
unreserved = alphanum | mark
mark = "-" | "_" | "." | "!" | "~" | "*" | "'" |
(" | ")"
And in RFC 3986:
ALPHA / DIGIT / "-" / "." / "_" / "~"
The source code
It looks like whether each character of EscapeDataString is escaped is determined roughly like this
is unicode above \x7F
? PERCENT ENCODE
: is a percent symbol
? is an escape char
? LEAVE ALONE
: PERCENT ENCODE
: is a forced character
? PERCENT ENCODE
: is an unreserved character
? PERCENT ENCODE
It's at that final check "is an unreserved character" where the choice between RFC2396 and RFC3986 is made. The source code of the method verbatim is
internal static unsafe bool IsUnreserved(char c)
{
if (Uri.IsAsciiLetterOrDigit(c))
{
return true;
}
if (UriParser.ShouldUseLegacyV2Quirks)
{
return (RFC2396UnreservedMarks.IndexOf(c) >= 0);
}
return (RFC3986UnreservedMarks.IndexOf(c) >= 0);
}
And that code refers to
private static readonly UriQuirksVersion s_QuirksVersion =
(BinaryCompatibility.TargetsAtLeast_Desktop_V4_5
// || BinaryCompatibility.TargetsAtLeast_Silverlight_V6
// || BinaryCompatibility.TargetsAtLeast_Phone_V8_0
) ? UriQuirksVersion.V3 : UriQuirksVersion.V2;
internal static bool ShouldUseLegacyV2Quirks {
get {
return s_QuirksVersion <= UriQuirksVersion.V2;
}
}
Confusion
It seems contradictory that the documentation says the output of EscapeDataString depends on whether IRI/IDN parsing is enabled, whereas the source code says the output is determined by the value of TargetsAtLeast_Desktop_V4_5. Could someone clear this up?
A lot of changes has been done in 4.5 comparing to 4.0 in terms of system functions and how it behaves.
U can have a look at this thread
Why does Uri.EscapeDataString return a different result on my CI server compared to my development machine?
or
U can directly go to the following link
http://msdn.microsoft.com/en-us/library/hh367887(v=vs.110).aspx
All this has been with the input from the users around the world.

Categories

Resources