Remove strange hidden charecters from my JSON before deserializing - c#

I have some JSON being sent to me that breaks when it is trying to be deserialized. It seems to contain a black diamond with a ? in it. I cannot see the character but it is obviously there and it is failing on my system.
How do I get rid of this and still leave my JSON intact for deserialization?
UPDATE:
Here is a example of what will be in the middle of my JSON:
"UDF5" : "�65",
I am even open to just removing this property from my JSON altogether via RegEx.

As answered for: remove piece of string (JSON string ) with regex and based on the formatting you provide in that question (and I am assuming will edit into this one):
Assuming I can rely on the formatting you show above and it is one of these per regex being run this can be accomplished as simply as something like
([\S\s]*\"])\"UDF5\" : \"[\S\s]*?\",([\S\s]*)
Using the back reference $1$2 referencing the parts before and after the UDF5 field to write back out.
If there is a newline there to remove I am not doing it right now. This could be better - if someone else has time to correct or provide an additional answer. But in the interests of getting you an emergency fix I hope this helps.

Related

How to use AntiXss with a Web API

This is a question that has been asked before, but I've not found the information I'm looking for or maybe I'm just missing the point so please bear with me. I can always adjust my question if I'm asking it the wrong way.
If for example, I have a POST endpoint that use a simply DTO object with 2 properties (i.e. companyRequestDto) and contains a script tag in one of its properties. When I call my endpoint from Postman I use the following:
{
"company": "My Company<script>alert(1);</script>",
"description": "This is a description"
}
When it is received by the action in my endpoint,
public void Post(CompanyRequestDto companyRequestDto)
my DTO object will automatically be set and its properties will be set to:
companyDto.Company = "My Brand<script>alert(1);</script>";
companyDto.Description = "This is a description";
I clearly don't want this information to be stored in our database as is, nor do I want it stored as an escaped string as displayed above.
1) Request: So my first question is how do I throw an error if the DTO posted contains some invalid content such as the tag?
I've looked at Microsoft AntiXss but I don't understand how to handle this as the data provided in the properties of a DTO object is not an html string but just a string, so What I am missing here as I don't understand how this is helping sanitizing or validating the passed data.
When I call
var test = AntiXss.AntiXssEncoder.HtmlEncode(companyRequestDto.Company, true);
It returns an encoded string, but then what??
Is there a way to remove disallowed keywords or just simply throw an error?
2) Response: Assuming 1) was not implemented or didn't work properly and it ended up being stored in our database, am I suppose to return encoded data as a json string, so instead of returning:
"My company"
Am I suppose to return:
"My Company<script>alert(1)</script>"
Is the browser (or whatever app) just supposed to display as below then?:
"My Company<script>alert(1)</script>"
3) Code: Assuming there is a way to sanitize or throw an error, should I use this at the property level using attribute on all the properties of my various DTO objects or is there a way to apply this at the class level using an attribute that will validate and/or sanitize all string properties of a DTO object for example?
I found interesting articles but none really answering my problems or I'm having other problems with some of the answers:
asp.net mvc What is the difference between AntiXss.HtmlEncode and HttpUtility.HtmlEncode?
Stopping XSS when using WebAPI (currently looking into this one but don't see how example is solving problem as property is always failing whether I use the script tag or not)
how to sanitize input data in web api using anti xss attack (also looking at this one but having a problem calling ReadFromStreamAsync from my project at work. Might be down to some of the settings in my web.config but haven't figured out why but it always seems to return an empty string)
Thanks.
UPDATE 1:
I've just finished going through the answer from Stopping XSS when using WebAPI
This is probably the closest one to what I am looking for. Except I don't want to encode the data, as I don't want to store it in my database, so I'll see if I can figure out how to throw an error but I'm not sure what the condition will be. Maybe I should just look for characters such as <, >, ; , etc... as these will not likely be used in any of our fields.
You need to consider where your data will be used when you think about encoding, so that data with in it is only a problem if it's rendered as HTML so if you are going to display data that has been provided by users anywhere, it's probably at the point you are going to display it that you would want to html encode it for display (you want to avoid repeatedly html encoding the same string when saving it for example).
Again, it depends what the response is going to be used for... you probably want to html encode it at the point it's going to be displayed... remember if you are encoding something in the response it may not match whats in data so if the calling code could do something like call your API to search for a company with that name that could cause problems. If the browser does display the html encoded version it might look ugly but it's better than users being compromised by XSS attacks.
It's quite difficult to sanitize text for things like tags if you allow most characters for normal use. It's easier if you can whitelist characters allowed and only allow, say, alphanumeric but that isn't often possible. This can be done using a regex validation attribute on the DTO object. The best approach I think is to encode values for display if you can't stop certain characters. It's really difficult to try to allow all characters but avoid things like as people can start using ascii characters etc.

Deserializing JSON with Newtonsoft with quotes in a value

I am retrieving JSON from an API and have the following problem:
Some Json-values look like this and cannot be serialized the standard way
"key": "This is just a "dummy" value to show the problem",
The problem are the quotes around dummy. Newtonsoft obviously thinks the value ends with the quote before dummy, but it actually ends after problem.
Is there a way to ignore those quotes or somehow remove them automatically?
I've tried to remove them with a StringBuilder and String-Replace, but that didn't work because such a pattern occures mutliple times in the JSON-File and sometimes the nested quotes quote a single word, sometimes a whole sentence.
The whole JSON from the API has around 50.000 lines, so it's impossible to correct the error by hand.
Can this be solved somehow in C#?
Update: You have to write a custom parser to parse since its clearly not JSON then. What you have to do is fix the serialized object before you deseriliaze it. You have to iterate through the entire string and remove the unrequired quotes.
An example would be when the value property of a JSON ends and next one begins there is a comma character in the middle.
Its basically a huge nested if condition, to fix this.
Original Answer
As you can see, it does not parse as an valid JSON. You have to represent quotes as below. If its not someting in your control, you have to come up with a custom parser.

How to convert JSON format plain text to simple plain text

I have a string in plain text which contains brackets like JSON format as it is created using JavaScriptSerializer().Serialize() method. I need to remove brackets and collon and want to convert it into key = value, key = value format.
Need to convert
{
"account":"rf750",
"type":null,
"amount":"31",
"auth_type":"5",
"balance":"2.95",
"card":"re0724"
}
to
'account=rf750,type=null,amount=31,authe=5,balanc=2.95,card=re0724'
Well, you've got three different things going on here.
The first, and surface issue, is: how do you change the string?
Simple - you do some string substitutions, preferably using Regex. Remove the starting/ending braces, change [a]:"[b]", to [a]=[b], - or however you want the final format to look like.
The second, and slightly deeper issue is: JSON isn't just a simple list of keys=values. You can have nesting. You can have non-string data. Simply saying you want to change the JSON result to key=value,key=value,key=value, etc - is fragile. How do you know the JSON structure will be what you're expecting? JSON Serialization will serialize successfully even if you've got nested structures, non string/int data, etc. And if you want solid code that doesn't easily break, you have to figure out: how do I handle this? Can I handle this?
The third, and final thing is: you're taking a standard data format schema and figuring out how to translate it to a nonstandard data format. 90% of the time someone does that, they deserve to be shot. Seriously, spend some solid time asking yourself whether you can use the JSON as-is, and whether the process wanting key=value,key=value,etc can be changed to use an actual standardized data format.
Here is simple solution which (1) parses json to Dictionary and (2) uses String.Join and Linq Select to provide desired output:
using System.Linq;
using Newtonsoft.Json;
..
var dict = JsonConvert.DeserializeObject<Dictionary<string, string>>(json);
var str = string.Join(',', dict.Select(r => $"{r.Key}={r.Value}"));
str-variable now contains:
account=rf750,type=,amount=31,auth_type=5,balance=2.95,card=re0724
Well thanks everyone for your time and response. your answer led me towards solution and finally i found the following solution which resolved the issue perfectly.
var jObj = (JObject)JsonConvert.DeserializeObject(modelString);
modelString = String.Join("&",jObj.Children().Cast<JProperty>().Select(jp => jp.Name + "="+ HttpUtility.UrlEncode(jp.Value.ToString())));
the above code converts the JSON into a url encoded string and remove the JSON format

HttpRequestHeaders.Add splits the header on a whitespace

I'm trying to add a new header to a request I already had (which worked before), in which I want to put some sort of User-Agent string formatted like this:
AppName/AppVersion (DeviceOS DeviceOSVersion)
The code for it is written like this (request is a HttpRequestMessage):
request.Headers.Add(UserAgentKey, $"{AppName}/{DependencyService.Get<IVersionProperties>().GetAppVersion()} ({Device.RuntimePlatform} {DependencyService.Get<IVersionProperties>().GetOSVersion()})");
But weirdly enough it splits the string in two parts on the withspace (between the appverion and the opening parenthesis) resulting in 2 values for the User-Agent header instead of 1 unified whole.
So I'm curious what I'm doing wrong here, I think it has something to do with the whitespace and I might need to escape it somehow, but I'm not sure how. I hope someone can help me with this issue.
Thanks in advance.
Maybe not a full-on solution, but at least a workaround, why not compose the string first: var userAgent = $"{AppName}/{DependencyService.Get<IVersionProperties>().GetAppVersion()} ({Device.RuntimePlatform} {DependencyService.Get<IVersionProperties>().GetOSVersion()});"
And then take out the newlines: userAgent = userAgent.Replace(Environment.NewLine, " ");
As for the cause, I would say that one of these values has a newline in it. Although I don't really see why or which. Did you inspect each of the values individually?
Apparently it had to do with the header I was using.
I used the header "User-Agent" which expects a certain format and has some other funny business attached to it, when I changed it to "User-Agentt" for example it worked just fine and since I don't explicity need the header to be called that I will just change the name of the header.

Parsing Log using something else than string split c#

I'm pretty sure it has been asked before, but I could not find anything good.
I'm trying to parse a log but having troubles with it.
At first it looked pretty easy because the log is build like this:
thing,thing,thing,thing
so I string split it on the ,
however in the value itself it is possible that a , appears, and this is where I did not know what to do anymore.
How would I successfully parse this kind of log?
Edit~~
here is an log example:
1326139200953,info,,0,"str value which may contain, ",,,0
1326139201109,info,,0,"str value which may contain, ",,,0
1326139201265,info,,0,"str value which may contain, ",,,0
1326139201999,start,,0,,,,0
1326139368296,new,F:\Dir\Dir\file.txt,1536,,0,,0
``
If your log file doesn't have field encapsulators, the fields have variable width, and the separator/delimiter can also appear in a field, then it's likely you can't program something that will work in all cases.
Can you supply an example of your log file data? It may be possible to match the parts you need with a regex.
Unfortunately I think your question is not answerable in its current state, please provide more info.
Edit: Thanks for updating the question, you do have field encapsulators (double quotes). This will make it easier!
I think there are many ways to do this. Personally i think i would carry on splitting on commas, but then loop over the resulting array, checking if the first character of any value is a double quote. If it is, then you need to join it to the array item after it. If the last character of the joined array item isn't a double quote, you need to continue joining until you've closed your opening double quote.
There's certainly a better way so you may wish to wait for another solution.
Edit 2: Give this a go and let me know how you get on:
string myRegex = #"(?<=^(?:[^""]*""[^""]*"")*[^""]*),";
string[] outputArray = Regex.Split(myStr, myRegex);

Categories

Resources