Regular Expression C#, HTML parse - c#

Please help.
I have a text from html, and I need to parse it.
Text:
converter.rates =
{"3":{"USD":{"buy":27.950001,"sell":28.190001},"EUR":{"buy":32.049999,"sell":32.689999}},"8":{"RUB":{"buy":0.27,"sell":0.43},"USD":{"buy":27.799999,"sell":28.200001},"EUR":{"buy":31.700001,"sell":32.549999}},"41":{"USD":{"buy":28.0,"sell":28.200001},"EUR":{"buy":31.950001,"sell":32.650002}},"46":{"RUB":{"buy":0.413,"sell":0.443},"USD":{"buy":28.0,"sell":28.25},"EUR":{"buy":31.73,"sell":32.73}},"47":{"RUB":{"buy":0.41,"sell":0.448},"USD":{"buy":27.98,"sell":28.15},"EUR":{"buy":31.889999,"sell":32.540001}},"48":{"RUB":{"buy":0.4,"sell":0.43},"USD":{"buy":28.0,"sell":28.200001},"EUR":{"buy":32.099998,"sell":32.490002}},"52":{"RUB":{"buy":0.41,"sell":0.43},"USD":{"buy":27.950001,"sell":28.25},"EUR":{"buy":32.0,"sell":32.5}},"77":{"RUB":{"buy":0.38,"sell":0.43},"USD":{"buy":28.049999,"sell":28.200001},"EUR":{"buy":32.049999,"sell":32.5}},"79":{"RUB":{"buy":0.412,"sell":0.444},"USD":{"buy":27.950001,"sell":28.799999},"EUR":{"buy":31.959999,"sell":33.099998}},"80":{"RUB":{"buy":0.38,"sell":0.43},"USD":{"buy":28.030001,"sell":28.190001},"EUR":{"buy":32.0,"sell":32.450001}},"70":{"RUB":{"buy":0.39,"sell":0.42},"USD":{"buy":28.0,"sell":28.25},"EUR":{"buy":32.0,"sell":32.200001}},"1":{"RUB":{"buy":0.42658,"sell":0.42658},"USD":{"buy":28.036648,"sell":28.036648},"EUR":{"buy":32.256161,"sell":32.256161}},"4":{"RUB":{"buy":0.42,"sell":0.43},"USD":{"buy":27.950001,"sell":28.25},"EUR":{"buy":32.150002,"sell":32.599998}},"10":{"RUB":{"buy":0.414,"sell":0.435},"USD":{"buy":28.0,"sell":28.200001},"EUR":{"buy":32.0,"sell":32.599998}},"13":{"RUB":{"buy":0.275,"sell":0.46},"USD":{"buy":27.9,"sell":28.200001},"EUR":{"buy":31.67,"sell":32.599998}},"15":{"RUB":{"buy":0.3749,"sell":0.4395},"USD":{"buy":27.985001,"sell":28.2075},"EUR":{"buy":32.036366,"sell":32.529091}},"31":{"RUB":{"buy":0.275,"sell":0.42},"USD":{"buy":27.9,"sell":28.139999},"EUR":{"buy":31.799999,"sell":32.400002}},"32":{"RUB":{"buy":0.42,"sell":0.5},"USD":{"buy":28.07,"sell":28.299999},"EUR":{"buy":32.150002,"sell":32.599998}},"39":{"USD":{"buy":28.07,"sell":28.25},"EUR":{"buy":32.150002,"sell":32.549999}},"40":{"RUB":{"buy":0.41,"sell":0.43},"USD":{"buy":27.950001,"sell":28.139999},"EUR":{"buy":32.049999,"sell":32.400002}},"64":{"RUB":{"buy":0.4,"sell":0.425},"USD":{"buy":27.9,"sell":28.200001},"EUR":{"buy":32.099998,"sell":32.599998}},"73":{"RUB":{"buy":0.4,"sell":0.43},"USD":{"buy":28.0,"sell":28.299999},"EUR":{"buy":32.0,"sell":32.549999}},"74":{"RUB":{"buy":0.41,"sell":0.435},"USD":{"buy":28.049999,"sell":28.25},"EUR":{"buy":31.799999,"sell":32.5}},"85":{"RUB":{"buy":0.3,"sell":0.43},"USD":{"buy":28.0,"sell":28.200001},"EUR":{"buy":32.099998,"sell":32.52}},"86":{"RUB":{"buy":0.37,"sell":0.42},"USD":{"buy":28.0,"sell":28.200001},"EUR":{"buy":32.0,"sell":32.799999}},"88":{"RUB":{"buy":0.35,"sell":0.5},"USD":{"buy":28.0,"sell":28.15},"EUR":{"buy":32.099998,"sell":32.450001}},"90":{"RUB":{"buy":4.0,"sell":4.4},"USD":{"buy":28.0,"sell":28.15},"EUR":{"buy":31.950001,"sell":32.450001}}}
I need next info from it:
code of bank - "3"
and USD rate - 27.950001, 28.190001
My expression:
#"(\d+)":..USD....\w+..(\d+.\d+)........(\d+.\d+)"
But it didn't work, because the USD does not always go first after the bank code

This is a JSON document. JSON is a recursive format, and regular expressions are notoriously hard to use when parsing recursive data.
Please use a specified parser, like NewtonSoft JSON:
var rawData = #"converter.rates = { ... }"; // original string
var rawJson = rawData.Substring("converter.rates = ".Length); // remove the prefix
var json = JObject.Parse(rawJson); // convert to a JSON data structure
Then you can use it like a dictionary:
foreach(var codeEntry in json)
{
foreach(var currencyEntry in codeEntry.Value)
{
var code = codeEntry.Key;
var currency = currencyEntry.Key;
var buy = currencyEntry.Value["buy"].Value<double>();
var sell = currencyEntry.Value["buy"].Value<double>();
Console.WriteLine($"code of bank - {code} and {currency} rate - {buy}, {sell} ");
}
}

If you still want to use regex, this can do it:
#"""(?<code>\d+)"":\{.*?(?<=""USD""):\{""buy"":(?<buy>\d+\.\d+),""sell"":(?<sell>\d+.\d+)\}"
It is build from your example. Basically it creates three named Groups 'code', 'buy' and 'sell'. Other than that it matches literal characters, only using a look behind '(?<=""USD"")' to find 'USD' to get the wanted rates.
Edit:
If you have a html document and want to grap the 'converter.rates' var as text, you can use this regex:
#"converter.rates\s?=.*\}\}\}"
It looks for the 3 '}' ending the string.

Related

Regular Expressions C#: Clean bad Json string

I get an answer from the server in the form of such JSON:
var zohozoho_atliview92 = {\"Itinerary\":[
{\"Client_Email\":\"garymc\",
\"Client_Name\":\"Gary\",
\"NT_Number\":\"NT-1237\",\"Number_of_Nights\":7,
\"ID\":\"24297940\",
\"Itinerary_Name\":\"Icelandnights\",
\"Tour_Template_Name\":\"Iceland FireDrive\",
\"Departure_Date\":\"2018-07-04\"}
]};
I need to remove this: var zohozoho_atliview92 = {\"Itinerary\":[ and delete last 3 characters ]}; to Deserialize it in my object.
How can i make it using Regular Expressions? Or is there a better variant?
is there a better variant?
Yes you can parse your json escaped string to JObject.
And then you can access any key/value pair from json with Querying JSON with LINQ
Or you can map your JObject to your custom type by using var result = jObject.ToObject<T>();
class Program
{
static void Main(string[] args)
{
var zohozoho_atliview92 = "{\"Itinerary\":[ {\"Client_Email\":\"garymc\", \"Client_Name\":\"Gary\", \"NT_Number\":\"NT-1237\",\"Number_of_Nights\":7, \"ID\":\"24297940\", \"Itinerary_Name\":\"Icelandnights\", \"Tour_Template_Name\":\"Iceland FireDrive\", \"Departure_Date\":\"2018-07-04\"}]}";
JObject jObject = JObject.Parse(zohozoho_atliview92);
Console.WriteLine(jObject);
Console.ReadLine();
}
}
Output:
This is not JSON, it's Javascript (wich object declaration is JSON).
Regular expressions are slow, I would advise you to use Substring
var start=inputString.IndexOf("[");
var end=("]");
var json=inputString.Substring(start, end-start);
There might be some of by one errors, test and correct.
It would be even faster but weaker to hardcode start.

Is there any way to "substitute" numbers in string C#?

I have html code, which I need to parse on the fly. I need to find exact divs there, which all have id of "content-text-" and then 6 numbers (like "content-text-123456"), which I don't know beforehand. Is there any way to "substitute" the numbers at the end of the string I'm searching for (like "content-text-######")? Searching for "content-text-" does not work.
I'm doing this project on Windows Phone 8.1 with C# if it matters.
EDIT:
WPPageResponse response = JsonConvert.DeserializeObject<WPPageResponse>(json);
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(response.content);
foreach (var node in doc.DocumentNode.Descendants("div").Where(div => div.GetAttributeValue("id", "") == "content-text-######"))
{
// Gather data what it returns
}
Here is some code if it helps. It works if I know the numbers and search with them, but the thing is that I can't know all the numbers there.
You can use Regex for this.
string data = "MyTest = 5564327";
string output = Regex.Replace(data, #"\d", "#");
Console.WriteLine(output);
Console.Read();
Output is:
MyTest = #######

Regex to replace JSON structure

I have JSON text like this :
...
"simples":{
"AS100ELABQVKANID-91057":{
"meta":{
"sku":"AS100ELABQVKANID-91057",
"price":"3669000.00",
"original_price":"3379000.00",
"special_to_date":"2015-03-19 23:59:59",
"shipment_type":"1",
"special_price":"3299000.00",
"tax_percent":"10.00",
"sourceability":"Sourceable",
"quantity":"15",
"variation":"...",
"package_type_position":"0",
"min_delivery_time":"1",
"max_delivery_time":"3",
"attribute_set_name":"electronics",
"3hours_shipment_available":false,
"estimated_delivery":"",
"estimated_delivery_position":""
},
"attributes":{
"package_type":"Parcel"
}
}
},
"description":
...
The above text appears repeatedly in my JSON text. I am trying to build every result to this :
"simples":[],"description"
So far, I have made this regex :
\"simples\":{(?:.*v(?:|=)|(?:.*)?)},\"description\"
But the result is cut everything from my first "simples" into last "description".
Regex newbie here.
Thanks in advance
I recommend parsing the JSON, replacing the value, then re-stringifying it
var obj = JSON.parse(json);
obj.simples = [];
json = JSON.stringify(obj);
Using a regexp for this is pure insanity
Don't use Regex to parse JSON; use a JSON parser.
Here is how you can do this using JSON.Net, assuming the object containing simples is part of a flat list of results:
JArray array = JArray.Parse(json);
foreach (JObject obj in array)
{
obj["simples"].Parent.Remove();
}
json = array.ToString();
If your JSON is more complicated (i.e. "simples" can appear at more than one level in the JSON), then you will need a recursive search to find and remove it. This answer has a helper method that can find a specific property by name anywhere in the JSON and return a list of all occurrences. Once you have the list of occurrences, you can then loop through them and remove them as shown above.

Using Regexp to get information in a KeyValuePair

Help me to parse this message:
text=&direction=re&orfo=rus&files_id=&message=48l16qL2&old_charset=utf-8&template_id=&HTMLMessage=1&draft_msg=&re_msg=&fwd_msg=&RealName=0&To=john+%3Cjohn11%40gmail.com%3E&CC=&BCC=&Subject=TestSubject&Body=%3Cp%3EHello+%D0%9F%D1%80%D0%B8%D0%B2%D0%B5%D1%82+%D1%82%D0%B5%D0%BA%D1%81%D1%82%3Cbr%3E%3Cbr%3E%3C%2Fp%3E&secur
I would like to get information in an KeyValuePair:
Key - Value
text -
direction - re
and so on.
And how to convert this: Hello+%D0%9F%D1%80%D0%B8%D0%B2%D0%B5%D1%82+%D1%82%D0%B5%D0%BA%D1%81%...
there are cyrillic character.
Thanks.
If you want to use a Regex, you can do it like this:
// I only added the first 3 keys, but the others are basically the same
Regex r = new Regex(#"text=(?<text>.*)&direction=(?<direction>.*)&orfo=(?<orfo>.*)");
Match m = r.Match(inputText);
if(m.Success)
{
var text = m.Groups["text"].Value; // result is ""
var direction = m.Groups["direction"].Value; // re
var orfo = m.Groups["orfo"].Value;
}
However, the method suggested by BoltClock is much better:
System.Collections.Specialized.NameValueCollection collection =
System.Web.HttpUtility.ParseQueryString(inputString);
It looks like you are dealing with a URI, better to use the proper class than try and figure out the detailed processing.
http://msdn.microsoft.com/en-us/library/system.uri.aspx

Best way to deserialize a long string (response of an external web service)

I am querying a web service that was built by another developer. It returns a result set in a JSON-like format. I get three column values (I already know what the ordinal position of each column means):
[["Boston","142","JJK"],["Miami","111","QLA"],["Sacramento","042","PPT"]]
In reality, this result set can be thousands of records long.
What's the best way to parse this string?
I guess a JSON deserializer would be nice, but what is a good one to use in C#/.NET? I'm pretty sure the System.Runtime.Serialization.Json serializer won't work.
Using the built in libraries for asp.net (System.Runtime.Serialization and System.ServiceModel.Web) you can get what you want pretty easily:
string[][] parsed = null;
var jsonStr = #"[[""Boston"",""142"",""JJK""],[""Miami"",""111"",""QLA""],[""Sacramento"",""042"",""PPT""]]";
using (var ms = new System.IO.MemoryStream(System.Text.Encoding.Default.GetBytes(jsonStr)))
{
var serializer = new System.Runtime.Serialization.Json.DataContractJsonSerializer(typeof(string[][]));
parsed = serializer.ReadObject(ms) as string[][];
}
A little more complex example (which was my original answer)
First make a dummy class to use for serialization. It just needs one member to hold the result which should be of type string[][].
[DataContract]
public class Result
{
[DataMember(Name="d")]
public string[][] d { get; set; }
}
Then it's as simple as wrapping your result up like so: { "d": /your results/ }. See below for an example:
Result parsed = null;
var jsonStr = #"[[""Boston"",""142"",""JJK""],[""Miami"",""111"",""QLA""],[""Sacramento"",""042"",""PPT""]]";
using (var ms = new MemoryStream(Encoding.Default.GetBytes(string.Format(#"{{ ""d"": {0} }}", jsonStr))))
{
var serializer = new System.Runtime.Serialization.Json.DataContractJsonSerializer(typeof(Result));
parsed = serializer.ReadObject(ms) as Result;
}
How about this?
It sounds like you have a pretty simple format that you could write a custom parser for, since you don't always want to wait for it to parse and return the entire thing before it uses it.
I would just write a recursive parser that looks for the tokens "[", ",", "\"", and "]" and does the appropriate thing.

Categories

Resources