Regular Expression C#, HTML parse - c#
Please help.
I have a text from html, and I need to parse it.
Text:
converter.rates =
{"3":{"USD":{"buy":27.950001,"sell":28.190001},"EUR":{"buy":32.049999,"sell":32.689999}},"8":{"RUB":{"buy":0.27,"sell":0.43},"USD":{"buy":27.799999,"sell":28.200001},"EUR":{"buy":31.700001,"sell":32.549999}},"41":{"USD":{"buy":28.0,"sell":28.200001},"EUR":{"buy":31.950001,"sell":32.650002}},"46":{"RUB":{"buy":0.413,"sell":0.443},"USD":{"buy":28.0,"sell":28.25},"EUR":{"buy":31.73,"sell":32.73}},"47":{"RUB":{"buy":0.41,"sell":0.448},"USD":{"buy":27.98,"sell":28.15},"EUR":{"buy":31.889999,"sell":32.540001}},"48":{"RUB":{"buy":0.4,"sell":0.43},"USD":{"buy":28.0,"sell":28.200001},"EUR":{"buy":32.099998,"sell":32.490002}},"52":{"RUB":{"buy":0.41,"sell":0.43},"USD":{"buy":27.950001,"sell":28.25},"EUR":{"buy":32.0,"sell":32.5}},"77":{"RUB":{"buy":0.38,"sell":0.43},"USD":{"buy":28.049999,"sell":28.200001},"EUR":{"buy":32.049999,"sell":32.5}},"79":{"RUB":{"buy":0.412,"sell":0.444},"USD":{"buy":27.950001,"sell":28.799999},"EUR":{"buy":31.959999,"sell":33.099998}},"80":{"RUB":{"buy":0.38,"sell":0.43},"USD":{"buy":28.030001,"sell":28.190001},"EUR":{"buy":32.0,"sell":32.450001}},"70":{"RUB":{"buy":0.39,"sell":0.42},"USD":{"buy":28.0,"sell":28.25},"EUR":{"buy":32.0,"sell":32.200001}},"1":{"RUB":{"buy":0.42658,"sell":0.42658},"USD":{"buy":28.036648,"sell":28.036648},"EUR":{"buy":32.256161,"sell":32.256161}},"4":{"RUB":{"buy":0.42,"sell":0.43},"USD":{"buy":27.950001,"sell":28.25},"EUR":{"buy":32.150002,"sell":32.599998}},"10":{"RUB":{"buy":0.414,"sell":0.435},"USD":{"buy":28.0,"sell":28.200001},"EUR":{"buy":32.0,"sell":32.599998}},"13":{"RUB":{"buy":0.275,"sell":0.46},"USD":{"buy":27.9,"sell":28.200001},"EUR":{"buy":31.67,"sell":32.599998}},"15":{"RUB":{"buy":0.3749,"sell":0.4395},"USD":{"buy":27.985001,"sell":28.2075},"EUR":{"buy":32.036366,"sell":32.529091}},"31":{"RUB":{"buy":0.275,"sell":0.42},"USD":{"buy":27.9,"sell":28.139999},"EUR":{"buy":31.799999,"sell":32.400002}},"32":{"RUB":{"buy":0.42,"sell":0.5},"USD":{"buy":28.07,"sell":28.299999},"EUR":{"buy":32.150002,"sell":32.599998}},"39":{"USD":{"buy":28.07,"sell":28.25},"EUR":{"buy":32.150002,"sell":32.549999}},"40":{"RUB":{"buy":0.41,"sell":0.43},"USD":{"buy":27.950001,"sell":28.139999},"EUR":{"buy":32.049999,"sell":32.400002}},"64":{"RUB":{"buy":0.4,"sell":0.425},"USD":{"buy":27.9,"sell":28.200001},"EUR":{"buy":32.099998,"sell":32.599998}},"73":{"RUB":{"buy":0.4,"sell":0.43},"USD":{"buy":28.0,"sell":28.299999},"EUR":{"buy":32.0,"sell":32.549999}},"74":{"RUB":{"buy":0.41,"sell":0.435},"USD":{"buy":28.049999,"sell":28.25},"EUR":{"buy":31.799999,"sell":32.5}},"85":{"RUB":{"buy":0.3,"sell":0.43},"USD":{"buy":28.0,"sell":28.200001},"EUR":{"buy":32.099998,"sell":32.52}},"86":{"RUB":{"buy":0.37,"sell":0.42},"USD":{"buy":28.0,"sell":28.200001},"EUR":{"buy":32.0,"sell":32.799999}},"88":{"RUB":{"buy":0.35,"sell":0.5},"USD":{"buy":28.0,"sell":28.15},"EUR":{"buy":32.099998,"sell":32.450001}},"90":{"RUB":{"buy":4.0,"sell":4.4},"USD":{"buy":28.0,"sell":28.15},"EUR":{"buy":31.950001,"sell":32.450001}}}
I need next info from it:
code of bank - "3"
and USD rate - 27.950001, 28.190001
My expression:
#"(\d+)":..USD....\w+..(\d+.\d+)........(\d+.\d+)"
But it didn't work, because the USD does not always go first after the bank code
This is a JSON document. JSON is a recursive format, and regular expressions are notoriously hard to use when parsing recursive data.
Please use a specified parser, like NewtonSoft JSON:
var rawData = #"converter.rates = { ... }"; // original string
var rawJson = rawData.Substring("converter.rates = ".Length); // remove the prefix
var json = JObject.Parse(rawJson); // convert to a JSON data structure
Then you can use it like a dictionary:
foreach(var codeEntry in json)
{
foreach(var currencyEntry in codeEntry.Value)
{
var code = codeEntry.Key;
var currency = currencyEntry.Key;
var buy = currencyEntry.Value["buy"].Value<double>();
var sell = currencyEntry.Value["buy"].Value<double>();
Console.WriteLine($"code of bank - {code} and {currency} rate - {buy}, {sell} ");
}
}
If you still want to use regex, this can do it:
#"""(?<code>\d+)"":\{.*?(?<=""USD""):\{""buy"":(?<buy>\d+\.\d+),""sell"":(?<sell>\d+.\d+)\}"
It is build from your example. Basically it creates three named Groups 'code', 'buy' and 'sell'. Other than that it matches literal characters, only using a look behind '(?<=""USD"")' to find 'USD' to get the wanted rates.
Edit:
If you have a html document and want to grap the 'converter.rates' var as text, you can use this regex:
#"converter.rates\s?=.*\}\}\}"
It looks for the 3 '}' ending the string.
Related
Regular Expressions C#: Clean bad Json string
I get an answer from the server in the form of such JSON: var zohozoho_atliview92 = {\"Itinerary\":[ {\"Client_Email\":\"garymc\", \"Client_Name\":\"Gary\", \"NT_Number\":\"NT-1237\",\"Number_of_Nights\":7, \"ID\":\"24297940\", \"Itinerary_Name\":\"Icelandnights\", \"Tour_Template_Name\":\"Iceland FireDrive\", \"Departure_Date\":\"2018-07-04\"} ]}; I need to remove this: var zohozoho_atliview92 = {\"Itinerary\":[ and delete last 3 characters ]}; to Deserialize it in my object. How can i make it using Regular Expressions? Or is there a better variant?
is there a better variant? Yes you can parse your json escaped string to JObject. And then you can access any key/value pair from json with Querying JSON with LINQ Or you can map your JObject to your custom type by using var result = jObject.ToObject<T>(); class Program { static void Main(string[] args) { var zohozoho_atliview92 = "{\"Itinerary\":[ {\"Client_Email\":\"garymc\", \"Client_Name\":\"Gary\", \"NT_Number\":\"NT-1237\",\"Number_of_Nights\":7, \"ID\":\"24297940\", \"Itinerary_Name\":\"Icelandnights\", \"Tour_Template_Name\":\"Iceland FireDrive\", \"Departure_Date\":\"2018-07-04\"}]}"; JObject jObject = JObject.Parse(zohozoho_atliview92); Console.WriteLine(jObject); Console.ReadLine(); } } Output:
This is not JSON, it's Javascript (wich object declaration is JSON). Regular expressions are slow, I would advise you to use Substring var start=inputString.IndexOf("["); var end=("]"); var json=inputString.Substring(start, end-start); There might be some of by one errors, test and correct. It would be even faster but weaker to hardcode start.
Is there any way to "substitute" numbers in string C#?
I have html code, which I need to parse on the fly. I need to find exact divs there, which all have id of "content-text-" and then 6 numbers (like "content-text-123456"), which I don't know beforehand. Is there any way to "substitute" the numbers at the end of the string I'm searching for (like "content-text-######")? Searching for "content-text-" does not work. I'm doing this project on Windows Phone 8.1 with C# if it matters. EDIT: WPPageResponse response = JsonConvert.DeserializeObject<WPPageResponse>(json); HtmlDocument doc = new HtmlDocument(); doc.LoadHtml(response.content); foreach (var node in doc.DocumentNode.Descendants("div").Where(div => div.GetAttributeValue("id", "") == "content-text-######")) { // Gather data what it returns } Here is some code if it helps. It works if I know the numbers and search with them, but the thing is that I can't know all the numbers there.
You can use Regex for this. string data = "MyTest = 5564327"; string output = Regex.Replace(data, #"\d", "#"); Console.WriteLine(output); Console.Read(); Output is: MyTest = #######
Regex to replace JSON structure
I have JSON text like this : ... "simples":{ "AS100ELABQVKANID-91057":{ "meta":{ "sku":"AS100ELABQVKANID-91057", "price":"3669000.00", "original_price":"3379000.00", "special_to_date":"2015-03-19 23:59:59", "shipment_type":"1", "special_price":"3299000.00", "tax_percent":"10.00", "sourceability":"Sourceable", "quantity":"15", "variation":"...", "package_type_position":"0", "min_delivery_time":"1", "max_delivery_time":"3", "attribute_set_name":"electronics", "3hours_shipment_available":false, "estimated_delivery":"", "estimated_delivery_position":"" }, "attributes":{ "package_type":"Parcel" } } }, "description": ... The above text appears repeatedly in my JSON text. I am trying to build every result to this : "simples":[],"description" So far, I have made this regex : \"simples\":{(?:.*v(?:|=)|(?:.*)?)},\"description\" But the result is cut everything from my first "simples" into last "description". Regex newbie here. Thanks in advance
I recommend parsing the JSON, replacing the value, then re-stringifying it var obj = JSON.parse(json); obj.simples = []; json = JSON.stringify(obj); Using a regexp for this is pure insanity
Don't use Regex to parse JSON; use a JSON parser. Here is how you can do this using JSON.Net, assuming the object containing simples is part of a flat list of results: JArray array = JArray.Parse(json); foreach (JObject obj in array) { obj["simples"].Parent.Remove(); } json = array.ToString(); If your JSON is more complicated (i.e. "simples" can appear at more than one level in the JSON), then you will need a recursive search to find and remove it. This answer has a helper method that can find a specific property by name anywhere in the JSON and return a list of all occurrences. Once you have the list of occurrences, you can then loop through them and remove them as shown above.
Using Regexp to get information in a KeyValuePair
Help me to parse this message: text=&direction=re&orfo=rus&files_id=&message=48l16qL2&old_charset=utf-8&template_id=&HTMLMessage=1&draft_msg=&re_msg=&fwd_msg=&RealName=0&To=john+%3Cjohn11%40gmail.com%3E&CC=&BCC=&Subject=TestSubject&Body=%3Cp%3EHello+%D0%9F%D1%80%D0%B8%D0%B2%D0%B5%D1%82+%D1%82%D0%B5%D0%BA%D1%81%D1%82%3Cbr%3E%3Cbr%3E%3C%2Fp%3E&secur I would like to get information in an KeyValuePair: Key - Value text - direction - re and so on. And how to convert this: Hello+%D0%9F%D1%80%D0%B8%D0%B2%D0%B5%D1%82+%D1%82%D0%B5%D0%BA%D1%81%... there are cyrillic character. Thanks.
If you want to use a Regex, you can do it like this: // I only added the first 3 keys, but the others are basically the same Regex r = new Regex(#"text=(?<text>.*)&direction=(?<direction>.*)&orfo=(?<orfo>.*)"); Match m = r.Match(inputText); if(m.Success) { var text = m.Groups["text"].Value; // result is "" var direction = m.Groups["direction"].Value; // re var orfo = m.Groups["orfo"].Value; } However, the method suggested by BoltClock is much better: System.Collections.Specialized.NameValueCollection collection = System.Web.HttpUtility.ParseQueryString(inputString);
It looks like you are dealing with a URI, better to use the proper class than try and figure out the detailed processing. http://msdn.microsoft.com/en-us/library/system.uri.aspx
Best way to deserialize a long string (response of an external web service)
I am querying a web service that was built by another developer. It returns a result set in a JSON-like format. I get three column values (I already know what the ordinal position of each column means): [["Boston","142","JJK"],["Miami","111","QLA"],["Sacramento","042","PPT"]] In reality, this result set can be thousands of records long. What's the best way to parse this string? I guess a JSON deserializer would be nice, but what is a good one to use in C#/.NET? I'm pretty sure the System.Runtime.Serialization.Json serializer won't work.
Using the built in libraries for asp.net (System.Runtime.Serialization and System.ServiceModel.Web) you can get what you want pretty easily: string[][] parsed = null; var jsonStr = #"[[""Boston"",""142"",""JJK""],[""Miami"",""111"",""QLA""],[""Sacramento"",""042"",""PPT""]]"; using (var ms = new System.IO.MemoryStream(System.Text.Encoding.Default.GetBytes(jsonStr))) { var serializer = new System.Runtime.Serialization.Json.DataContractJsonSerializer(typeof(string[][])); parsed = serializer.ReadObject(ms) as string[][]; } A little more complex example (which was my original answer) First make a dummy class to use for serialization. It just needs one member to hold the result which should be of type string[][]. [DataContract] public class Result { [DataMember(Name="d")] public string[][] d { get; set; } } Then it's as simple as wrapping your result up like so: { "d": /your results/ }. See below for an example: Result parsed = null; var jsonStr = #"[[""Boston"",""142"",""JJK""],[""Miami"",""111"",""QLA""],[""Sacramento"",""042"",""PPT""]]"; using (var ms = new MemoryStream(Encoding.Default.GetBytes(string.Format(#"{{ ""d"": {0} }}", jsonStr)))) { var serializer = new System.Runtime.Serialization.Json.DataContractJsonSerializer(typeof(Result)); parsed = serializer.ReadObject(ms) as Result; }
How about this?
It sounds like you have a pretty simple format that you could write a custom parser for, since you don't always want to wait for it to parse and return the entire thing before it uses it. I would just write a recursive parser that looks for the tokens "[", ",", "\"", and "]" and does the appropriate thing.