MDX parser in C# - c#

I need to parse an MDX with my .Net application. Initially, I used regular expression to do this but the expressions are getting complicated and a regex expert suggested that it will be better if I use parser.
Is there any parser specifically for MDX? I tried Ranet but for some unknown reason it does not install in my machine (does not show any error message).
I need to split the several parts of the MDX into strings. For example, the where clause in one string, from clause in another etc.

The best solution would be to find a parser, but it is always very hard to find a parser for your specific needs. So if you end up with writing a parser Ve Parser is a better tool comparing to regex, because it provides more parsing functionalities, you can generate better output and since you are calling .net methods it implicitly have intellisence for writing your parser.
The downside is that it still is not well-documented, so you may find it difficult for some special scenarios.
Project Link : http://veparser.codeplex.com
NuGet identifier : veparser
If you need to get text for different parts of an MDX here is a partial sample code:
using VeParser;
using System.Linq;
using System.Collections.Generic;
using System;
public class MDXParser : TokenParser
{
protected override Parser GetRootParser()
{
// read the following line as : fill 'select' property of 'current object(which is a statement)' with the 'new value of selectStatement' after facing a sequence of a select statement and then the symbol of ( and then a delemitied list of identierfiers filling the 'fileds' property of 'current selectStatement object' delemitied by ',' and finally expect the sequence to be finished with a symbol of ')'
var selectStatement = fill("select", create<selectStatment>( seq(expectKeyword_of("select"), expectSymbol_of("("), deleimitedList(expectSymbol_of(","), fill("fields",identifier) ), expectSymbol_of(")"))));
// read the following line as : fill the from property of 'current object(which is a statement)' with an expected identifier that is after a 'from' keyword
var fromStatement = seq(expectKeyword_of("from"), fill("from", identifier));
// the following statement is incomplete, as I just wanted to show a sample bit, If you are interested I can help you complete the parser until the full documentation become available.
var whereStatement = fill("where", create<whereStatement>(seq(expectKeyword_of("where"))));
var statement = create<statement>(seq(selectStatement, fromStatement, whereStatement));
return statement;
}
public statement Parse(string code)
{
var keywords = new[] { "select", "where", "from" };
var symbols = new[] { "(",")", ".", "[", "]" };
var tokenList = Lexer.Parser(code, keywords, symbols, ignoreWhireSpaces : true);
// Now we have our string input converted into a list of tokens which actually is a list of words but with some additional information about any word, for example a "select" is marked as keyword
var parseResult = base.Parse(tokenList.tokens);
if (parseResult == null)
throw new Exception("Invalid Code, at the moment Ve Parser does not support any error reporting feature.");
else
return (statement)parseResult;
}
}
public class statement
{
public selectStatment select;
public string where;
public identifier from;
}
public class selectStatment
{
public List<identifier> fields;
}
public class whereStatement
{
}
This code is not complete, I just wanted to demonstrate how to use Ve Parser to write your own parser for MDX. If you liked the library and wanted to use it, I would be happy to provide you with the all descriptions and techniques you need.

You could take a look at parser generators like http://grammatica.percederberg.net/
Though it is hard work to formulate grammar and keep it up to date.

Related

Build JSON Safely

I'm running a project that is returning dynamically build JSON.
Recently I discovered that carriage returns, and double quotes the JSON string invalid (can't be loaded via an AJAX). I'm now replacing the parameter in question, removing any double quotes, and such, but I feel like I'm playing whack-a-mole.
Is there a better way?
In XML, for example, if I'm building a node, I can just call setAttribute( strMyJunkyString ), and it safely creates an attribute that will never break the XML, even if it has special characters, entities, etc.
Is there some sort of MakeStringJSONSafe() function, to remove anything that would break the array ([{}"\r\n])...
Here's a couple examples of a broken strings that my program is creating...
// String built with " included.
var t1 = [{"requestcomment":"Please complete "Education Provided" for all Medications "}];
// String built with returns embedded included.
var t2 = [{"requestcomment":"Please complete
Education Provided
History
Allergies
"}];
Use JSON.NET.
var jsonString = Newtonsoft.Json.JsonConvert.SerializeObject(new { requestcomment = "Please complete \"Education Provided\" for all Medications" });
and...
var jsonString = Newtonsoft.Json.JsonConvert.SerializeObject(new { requestcomment = "Please complete\nEducation Provided\nHistory\nAllergies" });

regex in string is being stripped out when converting to BsonDocument

I am creating quite a complex query for mongodb within .net using C#. To do this I am building the query as a string then parsing it to get a QueryDocument:
var Q = new QueryDocument(BsonDocument.Parse(QueryString))
My problem is that part of the query contains a regex:
{""Str.tagkw"":{$regex : "" \\b(rasberry|ice cream|sweeties)\\b ""}}
After parsing the $regex part has been removed when I look at the query Q (as above)
Any help would be welcome.
Your code appears to work for me:
string queryString = #"{""Str.tagkw"":{$regex : "" \\b(rasberry|ice cream|sweeties)\\b ""}}";
var Q = new QueryDocument(BsonDocument.Parse(queryString));
When you look at this in an IDE such as Visual Studio, it will be displayed as
{ "Str.tagkw" : / \b(rasberry|ice cream|sweeties)\b / }
That's the Javascript representation: In Javascript, you can create regular expressions using either
var regex = new RegExp("(foo|bar)");
or, as syntactic sugar
var regex = /(foo|bar)/;
The ToString method which will be used by the debugger seems to prefer the second representation, but that's just a matter of how it's displayed.

How to read a string containing XML elements without using the XML properties

I'm doing an XML reading process in my project. Where I have to read the contents of an XML file. I have achieved it.
Just out of curiosity, I also tried using the same by keeping the XML content inside a string and then read only the values inside the elemet tag. Even this I have achieved. The below is my code.
string xml = <Login-Form>
<User-Authentication>
<username>Vikneshwar</username>
<password>xxx</password>
</User-Authentication>
<User-Info>
<firstname>Vikneshwar</firstname>
<lastname>S</lastname>
<email>xxx#xxx.com</email>
</User-Info>
</Login-Form>";
XDocument document = XDocument.Parse(xml);
var block = from file in document.Descendants("client-authentication")
select new
{
Username = file.Element("username").Value,
Password = file.Element("password").Value,
};
foreach (var file in block)
{
Console.WriteLine(file.Username);
Console.WriteLine(file.Password);
}
Similarly, I obtained my other set of elements (firstname, lastname, and email). Now my curiosity draws me again. Now I'm thinking of doing the same using the string functions?
The same string used in the above code is to be taken. I'm trying not to use any XMl related classes, that is, XDocument, XmlReader, etc. The same output should be achieved using only string functions. I'm not able to do that. Is it possible?
Don't do it. XML is more complex than can appear the case, with complex rules surrounding nesting, character-escaping, named-entities, namespaces, ordering (attributes vs elements), comments, unparsed character data, and whitespace. For example, just add
<!--
<username>evil</username>
-->
Or
<parent xmlns=this:is-not/the/data/you/expected">
<username>evil</username>
</parent>
Or maybe the same in a CDATA section - and see how well basic string-based approaches work. Hint: you'll get a different answer to what you get via a DOM.
Using a dedicated tool designed for reading XML is the correct approach. At the minimum, use XmlReader - but frankly, a DOM (such as your existing code) is much more convenient. Alternatively, use a serializer such as XmlSerializer to populate an object model, and query that.
Trying to properly parse xml and xml-like data does not end well.... RegEx match open tags except XHTML self-contained tags
You could use methods like IndexOf, Equals, Substring etc. provided in String class to fulfill your needs, for more info Go here,
Using Regex is a considerable option too.
But it's advisable to use XmlDocument class for this purpose.
It can be done without regular expressions, like this:
string[] elementNames = new string[]{ "<username>", "<password>"};
foreach (string elementName in elementNames)
{
int startingIndex = xml.IndexOf(elementName);
string value = xml.Substring(startingIndex + elementName.Length,
xml.IndexOf(elementName.Insert(1, "/"))
- (startingIndex + elementName.Length));
Console.WriteLine(value);
}
With a regular expression:
string[] elementNames2 = new string[]{ "<username>", "<password>"};
foreach (string elementName in elementNames2)
{
string value = Regex.Match(xml, String.Concat(elementName, "(.*)",
elementName.Insert(1, "/"))).Groups[1].Value;
Console.WriteLine(value);
}
Of course, the only recommended thing is to use the XML parsing classes.
Build an extension method that will get the text between tags like this:
public static class StringExtension
{
public static string Between(this string content, string start, string end)
{
int startIndex = content.IndexOf(start) + start.Length;
int endIndex = content.IndexOf(end);
string result = content.Substring(startIndex, endIndex - startIndex);
return result;
}
}

Anyway to make a IList.Contains() act more like a wildcard contains?

I am trying to parse thru a csv string, put the results into a IList collection and then trying to find a way to do a wildcard 'contains' based on what was passed in. Right now I have the following:
public static IList<string> DBExclusionList
{
get
{
Regex splitRx = new Regex(#",\s*", RegexOptions.Compiled);
String list = (string)_asr.GetValue("DBExclusionList",typeof(string));
string[] fields = splitRx.Split(list);
return fields;
}
}
if (DBExclusionList.Contains(dbx.Name.ToString())==false)
{...}
So if the string I am parsing (key value from .config file) contains:
key="DBExclusionList" value="ReportServer,ReportServerTempDB,SQLSentry20,_TEST"
The DBExclusionList.Contains() works very well for exact matches on the first 3 items in the list, but I want to be able to ALSO have it for any partial match of the fourth item '_TEST'
is there any way to do it? I could certainly hardcode it to always exclude whatever but I'd rather not.
thanks.
Using Linq :
if (DBExclusionList.Any(s => s.Contains(dbx.Name.ToString())))
Since you're using .NET 3.5, you could use the Where() LINQ extension method:
DBExclusionList.Where(item => item.Contains(dbx.Name.ToString()))
Don't use regex to split. Use string.Split() and delimit your _asr.GetValue("DBExclusionList) with something like semi-colons.
Each item in _asr.GetValue("DBExclusionList) should be a regex pattern, so you check against each one in turn.

Newbie needs to know how to parse a whole text group like this

I need to parse a textfile with about 10000 groupings like this
group "C_BatTemp" -- block-group
{
block: "Constant"
flags: BLOCK|COLLAPSED
}
-- Skipping output Out1
p_untitled_P_real_T_0[1]
{
type: flt(64,IEEE)*
alias: "Value"
flags: PARAM
}
endgroup -- block-group "C_BatTemp"
The desired objects I expect the parser to fill look like this
string Varname = "C_BatTemp";
string GroupType = "Constant";
string BaseAdressName = "p_untitled_P_real_T_0";
int AdressOffset = 1; // number in parenthesis p_untitled_P_real_T_0[1]<----
string VarType = "flt(64, IEEE)";
bool IsPointer = true; // true if VarType is "flt(64, IEEE)*" ,
//false if "flt(64, IEEE)"
string VarAlias = "Value";
What is the best way of parsing this ??
How sould I start ?
One solution might be using a regular expression. I quickly made up one, but it might require some additional tuning to fit your exact needs. It works for your example, but might fail for other inputs. The expression is very much tailored to the given example especially in respect to line breaks and comments.
CODE
String input =
#"group ""C_BatTemp"" -- block-group
{
block: ""Constant""
flags: BLOCK|COLLAPSED
}
-- Skipping output Out1
p_untitled_P_real_T_0[1]
{
type: flt(64,IEEE)*
alias: ""Value""
flags: PARAM
}
endgroup -- block-group ""C_BatTemp""";
String pattern = #"^group\W*""(?<varname>[^""]*)""[^{]*{\W*block:\W*""(?<grouptype>[^""]*)""[^}]*}$(\W*--.*$)*\W*(?<baseaddressname>[^[]*)\[(?<addressoffset>[^\]]*)][^{]*{\W*type:\W*(?<vartype>.*)$\W*alias:\W*""(?<alias>[^""]*)""[^}]*}\W*endgroup.*$";
foreach (Match match in Regex.Matches(input.Replace("\r\n", "\n"), pattern, RegexOptions.Multiline))
{
Console.WriteLine(match.Groups["varname"].Value);
Console.WriteLine(match.Groups["grouptype"].Value);
Console.WriteLine(match.Groups["baseaddressname"].Value);
Console.WriteLine(match.Groups["addressoffset"].Value);
Console.WriteLine(match.Groups["vartype"].Value);
Console.WriteLine(match.Groups["vartype"].Value.EndsWith("*"));
Console.WriteLine(match.Groups["alias"].Value);
}
OUTPUT
C_BatTemp
Constant
p_untitled_P_real_T_0
1
flt(64,IEEE)*
True
Value
I had to do something similar recently.
Break each block of data into records (would these be your 'groups'?). Extract each element you need from each record using Regular Expressions.
Without a clearer idea of the data I can't elaborate.

Categories

Resources