Syntax Highlighting in FlowDocumentControl for RScript - c#

We have use the following Regex function to highlight the string and numeric
String Regex function
public string StringRegEx
{
get { return #"#?""""|#?"".*?(?!\\).""|''|'.*?(?!\\).'"; }
}
Numeric Regex function
public string NumberRegEX
{
get { return #"[0-9].*?(?=:[0-9]*)?"; }
}
while using this regex function we have face some issues for highlighting string contains numeric
p1 = 1
p2 = 0.2
In this example, 1 and 2 in p1 and p2 also highlighted. How to skip the number highlighted along with the string?

For a more general approach on how to properly catch things when dealing with a programming language snippet, take a look here.
Your problem might not be "comments in strings, strings in comments" but it is similar, namely "letters in a string that started with a number, numbers in a string that started with a letter" so you'll need a similar approach with pipe-separated regexes for the different matches you wanna have.
A more thorough explanation of this design-pattern is given here.

Related

Read input with different datatypes and space seperation

I'm trying to figure out how to write code to let the user input three values (string, int, int) in one line with space to separate the values.
I thought of doing it with String.Split Method but that only works if all the values have the same datatype.
How can I do it with different datatypes?
For example:
The user might want to input
Hello 23 54
I'm using console application C#
Well the first problem is that you need to decide whether the text the user enters itself can contain spaces. For example, is the following allowed?
Hello World, it's me 08 15
In that case, String.Split will not really be helpful.
What I'd try is using a regular expression. The following may serve as a starting point:
Match m = Regex.Match(input, #"^(?<text>.+) (?<num1>(\+|\-)?\d+) (?<num2>(\+|\-)?\d+)$");
if (m.Success)
{
string stringValue = m.Groups["text"].Value;
int num1 = Convert.ToInt32(m.Groups["num1"].Value);
int num2 = Convert.ToInt32(m.Groups["num2"].Value);
}
BTW: The following part of your question makes me frown:
I thought of doing it with String.Split Method but that only works if all the values have the same datatype.
A string is always just a string. Whether it contains a text, your email-address or your bank account balance. It is always just a series of characters. The notion that the string contains a number is just your interpretation!
So from a program's point of view, the string you gave is a series of characters. And for splitting that it doesn't matter at all what the real semantics of the content are.
That's why the splitting part is separate from the conversion part. You need to tell your application that that the first part is a string, the second and third parts however are supposed to be numbers. That's what you need type conversions for.
You are confusing things. A string is either null, empty or contains a sequence of characters. It never contains other data types. However, it might contain parts that could be interpreted as numbers, dates, colors etc... (but they are still strings). "123" is not an int! It is a string containing a number.
In order to extract these pieces you need to do two things:
Split the string into several string parts.
Convert string parts that are supposed to represent whole numbers into a the int type (=System.Int32).
string input = "Abc 123 456"
string[] parts = input.Split(); //Whitespaces are assumed as separators by default.
if (parts.Count == 3) {
Console.WriteLine("The text is \"{0}\"", parts[0]);
int n1;
if (Int32.TryParse(parts[1], out n1)) {
Console.WriteLine("The 1st number is {0}", n1);
} else {
Console.WriteLine("The second part is supposed to be a whole number.");
}
int n2;
if (Int32.TryParse(parts[2], out n2)) {
Console.WriteLine("The 2nd number is {0}", n2);
} else {
Console.WriteLine("The third part is supposed to be a whole number.");
}
} else {
Console.WriteLine("You must enter three parts separated by a space.");
}
What you have to do is get "Hello 23 54" in a string variable. Split by " " and treat them.
string value = "Hello 23 54";
var listValues = value.Split(' ').ToList();
After that you have to parse each item from listValues to your related types.
Hope it helps. ;)

grouping adjacent similar substrings

I am writing a program in which I want to group the adjacent substrings, e.g ABCABCBC can be compressed as 2ABC1BC or 1ABCA2BC.
Among all the possible options I want to find the resultant string with the minimum length.
Here is code what i have written so far but not doing job. Kindly help me in this regard.
using System;
using System.Collections.Generic;
using System.Linq;
namespace EightPrgram
{
class Program
{
static void Main(string[] args)
{
string input;
Console.WriteLine("Please enter the set of operations: ");
input = Console.ReadLine();
char[] array = input.ToCharArray();
List<string> list = new List<string>();
string temp = "";
string firstTemp = "";
foreach (var x in array)
{
if (temp.Contains(x))
{
firstTemp = temp;
if (list.Contains(firstTemp))
{
list.Add(firstTemp);
}
temp = "";
list.Add(firstTemp);
}
else
{
temp += x;
}
}
/*foreach (var item in list)
{
Console.WriteLine(item);
}*/
Console.ReadLine();
}
}
}
You can do this with recursion. I cannot give you a C# solution, since I do not have a C# compiler here, but the general idea together with a python solution should do the trick, too.
So you have an input string ABCABCBC. And you want to transform this into an advanced variant of run length encoding (let's called it advanced RLE).
My idea consists of a general first idea onto which I then apply recursion:
The overall target is to find the shortest representation of the string using advanced RLE, let's create a function shortest_repr(string).
You can divide the string into a prefix and a suffix and then check if the prefix can be found at the beginning of the suffix. For your input example this would be:
(A, BCABCBC)
(AB, CABCBC)
(ABC, ABCBC)
(ABCA, BCBC)
...
This input can be put into a function shorten_prefix, which checks how often the suffix starts with the prefix (e.g. for the prefix ABC and the suffix ABCBC, the prefix is only one time at the beginning of the suffix, making a total of 2 ABC following each other. So, we can compact this prefix / suffix combination to the output (2ABC, BC).
This function shorten_prefix will be used on each of the above tuples in a loop.
After using the function shorten_prefix one time, there still is a suffix for most of the string combinations. E.g. in the output (2ABC, BC), there still is the string BC as suffix. So, need to find the shortest representation for this remaining suffix. Wooo, we still have a function for this called shortest_repr, so let's just call this onto the remaining suffix.
This image displays how this recursion works (I only expanded one of the node after the 3rd level, but in fact all of the orange circles would go through recursion):
We start at the top with a call of shortest_repr to the string ABABB (I selected a shorter sample for the image). Then, we split this string at all possible split positions and get a list of prefix / suffix pairs in the second row. On each of the elements of this list we first call the prefix/suffix optimization (shorten_prefix) and retrieve a shortened prefix/suffix combination, which already has the run-length numbers in the prefix (third row). Now, on each of the suffix, we call our recursion function shortest_repr.
I did not display the upward-direction of the recursion. When a suffix is the empty string, we pass an empty string into shortest_repr. Of course, the shortest representation of the empty string is the empty string, so we can return the empty string immediately.
When the result of the call to shortest_repr was received inside our loop, we just select the shortest string inside the loop and return this.
This is some quickly hacked code that does the trick:
def shorten_beginning(beginning, ending):
count = 1
while ending.startswith(beginning):
count += 1
ending = ending[len(beginning):]
return str(count) + beginning, ending
def find_shortest_repr(string):
possible_variants = []
if not string:
return ''
for i in range(1, len(string) + 1):
beginning = string[:i]
ending = string[i:]
shortened, new_ending = shorten_beginning(beginning, ending)
shortest_ending = find_shortest_repr(new_ending)
possible_variants.append(shortened + shortest_ending)
return min([(len(x), x) for x in possible_variants])[1]
print(find_shortest_repr('ABCABCBC'))
print(find_shortest_repr('ABCABCABCABCBC'))
print(find_shortest_repr('ABCABCBCBCBCBCBC'))
Open issues
I think this approach has the same problem as the recursive levenshtein distance calculation. It calculates the same suffices multiple times. So, it would be a nice exercise to try to implement this with dynamic programming.
If this is not a school assignment or performance critical part of the code, RegEx might be enough:
string input = "ABCABCBC";
var re = new Regex(#"(.+)\1+|(.+)", RegexOptions.Compiled); // RegexOptions.Compiled is optional if you use it more than once
string output = re.Replace(input,
m => (m.Length / m.Result("$1$2").Length) + m.Result("$1$2")); // "2ABC1BC" (case sensitive by default)

Complex string compare logic

I need help with some complex (for me anyway as I not too experienced) string comparison logic. Basically, I want to validate a string to make sure it matches a format rule. I am using C#, targeting .NET 4.5.2.
I am trying to work with an API which gives me the expected format of the string this way:
1:420+4:9#### (must have “420” starting in position 1 AND have a “9” in position 4 AND have numeric digits in positions 5-8
2:Z+14:&&+20:10,11,12 (must have a “Z” in position 2 AND and alpha letters in positions 14, 15 AND have either “10”, “11”, or “12” starting in position 20
Legend:
":" = position/valuelist separator
"," = value separator
"+" = test separator
"#" = numeric digit-only wildcard
"&" = alpha letter-only wildcard
Given this, my first thought is to do a series of substrings and splits of the input string and then do compare on each section? Or, I could do a for loop and iterate through each character one by one until I hit the end of the length of the input string.
Let's assume in this case that the input string is something like "420987435744585". Using rule number one, I should get a pass on this since the first three are 420, position 4 is a 9 and the next 5-8 are numeric.
So far, I have created a method that returns a bool if I pass/fail validation. The input string is passed in. I then started to split on + or - to get all of the and or not sections and then split on comma to get the groups of rules. But this is where I am stuck. It seems like it should be easy and maybe it is but I just can't seem to wrap my head around it and I am thinking I am going to end up with a ton of arrays, foreach loops, if statements, etc... Just to validate and return true/false if the input string matches my format.
Can somebody please assist and give some guidance?
Thank you!!!!
The best way to handle these conditions would be using Regular Expressions (Regex). At first, you may find it a bit complicated, but it's worth to put time on learning it to handle all types of string patterns in a simple non-verbose way.
You can start with these tutorials :
http://www.codeproject.com/Articles/9099/The-Minute-Regex-Tutorial
http://www.tutorialspoint.com/csharp/csharp_regular_expressions.htm
And use this one as a reference :
https://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx
I think the best way is a custom function, it will be faster than RegEx, and it would be a lot of manual work to convert that format to RegEx.
I've made a start at the validation function, and it's testing ok for the samples you provided.
Here is the code:
static bool CheckFormat(string formatString, string value)
{
string[] tests = formatString.Split('+');
foreach(string test in tests)
{
string[] testElement = test.Split(':');
int startPos = int.Parse(testElement[0]);
string patterns = testElement[1];
string[] patternElements = patterns.Split(',');
foreach(string patternElement in patternElements)
{
//value string not long enough, so fail.
if(startPos + patternElement.Length > value.Length)
return false;
for (int i = 0; i < patternElement.Length; i++)
{
switch(patternElement[i])
{
case '#':
if (!Char.IsNumber(value[i]))
return false;
break;
case '&':
if (!Char.IsLetter(value[i]))
return false;
break;
default:
if(patternElement[i] != value[i])
return false;
break;
}
}
}
}
return true;
}
The dotnet fiddle is here if you want to play with it: https://dotnetfiddle.net/52olLQ.
Good luck.

Match string with hex string

I'm converting some of my code from C++, and wanted to take advantage of Regex for a scenario in my program. The user story says that the string needs to be 3 sets of hex numbers between 4 tags (however these tags didn't have end tags sigh) The 4 tags to be used were <DIV>, <GKY>, <UID>, <END> well I like to give my users a little more flexibility in their code if they so desire, so what I was hoping for a simple regex expression that I could write a simple method around. I found the code I wanted to match if it is a hex string ( think I do atleast), but i can't get my Reg expression test tool to match with a tag behind it. Take this string for example.
<DIV>A9F81123C8288B34758D0481E8271843<GKY><UID><END>
I wouldn't mind if the regex expression returned <DIV>A9... or if it return just the hex string. but I would want it to be able to return it from all 3 of these scenarios
<DIV>A9F81123C8288B34758D0481E8271843<GKY><UID><END>
<GKY><DIV>A9F81123C8288B34758D0481E8271843<UID><END>
<GKY><UID><DIV>A9F81123C8288B34758D0481E8271843<END>
a full key example would look something like this
<DIV>A9F81123C8288B34758D0481E8271843<GKY>1234568790ABCDEF0<UID>0422ABCDEF<END>
so far all I have in my unit test is to tell that the string contains the 4 Tags. So i'm stuck right here
public static KeyInputParser ParseKeyInputString(string inputKey)
{
if (string.IsNullOrEmpty(inputKey)) throw new ArgumentNullException("inputKey", "Input Key can't be null or empty");
inputKey = inputKey.ToUpper();
var key = new KeyInputParser();
AssertKeyContainsTheseTags(inputKey, "<DIV>", "<GKY>", "<UID>", "<END>");
//DIV must always be 16 bytes
string div = Regex.Match(inputKey, #"<DIV>^([A-Fa-f0-9]{2}){16}$").Value;
//UID can be 5, 7, or 10 bytes
//not sure on GKY but it must be more than 1 byte
return key;
}
div is returning empty
If you do not really care about tags themselves, you can try this:
(?<=>)[A-Fa-f0-9]+(?=<)
It correctly matches all your test cases, see it in action on Rubular.
If you want the preceding tag as well, this is ok (preview here):
(?<tag><\w+>)(?<string>[A-Fa-f0-9]+)(?=<)
string div = Regex.Match(inputKey, #"<DIV>([A-Fa-f0-9]{32})").Value;
It should work for you:
^((?<gdiv><DIV>[A-Fa-f0-9]*)|(?<ggky><GKY>[A-Fa-f0-9]*)|(?<guid><UID>[A-Fa-f0-9]*))*<END>$
Tests:
input: <DIV>A9F81123C8288B34758D0481E8271843<GKY><UID><END>
matches: gdiv <DIV>A9F81123C8288B34758D0481E8271843
ggky <GKY>
guid <UID>
input: <GKY><DIV>A9F81123C8288B34758D0481E8271843<UID><END>
matches: gdiv <DIV>A9F81123C8288B34758D0481E8271843
ggky <GKY>
guid <UID>
input: <GKY><UID><DIV>A9F81123C8288B34758D0481E8271843<END>
matches: gdiv <DIV>A9F81123C8288B34758D0481E8271843
ggky <GKY>
guid <UID>
input: <UID>0422ABCDEF<DIV>A9F81123C8288B34758D0481E8271843<GKY>1234568790ABCDEF0<END>
matches: gdiv <DIV>A9F81123C8288B34758D0481E8271843
ggky <GKY>1234568790ABCDEF0
guid <UID>0422ABCDEF
input: <GKY>1234568790ABCDEF0<DIV>A9F81123C8288B34758D0481E8271843<UID>0422ABCDEF<END>
matches: gdiv <DIV>A9F81123C8288B34758D0481E8271843
ggky <GKY>1234568790ABCDEF0
guid <UID>0422ABCDEF
See examples at rebular.
NOTE:
While one of tags (DIV, GKY, or UID) values may be empty, so I would recommend you to use [A-Fa-f0-9]* instead of -for example- [A-Fa-f0-9]{16} and test length of values by your self.

Code an elegant way to strip strings

I am using C# and in one of the places i got list of all peoples names with their email id's in the format
name(email)\n
i just came with this sub string stuff just off my head. I am looking for more elegant, fast ( in the terms of access time, operations it performs), easy to remember line of code to do this.
string pattern = "jackal(jackal#gmail.com)";
string email = pattern.SubString(pattern.indexOf("("),pattern.LastIndexOf(")") - pattern.indexOf("("));
//extra
string email = pattern.Split('(',')')[1];
I think doing the above would do sequential access to each character until it finds the index of the character. Works ok now since name is short, but would struggle when having a large name ( hope people don't have one)
A dirty hack would be to let microsoft do it for you.
try
{
new MailAddress(input);
//valid
}
catch (Exception ex)
{
// invalid
}
I hope they would do a better job than a custom reg-ex.
Maintaining a custom reg-ex that takes care of everything might involve some effort.
Refer: MailAddress
Your format is actually very close to some supported formats.
Text within () are treated as comments, but if you replace ( with < and ) with > and get a supported format.
The second parameter in Substring() is the length of the string to take, not the ending index.
Your code should read:
string pattern = "jackal(jackal#gmail.com)";
int start = pattern.IndexOf("(") + 1;
int end = pattern.LastIndexOf(")");
string email = pattern.Substring(start, end - start);
Alternatively, have a look at Regular Expression to find a string included between two characters while EXCLUDING the delimiters

Categories

Resources