Can you improve this 'lines of code algorithm' in F#? - c#

I've written a little script to iterate across files in folders to count lines of code.
The heart of the script is this function to count lines of whitespace, comments, and code. (Note that for the moment it is tailored to C# and doesn't know about multi-line comments).
It just doesn't look very nice to me - has anyone got a cleaner version?
// from list of strings return tuple with count of (whitespace, comments, code)
let loc (arr:List<string>) =
let innerloc (whitesp, comment, code) (l:string) =
let s = l.Trim([|' ';'\t'|]) // remove leading whitespace
match s with
| "" -> (whitesp + 1, comment, code) //blank lines
| "{" -> (whitesp + 1, comment, code) //opening blocks
| "}" -> (whitesp + 1, comment, code) //closing blocks
| _ when s.StartsWith("#") -> (whitesp + 1, comment, code) //regions
| _ when s.StartsWith("//") -> (whitesp, comment + 1, code) //comments
| _ -> (whitesp, comment, code + 1)
List.fold_left innerloc (0,0,0) arr

I think what you have is fine, but here's some variety to mix it up. (This solution repeats your problem of ignoring trailing whitespace.)
type Line =
| Whitespace = 0
| Comment = 1
| Code = 2
let Classify (l:string) =
let s = l.TrimStart([|' ';'\t'|])
match s with
| "" | "{" | "}" -> Line.Whitespace
| _ when s.StartsWith("#") -> Line.Whitespace
| _ when s.StartsWith("//") -> Line.Comment
| _ -> Line.Code
let Loc (arr:list<_>) =
let sums = Array.create 3 0
arr
|> List.iter (fun line ->
let i = Classify line |> int
sums.[i] <- sums.[i] + 1)
sums
"Classify" as a separate entity might be useful in another context.

A better site for this might be refactormycode - it's tailored exactly for these questions.

Can't see much wrong with that other than the fact you will count a single brace with trailing spaces as code instead of whitespace.

Related

How to extract specific value from a string with Regex? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I am new to Regex and i want to extract a specific value from a string, i have strings like:
"20098: Blue Quest"
"95: Internal Comp"
"33: ICE"
and so on.Every string has the same pattern : Number followed by ":" followed by a space and random text. I want to get the numbers at the start for ex: "20098","95","33" etc.
i tried
Regex ex = new regex(#"[0-9]+\: [a-zA-Z]$")
This is not giving me any solution, Where am i going wrong?
(i am using c#)
This is a totally silly solution. However, i decided to benchmark an unchecked pointer version, against the other regex and int parse solutions here in the answers.
You mentioned the strings are always the same format, so i decided to see how fast we could get it.
Yehaa
public unsafe static int? FindInt(string val)
{
var result = 0;
fixed (char* p = val)
{
for (var i = 0; i < val.Length; i++)
{
if (*p == ':')return result;
result = result * 10 + *p - 48;
}
return null;
}
}
I run each test 50 times with 100,000 comparisons, and 1,000,000 respectively with both Lee Gunn's int.parse,The fourth bird version ^\d+(?=: [A-Z]) also my pointer version and ^\d+
Results
Test Framework : .NET Framework 4.7.1
Scale : 100000
Name | Time | Delta | Deviation | Cycles
----------------------------------------------------------------------------
Pointers | 2.597 ms | 0.144 ms | 0.19 | 8,836,015
Int.Parse | 17.111 ms | 1.009 ms | 2.91 | 57,167,918
Regex ^\d+ | 85.564 ms | 10.957 ms | 6.14 | 290,724,120
Regex ^\d+(?=: [A-Z]) | 98.912 ms | 1.508 ms | 7.16 | 336,716,453
Scale : 1000000
Name | Time | Delta | Deviation | Cycles
-------------------------------------------------------------------------------
Pointers | 25.968 ms | 1.150 ms | 1.15 | 88,395,856
Int.Parse | 143.382 ms | 2.536 ms | 2.62 | 487,929,382
Regex ^\d+ | 847.109 ms | 14.375 ms | 21.92 | 2,880,964,856
Regex ^\d+(?=: [A-Z]) | 950.591 ms | 6.281 ms | 20.38 | 3,235,489,411
Not surprisingly regex sucks
If they are all separate strings - you don't need to use a regex, you can simply use:
var s = "20098: Blue Quest";
var index = s.IndexOf(':');
if(index > 0){
if(int.TryParse(s.Substring(0, index), out var number))
{
// Do stuff
}
}
If they're all contained in one sting, you can loop over each line and perform the Substring. Perhaps a bit easier to read as a lot of people aren't comfortable with regular expressions.
In your regex "[0-9]+: [a-zA-Z]$ you match one or more digits followed by a colon and then a single lower or uppercase character.
That would match 20098: B and would not match the digits only.
There are better alternatives besides using a regex like as suggested, but you might match from the beginning of the string ^ one or more digits \d+ and use a positive lookahead (?= to assert that what follows is a colon, whitespace and an uppercase character [A-Z])
^\d+(?=: [A-Z])
Firstly, after colon, yoiu should use \s instead of literal space. Also, if the text after colon can include spaces, the second group should also allow /s and have a + after it.
[0-9]+\:\s[a-zA-Z\s]+$
Secondly, that entire regex will return the entire string. If you only want the first number, then the regex would be simply:
[0-9]+
You can use look-behind ?<= to find any number following ^" (where ^ is the beginning of line):
(?<=^")[0-9]+

ReSharper Formatting Ternary Operator in C#

This is driving me up the wall now. The ternary formatting options in ReSharper -> Options -> C# do not cover indentation, just spacing of '?' and ':' characters, and line chopping.
What I want is:
var x = expr1
? expr2
: expr3;
But what I get is:
var x = expr1
? expr2
: expr3;
If the ternary operator formatting was offering no assistance, I thought that the Chained binary expressions may help, but no. That is set as follows.
var a = someOperand + operand2
+ operand3
+ operand4;
Any ideas?
Try enabling ReSharper | Options | Code Editing | C# | Formatting Style | Other | Align Multiline Constructs | Expression

X and Y Axis Indices in List<string> for Roguelike

After analyzing a snippet of code from this link (the C# portion), I tried doing this on my own for some practice.
However, I'm confused about how the portion below translates to an X,Y index in the string list, and why the if() statement has the Y index before the X.
if (Map[playerY][playerX] == ' ')
Here's what the list looks like:
List<string> Map = new List<string>()
{
"##########",
"# #",
"# > #",
"# > #",
"# #",
"##########"
};
Any help would be appreciated, thank you in advance!
The first [ ] picks one string from the array. The second [ ] picks a character from the string.
Because strings are arrays themselves, calling an indexer function such as: string[n] will get the character at position n.
So when you are trying to get the character the player is on, you get the Y coordinate by indexing the array of strings, because the first string in the array is the top row of the map.
Y |
------------------
0 | ##########
1 | # #
2 | # > #
3 | # > #
4 | # #
5 | ##########
We then pick out the X by matching it to the character at the X position in the string:
X | 0123456789
------------------
| ##########
| # #
| # > #
| # > #
| # #
| ##########
So [Y,X] will get the appropriate character.
The Y index selects which string, as you would expect from a List. The X index actually picks a character from that string. This wouldn't work on a List of, say, ints, because this example is actually using the [] operator on the List and then using it again on the String the List returns.

How does this regex find triangular numbers?

Part of a series of educational regex articles, this is a gentle introduction to the concept of nested references.
The first few triangular numbers are:
1 = 1
3 = 1 + 2
6 = 1 + 2 + 3
10 = 1 + 2 + 3 + 4
15 = 1 + 2 + 3 + 4 + 5
There are many ways to check if a number is triangular. There's this interesting technique that uses regular expressions as follows:
Given n, we first create a string of length n filled with the same character
We then match this string against the pattern ^(\1.|^.)+$
n is triangular if and only if this pattern matches the string
Here are some snippets to show that this works in several languages:
PHP (on ideone.com)
$r = '/^(\1.|^.)+$/';
foreach (range(0,50) as $n) {
if (preg_match($r, str_repeat('o', $n))) {
print("$n ");
}
}
Java (on ideone.com)
for (int n = 0; n <= 50; n++) {
String s = new String(new char[n]);
if (s.matches("(\\1.|^.)+")) {
System.out.print(n + " ");
}
}
C# (on ideone.com)
Regex r = new Regex(#"^(\1.|^.)+$");
for (int n = 0; n <= 50; n++) {
if (r.IsMatch("".PadLeft(n))) {
Console.Write("{0} ", n);
}
}
So this regex seems to work, but can someone explain how?
Similar questions
How to determine if a number is a prime with regex?
Explanation
Here's a schematic breakdown of the pattern:
from beginning…
| …to end
| |
^(\1.|^.)+$
\______/|___match
group 1 one-or-more times
The (…) brackets define capturing group 1, and this group is matched repeatedly with +. This subpattern is anchored with ^ and $ to see if it can match the entire string.
Group 1 tries to match this|that alternates:
\1., that is, what group 1 matched (self reference!), plus one of "any" character,
or ^., that is, just "any" one character at the beginning
Note that in group 1, we have a reference to what group 1 matched! This is a nested/self reference, and is the main idea introduced in this example. Keep in mind that when a capturing group is repeated, generally it only keeps the last capture, so the self reference in this case essentially says:
"Try to match what I matched last time, plus one more. That's what I'll match this time."
Similar to a recursion, there has to be a "base case" with self references. At the first iteration of the +, group 1 had not captured anything yet (which is NOT the same as saying that it starts off with an empty string). Hence the second alternation is introduced, as a way to "initialize" group 1, which is that it's allowed to capture one character when it's at the beginning of the string.
So as it is repeated with +, group 1 first tries to match 1 character, then 2, then 3, then 4, etc. The sum of these numbers is a triangular number.
Further explorations
Note that for simplification, we used strings that consists of the same repeating character as our input. Now that we know how this pattern works, we can see that this pattern can also match strings like "1121231234", "aababc", etc.
Note also that if we find that n is a triangular number, i.e. n = 1 + 2 + … + k, the length of the string captured by group 1 at the end will be k.
Both of these points are shown in the following C# snippet (also seen on ideone.com):
Regex r = new Regex(#"^(\1.|^.)+$");
Console.WriteLine(r.IsMatch("aababc")); // True
Console.WriteLine(r.IsMatch("1121231234")); // True
Console.WriteLine(r.IsMatch("iLoveRegEx")); // False
for (int n = 0; n <= 50; n++) {
Match m = r.Match("".PadLeft(n));
if (m.Success) {
Console.WriteLine("{0} = sum(1..{1})", n, m.Groups[1].Length);
}
}
// 1 = sum(1..1)
// 3 = sum(1..2)
// 6 = sum(1..3)
// 10 = sum(1..4)
// 15 = sum(1..5)
// 21 = sum(1..6)
// 28 = sum(1..7)
// 36 = sum(1..8)
// 45 = sum(1..9)
Flavor notes
Not all flavors support nested references. Always familiarize yourself with the quirks of the flavor that you're working with (and consequently, it almost always helps to provide this information whenever you're asking regex-related questions).
In most flavors, the standard regex matching mechanism tries to see if a pattern can match any part of the input string (possibly, but not necessarily, the entire input). This means that you should remember to always anchor your pattern with ^ and $ whenever necessary.
Java is slightly different in that String.matches, Pattern.matches and Matcher.matches attempt to match a pattern against the entire input string. This is why the anchors can be omitted in the above snippet.
Note that in other contexts, you may need to use \A and \Z anchors instead. For example, in multiline mode, ^ and $ match the beginning and end of each line in the input.
One last thing is that in .NET regex, you CAN actually get all the intermediate captures made by a repeated capturing group. In most flavors, you can't: all intermediate captures are lost and you only get to keep the last.
Related questions
(Java) method matches not work well - with examples on how to do prefix/suffix/infix matching
Is there a regex flavor that allows me to count the number of repetitions matched by * and + (.NET!)
Bonus material: Using regex to find power of twos!!!
With very slight modification, you can use the same techniques presented here to find power of twos.
Here's the basic mathematical property that you want to take advantage of:
1 = 1
2 = (1) + 1
4 = (1+2) + 1
8 = (1+2+4) + 1
16 = (1+2+4+8) + 1
32 = (1+2+4+8+16) + 1
The solution is given below (but do try to solve it yourself first!!!!)
(see on ideone.com in PHP, Java, and C#):
^(\1\1|^.)*.$

Can someone help me compare using F# over C# in this specific example (IP Address expressions)?

So, I am writing code to parse and IP Address expression and turn it into a regular expression that could be run against and IP Address string and return a boolean response. I wrote the code in C# (OO) and it was 110 lines of code. I am trying to compare the amount of code and the expressiveness of C# to F# (I am a C# programmer and a noob at F#). I don't want to post both the C# and F#, just because I don't want to clutter the post. If needed, I will do so.
Anyway, I will give an example. Here is an expression:
192.168.0.250,244-248,108,51,7;127.0.0.1
I would like to take that and turn it into this regular expression:
((192\.168\.0\.(250|244|245|246|247|248|108|51|7))|(127\.0\.0\.1))
Here are some steps I am following:
Operations:
Break by ";" 192.168.0.250,244-248,108,51,7 127.0.0.1
Break by "." 192 168 0 250,244-248,108,51,7
Break by "," 250 244-248 108 51 7
Break by "-" 244 248
I came up with F# that produces the output. I am trying to forward-pipe through my operations listed above, as I think that would be more expressive. Can anyone make this code better? Teach me something :)
open System
let createItemArray (group:bool) (y:char) (items:string[]) =
[|
let indexes = items.Length - 1
let group = indexes > 0 && group
if group then
yield "("
for i in 0 .. indexes do
yield items.[i].ToString()
if i < indexes then
yield y.ToString()
if group then
yield ")"
|]
let breakBy (group:bool) (x:string) (y:char): string[] =
x.Split(y)
|> createItemArray group y
let breakItem (x:string) (y:char): string[] = breakBy false x y
let breakGroup (x:string) (y:char): string[] = breakBy true x y
let AddressExpression address:string =
let builder = new System.Text.StringBuilder "("
breakGroup address ';'
|> Array.collect (fun octet -> breakItem octet '.')
|> Array.collect (fun options -> breakGroup options ',')
|> Array.collect (fun (ranges : string) ->
match (breakGroup ranges '-') with
| x when x.Length > 3
-> match (Int32.TryParse(x.[1]), Int32.TryParse(x.[3])) with
| ((true, a) ,(true, b))
-> [|a .. b|]
|> Array.map (int >> string)
|> createItemArray false '-'
| _ -> [|ranges|]
| _ -> [|ranges|]
)
|> Array.iter (fun item ->
match item with
| ";" -> builder.Append ")|("
| "." -> builder.Append "\."
| "," | "-" -> builder.Append "|"
| _ -> builder.Append item
|> ignore
)
builder.Append(")").ToString()
let address = "192.168.0.250,244-248,108,51,7;127.0.0.1"
AddressExpression address
Here's mine in 63 lines of F# (including the one test case); it worked the first time, and feels pretty readable to me. It's a typical parser-followed-by-pretty-printer. What do we think?
type IPs = IP[]
and IP = IP of OrParts * OrParts * OrParts * OrParts
and OrParts = Or of Part[]
and Part = Num of int | Range of int * int
let Valid(x) = if x < 0 || x > 255 then failwithf "Invalid number %d" x
let rec parseIPs (s:string) =
s.Split [|';'|] |> Array.map parseIP
and parseIP s =
let [|a;b;c;d|] = s.Split [|'.'|]
IP(parseOrParts a, parseOrParts b, parseOrParts c, parseOrParts d)
and parseOrParts s =
Or(s.Split [|','|] |> Array.map parsePart)
and parsePart s =
if s.Contains("-") then
let [|a;b|] = s.Split [|'-'|]
let x,y = int a, int b
Valid(x)
Valid(y)
if x > y then failwithf "Invalid range %d-%d" x y
Range(x, y)
else
let x = int s
Valid(x)
Num(x)
let rec printIPsAsRegex ips =
let sb = new System.Text.StringBuilder()
let add s = sb.Append(s:string) |> ignore
add "("
add(System.String.Join("|", ips |> Array.map printIPAsRegex))
add ")"
sb.ToString()
and printIPAsRegex (IP(a, b, c, d)) : string =
let sb = new System.Text.StringBuilder()
let add s = sb.Append(s:string) |> ignore
add "("
printPartsAsRegex add a
add "."
printPartsAsRegex add b
add "."
printPartsAsRegex add c
add "."
printPartsAsRegex add d
add ")"
sb.ToString()
and printPartsAsRegex add (Or(parts)) =
match parts with
| [| Num x |] -> // exactly one Num
add(string x)
| _ ->
add "("
add(System.String.Join("|", parts |> Array.collect (function
| Num x -> [| x |]
| Range(x,y) -> [| x..y |])
|> Array.map (fun x -> x.ToString())))
add ")"
let Main() =
let ips = parseIPs "192.168.0.250,244-248,108,51,7;127.0.0.1"
printfn "%s" (printIPsAsRegex ips)
Main()

Categories

Resources