Linq Fuzzy Search

Linq Fuzzy Search - c#

I am filtering results from the database based on the Pickup Location & Drop Location.
My Database contains values like:
Pickup Location: San Jose 95002, San Jose 95112, San Jose 95119, etc.
Drop Location: SFO, SJC, Castro Valley
My search strings are:
Pickup Search String: 195, San Jose, California, 95119
Drop Search String: 56, Castro Valley Boulevard, Castro Valley, California
I am splitting the search string and looking for the best match using "Contains" in my Linq Query, but to improve the time, I break the loop as soon as a match is found, which returns undesirable results, is there a better way to do this?
Pickup Address: 195|San Jose|California|95119
Destination Address: 56|Castro Valley Boulevard|Castro Valley|California
String startLocation = PickupAddress.Replace("\"", "").Replace("/", "");
string[] locationList = startLocation.Split('|');
var rateData = new Rate();
foreach (var location in locationList)
{
String startLocationData = location.Trim();
rateData = (from p in _context.Rates
where (p.StartLocation.Contains(startLocationData)
&& p.EndLocation.Contains(DestinationAddress))
&& p.VehicleCategoryID == VehicleID
select p).FirstOrDefault();
if (rateData != null)
{
break;
}
}
if (rateData != null)
{
String amount = Convert.ToString(rateData.Amount).Replace('$', ' ');
String plusRate = Convert.ToString(rateData.PlusRate).Replace('$', ' ');
String fee = Convert.ToString(rateData.QwykrFee).Replace('$', ' ');
Decimal retVal = Convert.ToDecimal(amount) + Convert.ToDecimal(plusRate) + Convert.ToDecimal(fee);
return retVal;
}
else
{
return 0;
}

https://nugetmusthaves.com/Tag/fuzzy
https://github.com/DanHarltey/Fastenshtein
The fastest .Net Levenshtein around.
Fastenshtein is an optimized and unit tested Levenshtein implementation. It is optimized for speed and memory usage.
From the included brenchmarking tests comparing random words of 3 to 20 random chars to other Nuget Levenshtein implementations.

Related

C# Regex Split but include empty string if fails to split

I am trying to split a string into an array of strings.
My current string looks like this and this is all in one string. It also has newlines (\r\n) and spaces. I put a better-looking example here.
BFFPPB14 Dark Chocolate Dried Cherries 14 oz (397g)
INGREDIENTS: DARK CHOCOLATE (SUGAR, CHOCOLATE LIQUOR, COCOA BUTTER,
ANHYDROUS MILK FAT, SOYA LECITHIN, VANILLIN [AN ARTIFICIAL FLAVOR]), DRIED
TART CHERRIES (CHERRIES, SUGAR), GUM ARABIC, CONFECTIONER'S GLAZE.
CONTAINS: MILK, SOY
ALLERGEN INFORMATION: MAY CONTAIN TREE NUTS, PEANUTS, EGG AND
WHEAT.
01/11/2019
Description: Sweetened dried Montmorency cherries that are panned with dark chocolate.
Storage Conditions: Store at ambient temperatures with a humidity less than 50%.
Shelf Life: 9 months
Company Name
Item No.: 701804
Bulk: 415265
Supplier: Cherryland's Best
WARNING: CHERRIES MAY CONTAIN PITS
My Regex looks like this
List<string> result = Regex.Split(text, #"INGREDIENTS: |CONTAINS: |ALLERGEN INFORMATION: |(\d{1,2}/\d{1,2}/\d{2,4})|Description: |Storage Conditions: |Shelf Life: |Company Name|Item No.: |Bulk: |Supplier: |WARNING: ").ToList();
This is what result looks like
Note: The first string is the product name
Sometimes I get strings that don't have a supplier or a warning, I want the split to have empty strings if it doesn't find that split value.
EX:
result[0] = "blabla"
result[1] = ""
result[2] = "blabla"
That way I know that result 1 was split on the value (INGREDIENTS: ) and I can assign it to something

Using a regex may have performance concerns if you are using this in a high volume application. Below is one possible regex you could use. It is somewhat difficult to parse the product line and the "company name" line since it wasn't clear if the product code had a pattern and the company name line didn't have a ':' like the other fields, so the regex is somewhat "hacky" in those areas:
using System;
using System.Text.RegularExpressions;
using System.Linq;
namespace so20190113_01 {
class Program {
static void Main(string[] args) {
string text =
#"BFFPPB14 Dark Chocolate Dried Cherries 14 oz (397g)
INGREDIENTS: DARK CHOCOLATE (SUGAR, CHOCOLATE LIQUOR, COCOA BUTTER, ANHYDROUS MILK FAT, SOYA LECITHIN, VANILLIN [AN ARTIFICIAL FLAVOR]), DRIED TART CHERRIES (CHERRIES, SUGAR), GUM ARABIC, CONFECTIONER'S GLAZE.
CONTAINS: MILK, SOY
ALLERGEN INFORMATION: MAY CONTAIN TREE NUTS, PEANUTS, EGG AND WHEAT.
01/11/2019
Description: Sweetened dried Montmorency cherries that are panned with dark chocolate.
Storage Conditions: Store at ambient temperatures with a humidity less than 50%. Shelf Life: 9 months
Company Name
Item No.: 701804
Bulk: 415265
Supplier: Cherryland's Best
WARNING: CHERRIES MAY CONTAIN PITS";
string pat =
#"^\s*(?<product>\w+\s+\w+\s+\w*[^:]+)$
|^ingredients:\s*(?<ingredients>.*)$
|^contains:\s*(?<contains>.*)$
|^allergen\s+information:\s*(?<allergen>.*)$
|^(?<date>(\d{1,2}/\d{1,2}/\d{2,4}))$
|^description:\s*(?<description>.*)$
|^storage\sconditions:\s*(?<storage>.*)$
|^shelf\slife:\s*(?<shelf>.*)$
|^company\sname\s*(?<company>.*)$
|^item\sno\.:\s*(?<item>.*)$
|^bulk:\s*(?<bulk>.*)$
|^supplier:\s*(?<supplier>.*)$
|^warning:\s*(?<warning>.*)$
";
Regex r = new Regex(pat, RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline);
// Match the regular expression pattern against a text string.
Match m = r.Match(text); // you might want to use the overload that supports a timeout value
Console.WriteLine("Start---");
while (m.Success) {
foreach (Group g in m.Groups.Where(x => x.Success)) {
switch (g.Name) {
case "product":
Console.WriteLine($"Product({g.Success}): '{g.Value.Trim()}'");
break;
case "ingredients":
Console.WriteLine($"Ingredients({g.Success}): '{g.Value.Trim()}'");
break;
// etc.
}
}
m = m.NextMatch();
}
Console.WriteLine("End---");
}
}
}

I think a parser is the only way. Originally, I tried using this regex:
^([\w \.]+?):([\s\S]+?)(?=((^[\w \.]+?):))
The key component there is the look-ahead ?= which allows the string to match all text from label to label. However, it doesn't work on the final line item since it does not precede another label and I could not find a regex that stops matching at a pattern that may not exist. If that regex exists, you can do it all in one line of code:
KeyValuePair<string, string>[] kvs = null;
//one line of code if the look-ahead would also consider non-existent matches
kvs = Regex.Matches(text, #"^([\w \.]+?):([\s\S]+?)(?=((^[\w \.]+?):))", RegexOptions.Multiline)
.Cast<Match>()
.Select(x => new KeyValuePair<string, string>(x.Groups[1].Value, x.Groups[2].Value.Trim(' ', '\r', '\n', '\t')))
.ToArray();
This code does it well enough. Also, the document is not formatted consistently in that Company Name does not precede a colon. This is the only anchor pattern that will work since various lines are broken by new lines.
KeyValuePair<string, string>[] kvs = null;
//Otherwise, you have to write a parser
//get all start indexes of labels
var matches = Regex.Matches(text, #"^.+?:", RegexOptions.Multiline).Cast<Match>().ToArray();
kvs = new KeyValuePair<string, string>[matches.Length];
KeyValuePair<string, string> GetKeyValuePair(Match match1, int match1EndIndex)
{
//get the label
var label = text.Substring(match1.Index, match1.Value.Length - 1);
//get the desc and trim white space
var descStart = match1.Index + match1.Value.Length + 1;
var desc = text
.Substring(descStart, match1EndIndex - descStart)
.Trim(' ', '\r', '\n', '\t');
return new KeyValuePair<string, string>(label, desc);
}
for (int i = 0; i < matches.Length - 1; i++)
{
kvs[i] = GetKeyValuePair(matches[i], matches[i + 1].Index);
}
kvs[kvs.Length - 1] = GetKeyValuePair(matches[matches.Length - 1], text.Length);
foreach (var kv in kvs)
{
Console.WriteLine($"{kv.Key}: {kv.Value}");
}

So if your requirement is :
find a line with starting with with specif word
use Linq
use StartsWith
code
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
namespace ConsoleApp12
{
class Program
{
public static void Main(string[] args)
{
// test string
var str = #"BFFPPB10 Dark Chocolate Macadamia Nuts 11 oz (312g)\r\nINGREDIENTS: DARK CHOCOLATE (SUGAR, CHOCOLATE, COCOA BUTTER, \r\nANHYDROUS MILK FAT, SOY LECITHIN, VANILLA), MACADAMIA NUTS, SEA SALT.\r\nCONTAINS: MACADAMIA NUTS, MILK, SOY.\r\nALLERGEN INFORMATION: MAY CONTAIN OTHER TREE NUTS, PEANUTS, EGG AND\r\nWHEAT.\r\n01/11/2019\r\nDescription: Dry roasted, salted macadamias covered in dark chocolate.\r\nStorage Conditions: Store at ambient temperatures with a humidity less than 50%. \r\nShelf Life: 12 months\r\nBlain's Farm & Fleet\r\nItem No.: 701772\r\nBulk: 421172\r\nSupplier: Devon's\r\n";
// Keys
const string KEY_INGREDIENTS = "INGREDIENTS:";
const string KEY_CONTAINS = "CONTAINS:";
const string KEY_ALLERGEN_INFORMATION = "ALLERGEN INFORMATION:";
const string KEY_DESCRPTION = "Description:";
const string KEY_STORAGE_CONDITION = "Storage Conditions:";
const string KEY_SHELFLIFE = "Shelf Life:";
const string KEY_ITEM_NO = "Item No.:";
const string KEY_BULK = "Bulk:";
const string KEY_SUPPLIER = "Supplier:";
const string KEY_WARNING = "WARNING:";
const string KEY_YEAR_Regex = #"^\d{1,2}/\d{1,2}/\d{4}$";
const string KEY_AFTER_COMPANY_NAME = KEY_ITEM_NO;
// Helpers
var keys = new string[]
{ KEY_INGREDIENTS, KEY_CONTAINS, KEY_ALLERGEN_INFORMATION, KEY_DESCRPTION, KEY_STORAGE_CONDITION,
KEY_SHELFLIFE, KEY_ITEM_NO, KEY_BULK, KEY_SUPPLIER, KEY_WARNING };
var lines = str.Split(new string[] { #"\r\n" }, StringSplitOptions.RemoveEmptyEntries);
void log(string key, string val)
{
Console.WriteLine($"{key} => {val}");
Console.WriteLine();
}
void removeLine(string line)
{
if (line != null) lines = lines.Where(w => w != line).ToArray();
}
// get Multi Line Item with key
string getMultiLine(string key)
{
var line = lines
.Select((linetxt, index) => new { linetxt, index })
.Where(w => w.linetxt.StartsWith(key))
.FirstOrDefault();
if (line == null) return string.Empty;
var result = line.linetxt;
for (int i = line.index + 1; i < lines.Length; i++)
{
if (!keys.Any(a => lines[i].StartsWith(a)))
result += lines[i];
else
break;
}
return result;
}
// get single Line Item before spesic key if the Line is not a key
string getLinebefore(string the_after_key)
{
var the_after_line = lines
.Select((linetxt, index) => new { linetxt, index })
.Where(w => w.linetxt.StartsWith(the_after_key))
.FirstOrDefault();
if (the_after_line == null) return string.Empty;
var the_before_line_text = lines[the_after_line.index - 1];
//not a key
if (!keys.Any(a => the_before_line_text.StartsWith(a)))
return the_before_line_text;
else
return null;
}
// 1st get item without key
var itemName = lines.FirstOrDefault();
removeLine(itemName);
var year = lines.Where(w => Regex.Match(w, KEY_YEAR_Regex).Success).FirstOrDefault();
removeLine(year);
var companyName = getLinebefore(KEY_AFTER_COMPANY_NAME);
removeLine(companyName);
//2nd get item with Keys
var ingredients = getMultiLine(KEY_INGREDIENTS);
var contanins = getMultiLine(KEY_CONTAINS);
var allergenInfromation = getMultiLine(KEY_ALLERGEN_INFORMATION);
var description = getMultiLine(KEY_DESCRPTION);
var storageConditions = getMultiLine(KEY_STORAGE_CONDITION);
var shelfLife = getMultiLine(KEY_SHELFLIFE);
var itemNo = getMultiLine(KEY_ITEM_NO);
var bulk = getMultiLine(KEY_BULK);
var supplier = getMultiLine(KEY_SUPPLIER);
var warning = getMultiLine(KEY_WARNING);
// 3rd log
log("ItemName", itemName);
log("Ingredients", ingredients);
log("contanins", contanins);
log("Allergen Infromation", allergenInfromation);
log("Year", year);
log("Description", description);
log("Storage Conditions", storageConditions);
log("Shelf Life", shelfLife);
log("CompanyName", companyName);
log("Item No", itemNo);
log("Bulk", bulk);
log("Supplier", supplier);
log("warning", warning);
Console.ReadLine();
}
}
}
will output
ItemName => BFFPPB10 Dark Chocolate Macadamia Nuts 11 oz (312g)
Ingredients => INGREDIENTS: DARK CHOCOLATE (SUGAR, CHOCOLATE, COCOA
BUTTER, ANHYDROUS MILK FAT, SOY LECITHIN, VANILLA), MACADAMIA NUTS,
SEA SALT.
contanins => CONTAINS: MACADAMIA NUTS, MILK, SOY.
Allergen Infromation => ALLERGEN INFORMATION: MAY CONTAIN OTHER TREE
NUTS, PEANUTS, EGG ANDWHEAT.
Year => 01/11/2019
Description => Description: Dry roasted, salted macadamias covered in
dark chocolate.
Storage Conditions => Storage Conditions: Store at ambient
temperatures with a humidity less than 50%.
Shelf Life => Shelf Life: 12 months
CompanyName => Blain's Farm & Fleet
Item No => Item No.: 701772
Bulk => Bulk: 421172
Supplier => Supplier: Devon's
warning =>

How to remove pieces of data from string

I have a text file with multiple entries of this format:
Page: 1 of 1
Report Date: January 15 2018
Mr. Gerald M. Abridge ID #: 0000008 1 Route 81 Mr. Gerald Michael Abridge Pittaburgh PA 15668 SSN: XXX-XX-XXXX
Birthdate: 01/00/1998 Sex: M
COURSE Course Title CRD GRD GRDPT COURSE Course Title CRD GRD GRDPT
FALL 2017 (08/28/2017 to 12/14/2017) CS102F FUND. OF IT & COMPUTING 4.00 A 16.00 CS110 C++ PROGRAMMING I 3.00 A- 11.10 EL102 LANGUAGE AND RHETORIC 3.00 B+ 9.90 MA109 CALC WITH APPLICATIONS I 4.00 A 16.00 SP203 INTERMEDIATE SPANISH I 3.00 A 12.00
EHRS QHRS QPTS GPA Term 17.00 17.00 65.00 3.824 Cum 17.00 17.00 65.00 3.824
Current Program(s): Bachelor of Science in Computer Science
End of official record.
So far, I have read the text file into a string, full. I want to be able to remove first two lines of each of the entries. How would I go about doing this?
Here's the code that I used to read it in:
using (StreamReader sr = new StreamReader(fileName, Encoding.Default))
{
string full = sr.ReadToEnd();
}

If all the lines you want to skip begin with the same strings, you can put those prefixes in a list and then, when you're reading the lines, skip the any that being with one of the prefixes:
This will leave you with a list of strings that represent all the file lines that don't begin with one of the specified prefixes:
var filePath = #"f:\public\temp\temp.txt";
var ignorePrefixes = new List<string> {"Page:", "Report Date:"};
var filteredContent = File.ReadAllLines(filePath)
.Where(line => ignorePrefixes.All(prefix => !line.StartsWith(prefix)))
.ToList();
If you want all the content as a single string, you can use String.Join:
var filteredAsString = string.Join(Environment.NewLine, filteredContent);
If Linq isn't your thing, or you don't understand what it's doing, here's the "old school" way of doing the same thing:
List<string> filtered = new List<string>();
foreach (string line in File.ReadLines(filePath))
{
bool okToAdd = true;
foreach (string prefix in ignorePrefixes)
{
if (line.StartsWith(prefix))
{
okToAdd = false;
break;
}
}
if (okToAdd)
{
filtered.Add(line);
}
}

public static IEnumerable<string> ReadReportFile(FileInfo file)
{
var line = String.Empty;
var page = "Page:";
var date = "Report Date:";
using(var reader = File.OpenText(file.FullName))
while((line = reader.ReadLine()) != null)
while(line.IndexOf(page) == -1 AND line.IndexOf(date) == -1)
yield return line;
}
Code is pretty straight forward, while line is not null and doesn't contain page or date, return line. You could condense or even get fancier, building lookups for your prefix etc. but if the code is simple or not needed to be that complex, this should suffice.

Parse and find string in between (English string inside double square bracket) with C#?

Below is code snippet. Wanted to find Item starts with "[[" and ends with "]]" and followed by any English letters a-z and A-Z. What is the efficient way?
string sample_input = "'''அர்காங்கெல்சுக் [[sam]] மாகாணம்''' (''Arkhangelsk Oblast'', {{lang-ru|Арха́нгельская о́бласть}}, ''அர்காங்கெல்சுக்யா ஓபிலாஸ்து'') என்பது [[உருசியா]]வின் [[I am sam]] [[உருசியாவின் கூட்டாட்சி அமைப்புகள்|நடுவண் அலகு]] ஆகும். <ref>{{cite news|author=Goldman, Francisco|date=5 April 2012|title=Camilla Vallejo, the World's Most Glamorous Revolutionary|newspaper=[[The New York Times Magazine]]| url=http://www.nytimes.com/2012/04/08/magazine/camila-vallejo-the-worlds-most-glamorous-revolutionary.html|accessdate=5 April 2013}}</ref>";
List<string> found = new List<string>();
foreach (var item in sample_input.Split(' '))
{
if (item.StartsWith("[[s") || item.StartsWith("[[S") || item.StartsWith("[[a") || item.StartsWith("[[a"))
{
found.Add(item);
}
}
Expected Results: [[Sam]], [[I am Sam]], [[The New York Times Magazine]].

Try this
string sample_input = "'''அர்காங்கெல்சுக் [[sam]] மாகாணம்''' (''Arkhangelsk Oblast'', {{lang-ru|Арха́нгельская о́бласть}}, ''அர்காங்கெல்சுக்யா ஓபிலாஸ்து'') என்பது [[உருசியா]]வின் [[உருசியாவின் கூட்டாட்சி அமைப்புகள்|நடுவண் அலகு]] ஆகும்.";
var regex= new Regex(#"\[\[[a-zA-Z]+\]\]");
var found = regex.Matches(sample_input).OfType<Match>().Select(x=>x.Value).ToList();

How to extract address components from a string?

I have a Xamarin Forms application that uses Xamarin. Mobile on the platforms to get the current location and then ascertain the current address. The address is returned in string format with line breaks.
The address can look like this:
111 Mandurah Tce
Mandurah WA 6210
Australia
or
The Glades
222 Mandurah Tce
Mandurah WA 6210
Australia
I have this code to break it down into the street address (including number), suburb, state and postcode (not very elegant, but it works)
string[] lines = address.Split(new string[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries);
List<string> addyList = new List<string>(lines);
int count = addyList.Count;
string lineToSplit = addyList.ElementAt(count - 2);
string[] splitLine = lineToSplit.Split(null);
List<string> splitList = new List<string>(splitLine);
string streetAddress = addyList.ElementAt (count - 3).ToString ();
string postCode = splitList.ElementAt(2);
string state = splitList.ElementAt(1);
string suburb = splitList.ElementAt(0);
I would like to extract the street number, and in the previous examples this would be easy, but what is the best way to do it, taking into account the number might be Lot 111 (only need to capture the 111, not the word LOT), or 123A or 8/123 - and sometimes something like 111-113 is also returned
I know that I can use regex and look for every possible combo, but is there an elegant built-in type solution, before I go writing any more messy code (and I know that the above code isn't particularly robust)?

These simple regular expressions will account for many types of address formats, but have you considered all the possible variations, such as:
PO Box 123 suburb state post_code
Unit, Apt, Flat, Villa, Shop X Y street name
7C/94 ALISON ROAD RANDWICK NSW 2031
and that is just to get the number. You will also have to deal with all the possible types of streets such as Lane, Road, Place, Av, Parkway.
Then there are street types such as:
12 Grand Ridge Road suburb_name
This could be interpreted as street = "Grand Ridge" and suburb = "Road suburb_name", as Ridge is also a valid street type.
I have done a lot of work in this area and found the huge number of valid address patterns meant simple regexs didn't solve the problem on large amounts of data.
I ended up develpping this parser http://search.cpan.org/~kimryan/Lingua-EN-AddressParse-1.20/lib/Lingua/EN/AddressParse.pm to solve the problem. It was originally written for Australian addresses so should work well for you.

Regex can capture the parts of a match into groups. Each parentheses () defines a group.
([^\d]*)(\d*)(.*)
For "Lot 222 Mandurah Tce" this returns the following groups
Group 0: "Lot 222 Mandurah Tce" (the input string)
Group 1: "Lot "
Group 2: "222"
Group 3: " Mandurah Tce"
Explanation:
[^\d]* Any number (including 0) of any character except digits.
\d* Any number (including 0) of digits.
.* Any number (including 0) of any character.
string input = "Lot 222 Mandurah Tce";
Match match = Regex.Match(input, #"([^\d]*)(\d*)(.*)");
string beforeNumber = match.Groups[1].Value; // --> "Lot "
string number = match.Groups[2].Value; // --> "222"
string afterNumber = match.Groups[3].Value; // --> " Mandurah Tce"
If a group finds no match, match.Groups[i] will return an empty string ("") for that group.

You could check if the content starts with a number for each entry in the splitLine.
string[] splitLine = lineToSplit.Split(addresseLine);
var streetNumber = string.empty;
foreach(var s in splitLine)
{
//Get the first digit value
if (Regex.IsMatch(s, #"^\d"))
{
streetNumber = s;
break;
}
}
//Deal with empty value another way
Console.WriteLine("My streetnumber is " + s)

Yea I think you have to identify what will work.
If:
it is always in the address line and it must always start with a Digit
nothing else in that line can start with a digit (or if something else does you know which always comes in what order, ie the code below will always work if the street number is always first)
you want every contiguous character to the digit that isn't whitespace (the - and \ examples suggest that to me)
Then it could be as simple as:
var regx = new Regex(#"(?:\s|^)\d[^\s]*");
var mtch = reg.Match(addressline);
You would sort of have to sift and see if any of those assumptions are broken.

Parsing Google calendar to DDay.iCal

I'm working on application which parses Google Calendar via Google API to DDay.iCal
The main attributes, properties are handled easily... ev.Summary = evt.Title.Text;
The problem is when I got an recurring event, the XML contains a field like:
<gd:recurrence>
DTSTART;VALUE=DATE:20100916
DTEND;VALUE=DATE:20100917
RRULE:FREQ=YEARLY
</gd:recurrence>
or
<gd:recurrence>
DTSTART:20100915T220000Z
DTEND:20100916T220000Z
RRULE:FREQ=YEARLY;BYMONTH=9;WKST=SU"
</gd:recurrence>
using the following code:
String[] lines =
evt.Recurrence.Value.Split(new char[] { '\n', '\r' }, StringSplitOptions.RemoveEmptyEntries);
foreach (String line in lines)
{
if (line.StartsWith("R"))
{
RecurrencePattern rp = new RecurrencePattern(line);
ev.RecurrenceRules.Add(rp);
}
else
{
ISerializationContext ctx = new SerializationContext();
ISerializerFactory factory = new DDay.iCal.Serialization.iCalendar.SerializerFactory();
ICalendarProperty property = new CalendarProperty();
IStringSerializer serializer = factory.Build(property.GetType(), ctx) as IStringSerializer;
property = (ICalendarProperty)serializer.Deserialize(new StringReader(line));
ev.Properties.Add(property);
Console.Out.WriteLine(property.Name + " - " + property.Value);
}
}
RRULEs are parsed correctly, but the problem is that other property (datetimes) values are empty...

Here is the starting point of what I'm doing, going off of the RFC-5545 spec's recurrence rule. It isn't complete to the spec and may break given certain input, but it should get you going. I think this should all be doable using RegEx, and something as heavy as a recursive decent parser would be overkill.
RRULE:(?:FREQ=(DAILY|WEEKLY|SECONDLY|MINUTELY|HOURLY|DAILY|WEEKLY|MONTHLY|YEARLY);)?(?:COUNT=([0-9]+);)?(?:INTERVAL=([0-9]+);)?(?:BYDAY=([A-Z,]+);)?(?:UNTIL=([0-9]+);)?
I am building this up using http://regexstorm.net/tester.
The test input I'm using is:
DTSTART;TZID=America/Chicago:20140711T133000\nDTEND;TZID=America/Chicago:20140711T163000\nRRULE:FREQ=WEEKLY;INTERVAL=8;BYDAY=FR;UNTIL=20141101
DTSTART;TZID=America/Chicago:20140711T133000\nDTEND;TZID=America/Chicago:20140711T163000\nRRULE:FREQ=WEEKLY;COUNT=5;INTERVAL=8;BYDAY=FR;UNTIL=20141101
DTSTART;TZID=America/Chicago:20140711T133000\nDTEND;TZID=America/Chicago:20140711T163000\nRRULE:FREQ=WEEKLY;BYDAY=FR;UNTIL=20141101
Sample matching results would look like:
Index Position Matched String $1 $2 $3 $4 $5
0 90 RRULE:FREQ=WEEKLY;INTERVAL=8;BYDAY=FR;UNTIL=20141101 WEEKLY 8 FR 20141101
1 236 RRULE:FREQ=WEEKLY;COUNT=5;INTERVAL=8;BYDAY=FR;UNTIL=20141101 WEEKLY 5 8 FR 20141101
2 390 RRULE:FREQ=WEEKLY;BYDAY=FR;UNTIL=20141101 WEEKLY FR 20141101
Usage is like:
string freqPattern = #"RRULE:(?:FREQ=(DAILY|WEEKLY|SECONDLY|MINUTELY|HOURLY|DAILY|WEEKLY|MONTHLY|YEARLY);?)?(?:COUNT=([0-9]+);?)?(?:INTERVAL=([0-9]+);?)?(?:BYDAY=([A-Z,]+);?)?(?:UNTIL=([0-9]+);?)?";
MatchCollection mc = Regex.Matches(rule, freqPattern, System.Text.RegularExpressions.RegexOptions.IgnoreCase);
foreach (Match m in mc)
{
string frequency = m.Groups[1].ToString();
string count = m.Groups[2].ToString();
string interval = m.Groups[3].ToString();
string byday = m.Groups[4].ToString();
string until = m.Groups[5].ToString();
System.Console.WriteLine("recurrence => frequency: \"{0}\", count: \"{1}\", interval: \"{2}\", byday: \"{3}\", until: \"{4}\"", frequency, count, interval, byday, until);
}

This is a great example of when to use regular expressions. Try this out for general parsing:
\s*(\w+):((\w+=\w+;)+(\w+=\w+)?|\w+)
Or, you might decide to have something more schema-specific.
\s*(?:DTSTART:)(?'Start'\w+)
\s*(?:DTEND:)(?'End'\w+)
\s*(?:RRULE:)(?'Rule'(\w+=\w+;)+(\w+=\w+)?)

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Linq Fuzzy Search - c#

Related

C# Regex Split but include empty string if fails to split

How to remove pieces of data from string

Parse and find string in between (English string inside double square bracket) with C#?

How to extract address components from a string?

Parsing Google calendar to DDay.iCal

Categories

Resources