I want to match the string in Hong kong language
I have month and year as below in hongkong language
二零一六年六月份 ===>June 2016
二零一五年六月份 ===>June 2015
I have use culture info (zh-HK) to get month like
But how to get year? Please help
Basically, you need to create a dictionary that uses the Chinese characters as the key and the corresponding numbers as the value:
var dict = new Dictionary<String, String>() {
{"零", "0"},
{"一", "1"},
{"二", "2"},
{"三", "3"},
{"四", "4"},
{"五", "5"},
{"六", "6"},
{"七", "7"},
{"八", "8"},
{"九", "9"},
{"十", "1"} // this is needed for the months to work. If you know Chinese you would know what I mean
};
Then, you split the input string with the separator "年":
string[] twoParts = inputString.Split('年');
You loop through each character of the first part. Using the dictionary you created, you can easily get 2016 from "二零一六".
For the second part, check whether "份" is present at the end. If it is, substring it off. (sometimes months can be written without "份"). After that, do one more substring to get rid of the "月".
Now you use the dictionary above again to turn something like "十二" to "12"
Now you have the year and the month, just create a new instance of DateTime!
Here's the full code:
string inputString = ...;
var dict = new Dictionary<String, String>() {
{"零", "0"},
{"一", "1"},
{"二", "2"},
{"三", "3"},
{"四", "4"},
{"五", "5"},
{"六", "6"},
{"七", "7"},
{"八", "8"},
{"九", "9"},
{"十", "1"} // this is needed for the months to work. If you know Chinese you would know what I mean
};
string[] twoParts = inputString.Split ('年');
StringBuilder yearBuilder = new StringBuilder ();
foreach (var character in twoParts[0]) {
yearBuilder.Append (dict [character.ToString ()]);
}
string month = twoParts [1];
if (month [month.Length - 1] == '份') {
month = month.Substring (0, month.Length - 1);
}
month = month.Substring (0, month.Length - 1);
StringBuilder monthBuilder = new StringBuilder ();
foreach (var character in month) {
monthBuilder.Append (dict [character.ToString ()]);
}
var date = new DateTime (Convert.ToInt32 (yearBuilder.ToString()), Convert.ToInt32 (monthBuilder.ToString()), 1);
Console.WriteLine (date);
EDIT:
I just realized that this doesn't work if the month is October, in which case it will parse to January. To fix this, you need to use a separate dictionary for the months. Since the SE editor doesn't allow me to enter too many Chinese characters, I will try to tell you want to put in this dictionary in the comments.
When you parse the months, please use the new dictionary. So now the month parsing code will look like this:
month = month.Substring (0, month.Length - 1);
string monthNumberString = newDict[month];
No need for the for each loop.
Related
I try to find the numbers of apples and oranges in different strings using Pidgin, but i cant seem to skip over variable lengths of text, I want to find the numbers 1,2,3,4 in the following:
List<string> testStrings = new List<string> {
"12 blerg",
"1 apples",
" 2 apples ignore_this",
"3 oranges ignore_this",
"this 5 6 should be ignored but fails 4 apples ignore_this"};
I can get 1,2,3, to work, i.e. skip whitespace and ignore text after keywords, I have tried skipUntil but can't get it to work, the string "this 5 6 should be ignored but fails 4 apples ignore_this" should return the number 4 but is skipped completely.
The code is in this (non-working) fiddle, so you need to run it locally :/
https://dotnetfiddle.net/jEppA8
edit: full listing below:
using System;
using System.Collections.Generic;
using Pidgin;
using static Pidgin.Parser;
using static Pidgin.Parser<char>;
public class Program
{
public static void Main()
{
Parser<char, string> multiThings = OneOf(
String("apples"),
String("oranges")
);
Parser<char, string> amountOfThings = Digit.ManyString().Between(Whitespaces, Whitespaces);
Parser<char, string> amountOfThingsFollowedBy = amountOfThings.Before(Whitespaces.Before(multiThings));
Parser<char, string> simpleSkipBefore = amountOfThings.Before(Any.SkipUntil(amountOfThings));
Parser<char, string> simpleSkipAfter = Any.SkipUntil(amountOfThings).Then(amountOfThingsFollowedBy);
Parser<char, string> amountOfThingsAnyWhere = OneOf(
amountOfThingsFollowedBy,
simpleSkipBefore,
simpleSkipAfter
);
List<string> testStrings = new List<string> {
"12 blerg",
"1 apples",
" 2 apples ignore_this",
"3 oranges ignore_this",
"this 5 6 should be ignored but fails 4 apples ignore_this"};
foreach (var str in testStrings)
{
try
{
Console.WriteLine(str + " ----> " + amountOfThingsAnyWhere.ParseOrThrow(str));
} catch (Exception e)
{
Console.WriteLine(str + " exception: " + e);
}
}
}
}
If you are OK with using Regular Expressions, here is how you can do it.
I am only showing for "apple"/"apples" but it would be similar for oranges, or any other word whose plural form requires suffixing with "s".
private List<string> CountApples(string s)
{
Regex rg = new Regex(#"([0-9]+)\s\bapples?\b");
MatchCollection matches = rg.Matches(s);
var result = new List<string>();
foreach (Match match in matches)
{
var quantifierGroup = match.Groups[1];
result.Add(quantifierGroup.Value);
}
return result;
}
Input:
"this 5 6 should be ignored but fails 4 apples ignore_this and 42 apples again"
Output:
[4, 42]
The regex ([0-9]+)\s\bapples?\b is looking for any number [0-9]+, and putting this into a group (), to retrieve it later.
Then it expects a white space \s and either apple or apples exactly (thanks to the \b word delimiters.
It will not work for words starting with "apple" like "applewood".
I was able to make it work with the following:
Parser<char, string> amount = Digit.ManyString();
Parser<char, string> amountOfThings =
amount.Before(Whitespace.AtLeastOnce()).Before(multiThings);
Parser<char, string> amountOfThingsAnyWhere =
Any.SkipUntil(Try(Lookahead(amountOfThings))).Then(amountOfThings);
This works with your given input as well as:
"12 oranges",
" 12 oranges",
"xx12 oranges",
" all text no digits ",
"2 apples ignore_this",
"xx2 apples ignore_this",
"xx2 2 apples ignore_this",
"apples",
"apples ignore_this"
LookAhead is necessary to prevent SkipUntil from consuming the
terminator (amountOfThings).
Try is necessary to allow backpeddling when the start of the pattern
is encountered (in this case any digit not followed by an apple or an
orange).
I am still trying to get it to work when amountOfThings is modified to
Parser<char, string> amountOfThings =
amount.Then(Whitespace.AtLeastOnce()).Then(multiThings);
I was hoping this would return '4 apples' and so on, but it only returns the name of the fruit alone!
EDIT:
Here is how to make it say the number and name of the fruit as well:
Parser<char, string> amount = Digit.ManyString();
Parser<char, string> amountOfThings =
amount
.Then(Whitespace.AtLeastOnce())
.Then(multiThings);
Parser<char, string> amountOfThingsEntire =
amount
.Before(Whitespace.AtLeastOnce())
.Then(multiThings, (amountResult, fruitResult) => $"{amountResult} {fruitResult}");
Parser<char, string> amountOfThingsAnyWhere =
Any.SkipUntil(Try(Lookahead(amountOfThings))).Then(amountOfThingsEntire);
Suppose I have a list of strings {"boy", "car", "ball"} and a text "the boy sold his car to buy a ball".
Given another string list {"dog", "bar", "bone"}, my objective is to find all occurrences of the first list inside the text and swap them for the strings of the second list:
BEFORE: the [boy] sold his [car] to buy a [ball]
AFTER: the [dog] sold his [bar] to buy a [bone]
My first thought was to use Regex but I have no idea how to associate a list of strings into a regex and I don't want to write Aho-Corasick.
What is the right way to go for that?
Another example:
Text: aaa bbb abab aabb bbaa ubab
replacing {aa, bb, ab, ub} for {11, 22, 35, &x}
BEFORE: [aa]a [bb]b [ab][ab] [aa][bb] [bb][aa] [ub][ab]
AFTER: [11]a [22]b [35][35] [11][22] [22][11] [&x][35]
If you want to use regex, you may use something like this:
var findList = new List<string>() { "boy", "car", "ball" };
var replaceList = new List<string>() { "dog", "bar", "bone" };
// Create a dictionary from the lists or have a dictionary from the beginning.
var dictKeywords = findList.Select((s, i) => new { s, i })
.ToDictionary(x => x.s, x => replaceList[x.i]);
string input = "the boy sold his car to buy a ball";
// Construct the regex pattern by joining the dictionary keys with an 'OR' operator.
string pattern = string.Join("|", dictKeywords.Keys.Select(s => Regex.Escape(s)));
string output =
Regex.Replace(input, pattern, delegate (Match m)
{
string replacement;
if (dictKeywords.TryGetValue(m.Value, out replacement)) return replacement;
return m.Value;
});
Console.WriteLine(output);
Output:
the dog sold his bar to buy a bone
No need to use Regex, string.Replace would suffice
var input = "the boy sold his car to buy a ball";
var oldvalues = new List<string>() { "boy", "car", "ball" };
var newValues = new List<string>() { "dog", "bar", "bone" };
var output = input;
for (int i = 0; i < oldvalues.Count; i++)
{
output = output.Replace(oldvalues[i], newValues[i]);
}
Console.WriteLine(output);
I'm making a Scrabble game in the command line with C#. The player must input some words like list below:
Word
Points
some
6
first
8
potsie
8
day
7
could
8
postie
8
from
9
have
10
back
12
this
7
The letters the player got are this:
sopitez
This value is a string. I'll check if the letters contains in the words. For this I've tried this code:
String highst = (from word
in words
where word.Contains(letters)
orderby points descending
select word).First();
But it doesn't work how I'll it. This code wouldn't select any word. I know the reason why because sopitez doesn't contain in any word.
My question now is there a way to check the chars in the string letters contain into the words whitout looping over the chars.
Note: Each letter must be used at most once in the solution.
If I calculate the result it must be potsie or postie. (I must write the logic for that)
P.S.: I'm playing this game: www.codingame.com/ide/puzzle/scrabble
This will not be performant at all but at least it will do the trick. Notice that I've used a dictionary just for the sake of simplicity (also I don't see why you would have repeated words like "potsie", I've never played scrabble). You can as well use a list of Tuples if you follow this code
EDIT: I changed this according to the OP's new comments
using System;
using System.Linq;
using System.Collections.Generic;
public class Program
{
public static void Main()
{
var letters = new HashSet<char>("sopitez");
var wordsMap = new Dictionary<string, int>()
{
{"some", 6}, {"first", 8}, {"potsie", 8}, {"postie", 8}, {"day", 7},
{"could", 8}, {"from", 9}, {"have", 10}, {"back", 12},
{"this", 7}
};
var highest = wordsMap
.Select(kvp => {
var word = kvp.Key;
var points = kvp.Value;
var matchCount = kvp.Key.Sum(c => letters.Contains(c) ? 1 : 0);
return new {
Word = word,
Points = points,
MatchCount = matchCount,
FullMatch = matchCount == word.Length,
EstimatedScore = points * matchCount /(double) word.Length // This can vary... it's just my guess for an "Estiamted score"
};
})
.OrderByDescending(x => x.FullMatch)
.ThenByDescending(x => x.EstimatedScore);
foreach (var anon in highest)
{
Console.WriteLine("{0}", anon);
}
}
}
The problem here is that Contains checks to see if one string contains another; it is not checking to see if it contains all of those characters. You need to replace each string in your dictionary with a HashSet<char> and perform set comparisons like IsSubset or IsSuperset to determine if the letters are matching.
Here is what you're doing:
string a= "Hello";
string b= "elHlo";
bool doesContain = b.Contains(a); //This returns false
Here is what you need to do:
var setA = new HashSet<char>(a);
var setB = new HashSet<char>(b);
bool isSubset = a.IsSubsetOf(b); //This returns true
Update
Actually, this is wrong, because sets remove duplicate elements. But essentially you are misusing Contains. You'll need some more complicated sequence comparison that can allow duplicate letters.
Update2
You need this for word/letters comparison:
//Compares counts of each letter in word and tiles
bool WordCanBeMadeFromLetters(string word, string tileLetters) {
var tileLetterCounts = GetLetterCounts(tileLetters);
var wordLetterCounts = GetLetterCounts(word);
return wordLetterCounts.All(letter =>
tileLetterCounts.ContainsKey(letter.Key)
&& tileLetterCounts[letter.Key] >= letter.Value);
}
//Gets dictionary of letter/# of letter in word
Dictionary<char, int> GetLetterCounts(string word){
return word
.GroupBy(c => c)
.ToDictionary(
grp => grp.Key,
grp => grp.Count());
}
So your original example can look like this:
String highst = (from word
in words
where WordCanBeMadeFromLetters(word, letters)
orderby points descending
select word).First();
Since letters can repeat, I think you need something like this (of course that's not very efficient, but pure LINQ):
var letters = "sopitezwss";
var words = new Dictionary<string, int>() {
{"some", 6}, {"first", 8}, {"potsie", 8}, {"day", 7},
{"could", 8}, {"from", 9}, {"have", 10}, {"back", 12},
{"this", 7}, {"postie", 8}, {"swiss", 15}
};
var highest = (from word
in words
where word.Key.GroupBy(c => c).All(c => letters.Count(l => l == c.Key) >= c.Count())
orderby word.Value descending
select word);
Below is code snippet. Wanted to find Item starts with "[[" and ends with "]]" and followed by any English letters a-z and A-Z. What is the efficient way?
string sample_input = "'''அர்காங்கெல்சுக் [[sam]] மாகாணம்''' (''Arkhangelsk Oblast'', {{lang-ru|Арха́нгельская о́бласть}}, ''அர்காங்கெல்சுக்யா ஓபிலாஸ்து'') என்பது [[உருசியா]]வின் [[I am sam]] [[உருசியாவின் கூட்டாட்சி அமைப்புகள்|நடுவண் அலகு]] ஆகும். <ref>{{cite news|author=Goldman, Francisco|date=5 April 2012|title=Camilla Vallejo, the World's Most Glamorous Revolutionary|newspaper=[[The New York Times Magazine]]| url=http://www.nytimes.com/2012/04/08/magazine/camila-vallejo-the-worlds-most-glamorous-revolutionary.html|accessdate=5 April 2013}}</ref>";
List<string> found = new List<string>();
foreach (var item in sample_input.Split(' '))
{
if (item.StartsWith("[[s") || item.StartsWith("[[S") || item.StartsWith("[[a") || item.StartsWith("[[a"))
{
found.Add(item);
}
}
Expected Results: [[Sam]], [[I am Sam]], [[The New York Times Magazine]].
Try this
string sample_input = "'''அர்காங்கெல்சுக் [[sam]] மாகாணம்''' (''Arkhangelsk Oblast'', {{lang-ru|Арха́нгельская о́бласть}}, ''அர்காங்கெல்சுக்யா ஓபிலாஸ்து'') என்பது [[உருசியா]]வின் [[உருசியாவின் கூட்டாட்சி அமைப்புகள்|நடுவண் அலகு]] ஆகும்.";
var regex= new Regex(#"\[\[[a-zA-Z]+\]\]");
var found = regex.Matches(sample_input).OfType<Match>().Select(x=>x.Value).ToList();
In C#, is there a way to get the equivalent custom numeric format for a standard numeric format with a specified user's culture?
Examples (not sure my conversion are right) :
C3 in fr-FR = 0.000 '€'
D2 = 0.00
P0 = %#0
See those links :
http://msdn.microsoft.com/en-us/library/dwhawy9k(v=vs.110).aspx
http://msdn.microsoft.com/en-us/library/0c899ak8(v=vs.110).aspx
Given a CultureInfo you can examine the NumberFormat property of type NumberFormatInfo to get all the information that .NET uses when formatting different number types. E.g.:
var frenchCultureInfo = CultureInfo.GetCultureInfo("fr-FR");
Console.WriteLine(frenchCultureInfo.NumberFormat.CurrencySymbol);
This will print €. To reconstruct the complete format you will have to inspect multiple properties on the NumberFormat property. This can become quite tedious. As an experiment I have tried to write the code necessary to format a Decimal using the C format specifier:
var value = -123456.789M;
var cultureInfo = CultureInfo.GetCultureInfo("as-IN");
var numberFormat = cultureInfo.NumberFormat;
// Create group format like "#,##,###".
var groups = numberFormat
.CurrencyGroupSizes
.Reverse()
.Select(groupSize => new String('#', groupSize));
var format1 = "#," + String.Join(",", groups);
// Create number format like "#,##,##0.00".
var format2 = format1.Substring(0, format1.Length - 1)
+ "0." + new String('0', numberFormat.CurrencyDecimalDigits);
// Format the number without a sign.
// Note that it is necessary to use the correct CultureInfo here.
var formattedNumber = Math.Abs(value).ToString(format2, cultureInfo);
// Combine sign, currency symbol and number.
var currencyPositivePatterns = new Dictionary<Int32, String> {
{ 0, "{0}{1}" },
{ 1, "{1}{0}" },
{ 2, "{0} {1}" },
{ 3, "{1} {0}" }
};
var currencyNegativePatterns = new Dictionary<Int32, String> {
{ 0, "({0}{1})" },
{ 1, "-{0}{1}" },
{ 2, "{0}-{1}" },
{ 3, "{0}{1}-" },
{ 4, "({1}{0})" },
{ 5, "-{1}{0}" },
{ 6, "{1}-{0}" },
{ 7, "{1}{0}-" },
{ 8, "-{1} {0}" },
{ 9, "-{0} {1}" },
{ 10, "{1} {0}-)" },
{ 11, "{0} {1}-" },
{ 12, "{0} -{1}" },
{ 13, "{1}- {0}" },
{ 14, "({0} {1})" },
{ 15, "({1} {0})" }
};
var currencyPattern = value >= Decimal.Zero
? currencyPositivePatterns[numberFormat.CurrencyPositivePattern]
: currencyNegativePatterns[numberFormat.CurrencyNegativePattern];
var formattedValue = String.Format(
currencyPattern,
numberFormat.CurrencySymbol,
formattedNumber
);
The value of formattedValue is ₹ -1,23,456.79 which is the same as you get when evaluating value.ToString("C", cultureInfo). Obviously, the later is much simpler.
Note that some currency symbols can contain . or ' which have special meaning in a custom numeric format. As a result of this you cannot always create a custom format string to replace C. E.g. for the da-DK culture the C format for a positive number is equivalent to kr. #,##0.00 except the dot after kr will make it impossible to use that format. Instead you have to use an approach where the currency symbol is added after the number has been formatted, or alternatively you can escape the problematic character.
Afaik unlike decimal separator or thousands separator, there is no format string for the cultures currency. Maybe this is due to the fact that a number is always the same in every culture, but an amount of money is different according to the currency. Thus, you would have to know the currency in advance.
However, if you do know the currency in advance, you can just use the currency symbol as a normal literal character.
Thus, you could use "0.000 €" (don't French people use the , as their decimal separator?).
If you know the culture in advance, you can get the format string that's used for currency and use it yourself.
In your example, you would retrieve the CultureInfo object for the fr-FR culture using this:
CultureInfo French = new CultureInfo("fr-FR");
You could just pass this CultureInfo object to the ToString() method, or you could get the NumberFormatInfo object from the CultureInfo:
NumberFormatInfo numberFormat = French.NumberFormat;
The NumberFormatInfo object has the following properties: CurrencySymbol, CurrencyDecimalDigits, CurrencyDecimalSeparator, 'CurrencyGroupSeparator,CurrencyGroupSizes`. Using them, you can construct your own format string:
string fmt = numberFormat.CurrencySymbol + "#" + numberFormat.CurrencyGroupSeparator;
for ( int i = 0; i < numberFormat.CurrencyGroupSizes - 1; i++ )
fmt += "#";
fmt += "0" + numberFormat.CurrencyDecimalSeparator;
for ( int d = 0; i < numberFormat.CurrencyDecimalDigits; i++ )
fmt += "0";
There are two other properties of the NumberFormatInfo class called CurrencyNegativePattern and CurrencyPositivePattern, which tell you where the decimal symbol & sign go, or whether to put negative values in parentheses. I'll leave it to you to decide if you need these & write the code necessary to build the format string.