Function to Make Pascal Case? (C#) - c#

I need a function that will take a string and "pascal case" it. The only indicator that a new word starts is an underscore. Here are some example strings that need to be cleaned up:
price_old => Should be PriceOld
rank_old => Should be RankOld
I started working on a function that makes the first character upper case:
public string FirstCharacterUpper(string value)
{
if (value == null || value.Length == 0)
return string.Empty;
if (value.Length == 1)
return value.ToUpper();
var firstChar = value.Substring(0, 1).ToUpper();
return firstChar + value.Substring(1, value.Length - 1);
}
The thing the above function doesn't do is remove the underscore and "ToUpper" the character to the right of the underscore.
Also, any ideas about how to pascal case a string that doesn't have any indicators (like the underscore). For example:
companysource
financialtrend
accountingchangetype
The major challenge here is determining where one word ends and another starts. I guess I would need some sort of lookup dictionary to determine where new words start? Are there libraries our there to do this sort of thing already?
Thanks,
Paul

You can use the TextInfo.ToTitleCase method then remove the '_' characters.
So, using the extension methods I've got:
http://theburningmonk.com/2010/08/dotnet-tips-string-totitlecase-extension-methods
you can do somethingl ike this:
var s = "price_old";
s.ToTitleCase().Replace("_", string.Empty);

Well the first thing is easy:
string.Join("", "price_old".Split(new [] { '_' }, StringSplitOptions.RemoveEmptyEntries).Select(s => s.Substring(0, 1).ToUpper() + s.Substring(1)).ToArray());
returns PriceOld
Second thing is way more difficult. As companysource could be CompanySource or maybe CompanysOurce, can be automated but is quite faulty. You will need an English dictionary, and do some guessing (ah well, I mean alot) on which combination of words is correct.

Try this:
public static string GetPascalCase(string name)
{
return Regex.Replace(name, #"^\w|_\w",
(match) => match.Value.Replace("_", "").ToUpper());
}
Console.WriteLine(GetPascalCase("price_old")); // => Should be PriceOld
Console.WriteLine(GetPascalCase("rank_old" )); // => Should be RankOld

With underscores:
s = Regex.Replace(s, #"(?:^|_)([a-z])",
m => m.Groups[1].Value.ToUpper());
Without underscores:
You're on your own there. But go ahead and search; I'd be surprised if nobody has done this before.

For your 2nd problem of splitting concatenated words, you could utilize our best friends Google & Co. If your concatenated input is made up of usual english words, the search engines have a good hit rate for the single words as an alternative search query
If you enter your sample input, Google and Bing suggest the following:
original | Google | Bing
=====================================================================
companysource | company source | company source
financialtrend | financial trend | financial trend
accountingchangetype | accounting changetype | accounting change type
See this exaple.
Writing a small screen scraper for that should be fairly easy.

for those who needs a non regex solution
public static string RemoveAllSpaceAndConcertToPascalCase(string status)
{
var textInfo = new System.Globalization.CultureInfo("en-US").TextInfo;
var titleCaseStr = textInfo.ToTitleCase(status);
string result = titleCaseStr.Replace("_","").Replace(" ", "");
return result;
}

Related

How to find one of many possible substrings in a larger string?

I have a simple problem, but I could not find a simple solution yet.
I have a string containing for example this
UNB+123UNH+234BGM+345DTM+456
The actual string is lots larger, but you get the idea
now I have a set of values I need to find in this string
for example UNH and BGM and DTM and so on
So I need to search in the large string, and find the position of the first set of values.
something like this (not existing but to explain the idea)
string[] chars = {"UNH", "BGM", "DTM" };
int pos = test.IndexOfAny(chars);
in this case pos would be 8 because from all 3 substrings, UNH is the first occurrence in the variable test
What I actually trying to accomplish is splitting the large string into a list of strings, but the delimiter can be one of many values ("BGM", "UNH", "DTM")
So the result would be
UNB+123
UNH+234
BGM+345
DTM+456
I can off course build a loop that does IndexOf for each of the substrings, and then remember the smallest value, but that seems so inefficient. I am hoping for a better way to do this
EDIT
the substrings to search for are always 3 letters, but the text in between can be anything at all with any length
EDIT
It are always 3 alfanumeric characters, and then anything can be there, also lots of + signs
You will find more problems with EDI than just splitting into corresponding fields, what about conditions or multiple values or lists?. I recommend you to take a look at EDI.net
EDIT:
EDIFact is a format pretty complex to just use regex, as I mentioned before, you will have conditions for each format/field/process, you will need to catch the whole field in order to really parse it, means as example DTM can have one specific datetime format and in another EDI can have a DateTime format totally different.
However, this is the structure of a DTM field:
DTM DATE/TIME/PERIOD
Function: To specify date, and/or time, or period.
010 C507 DATE/TIME/PERIOD M 1
2005 Date or time or period function code
qualifier M an..3
2380 Date or time or period text C an..35
2379 Date or time or period format code C an..3
So you will have always something like 'DTM+d3:d35:d3' to search for.
Really, it doesn't worth the struggle, use EDI.net, create your own POCO classes and work from there.
Friendly reminder that EDIFact changes every 6 months on Europe.
If the separators can be any one of UNB, UNH, BGM, or DTM, the following Regex could work:
foreach (Match match in Regex.Matches(input, #"(UNB|UNH|BGM|DTM).+?(?=(UNB|UNH|BGM|DTM)|$)"))
{
Console.WriteLine(match.Value);
}
Explanation:
(UNB|UNH|BGM|DTM) matches either of the separators
.+? matches any string with at least one character (but as short as possible)
(?=(UNB|UNH|BGM|DTM)|$) matches if either a separator follows or if the string ends there - the match is however not included in the value.
It sounds like the other answer recognises the format - you should definitely consider a library specifically for parsing this format!
If you're intent on parsing it yourself, you could simply find the index of your identifiers in the string, determine the first 2 by position, and use those positions to Substring the original input
var input = "UNB+123UNH+234BGM+345DTM+456";
var chars = new[]{"UNH", "BGM", "DTM" };
var indexes = chars.Select(c => new{Length=c.Length,Position= input.IndexOf(c)}) // Get position and length of each input
.Where(x => x.Position>-1) // where there is actually a match
.OrderBy(x =>x.Position) // put them in order of the position in the input
.Take(2) // only interested in first 2
.ToArray(); // make it an array
if(indexes.Length < 2)
throw new Exception("Did not find 2");
var result = input.Substring(indexes[0].Position + indexes[0].Length, indexes[1].Position - indexes[0].Position - indexes[0].Length);
Live example: https://dotnetfiddle.net/tDiQLG
There is already a lot of answers here, but I took the time to write mine so might as well post it even if it's not as elegant.
The code assumes all tags are accounted for in the chars array.
string str = "UNB+123UNH+234BGM+345DTM+456";
string[] chars = { "UNH", "BGM", "DTM" };
var locations = chars.Select(o => str.IndexOf(o)).Where(i => i > -1).OrderBy(o => o);
var resultList = new List<string>();
for(int i = 0;i < locations.Count();i++)
{
var nextIndex = locations.ElementAtOrDefault(i + 1);
nextIndex = nextIndex > 0 ? nextIndex : str.Length;
nextIndex = nextIndex - locations.ElementAt(i);
resultList.Add(str.Substring(locations.ElementAt(i), nextIndex));
}
This is a fairly efficient O(n) solution using a HashSet
It's extremely simple, low allocations, more efficient than regex, and doesn't need a library
Given
private static HashSet<string> _set;
public static IEnumerable<string> Split(string input)
{
var last = 0;
for (int i = 0; i < input.Length-3; i++)
{
if (!_set.Contains(input.Substring(i, 3))) continue;
yield return input.Substring(last, i - last);
last = i;
}
yield return input.Substring(last);
}
Usage
_set = new HashSet<string>(new []{ "UNH", "BGM", "DTM" });
var results = Split("UNB+123UNH+234BGM+345DTM+456");
foreach (var item in results)
Console.WriteLine(item);
Output
UNB+123
UNH+234
BGM+345
DTM+456
Full Demo Here
Note : You could get this faster with a simple sorted tree, but would require more effort

Verify empty field Selenium C#

I am trying to check if a text field is empty and I can't convert bool to string.
I am trying this:
var firstName = driver.FindElement(By.Id("name_3_firstname"));
if (firstName.Equals(" ")) {
Console.WriteLine("This field can not be empty");
}
Also, how can I check if certain number field is exactly 20 digits?
Can you help me do this?
Thank you in advance!
If it's string, then you can use string.Empty or "", because " " contains a space, therefore it's not empty.
For those 20 digits, you can use a bit of a workaround field.ToString().Length == 20 or you can repetitively divide it by 10 until the resulting value is 0, but I'd say the workaround might be easier to use.
This is more of a general C# answer. I'm not exactly sure how well it's gonna work in Selenium, but I've checked and string.Empty and ToString() appear to exist there.
For Empty / White space / Null, use following APIs of the string class
string.IsNullOrEmpty(value) or
string.IsNullOrWhiteSpace(value)
For exact 20 digits, best is to use the Regular expression as follows, this can also be converted to range and combination of digits and characters if required. Current regular expression ensures that beginning, end and all components are digits
string pattern = #"^\d{20}$";
var booleanResult = Regex.Match(value,pattern).Success
I'm not sure that this way will work in your case. Code:
var firstName = driver.FindElement(By.Id("name_3_firstname"));
will return to You IWebElement object. First you should try to get text of this element. Try something like firstName.Text or firstName.getAttribute("value");. When u will have this you will able to check
:
var text = firstName.getAttribute("value");
if(string.IsNullOrEmpty(text)){ // do something }
if(text.length == 20) {// do something}

Regular Expression finding quotations

i am trying to check if a string is a quotation with regex in C#.
For e.g.
string x = "The flora and fauna of Britain \"has been transported to almost every corner of the globe since colonial times\" (Plants and Animals of Britain, 1942: 8).;
string y = "Morris et al (2000: 47) state \"that the debate of these particular issues should be left to representative committees.\"";
x and y are two quotations and the regex (or alternative solution) should be able to return true.
I came with this but there is a small problem:
string pattern = #"([‘'""]([\w\W]+?)[)])|(([\w\W]+?)[(]([\w\W]+?)[’'""])";
Is there any alternatives? Thanks in advance.
The project is an anti-plagiarism web application. The application found that these strings(quotation) was copied from the web. Now assume the user wants not to include these quotations in the search results, the question is how to do it.
The search results are stored in database, i am using EF and linq as such:
var webSearches = _db.WebSearches.Where(x => x.SubmissionId == submissionId).GroupBy(x => x.PlagiarisedText).Select(x => x.FirstOrDefault()).OrderBy(x => x.Id);
I want to filter the result (plagiarisedText) by not including quotations.
Thanks for replies, I appreciate.
Use \\\".
Use Regex.IsMatch() to find if it contains or not.
Console.WriteLine(Regex.IsMatch(x, "\\\""));// true if it contains ", otherwise false
If Regex is not a requirement you can use String functions:
int first = str.IndexOf('"');
int last = str.LastIndexOf('"');
if (str.Substring(first, last - first) != string.Empty)
{
// true
}
If it will be true when the first and the end characters are both "s, then you can simple use the following regex:
".*"

C# - Searching strings

I can't seem to find a good solution to this issue. I've got an array of strings that are fed in from a report that I recieve about lost or stolen equipment. I've been using the string.IndexOf function through the rest of the form and it works quite well. This issue is with the field that says if the device was lost or stolen.
Example:
"Lost or Stolen? Lost"
"Lost or Stolen? Stolen"
I need to be able to read this but when I do string.IndexOf(#"Lost") it will always return lost because it's in the question.
Unfortunately I'm not able to change the form itself in any way and due to the nature of how it's submited I can't just write code the knocks the first 15 or so characters off the string because that may be too few in some cases.
I would really like something in C# that would allow me to continue to search a string after the first result is found so that the logic would look like:
string my_string = "Lost or Stolen? Stolen";
searchFor(#"Stolen" in my_string)
{
Found Stolen;
Does it have "or " infront of it? yes;
ignore and keep searching;
Found Stolen again;
return "Equipment stolen";
}
Couple of options here. You could look for the last index of a space and take the rest of the string:
string input = "Lost or Stolen? Stolen";
int lastSpaceIndex = input.LastIndexOf(' ');
string result = input.Substring(lastSpaceIndex + 1);
Console.WriteLine(result);
Or you could split it and take the last word:
string input = "Lost or Stolen? Lost";
string result = input.Split(' ').Last();
Console.WriteLine(result);
Regex is also an option, but overkill given the simpler solutions above. A nice shortcut that fits this scenario is to use the RegexOptions.RightToLeft option to get the first match starting from the right:
string result = Regex.Match(input, #"\w+", RegexOptions.RightToLeft).Value;
If I understand your requirement, you're looking for an instance of Lost or Stolen after a ?:
var q = myString.IndexOf("?");
var lost = q >= 0 && myString.IndexOf("Lost", q) > 0;
var stolen = q >= 0 && myString.IndexOf("Stolen", q) > 0;
// or
var lost = myString.LastIndexOf("Lost") > myString.IndexOf("?");
var stolen = myString.LastIndexOf("Stolen") > myString.IndexOf("?");
// don't forget
var neither = !lost && !stolen;
You can look for the string 'Lost' and if it occurs twice, then you can confirm it is 'Lost'.
Its possible in this case that you could use index of on a substring knowing that it is always going to say lost or stolen first
so you parse out the lost or stolen, then like for you keyword to match the remaining string.
something like:
int questionIndex = inputValue.indexOf("?");
string toMatch = inputValue.Substring(questionIndex);
if(toMatch == "Lost")
If it works for your use case, it might be easier to use .EndsWith().
bool lost = my_string.EndsWith("Lost");

Remove formatting from a string: "(123) 456-7890" => "1234567890"?

I have a string when a telephone number is inputted - there is a mask so it always looks like "(123) 456-7890" - I'd like to take the formatting out before saving it to the DB.
How can I do that?
One possibility using linq is:
string justDigits = new string(s.Where(c => char.IsDigit(c)).ToArray());
Adding the cleaner/shorter version thanks to craigmoliver
string justDigits = new string(s.Where(char.IsDigit).ToArray())
You can use a regular expression to remove all non-digit characters:
string phoneNumber = "(123) 456-7890";
phoneNumber = Regex.Replace(phoneNumber, #"[^\d]", "");
Then further on - depending on your requirements - you can either store the number as a string or as an integer. To convert the number to an integer type you will have the following options:
// throws if phoneNumber is null or cannot be parsed
long number = Int64.Parse(phoneNumber, NumberStyles.Integer, CultureInfo.InvariantCulture);
// same as Int64.Parse, but returns 0 if phoneNumber is null
number = Convert.ToInt64(phoneNumber);
// does not throw, but returns true on success
if (Int64.TryParse(phoneNumber, NumberStyles.Integer,
CultureInfo.InvariantCulture, out number))
{
// parse was successful
}
Since nobody did a for loop.
long GetPhoneNumber(string PhoneNumberText)
{
// Returns 0 on error
StringBuilder TempPhoneNumber = new StringBuilder(PhoneNumberText.Length);
for (int i=0;i<PhoneNumberText.Length;i++)
{
if (!char.IsDigit(PhoneNumberText[i]))
continue;
TempPhoneNumber.Append(PhoneNumberText[i]);
}
PhoneNumberText = TempPhoneNumber.ToString();
if (PhoneNumberText.Length == 0)
return 0;// No point trying to parse nothing
long PhoneNumber = 0;
if(!long.TryParse(PhoneNumberText,out PhoneNumber))
return 0; // Failed to parse string
return PhoneNumber;
}
used like this:
long phoneNumber = GetPhoneNumber("(123) 456-7890");
Update
As pr commented many countries do have zero's in the begining of the number, if you need to support that, then you have to return a string not a long. To change my code to do that do the following:
1) Change function return type from long to string.
2) Make the function return null instead of 0 on error
3) On successfull parse make it return PhoneNumberText
You can make it work for that number with the addition of a simple regex replacement, but I'd look out for higher initial digits. For example, (876) 543-2019 will overflow an integer variable.
string digits = Regex.Replace(formatted, #"\D", String.Empty, RegexOptions.Compiled);
Aside from all of the other correct answers, storing phone numbers as integers or otherwise stripping out formatting might be a bad idea.
Here are a couple considerations:
Users may provide international phone numbers that don't fit your expectations. See these examples So the usual groupings for standard US numbers wouldn't fit.
Users may NEED to provide an extension, eg (555) 555-5555 ext#343 The # key is actually on the dialer/phone, but can't be encoded in an integer. Users may also need to supply the * key.
Some devices allow you to insert pauses (usually with the character P), which may be necessary for extensions or menu systems, or dialing into certain phone systems (eg, overseas). These also can't be encoded as integers.
[EDIT]
It might be a good idea to store both an integer version and a string version in the database. Also, when storing strings, you could reduce all punctuation to whitespace using one of the methods noted above. A regular expression for this might be:
// (222) 222-2222 ext# 333 -> 222 222 2222 # 333
phoneString = Regex.Replace(phoneString, #"[^\d#*P]", " ");
// (222) 222-2222 ext# 333 -> 2222222222333 (information lost)
phoneNumber = Regex.Replace(phoneString, #"[^\d]", "");
// you could try to avoid losing "ext" strings as in (222) 222-2222 ext.333 thus:
phoneString = Regex.Replace(phoneString, #"ex\w+", "#");
phoneString = Regex.Replace(phoneString, #"[^\d#*P]", " ");
Try this:
string s = "(123) 456-7890";
UInt64 i = UInt64.Parse(
s.Replace("(","")
.Replace(")","")
.Replace(" ","")
.Replace("-",""));
You should be safe with this since the input is masked.
You could use a regular expression or you could loop over each character and use char.IsNumber function.
You would be better off using regular expressions. An int by definition is just a number, but you desire the formatting characters to make it a phone number, which is a string.
There are numerous posts about phone number validation, see A comprehensive regex for phone number validation for starters.
As many answers already mention, you need to strip out the non-digit characters first before trying to parse the number. You can do this using a regular expression.
Regex.Replace("(123) 456-7890", #"\D", String.Empty) // "1234567890"
However, note that the largest positive value int can hold is 2,147,483,647 so any number with an area code greater than 214 would cause an overflow. You're better off using long in this situation.
Leading zeros won't be a problem for North American numbers, as area codes cannot start with a zero or a one.
Alternative using Linq:
string phoneNumber = "(403) 259-7898";
var phoneStr = new string(phoneNumber.Where(i=> i >= 48 && i <= 57).ToArray());
This is basically a special case of C#: Removing common invalid characters from a string: improve this algorithm. Where your formatng incl. White space are treated as "bad characters"
'you can use module / inside sub main form VB.net
Public Function ClearFormat(ByVal Strinput As String) As String
Dim hasil As String
Dim Hrf As Char
For i = 0 To Strinput.Length - 1
Hrf = Strinput.Substring(i, 1)
If IsNumeric(Hrf) Then
hasil &= Hrf
End If
Next
Return Strinput
End Function
'you can call this function like this
' Phone= ClearFormat(Phone)
public static string DigitsOnly(this string phoneNumber)
{
return new string(
new[]
{
// phoneNumber[0], (
phoneNumber[1], // 6
phoneNumber[2], // 1
phoneNumber[3], // 7
// phoneNumber[4], )
// phoneNumber[5],
phoneNumber[6], // 8
phoneNumber[7], // 6
phoneNumber[8], // 7
// phoneNumber[9], -
phoneNumber[10], // 5
phoneNumber[11], // 3
phoneNumber[12], // 0
phoneNumber[13] // 9
});
}

Categories

Resources