How to pass the second occurrence of a blankspace in regex? - c#

Duration : 00:05:48.73
File Size 61.5M
As you can see the two lines. One of them has a : separating the word and the number, the other one has a blankspace separating the word and number.
I need to separate the word from the number for both the cases (for : as well as for blankspace).
I used String.Split(':') and String.Split(null). While the String.Split(':') worked, and there were only two items in the array, String.Split(null) resulted in the following items in the array: File, Size, 61.5M. So three items. I want to make that into two.
this is the code I'm using:
private static Regex _regex = new Regex(#"^([\p{L}_ ]+):?(.+)$");
Match match = _regex.Match(line);
if (match.Success)
{
string key = match.Groups[1].Captures[0].Value;
string value = match.Groups[2].Captures[0].Value;
}

your split may not work, but if you use the regex you mentioned that should work fine.
Please find the attached fiddle. https://dotnetfiddle.net/g0apnE
var line1 = "Duration : 00:05:48.73";
var line2 = "File Size 61.5M";
Regex _regex = new Regex(#"^([\p{L}_ ]+):?(.+)$");
Match match = _regex.Match(line1);
if (match.Success)
{
string key = match.Groups[1].Captures[0].Value;
string value = match.Groups[2].Captures[0].Value;
//call trim to remove extra space around.
Console.WriteLine(key.Trim()); //Duration
Console.WriteLine(value.Trim()); //00:05:48.73
}
match = _regex.Match(line2);
if (match.Success)
{
string key = match.Groups[1].Captures[0].Value;
string value = match.Groups[2].Captures[0].Value;
//call trim to remove extra space around.
Console.WriteLine(key.Trim()); //File Size
Console.WriteLine(value.Trim()); //61.5M
}

Related

How to check if a string contains a word and ignore special characters?

I need to check if a sentence contains any of the word from a string array but while checking it should ignore special characters like comma. But the result should have original sentence.
For example, I have a sentence "Tesla car price is $ 250,000."
In my word array I've wrdList = new string[5]{ "250000", "Apple", "40.00"};
I have written the below line of code, but it is not returning the result because 250,000 and 250000 are not matching.
List<string> res = row.ItemArray.Where(itmArr => wrdList.Any(wrd => itmArr.ToString().ToLower().Contains(wrd.ToString()))).OfType<string>().ToList();
And one important thing is, I need to get original sentence if it matches with string array.
For example, result should be "Tesla car price is $ 250,000."
not like "Tesla car price is $ 250000."
How about Replace(",", "")
itmArr.ToString().ToLower().Replace(",", "").Contains(wrd.ToString())
side note: .ToLower() isn't required since digits are case insensitive and a string doesn't need .ToString()
so the resuld could also be
itmArr.Replace(",", "").Contains(wrd)
https://dotnetfiddle.net/A2zN0d
Update
sice the , could be a different character - culture based, you can also use
ystem.Threading.Thread.CurrentThread.CurrentCulture.NumberFormat.NumberGroupSeparator
instead
The first option to consider for most text matching problems is to use regular expressions. This will work for your problem. The core part of the solution is to construct an appropriate regular expression to match what you need to match.
You have a list of words, but I'll focus on just one word. Your requirements specify that you want to match on a "word". So to start with, you can use the "word boundary" pattern \b. To match the word "250000", the regular expression would be \b250000\b.
Your requirements also specify that the word can "contain" characters that are "special". For it to work correctly, you need to be clear what it means to "contain" and which characters are "special".
For the "contain" requirement, I'll assume you mean that the special character can be between any two characters in the word, but not the first or last character. So for the word "250000", any of the question marks in this string could be a special character: "2?5?0?0?0?0".
For the "special" requirement, there are options that depend on your requirements. If it's simply punctuation, you can use the character class \p{P}. If you need to specify a specific list of special characters, you can use a character group. For example, if your only special character is comma, the character group would be [,].
To put all that together, you would create a function to build the appropriate regular expression for each target word, then use that to check your sentence. Something like this:
public static void Main()
{
string sentence = "Tesla car price is $ 250,000.";
var targetWords = new string[]{ "250000", "350000", "400000"};
Console.WriteLine($"Contains target word? {ContainsTarget(sentence, targetWords)}");
}
private static bool ContainsTarget(string sentence, string[] targetWords)
{
return targetWords.Any(targetWord => ContainsTarget(sentence, targetWord));
}
private static bool ContainsTarget(string sentence, string targetWord)
{
string targetWordExpression = TargetWordExpression(targetWord);
var re = new Regex(targetWordExpression);
return re.IsMatch(sentence);
}
private static string TargetWordExpression(string targetWord)
{
var sb = new StringBuilder();
// If special characters means a specific list, use this:
string specialCharacterMatch = $"[,]?";
// If special characters means any punctuation, then you can use this:
//string specialCharactersMatch = "\\p{P}?";
bool any = false;
foreach (char c in targetWord)
{
if (any)
{
sb.Append(specialCharacterMatch);
}
any = true;
sb.Append(c);
}
return $"\\b{sb}\\b";
}
Working code: https://dotnetfiddle.net/5UJSur
Hope below solution can help,
Used Regular expression for removing non alphanumeric characters
Returns the original string if it contains any matching word from wrdList.
string s = "Tesla car price is $ 250,000.";
string[] wrdList = new string[3] { "250000", "Apple", "40.00" };
Regex rgx = new Regex("[^a-zA-Z0-9 -]");
string str = rgx.Replace(s, "");
if (wrdList.Any(str.Contains))
{
Console.Write(s);
}
else
{
Console.Write("No Match Found!");
}
Uplodade on fiddle for more exploration
https://dotnetfiddle.net/zbwuDy
In addition for paragraph, can split into array of sentences and iterate through. Check the same on below fiddle.
https://dotnetfiddle.net/AvO6FJ

How to match the first occurrence of a character and split it

I have a text file from which I want to store Keys and Values in a String array.
In this case, Key is something like "Input File" and the Value is "'D:\myfile.wav'". I'm splitting the text file lines by **:** character. However, I just want to restrict the split to only the first occurrence of **:**.
This is my code:
Input File : 'D:\myfile.wav'
Duration : 00:00:18.57
if (Regex.IsMatch(line, #"[^0-9\p{L}:_ ]+", RegexOptions.IgnoreCase))
{
string[] dataArray = line.Split(':');
}
Using regular expression captures
private static Regex _regex = new Regex(#"^([\p{L}_ ]+):?(.+)$");
....
Match match = _regex.Match(line);
if (match.Success)
{
string key = match.Groups[1].Captures[0].Value;
string value = match.Groups[2].Captures[0].Value;
}
The regexp is a static member to avoid compiling it for every usage. The ? in the expression is to force lazy behavior (greedy is the default) and match the first :.
Link to Fiddle.
Edit
I've updated the code and fiddle after your comment. I think this is what you mean:
Key: Any letter, underscore and whitespace combination (no digits)
Value: anything
Separator between key and value: :
Basically, you do not want to split your entire string, but to skip all the content before encountering first ':' char plus one symbol (':' itself).
var data = line.Substring(line.IndexOf(':') + 1);
Or if you really want solution with Split:
var data = string.Join(":", line.Split(':').Skip(1));
Here, we first split the string into array, then skip one element (the one we are trying to get rid of), and finally construct a new string with ':' between elements in the array.
Here's one way to do it with regex (comments in code):
string[] lines = {#"Input File : 'D:\myfile.wav'", #"Duration: 00:00:18.57"};
Regex regex = new Regex("^[^:]+");
Dictionary<string, string> dict = new Dictionary<string, string>();
for (int i = 0; i < lines.Length; i++)
{
// match in the string will be everything before first :,
// then we replace match with empty string and remove first
// character which will be :, and that will be the value
string key = regex.Match(lines[i]).Value.Trim();
string value = regex.Replace(lines[i], "").Remove(0, 1).Trim();
dict.Add(key, value);
}
It uses pattern ^[^:]+, which is negated class technique to match everything unless specified character.
you need to read put information to String Line
after that, do this.
String Key = Line.Split( ':' )[0];
String Value = Text.Substring( Key.Length + 1, Text.Length - Property.Length - 1 );
On this way you can read each line of the text file. You fill the json with Key = until the ":" Value= From the ":"
Dictionary<string, string> yourDictionary = new Dictionary<string, string>();
string pathF = "C:\\fich.txt";
StreamReader file = new StreamReader(pathF, Encoding.Default);
string step = "";
List<string> stream = new List<string>();
while ((step = file.ReadLine()) != null)
{
if (!string.IsNullOrEmpty(step))
{
yourDictionary.Add(step.Substring(0, step.IndexOf(':')), step.Substring(step.IndexOf(':') + 1));
}
}

Find all occurrences of substring in string

I have a text file that includes things such as the following:
_vehicle_12 = objNull;
if (true) then
{
_this = createVehicle ["Land_Mil_Guardhouse", [13741.654, 2926.7075, 3.8146973e-006], [], 0, "CAN_COLLIDE"];
_vehicle_12 = _this;
_this setDir -92.635818;
_this setPos [13741.654, 2926.7075, 3.8146973e-006];
};
I want to find all occurrences between { and }; and assign the following strings:
string direction = "_this setDir" value, in example _vehicle_12 it would mean that:
string direction = "-92.635818";
string position = "_this setPos" value, in example _vehicle_12 it would be:
string position = "[13741.654, 2926.7075, 3.8146973e-006]";
I have multiple occurrences of these types and would like to figure out the best way each time the { }; occurs to set direction and position and move onto the next occurrence.
The following code can read the string (that holds the file in a large string) and it finds the first occurence fine, however I would like to adapt it to finding every occurrence of the { and };
string alltext = File.ReadAllText(#file);
string re1 = ".*?"; // Non-greedy match on filler
string re2 = "(\\{.*?\\})"; // Curly Braces 1
Regex r = new Regex(re1 + re2, RegexOptions.IgnoreCase | RegexOptions.Singleline);
Match m = r.Match(alltext);
if (m.Success)
{
String cbraces1 = m.Groups[1].ToString();
MessageBox.Show("Found vehicle: " + cbraces1.ToString() + "\n");
}
Think of a regex that might work
Test it.
If it does not work, modify and return to step 2.
You have a working regex :-)
To get you started:
\{\n([0-z\[\]" ,-\.=]+;\n)+\}
should return the individual lines inside the curly braces.

Get partial string from string

I have the following string:
This isMyTest testing
I want to get isMyTest as a result. I only have two first characters available("is"). The rest of the word can vary.
Basically, I need to select a first word delimeted by spaces which starts with chk.
I've started with the following:
if (text.contains(" is"))
{
text.LastIndexOf(" is"); //Should give me index.
}
now I cannot find the right bound of the word since I need to match on something like
You can use regular expressions:
string pattern = #"\bis";
string input = "This isMyTest testing";
return Regex.Matches(input, pattern);
You can use IndexOf to get the index of the next space:
int startPosition = text.LastIndexOf(" is");
if (startPosition != -1)
{
int endPosition = text.IndexOf(' ', startPosition + 1); // Find next space
if (endPosition == -1)
endPosition = text.Length - 1; // Select end if this is the last word?
}
What about using a regex match? Generally if you are searching for a pattern in a string (ie starting with a space followed by some other character) regex are perfectly suited to this. Regex statements really only fall apart in contextually sensitive areas (such as HTML) but are perfect for a regular string search.
// First we see the input string.
string input = "/content/alternate-1.aspx";
// Here we call Regex.Match.
Match match = Regex.Match(input, #"[ ]is[A-z0-9]*", RegexOptions.IgnoreCase);
// Here we check the Match instance.
if (match.Success)
{
// Finally, we get the Group value and display it.
string key = match.Groups[1].Value;
Console.WriteLine(key);
}

Search string pattern

If I have a string like MCCORMIC 3H R Final 08-26-2011.dwg or even MCCORMIC SMITH 2N L Final 08-26-2011.dwg and I wanted to capture the R in the first string or the L in the second string in a variable, what is the best method for doing so? I was thinking about trying the below statement but it does not work.
string filename = "MCCORMIC 3H R Final 08-26-2011.dwg"
string WhichArea = "";
int WhichIndex = 0;
WhichIndex = filename.IndexOf("Final");
WhichArea = filename.Substring(WhichIndex - 1,1); //Trying to get the R in front of word Final
Just split by space:
var parts = filename.Split(new [] {' '},
StringSplitOptions.RemoveEmptyEntries);
WhichArea = parts[parts.Length - 3];
It looks like the file names have a very specific format, so this will work just fine.
Even with any number of spaces, using StringSplitOptions.RemoveEmptyEntries means spaces will not be part of the split result set.
Code updated to deal with both examples - thanks Nikola.
I had to do something similar, but with Mirostation drawings instead of Autocad. I used regex in my case. Here's what I did, just in case you feel like making it more complex.
string filename = "MCCORMIC 3H R Final 08-26-2011.dwg";
string filename2 = "MCCORMIC SMITH 2N L Final 08-26-2011.dwg";
Console.WriteLine(TheMatch(filename));
Console.WriteLine(TheMatch(filename2));
public string TheMatch(string filename) {
Regex reg = new Regex(#"[A-Za-z0-9]*\s*([A-Z])\s*Final .*\.dwg");
Match match = reg.Match(filename);
if(match.Success) {
return match.Groups[1].Value;
}
return String.Empty;
}
I don't think Oded's answer covers all cases. The first example has two words before the wanted letter, and the second one has three words before it.
My opinion is that the best way to get this letter is by using RegEx, assuming that the word Final always comes after the letter itself, separated by any number of spaces.
Here's the RegEx code:
using System.Text.RegularExpressions;
private string GetLetter(string fileName)
{
string pattern = "\S(?=\s*?Final)";
Match match = Regex.Match(fileName, pattern);
return match.Value;
}
And here's the explanation of RegEx pattern:
\S(?=\s*?Final)
\S // Anything other than whitespace
(?=\s*?Final) // Positive look-ahead
\s*? // Whitespace, unlimited number of repetitions, as few as possible.
Final // Exact text.

Categories

Resources