how to use regex class for string maniplations - c#

I am working on string maniplations using regex.
Source: string value = #"/webdav/MyPublication/Building%20Blocks/folder0/folder1/content_1.xml";
output required:
Foldername: folder1
content name: content
folderpath:/webdav/MyPublication/Building%20Blocks/folder0/folder1/
I am new to this, can any one say how it can be done using regex.
Thank you.

The rules you need seem to be the following:
Folder name = last string preceding a '/' character but not containing a '/' character
content name = last string not containing a '/' character until (but not including) a '_' or '.' character
folderpath = same as folder name except it can contain a '/' character
Assuming the rules above - you probably want this code:
string value = #"/webdav/MyPublication/Building%20Blocks/folder0/folder1/content_1.xml";
var foldernameMatch = Regex.Match(value, #"([^/]+)/[^/]+$");
var contentnameMatch = Regex.Match(value, #"([^/_\.]+)[_\.][^/]*$");
var folderpathMatch = Regex.Match(value, #"(.*/)[^/]*$");
if (foldernameMatch.Success && contentnameMatch.Success && folderpathMatch.Success)
{
var foldername = foldernameMatch.Groups[1].Value;
var contentname = contentnameMatch.Groups[1].Value;
var folderpath = folderpathMatch.Groups[1].Value;
}
else
{
// handle bad input
}
Note that you can also combine these to become one large regex, although it can be more cumbersome to follow (if it weren't already):
var matches = Regex.Match(value, #"(.*/)([^/]+)/([^/_\.]+)[_\.][^/]*$");
if (matches.Success)
{
var foldername = matches.Groups[2].Value;
var contentname = matches.Groups[3].Value;
var folderpath = matches.Groups[1].Value + foldername + "/";
}
else
{
// handle bad input
}

You could use named captures, but you're probably better off (from a security and implementation aspect) just using the Uri class.

I agree with Jeff Moser on this one, but to answer the original question, I believe the following regular expression would work:
^(\/.+\/)(.+?)\/(.+?)\.
edit: Added example.
var value = "/webdav/MyPublication/Building%20Blocks/folder0/folder1/content_1.xml";
var regex = Regex.Match(value, #"^(\/.+\/)(.+?)\/(.+?)\.");
// check if success
if (regex.Success)
{
// asssign the values from the regular expression
var folderName = regex.Groups[2].Value;
var contentName = regex.Groups[3].Value;
var folderPath = regex.Groups[1].Value;
}

Related

C# - Delete the slash at the end of URI

I would like to remove the slash at the end of the URI, while keeping the query parameter.
For example:
I get an object of type URI as input with the value:
https://stackoverflow.com/questions/ask/
I would like to remove the last "/" to get:
https://stackoverflow.com/questions/ask
In some cases I may have parameters:
https://stackoverflow.com/questions/ask/?test=test1
I would like to get:
https://stackoverflow.com/questions/ask?test=test1
Thanks in advance.
string url1 = "https://stackoverflow.com/questions/ask/";
string url2 = "https://stackoverflow.com/questions/ask/?test=test1";
//remove slash at end
Console.WriteLine(url1.TrimEnd('/'));
//remove slash preceding query parameters
int last = url2.LastIndexOf('?');
url2 = url2.Remove(last - 1, 1);
Console.WriteLine(url2);
There probably is a way to search and replace the last slash using Regex as well, since there is Regex.Replace()
You can first check if the URL has any query parameters by checking latest index of ? character. If there is ?, take before that, remove last character (which is / that you don't want) and combine it again. If it doesn't have any query parameters, just remove the last character! You can create an extension method like this;
public static string RemoveSlash(this string url)
{
var queryIndex = url.LastIndexOf('?');
if (queryIndex >= 0)
{
var urlWithoutQueryParameters = url[..queryIndex];
if (urlWithoutQueryParameters.EndsWith("/"))
{
urlWithoutQueryParameters = urlWithoutQueryParameters[..^1];
}
var queryParemeters = url[queryIndex..];
return urlWithoutQueryParameters + queryParemeters;
}
else if (url.EndsWith("/"))
{
return url[..^1];
}
return url;
}
Then you can use it like this;
var url1 = "https://stackoverflow.com/questions/ask";
var url2 = "https://stackoverflow.com/questions/ask/";
Console.WriteLine(url1.RemoveSlash()); // https://stackoverflow.com/questions/ask
Console.WriteLine(url2.RemoveSlash()); // https://stackoverflow.com/questions/ask
var url3 = "https://stackoverflow.com/questions/ask/?test=test1";
var url4 = "https://stackoverflow.com/questions/ask?test=test1";
Console.WriteLine(url3.RemoveSlash()); // https://stackoverflow.com/questions/ask?test=test1
Console.WriteLine(url4.RemoveSlash()); // https://stackoverflow.com/questions/ask?test=test1

How to extract name from a file name in the form "<name>_<fileNum>of<fileNumTotal>" or "<name>"?

a user specifies a file name that can be either in the form "<name>_<fileNum>of<fileNumTotal>" or simply "<name>". I need to somehow extract the "<name>" part from the full file name.
Basically, I am looking for a solution to the method "ExtractName()" in the following example:
string fileName = "example_File"; \\ This var is specified by user
string extractedName = ExtractName(fileName); // Must return "example_File"
fileName = "example_File2_1of5";
extractedName = ExtractName(fileName); // Must return "example_File2"
fileName = "examp_File_3of15";
extractedName = ExtractName(fileName); // Must return "examp_File"
fileName = "example_12of15";
extractedName = ExtractName(fileName); // Must return "example"
Edit: Here's what I've tried so far:
ExtractName(string fullName)
{
return fullName.SubString(0, fullName.LastIndexOf('_'));
}
But this clearly does not work for the case where the full name is just "<name>".
Thanks
This would be easier to parse using Regex, because you don't know how many digits either number will have.
var inputs = new[]
{
"example_File",
"example_File2_1of5",
"examp_File_3of15",
"example_12of15"
};
var pattern = new Regex(#"^(.+)(_\d+of\d+)$");
foreach (var input in inputs)
{
var match = pattern.Match(input);
if (!match.Success)
{
// file doesn't end with "#of#", so use the whole input
Console.WriteLine(input);
}
else
{
// it does end with "#of#", so use the first capture group
Console.WriteLine(match.Groups[1].Value);
}
}
This code returns:
example_File
example_File2
examp_File
example
The Regex pattern has three parts:
^ and $ are anchors to ensure you capture the entire string, not just a subset of characters.
(.+) - match everything, be as greedy as possible.
(_\d+of\d+) - match "_#of#", where "#" can be any number of consecutive digits.

How to extract an url from a String in C#

I have this string :
"<figure><img
src='http://myphotos.net/image.ashx?type=2&image=Images\\2\\9\\11\\12\\3\\8\\4\\7\\685621455625.jpg'
href='JavaScript:void(0);' onclick='return takeImg(this)'
tabindex='1' class='myclass' width='55' height='66' alt=\"myalt\"></figure>"
How can I retrieve this link :
http://myphotos.net/image.ashx?type=2&image=Images\\2\\9\\11\\12\\3\\8\\4\\7\\685621455625.jpg
All string are the same type so somehow I need to get substring between src= and href. But I don't know how to do that. Thanks.
If you parse HTML don't not use string methods but a real HTML parser like HtmlAgilityPack:
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html); // html is your string
var linksAndImages = doc.DocumentNode.SelectNodes("//a/#href | //img/#src");
var allSrcList = linksAndImages
.Select(node => node.GetAttributeValue("src", "[src not found]"))
.ToList();
You can use regex:
var src = Regex.Match("the string", "<img.+?src=[\"'](.+?)[\"'].*?>", RegexOptions.IgnoreCase).Groups[1].Value;
In general, you should use an HTML/XML parser when parsing a value from HTML code, but with a limited string like this, Regex would be fine.
string url = Regex.Match(htmlString, #"src='(.*?)'").Groups[1].Value;
If your string is always in same format, you can easily do this like so :
string input = "<figure><img src='http://myphotos.net/image.ashx?type=2&image=Images\\2\\9\\11\\12\\3\\8\\4\\7\\685621455625.jpg' href='JavaScript:void(0);' onclick='return takeImg(this)' tabindex='1' class='myclass' width='55' height='66' alt=\"myalt\"></figure>";
// link is between ' signs starting from the first ' sign so you can do :
input = input.Substring(input.IndexOf("'")).Substring(input.IndexOf("'"));
// now your string looks like : "http://myphotos.net/image.ashx?type=2&image=Images\\2\\9\\11\\12\\3\\8\\4\\7\\685621455625.jpg"
return input;
string str = "<figure><imgsrc = 'http://myphotos.net/image.ashx?type=2&image=Images\\2\\9\\11\\12\\3\\8\\4\\7\\685621455625.jpg'href = 'JavaScript:void(0);' onclick = 'return takeImg(this)'tabindex = '1' class='myclass' width='55' height='66' alt=\"myalt\"></figure>";
int pFrom = str.IndexOf("src = '") + "src = '".Length;
int pTo = str.LastIndexOf("'href");
string url = str.Substring(pFrom, pTo - pFrom);
Source :
Get string between two strings in a string
Q is your string in this case, i look for the index of the attribute you want (src = ') then I remove the first few characters (7 including spaces) and after that you look for when the text ends by looking for '.
With removing the first few characters you could use .IndexOf to look for how many to delete so its not hard coded.
string q =
"<figure><img src = 'http://myphotos.net/image.ashx?type=2&image=Images\\2\\9\\11\\12\\3\\8\\4\\7\\685621455625.jpg' href = 'JavaScript:void(0);' onclick = 'return takeImg(this)'" +
"tabindex = '1' class='myclass' width='55' height='66' alt=\"myalt\"></figure>";
string z = q.Substring(q.IndexOf("src = '"));
z = z.Substring(7);
z = z.Substring(0, z.IndexOf("'"));
MessageBox.Show(z);
This is certainly not the most elegant way (look at the other answers for that :)).

Get specific words from string c#

I am working on a final year project. I have a file that contain some text. I need to get words form this file that contain "//jj" tag. e.g abc//jj, bcd//jj etc.
suppose file is containing the following text
ffafa adada//bb adad ssss//jj aad adad adadad aaada dsdsd//jj
dsdsd sfsfhf//vv
dfdfdf
I need all the words that are associated with //jj tag. I am stuck here past few days.
My code that i am trying
// Create OpenFileDialog
Microsoft.Win32.OpenFileDialog dlg = new Microsoft.Win32.OpenFileDialog();
// Set filter for file extension and default file extension
dlg.DefaultExt = ".txt";
dlg.Filter = "Text documents (.txt)|*.txt";
// Display OpenFileDialog by calling ShowDialog method
Nullable<bool> result = dlg.ShowDialog();
// Get the selected file name and display in a TextBox
string filename = string.Empty;
if (result == true)
{
// Open document
filename = dlg.FileName;
FileNameTextBox.Text = filename;
}
string text;
using (var streamReader = new StreamReader(filename, Encoding.UTF8))
{
text = streamReader.ReadToEnd();
}
string FilteredText = string.Empty;
string pattern = #"(?<before>\w+) //jj (?<after>\w+)";
MatchCollection matches = Regex.Matches(text, pattern);
for (int i = 0; i < matches.Count; i++)
{
FilteredText="before:" + matches[i].Groups["before"].ToString();
//Console.WriteLine("after:" + matches[i].Groups["after"].ToString());
}
textbx.Text = FilteredText;
I cant find my result please help me.
With LINQ you could do this with one line:
string[] taggedwords = input.Split(' ').Where(x => x.EndsWith(#"//jj")).ToArray();
And all your //jj words will be there...
Personally I think Regex is overkill if that's definitely how the string will look. You haven't specified that you definitely need to use Regex so why not try this instead?
// A list that will hold the words ending with '//jj'
List<string> results = new List<string>();
// The text you provided
string input = #"ffafa adada//bb adad ssss//jj aad adad adadad aaada dsdsd//jj dsdsd sfsfhf//vv dfdfdf";
// Split the string on the space character to get each word
string[] words = input.Split(' ');
// Loop through each word
foreach (string word in words)
{
// Does it end with '//jj'?
if(word.EndsWith(#"//jj"))
{
// Yes, add to the list
results.Add(word);
}
}
// Show the results
foreach(string result in results)
{
MessageBox.Show(result);
}
Results are:
ssss//jj
dsdsd//jj
Obviously this is not quite as robust as a regex, but you didn't provide any more detail for me to go on.
You have an extra space in your regex, it assumes there's a space before "//jj". What you want is:
string pattern = #"(?<before>\w+)//jj (?<after>\w+)";
This regular expression will yield the words you are looking for:
string pattern = "(\\S*)\\/\\/jj"
A bit nicer without backslash escaping:
(\S*)\/\/jj
Matches will include the //jj but you can get the word from the first bracketed group.

Splitting a string which contain multiple symbols to get specific values

I cannot believe I am having trouble with this following string
String filter = "name=Default;pattern=%%;start=Last;end=Now";
This is a short and possibly duplicate question, but how would I split this string to get:
string Name = "Default";
string Pattern = "%%" ;
string start = "Last" ;
string end = "Now" ;
Reason why I ask is my deadline is very soon, and this is literally the last thing I must do. I'm Panicking, and I'm stuck on this basic command. I tried:
pattern = filter.Split(new string[] { "pattern=", ";" },
StringSplitOptions.RemoveEmptyEntries)[1]; //Gets the pattern
startDate = filter.Split(new string[] { "start=", ";" },
StringSplitOptions.RemoveEmptyEntries)[1]; //Gets the start date
I happen to get the pattern which I needed, but as soon as I try to split start, I get the value as "Pattern=%%"
What can I do?
Forgot to mention
The list in this string which needs splitting may not be in any particular order . this is a single sample of a string which will be read out of a stringCollection (reading these filters from Properties.Settings.Filters
Using string.Split this is a two stage process.
In the first case split on ; to get an array of keyword and value pairs:
string[] values = filter.Split(';');
Then loop over the resultant list splitting on = to get the keywords and values:
foreach (string value in values)
{
string[] pair = value.Split('=');
string key = pair[0];
string val = pair[1];
}
String filter = "name=Default;pattern=%%;start=Last;end=Now";
string[] temp = filter.Split('=');
string name = temp[1].Split(';')[0];
string pattern = temp[2].Split(';')[0];
string start = temp[3].Split(';')[0];
string end = temp[4].Split(';')[0];
This should do the trick:
string filter = "name=Default;pattern=%%;start=Last;end=Now";
// Make a dictionary.
var lookup = filter
.Split(';')
.Select(keyValuePair => keyValuePair.Split('='))
.ToDictionary(parts => parts[0], parts => parts[1]);
// Get values out of the dictionary.
string name = lookup["name"];
string pattern = lookup["pattern"];
string start = lookup["start"];
string end = lookup["end"];
The start date ends up at the thrird position in the array:
startDate = filter.Split(new string[] { "start=", ";" }, StringSplitOptions.RemoveEmptyEntries)[2];
Instead of splitting the string once for each value, you might want to split it into the separate key-value pairs, then split each pair:
string[] pairs = filter.Split(';');
string[] values = pairs.Select(pair => pair.Split('=')[1]).ToArray();
string name = values[0];
string pattern = values[1];
string start = values[2];
string end = values[3];
(This code of course assumes that the key-value pairs always come in the same order.)
You could also split the string into intersperced array, so that every other item is a key or a value:
string[] values = filter.Split(new string[] { "=", ";" }, StringSplitOptions.None);
string name = values[1];
string pattern = values[3];
string start = values[5];
string end = values[7];
Edit:
To handle key-values in any order, make a lookup from the string, and pick values from it:
ILookup<string, string> values =
filter.Split(';')
.Select(s => s.Split('='))
.ToLookup(p => p[0], p => p[1]);
string name = values["name"].Single();
string pattern = values["pattern"].Single();
string start = values["start"].Single();
string end = values["end"].Single();
You can use SingleOrDefault if you want to support values being missing from the string:
string name = values["name"].SingleOrDefault() ?? "DefaultName";
The lookup also supports duplicate key-value pairs. If there might be duplicates, just loop through the values:
foreach (var string name in values["name"]) {
// do something with the name
}
Well I tried something like this:
var result = "name=Default;pattern=%%;start=Last;end=Now".Split(new char[]{'=',';'});
for(int i=0;i<result.Length; i++)
{
if(i%2 == 0) continue;
Console.WriteLine(result[i]);
}
and the output is:
Default
%%
Last
Now
Is this what you want?
You see, the thing is now that your Split on filter a second time still starts from the beginning of the string, and it matches against ;, so since the string hasn't changed, you still retrieve previous matches (so your index accessor is off by X).
You could break this down into it's problem parts, such that:
var keyValues = filter.Split(';');
var name = keyValues[0].Split('=')[1];
var pattern = keyValues[1].Split('=')[1];
var start = keyValues[2].Split('=')[1];
var end = keyValues[3].Split('=')[1];
Note that the above code is potentially prone to error, and as such should be properly altered.
You can use the following:
String filter = "name=Default;pattern=%%;start=Last;end=Now";
string[] parts = filter.Split(';');
string Name = parts[0].Substring(parts[0].IndexOf('=') + 1);
string Pattern = parts[1].Substring(parts[1].IndexOf('=') + 1);
string start = parts[2].Substring(parts[2].IndexOf('=') + 1);
string end = parts[3].Substring(parts[3].IndexOf('=') + 1);
Use this:
String filter = "name=Default;pattern=%%;start=Last;end=Now";
var parts = filter.Split(';').Select(x => x.Split('='))
.Where(x => x.Length == 2)
.Select(x => new {key = x[0], value=x[1]});
string name = "";
string pattern = "";
string start = "";
string end = "";
foreach(var part in parts)
{
switch(part.key)
{
case "name":
name = part.value;
break;
case "pattern":
pattern = part.value;
break;
case "start":
start = part.value;
break;
case "end":
end = part.value;
break;
}
}
If you don't need the values in named variables, you only need the second line. It returns an enumerable with key/value pairs.
My solution has the added benefits that the order of those key/value pairs in the string is irrelevant and it silently ignores invalid parts instead of crashing.
I found a simple solution on my own too. Most of your answers would have worked if the list would have been in the same order every single time, but it wont be. the format however, will always stay the same. The solution is a simple iteration using a foreach loop, and then checking if it starts with a certain word, namely, the word I am looking for, like Name, Pattern etc.
Probably not the most cpu efficient way of doing it, but it is C# for dummies level. Really brain-fade level.
Here is my beauty.
foreach (string subfilter in filter.Split(';')) //filter.Split is a string [] which can be iterated through
{
if (subfilter.ToUpper().StartsWith("PATTERN"))
{
pattern = subfilter.Split('=')[1];
}
if (subfilter.ToUpper().StartsWith("START"))
{
startDate = subfilter.Split('=')[1];
}
if (subfilter.ToUpper().StartsWith("END"))
{
endDate = subfilter.Split('=')[1];
}
}

Categories

Resources