How to extract an url from a String in C#

How to extract an url from a String in C# - c#

I have this string :
"<figure><img
src='http://myphotos.net/image.ashx?type=2&image=Images\\2\\9\\11\\12\\3\\8\\4\\7\\685621455625.jpg'
href='JavaScript:void(0);' onclick='return takeImg(this)'
tabindex='1' class='myclass' width='55' height='66' alt=\"myalt\"></figure>"
How can I retrieve this link :
http://myphotos.net/image.ashx?type=2&image=Images\\2\\9\\11\\12\\3\\8\\4\\7\\685621455625.jpg
All string are the same type so somehow I need to get substring between src= and href. But I don't know how to do that. Thanks.

If you parse HTML don't not use string methods but a real HTML parser like HtmlAgilityPack:
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html); // html is your string
var linksAndImages = doc.DocumentNode.SelectNodes("//a/#href | //img/#src");
var allSrcList = linksAndImages
.Select(node => node.GetAttributeValue("src", "[src not found]"))
.ToList();

You can use regex:
var src = Regex.Match("the string", "<img.+?src=[\"'](.+?)[\"'].*?>", RegexOptions.IgnoreCase).Groups[1].Value;

In general, you should use an HTML/XML parser when parsing a value from HTML code, but with a limited string like this, Regex would be fine.
string url = Regex.Match(htmlString, #"src='(.*?)'").Groups[1].Value;

If your string is always in same format, you can easily do this like so :
string input = "<figure><img src='http://myphotos.net/image.ashx?type=2&image=Images\\2\\9\\11\\12\\3\\8\\4\\7\\685621455625.jpg' href='JavaScript:void(0);' onclick='return takeImg(this)' tabindex='1' class='myclass' width='55' height='66' alt=\"myalt\"></figure>";
// link is between ' signs starting from the first ' sign so you can do :
input = input.Substring(input.IndexOf("'")).Substring(input.IndexOf("'"));
// now your string looks like : "http://myphotos.net/image.ashx?type=2&image=Images\\2\\9\\11\\12\\3\\8\\4\\7\\685621455625.jpg"
return input;

string str = "<figure><imgsrc = 'http://myphotos.net/image.ashx?type=2&image=Images\\2\\9\\11\\12\\3\\8\\4\\7\\685621455625.jpg'href = 'JavaScript:void(0);' onclick = 'return takeImg(this)'tabindex = '1' class='myclass' width='55' height='66' alt=\"myalt\"></figure>";
int pFrom = str.IndexOf("src = '") + "src = '".Length;
int pTo = str.LastIndexOf("'href");
string url = str.Substring(pFrom, pTo - pFrom);
Source :
Get string between two strings in a string

Q is your string in this case, i look for the index of the attribute you want (src = ') then I remove the first few characters (7 including spaces) and after that you look for when the text ends by looking for '.
With removing the first few characters you could use .IndexOf to look for how many to delete so its not hard coded.
string q =
"<figure><img src = 'http://myphotos.net/image.ashx?type=2&image=Images\\2\\9\\11\\12\\3\\8\\4\\7\\685621455625.jpg' href = 'JavaScript:void(0);' onclick = 'return takeImg(this)'" +
"tabindex = '1' class='myclass' width='55' height='66' alt=\"myalt\"></figure>";
string z = q.Substring(q.IndexOf("src = '"));
z = z.Substring(7);
z = z.Substring(0, z.IndexOf("'"));
MessageBox.Show(z);
This is certainly not the most elegant way (look at the other answers for that :)).

Related

Parse a graph url string C#

I have the following string
https://graph.microsoft.com/v1.0/groups/group-id/members/user-id/$ref
How do I parse the url string and get the value user-id?
var a = requestStep; //requestStep is of type Microsoft.Graph.BatchRequestStep
var b = requestStep.Request.RequestUri;
b has the value:
https://graph.microsoft.com/v1.0/groups/group-id/members/user-id/$ref

There are a few ways to skin that cat ...
string url = "https://graph.microsoft.com/v1.0/groups/group-id/members/user-id/$ref";
string userId = url.Split('/')[url.Split('/').Length - 1];
Console.WriteLine(userId);

You can also use the following regex pattern :
^https:\/\/graph\.microsoft\.com\/.+\/members\/(.+)\/\$ref$

How to get substrings from an Xpath using C#?

I have an Xpath property inside of a JSON file and I'd like to get two substrings from this Xpath to assisting these substrings into two variables.
The JSON object is as follows;
{
'selectGateway':'0',
'waitingTime':'20000',
'status':'200',
'correlationID':'1',
'matchString':[{'xpath':'/whitelist/locations/location/var-fields/var-field[#key="whitelist-entry" and #value="8.0440147AA44A80"]','value':''}],
'matchInteger':[],
'matchSortedList':[]
}
This is my attempt so far it's working properly, I'm just looking for a way to do this more dynamically and in a better way if it's possible.
int firstStringPositionForKey = matchString[index].xpath.IndexOf("#key=\"");
int secondStringPositionForKey = matchString[index].xpath.IndexOf("\" and");
string betweenStringForKey = matchString[index].xpath.Substring(firstStringPositionForKey+6, secondStringPositionForKey-firstStringPositionForKey-6);
int firstStringPositionForValue = matchString[index].xpath.IndexOf("#value=\"");
int secondStringPositionForValue = matchString[index].xpath.IndexOf("\"]");
string betweenStringForValue = matchString[index].xpath.Substring(firstStringPositionForValue+8, secondStringPositionForValue-firstStringPositionForValue-8);
I expect the output to be like:
key is : whitelist-entry
value is : 8.0440147AA44A80

I believe you are getting value of xPath in matchString[index].xpath, so here is the solution
//Test is nothing but your xPath
string test = "/whitelist/locations/location/var-fields/var-field[#key=\"whitelist-entry\" and #value=\"8.0440147AA44A80\"]";
//Split your string by '/' and get last element from it.
string lastElement = test.Split('/').LastOrDefault();
//Use regex to get text present in "<text>"
var matches = new Regex("\".*?\"").Matches(lastElement);
//Remove double quotes
var key = matches[0].ToString().Trim('\"');
var #value = matches[1].ToString().Trim('\"');;
//Print key and value
Console.WriteLine("Key is: ${key}");
Console.WriteLine("Value is: ${value}");
Output:
Key is: whitelist-entry
Value is: 8.0440147AA44A80
.net fiddle

Using Regex (Link to formula)
var obj = JObject.Parse("your_json");
var xpath = ((JArray)obj["matchString"])[0]["xpath"].Value<string>();
string pattern = "(?<=key=\")(.*?)(?=\").*(?<=value=\")(.*?)(?=\")";
var match = new Regex(pattern).Match(xpath);
string key = match.Groups[1].Value;
string value = match.Groups[2].Value;

Converting a string into SendKeys.Send() formatted string

Is there any smart and neat way to I guess escape characters in a string to make it compatible with the specific format the SendKeys uses?
At first I thought this would work:
line = Regex.Replace(line, #"\{{0}", "{{}");
line = Regex.Replace(line, #"\}{0}", "{}}");
But this won't work work because it's doing two checks and messes up the syntax entirely.
How can I handle this?

You can use some place holder instead of { and } and create the formatted result using those place holders. Then at last replace those place holders by { and }. For example:
string PrepareForSendKeys(string input)
{
var specialChars = "+^%~(){}";
var c1 = "[BRACEOPEN]";
var c2 = "[BRACECLOSE]";
specialChars.ToList().ForEach(x =>
{
input = input.Replace(x.ToString(),
string.Format("{0}{1}{2}", c1, x.ToString(), c2));
});
input = input.Replace(c1, "{");
input = input.Replace(c2, "}");
return input;
}
And you can use it this way:
var input = "some string containing + ^ % ~ ( ) { }";
MessageBox.Show(PrepareForSendKeys(input));
And the result would be:
some string containing {+} {^} {%} {~} {(} {)} {{} {}}

How to get the center part of string on C#

i need the center string of Rocky44 only using C#
Hi <span>Rocky44</span>
I tried the some split method but can't work
string[] result = temp.Split(new string[] { "<span>" , "</span>" }, StringSplitOptions.RemoveEmptyEntries);
Example:
Hi <span>Rocky44</span>
To:
Rocky44

Use an html parser. I will give an example using HtmlAgilityPack
string html = #"Hi <span>Rocky44</span>";
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
var text = doc.DocumentNode.SelectSingleNode("//span").InnerText;

You're on the right track; you're just not escaping your quotes correctly:
string[] result = temp.Split(new string[] { "<span>" , "</span>" }, StringSplitOptions.RemoveEmptyEntries);
Of course, this is assuming that your input will always be in exactly the given format. As I4V mentions, an HTML parser may come in handy if you're trying to do anything more complicated.

If you're only going to get this sort of thing (eg this sort of HTML) then I would use regex. Else, DO NOT USE IT.
string HTML = #"Hi <span>Rocky44</span>"
var result = Regex.Match(HTML, #".*<a.*><span.*>(.*)</span></a>").Groups[1].Value;

Find the index of <span> and </span> using the IndexOf method.
Then (adjusting for the length of <span>) use the String.Substring method to get the desired text.
string FindLinkText(string linkHtml)
{
int startIndex = linkHtml.IndexOf("<span>") + "<span>".Length,
length = linkHtml.IndexOf("</span>") - startIndex;
return linkHtml.Substring(startIndex, length);
}

how to use regex class for string maniplations

I am working on string maniplations using regex.
Source: string value = #"/webdav/MyPublication/Building%20Blocks/folder0/folder1/content_1.xml";
output required:
Foldername: folder1
content name: content
folderpath:/webdav/MyPublication/Building%20Blocks/folder0/folder1/
I am new to this, can any one say how it can be done using regex.
Thank you.

The rules you need seem to be the following:
Folder name = last string preceding a '/' character but not containing a '/' character
content name = last string not containing a '/' character until (but not including) a '_' or '.' character
folderpath = same as folder name except it can contain a '/' character
Assuming the rules above - you probably want this code:
string value = #"/webdav/MyPublication/Building%20Blocks/folder0/folder1/content_1.xml";
var foldernameMatch = Regex.Match(value, #"([^/]+)/[^/]+$");
var contentnameMatch = Regex.Match(value, #"([^/_\.]+)[_\.][^/]*$");
var folderpathMatch = Regex.Match(value, #"(.*/)[^/]*$");
if (foldernameMatch.Success && contentnameMatch.Success && folderpathMatch.Success)
{
var foldername = foldernameMatch.Groups[1].Value;
var contentname = contentnameMatch.Groups[1].Value;
var folderpath = folderpathMatch.Groups[1].Value;
}
else
{
// handle bad input
}
Note that you can also combine these to become one large regex, although it can be more cumbersome to follow (if it weren't already):
var matches = Regex.Match(value, #"(.*/)([^/]+)/([^/_\.]+)[_\.][^/]*$");
if (matches.Success)
{
var foldername = matches.Groups[2].Value;
var contentname = matches.Groups[3].Value;
var folderpath = matches.Groups[1].Value + foldername + "/";
}
else
{
// handle bad input
}

You could use named captures, but you're probably better off (from a security and implementation aspect) just using the Uri class.

I agree with Jeff Moser on this one, but to answer the original question, I believe the following regular expression would work:
^(\/.+\/)(.+?)\/(.+?)\.
edit: Added example.
var value = "/webdav/MyPublication/Building%20Blocks/folder0/folder1/content_1.xml";
var regex = Regex.Match(value, #"^(\/.+\/)(.+?)\/(.+?)\.");
// check if success
if (regex.Success)
{
// asssign the values from the regular expression
var folderName = regex.Groups[2].Value;
var contentName = regex.Groups[3].Value;
var folderPath = regex.Groups[1].Value;
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to extract an url from a String in C# - c#

You can use regex: var src = Regex.Match("the string", "<img.+?src=[\"'](.+?)[\"'].*?>", RegexOptions.IgnoreCase).Groups[1].Value;

In general, you should use an HTML/XML parser when parsing a value from HTML code, but with a limited string like this, Regex would be fine. string url = Regex.Match(htmlString, #"src='(.*?)'").Groups[1].Value;

Related

Parse a graph url string C#

How to get substrings from an Xpath using C#?

Converting a string into SendKeys.Send() formatted string

How to get the center part of string on C#

how to use regex class for string maniplations

Categories

Resources