How should I get the absolute URL in CsQuery? - c#

I'm trying to get the absolute URI of each anchor tag on a Wikipedia page. I think the .href property should give the absolute URI but when I'm trying it in CsQuery I'm finding that it still gives me the relative URI. How should I get the absolute URI?
static void Main(string[] args)
{
string url = "https://en.wikipedia.org/wiki/Barack_Obama";
var dom = CQ.CreateFromUrl(url);
var selected = dom["div#mw-content-text a"];
foreach (var a in selected)
Console.WriteLine(a["href"]);
}

CsQuery shows you whatever exists in HTML page...
You can simply do that:
string domain = "https://en.wikipedia.org";
var dom = CQ.CreateFromUrl(url);
List<string> urls = new List<string>();
dom["a[href]"].Each(dom=>{
string url = dom.GetAttribute("href");
if(!url.StartsWith("https"))
url = domain + url;
urls.Add(url);
});
});

Related

Can I get the current screen name on asp.net without strict encoding (Hard coding)?

Any way to get the current screen name of asp.net without hard coding?
string ScreenName = HttpContext.Current.Request.Url.AbsoluteUri;
I tried this and got the full url.
If you want to get the domain name from the url, use the following code:
string[] hostParts = new System.Uri(sURL).Host.Split('.');
string domain = String.Join(".", hostParts.Skip(Math.Max(0, hostParts.Length - 2)).Take(2));
or :
var host = new System.Uri(sURL).Host;
var domain = host.Substring(host.LastIndexOf('.', host.LastIndexOf('.') - 1) + 1);
where "sURL" is your URL.
I found a code. For me the string path is good
string url = HttpContext.Current.Request.Url.AbsoluteUri;
// http://localhost:1302/TESTERS/Default6.aspx
string path = HttpContext.Current.Request.Url.AbsolutePath;
// /TESTERS/Default6.aspx
string host = HttpContext.Current.Request.Url.Host;
// localhost

C# Downloading Instagram Profile As HTML

I have been trying to download an public Instagram profile to the fetch stats such as followers and bio. I have been doing this in a c# console application and downloading the HTML using HTML Agility Pack.
Code:
string url = #"https://www.instagram.com/" + Console.ReadLine() + #"/?hl=en";
Console.WriteLine();
HtmlWeb web = new HtmlWeb();
HtmlDocument document = web.Load(url);
document.Save(path1);
When I save it though all I get is a bunch of scripts and a blank screen:
I was wondering how to save the html once all the scripts had run and formed the content
When you retrieve content using a web request, it returns a HTML document which is then rendered by the browser to display the content.
Right now, you're saving the HTML document given to you by the server. Instead of doing this, you need to render it before getting the details. One way to do this is using a web browser control. If you set the URL to the instragram URL, let the rendering engine handle it and once the load event is fired by the control, you can get the rendered HTML output.
From there, you can deserialize as an XmlDocument and identify exactly what details you need to retrieve from the rendered output.
public MainWindow()
{
InitializeComponent();
WB_1.Navigate(#"https://www.instagram.com/" + Console.ReadLine() + #"/?hl=en");
WB_1.LoadCompleted += wb_LoadCompleted;
}
void wb_LoadCompleted(object sender, NavigationEventArgs e)
{
dynamic doc = WB_1.Document;
string htmlText = doc.documentElement.InnerHtml;
}
ANSWER
Thanks for the suggestions on how to download the HTML! I managed to return some instagram information in the end. Here is the code:
//(This was done using HTML Agility Pack)
string url = #"https://www.instagram.com/" + Console.ReadLine() + #"/?hl=en";
HtmlWeb web = new HtmlWeb();
HtmlDocument document = web.Load(url);
var metas = document.DocumentNode.Descendants("meta");
var followers = metas.FirstOrDefault(_ => _.HasProperty("name", "description"));
if (followers == null) { Console.WriteLine("Sorry, Can't Find Profile :("); return; }
var content = followers.Attributes["content"].Value.StopAt('-');
Console.WriteLine(content);
And HasProperty() & StopAt()
public static bool HasProperty(this HtmlNode node, string property, params string[] valueArray)
{
var propertyValue = node.GetAttributeValue(property, "");
var propertyValues = propertyValue.Split(' ');
return valueArray.All(c => propertyValues.Contains(c));
}
public static string StopAt(this string input, char stopAt)
{
int x = input.IndexOf(stopAt);
return input.Substring(0, x);
}
NOTE:
However this is still not the answer I am looking for. I still have a wreck of HTML which is not structred the same as the HTML I recieve when I look at it in Google Chrome. Doing some searching in the HTML I managed to scalvage the content-less html for a meta tag which contains the content. This is okay for this but if I going to continue this method of finding HTML content then it may not be the same :(

How do I get the SPListItem from absolute URL?

I have problems when I try to use SPListItem.
This is the code:
string URL = "http://vstkmy36773/Lists/Permissions/DispForm.aspx?ID=6&ContentTypeId=0x0100F385377F0CAD6C438A23B301CE04E7BF"
using (SPSite cSite = new SPSite(URL))
{
using (SPWeb cWeb = cSite.OpenWeb())
{
// SPFile file = cWeb.GetFile(URL);
// SPListItem item = file.Item;
SPListItem item = cWeb.GetListItem(URL);
int id = item.ID;
item["Title"] = id+ " update and get " + URL;
}
}
And the output
System.NullReferenceException: Object reference not set to an instance of an object.
at Custom.Workflow.Activities.AddListItemPermissionAssigment.Execute(ActivityExecutionContext executionContext)
That's not the proper URL of the actual list item, from SharePoint's perspective. It's just the URL of some page that happens to display that item, which is different.
You're going to need to parse that URL, extract out the required information from it (namely the list and item ID), and then use that information to find the item:
var queryStrings = HttpUtility.ParseQueryString(url);
var listGuid = Guid.Parse(queryStrings["ListId"]);
var itemId = int.Parse(queryStrings["ID"]);
var item = web.Lists[listGuid].GetItemById(itemId);
If you're curious what the actual item URL is, print out the item.URL property to see what it actually is for that item. That's what your URL would need to contain for your code to actually work.

Calling node.NiceUrl gives me # in Umbraco

Doing a project in Umbraco, and i've encountered problems in one case that when calling node.NiceUrl I get # as the result. What is weird though is that if i debug it somehow it resolves into the correct url.
var pages = Pages.Select((item, index) => new
{
Url = item.NiceUrl,
Selected = item.Id == currentPage.Id,
Index = index
}).ToList();
Where Pages is obtained from:
CurrentPage.Parent.ChildrenAsList
If I do it this way, it works, but I don't know why.
Url = new Node(item.Id).NiceUrl,
I've encountered this error and it was because the id belonged to a media node.
Media is treated differently to other content and there's no easy way of getting the url because different types of media store the url in different ways depending on context. That's why the NiceUrl function doesn't work for media (according to the umbraco developers).
My specific scenario was using images that had been selected using a media picker. I got the url via the following code. I wrapped it up in an extension method so you can consume it from a template in a convenient way.
public static string GetMediaPropertyUrl(this IPublishedContent thisContent, string alias, UmbracoHelper umbracoHelper = null)
{
string url = "";
if (umbracoHelper == null)
umbracoHelper = new UmbracoHelper(UmbracoContext.Current);
var property = thisContent.GetProperty(alias);
string nodeID = property != null ? property.Value.ToString() : "";
if (!string.IsNullOrWhiteSpace(nodeID))
{
//get the media via the umbraco helper
var media = umbracoHelper.TypedMedia(nodeID);
//if we got the media, return the url property
if (media != null)
url = media.Url;
}
return url;
}
Try like this
Url = umbraco.library.NiceUrl(Item.Id);

C# I want the url, not the physical pathname

For this line of code;
string link = HttpContext.Current.Server.MapPath("/Contract/Details/" + this.ContractId.ToString());
I get the physical pathname on C drive.
What I want is the url, ie
http://localhost:1234/Contract/Details/1
How do I get this?
// Use the Uri constructor to form a URL relative to the current page
Uri linkUri = new Uri(HttpContext.Current.Request.Url, "/Contract/Details/" + this.ContractId.ToString());
string link = linkUri.ToString();
try this:
string url = HttpContext.Current.Request.Url.AbsoluteUri;
There's a great article on .Net paths # http://west-wind.com/weblog/posts/132081.aspx
Take a look at the Url or PathInfo property.
Uri base = new Uri("http://localhost:1234/";);
Uri file = new Uri(host, "/Contract/Details/" + this.ContractId.ToString());
string URL = file.AbsoluteUri;

Categories

Resources