I'm querying Wikipedia using LinqToWiki library for c#.
In particular I want to retrieve the full image url that points to wiki page File:Cinnamomum_verum.jpg
Using the official Media Wiki API the request is: http://it.wikipedia.org/w/api.php?action=query&prop=imageinfo&iiprop=url&titles=File:Cinnamomum_verum.jpg
As you can see just by entering in a browser, the xml response contains imageinfo structure, in particular the url .
I cannot retrieve this information using LinqToWiki.
I use the following code:
var c = wiki.CreateTitlesSource("File:Cinnamomum_verum.jpg");
var source = pages
.Select(
p =>
PageResult.Create(
p.info,
p.imageinfo()
.Select(i => new { i.comment }).ToEnumerable())
).ToEnumerable();
foreach (var item in list)
{
foreach (var item2 in item.Data)
{
//retrieve all urls detected
}
}
The first foreach statement correctly retrieves one element (the page), but the inner one return none.
Anybody encountered the same problem? Am I missing anything?
You're not missing anything, I just didn't expect that pages that don't actually exist (on the wiki you're using) could have useful data. I'll try to fix this soon, but as a temporary workaround, you could query http://commons.wikimedia.org directly for images from there.
EDIT: I have updated LinqToWiki, the new version should handle imageinfo correctly.
Related
I am trying to retrieve all of the items in a list from a SharePoint site. The fields are titled "Review Level Title", "Reviewer IDs", and "Review Level Priority". What I'm trying to do is to get the information from all three fields seperately, put them into the object I created, and then return the list with all of the objects I have created for each SharePoint item.
I have researched a lot on how to access this information from the SharePoint site, but I can not get it to work. Here is what I have created so far:
public List<OperationsReviewLevel> Get()
{
var operationsReviewLevels = new List<OperationsReviewLevel>();
ClientContext context = new ClientContext(ConfigurationManager.AppSettings["SharePointEngineeringChangeRequest"]);
var SPList = context.Web.Lists.GetByTitle("Review Levels");
CamlQuery query = new CamlQuery();
ListItemCollection entries = SPList.GetItems(query);
context.Load(entries);
context.ExecuteQuery();
foreach(ListItem currentEntry in entries)
{
operationsReviewLevels.Add(new OperationsReviewLevel(currentEntry["Review Level Title"].ToString(), currentEntry["Reviewer IDs"].ToString(), (int)currentEntry["Review Level Priority"]));
}
return operationsReviewLevels;
}
Whenever I try this code, I receive an error saying:
Microsoft.SharePoint.Client.PropertyOrFieldNotInitializedException: The property or field has not been initialized. It has not been requested or the request has not been executed. It may need to be explicitly requested.
I can not find any solutions to this error (in my scenario) online, and was wondering if anyone could see what I am doing wrong in this scenario.
Thanks everyone!
After reading the comment from Alessandra Amosso under my question, I ended up debugging entries. It took a lot of digging in the debugger, but I was able to find what the field names were being retrieved as. Debugging your ListItemCollection, if you go into Data, then any entry there, and then into FieldValues, you can see what each field value should be retrieved as.
In my case, all spaces were replaces with _x0020_ and the word priority was cut to just priorit due to length of the field name.
With this, I was able to change my foreach loop to:
foreach (ListItem currentEntry in entries)
{
operationsReviewLevels.Add(new OperationsReviewLevel(currentEntry["Review_x0020_Level_x0020_Title"].ToString(), currentEntry["Reviewer_x0020_IDs"].ToString(), Convert.ToInt32(currentEntry["Review_x0020_Level_x0020_Priorit"].ToString())));
}
And it now works properly.
Hope this helps anyone in the future!
Guess you're using SharePoint online, SharePoint online will remove the field special characters as staticname when creating fields, for example: Review Level Title will be ReviewLevelTitle.
Here is my test code.
foreach (ListItem currentEntry in entries)
{
Console.WriteLine(currentEntry["ReviewLevelTitle"].ToString()+'-'+ currentEntry["ReviewerIDs"].ToString()+'-'+ currentEntry["ReviewLevelPriority"]);
//operationsReviewLevels.Add(new OperationsReviewLevel(currentEntry["Review Level Title"].ToString(), currentEntry["Reviewer IDs"].ToString(), (int)currentEntry["Review Level Priority"]));
}
If you're not using SharePoint online,make sure the fields match also.
In SharePoint Online site via my Office 365 account, I've added a column - "CustomerId" to my documents. I want to find all documents with CustomerId of 102 in C# (not in JavaScript).
So far, I'm able to get all files under a folder
var files = graphClient.Sites[siteId].Drive.Items[driveItemId]
.Children.Request().GetAsync().Result;
Or see the same result in Graph Explorer
https://graph.microsoft.com/v1.0/sites/{siteId}/drive/items/{driveItemId}/children
but I haven't figured out the correct syntax to get all documents (driveIems) using the custom column filter condition in C# or in Graph Explorer. Examples of things I've tried:
In C#
var files = graphClient.Sites[siteId].Drive.Items[driveItemId]
.Search("fields/CustomerId eq 102").Request().GetAsync().Result;
In Graph Explorer
https://graph.microsoft.com/v1.0/sites/{siteId}/drive/items/{driveItemId}/search(q='CustomerId eq 102')
Hope someone can help me out on this.
Update:
Previously I got the driveItemId from
var customerFolder = graphClient.Sites[siteId].Drive.Root
.ItemWithPath("CustomerGroup/IndustryGroup").Request().GetAsync().Result;
string driveItemId = customerFolder.Id;
I see I can get a ListItem
var customerFolder = graphClient.Sites[siteId].Drive.Root
.ItemWithPath("CustomerGroup/IndustryGroup").Request()
.Expand(d => d.ListItem).GetAsync().Result;
but I only found a list ID of "4" from customerFolder.ListItem.Id
How shall I get a list ID so that I can use it in graphClient.Sites[siteId].Lists[listId]?
I would suggest to utilize the following query:
https://graph.microsoft.com/v1.0/sites/{site-id}/lists/{list-id}/items?filter=fields/CustomerId eq 123&expand=driveItem
Explanation:
filter items in a list via filter query option
return associated drive items for a list item via expand query option
Here is an example for msgraph-sdk-dotnet:
var request = await graphClient.Sites[siteId].Lists[listId].Items.Request().Filter("fields/CustomerId eq 123").Expand(item => item.DriveItem).GetAsync();
foreach (var item in request)
{
Console.WriteLine(item.DriveItem.WebUrl);
}
Update
The underlying document library list (along with its properties) for a drive could be retrieved like this:
var list = await graphClient.Sites[siteId].Drive.List.Request().GetAsync();
Console.WriteLine(list.Id); //gives SharePoint List Id
Note: https://graph.microsoft.com/beta/sites/{site-id}/drive
endpoint returns the default drive (document library) for this site
Reference
Working with SharePoint sites in Microsoft Graph
I am trying to setup a very basic search index, to index all items in a specific folder. I haven't really used much searching, but I'm trying to use out-of-the-box features, because its a very simple search. I just want to index all the fields. The sitecore documentation really doesn't provide much information - I've read a few blogs, and they all seem to suggest that I need the advanced database crawler (http://trac.sitecore.net/AdvancedDatabaseCrawler) - basically, something to the effect of 'it won't work without a custom crawler).
Is this right? I just want to create a simple index, and then start using it. What is the simplest way to do this, without any shared modules or otherwise? I went through the documentation on sitecore, but its not very clear (at least to me). It defines different elements of the index configuration in web.config, but doesn't really explain what they do, and what values are available. Maybe I'm not looking in the right place..
A simple way of creating new Lucene index in Sitecore with all the items below the specific node in just 3 steps:
1: Add the configuration below to the configuration/sitecore/search/configuration/indexes in Sitecore configuration:
<!-- id must be unique -->
<index id="my-custom-index" type="Sitecore.Search.Index, Sitecore.Kernel">
<!-- name - not sure if necessary but use id and forget about it -->
<param desc="name">$(id)</param>
<!-- folder - name of directory on the hard drive -->
<param desc="folder">__my-custom-index</param>
<!-- analyzer - reference to analyzer defined in Sitecore.config -->
<Analyzer ref="search/analyzer" />
<!-- list of locations to index - each of the with unique xml tag -->
<locations hint="list:AddCrawler">
<!-- first location (and the only one in this case) - specific folder from you question -->
<!-- type attribute is the crawler type - use default one in this scenario -->
<specificfolder type="Sitecore.Search.Crawlers.DatabaseCrawler,Sitecore.Kernel">
<!-- indexing itmes from master database -->
<Database>master</Database>
<!-- your folder path -->
<Root>/sitecore/content/home/my/specific/folder</Root>
</specificfolder>
</locations>
</index>
2: Rebuild the new index (only one time, all further changes will be detected automatically):
SearchManager.GetIndex("my-custom-index").Rebuild();
3: Use the new index:
// use id of from the index configuration
using (IndexSearchContext indexSearchContext = SearchManager.GetIndex("my-custom-index").CreateSearchContext())
{
// MatchAllDocsQuery will return everything. Use proper query from the link below
SearchHits hits = indexSearchContext.Search(new MatchAllDocsQuery(), int.MaxValue);
// Get Sitecore items from the results of the query
List<Item> items = hits.FetchResults(0, int.MaxValue).Select(result => result.GetObject<Item>()).Where(item => item != null).ToList();
}
Here is a pdf describing Sitecore Search and Indexing.
And here is a blog post about Troubleshooting Sitecore Lucene search and indexing.
Here is Lucene query syntax tutorial
and Introducing Lucene.Net
Sitecore Search Contrib (new name for advanced database crawler) is the best option, you just configure its config in the app config folder to tell it start path database etc.
You can then use its API to search within folders, by template type, where a certain field has a certain value. Here is a code example.
MultiFieldSearchParam parameters = new MultiFieldSearchParam();
parameters.Database = "web";
parameters.InnerCondition = QueryOccurance.Should;
parameters.FullTextQuery = searchTerm;
parameters.TemplateIds = array of pipe seperated ID's
var refinements = Filters.Select(item => new MultiFieldSearchParam.Refinement(item.Value, item.Key.ToString())).ToList();
parameters.Refinements = refinements;
//The actual Search
var returnItems = new List<Item>();
var runner = new QueryRunner(IndexName);
var skinnyItems = runner.GetItems(new[] {parameters});
skinnyItems.ForEach(x => returnItems.Add(Database.GetItem(new ItemUri(x.ItemID))));
return returnItems;
Otherwise you can just configure the web.config for standard lucene search and use this code to search. (Data base to use "web", start item etc)
public Item[] Search(string searchterms)
{
var children = new List<Item>();
var searchIndx = SearchManager.GetIndex(IndexName);
using (var searchContext = searchIndx.CreateSearchContext())
{
var ftQuery = new FullTextQuery(searchterms);
var hits = searchContext.Search(ftQuery);
var results = hits.FetchResults(0, hits.Length);
foreach (SearchResult result in results)
{
if (result.GetObject<Item>() != null)
{
//Regular sitecore item returned
var resultItem = result.GetObject<Item>();
if (ParentItem == null)
{
children.Add(resultItem);
}
else if (resultItem.Publishing.IsPublishable(DateTime.Now, false) &&
ItemUtilities.IsDecendantOfItem(ParentItem, resultItem))
{
children.Add(resultItem);
}
}
}
}
return children.ToArray();
}
Then you can download Lucene Index Viewer extension for Sitecore to view the index or you can download the Lucene Tool to view the indexes. See if you can populate the documents (files in your indexes). These are called 'Documents' in Lucene and technically these documents are content item present under the node that you specified.
Brian Pedersen has a nice post on it. You would start with a simple crawler. Need to download the Advanced Database Crawler and add the reference to your project after building it.
Then you have to create the config files which is mentioned in Brian's Blog and you have to copy as it is (except for the template id's n all). You get the point basically here.
Then you can download Lucene Index Viewer extension for Sitecore to view the index or you can download the Lucene Tool to view the indexes. See if you can populate the documents (files in your indexes). These are called 'Documents' in Lucene and technically these documents are content item present under the node that you specified.
Hope this helps!
Let me google that for you.
I want to get book information such as author name / pages / publish year / etc ...
from amazon using HtmlAgilityPack but seems amazon webpages have some problems and I can't access the appropriate fields.
here is what I've done :
I use Firefox and Firebug + FirePath to retrieve desired XPath and then inside my code I summon HtmlAgilityPack and instruct it to get information using acquired XPath that I've got it from Firebug
but no luck and till now I couldn't access the "Product Details" part of the amazon.com
and this is my XPath (which is working only with HtmlAgilityPack)
HtmlAgilityPack.HtmlNodeCollection cnt = doc.DocumentNode.SelectNodes("//*[#class='content']");
int i=1;
foreach (HtmlAgilityPack.HtmlNode content in cnt)
{
if (i != 3)
{
i++;
continue;
}
if (i == 3) // i==3 means I've reached the product details but I can't go any further :(
{
s = content.SelectSingleNode("").OuterHtml;
// break;
}
}
How can I access Product Details using appropriate understandable XPath for HtmlAgilityPack?
And why does the syntax of Firebug + FirePath XPath is different from HtmlAgilityPack?
As #Mystere said, I suggest using the API. But if you are doing this for test purpose, or just because you want to use web scraping to obtain the info (I'm not sure if Amazon allows it or not. You should check it before doing this), here is the thing:
Why are you doing this?
s = content.SelectSingleNode("").OuterHtml;
The following is what you are looking for in case you want to get the HTML source of that part of the page.
s = content.OuterHtml;
When you are scraping, I suggest you trying to identify the part you need to scrape, and see the particularities of that block of content.
If you use:
var node = doc.DocumentNode.SelectNodes("//td[#class='bucket']/div[#class='content']");
that will give you the Product Details block you are looking for.
If you want to get some fields like Paperback, Publisher, ... you can do:
string paperback = node.SelectSingleNode("./ul/li[1]/text()").InnerText;
string publisher = node.SelectSingleNode("./ul/li[2]/text()").InnerText;
string language = node.SelectSingleNode("./ul/li[3]/text()").InnerText;
...
If you want to be sure that the XPath you are using will be correct for HtmlAgilityPack, open the page on Internet Explorer 8 (or 9) and use the Developer Tools (F12) to get the XPath. The thing is that each browser renders the HTML in a particular way. For example, you will always see <tbody> tags in Firefox right after a <table>, so maybe HtmlAgilityPack doesn't, and that simple detail of adding /tbody/ to your XPath can make your program fail.
Why don't you just use amazon's web service api that is designed to do this?
I've been given the task of content migration from another CMS system to SharePoint 2010.
The data in the old system is fairly easy to capture and the page hierarchy is simple so I'm not worried about that.
However, I am completely flummoxed about how to even create a page in code. I'm using the Microsoft.SharePoint.Client namespace as I do not have sharepoint installed on my system and am wanting to code this up as a console application and so I'm using I'm using ClientContext. (On the other hand, I am willing to go into other solutions if necessary).
My end-game: To get a page uploaded into some folder hierarchy which uses a master page, has the page title in a header web part, and a big ol' content-editable web part in the body so any user can come along and edit the content.
Things I've tried so far:
Using FileCollection.Add() to add an aspx file to the folder "Site Pages". This renders the html in the browser but doesn't enable any features for the user to edit the page
Using ListItemCollection.Add() to add a page to the site, but I didn't know what fields I needed. Also I remember it came up with a runtime error saying I should use FileCollection.Add()
Uploading to 'Site Pages' instead of 'Pages'
So many others... ow my head :(
The only plausible thing I can see on the net is to use the PublishingPage type along with PublishingWeb. However, PublishingWeb can only be constructed from an SPWeb object which requires me to be actually hosting the sharepoint application on my workstation.
If anyone can lend a hand that would be greatly appreciated :)
Here is a method I use to create pages. It seems a more supported way of creating pages than mr Aquino's. Though this is for MOSS 2007 I'm sure the equivalent exists in 2010. Also, I'd recommend to create console apps using the full object model. You'll have to run it on the server itself but that doesn't seem much of a problem for a migration? This way you won't be limited in any way.
public static void CreatePage(string url, string pageName, string title, string layoutName, Dictionary<string, string> fieldDataCollection)
{
var relUrl = new Uri(url);
using (SPSite site = new SPSite(url))
using (SPWeb web = site.AllWebs[relUrl.AbsolutePath])
{
if (!PublishingWeb.IsPublishingWeb(web))
throw new ArgumentException("The specified web is not a publishing web.");
PublishingWeb pubweb = PublishingWeb.GetPublishingWeb(web);
PageLayout layout = null;
string availableLayouts = string.Empty;
foreach (PageLayout lo in pubweb.GetAvailablePageLayouts())
{
availableLayouts += "\t" + lo.Name + "\r\n";
if (lo.Name.ToLowerInvariant() == layoutName.ToLowerInvariant())
{ layout = lo; break; }
}
if (layout == null)
throw new ArgumentException("The layout specified could not be found. Available layouts are:\r\n" + availableLayouts);
if (!pageName.ToLowerInvariant().EndsWith(".aspx")) pageName += ".aspx";
PublishingPage page = pubweb.GetPublishingPages().Add(pageName, layout);
page.Title = title;
SPListItem item = page.ListItem;
foreach (string fieldName in fieldDataCollection.Keys)
{
string fieldData = fieldDataCollection[fieldName];
try
{
SPField field = item.Fields.GetFieldByInternalName(fieldName);
if (field.ReadOnlyField)
{
Console.WriteLine("Field '{0}' is read only and will not be updated.", field.InternalName);
continue;
}
if (field.Type == SPFieldType.Computed)
{
Console.WriteLine("Field '{0}' is a computed column and will not be updated.", field.InternalName);
continue;
}
if (field.Type == SPFieldType.URL)
{
item[field.Id] = new SPFieldUrlValue(fieldData);
}
else if (field.Type == SPFieldType.User)
{
// AddListItem.SetUserField(web, item, field, fieldData);
}
else
{
item[field.Id] = fieldData;
}
}
catch (ArgumentException)
{
Console.WriteLine("WARNING: Could not set field {0} for item {1}.", fieldName, item.ID);
}
}
page.Update();
}
}
I don't see a way of creating a publishing page without the actual publishing methods.
When you create a new article page it will only create a few xml parameters inside the page, the layout itself lives in the /_catalogs/masterpage/article-XXXX.aspx file.
You can try downloading a native file created in the Pages document library, understand its structure, fill the XML with your data and then uploading it back to the Pages document library using the FileCollection -- that's my only guess.
Edit: sample Article Page
<%# Page Inherits="Microsoft.SharePoint.Publishing.TemplateRedirectionPage,Microsoft.SharePoint.Publishing,Version=12.0.0.0,Culture=neutral,PublicKeyToken=71e9bce111e9429c" %>
<%# Reference VirtualPath="~TemplatePageUrl" %>
<%# Reference VirtualPath="~masterurl/custom.master" %>
<html xmlns:mso="urn:schemas-microsoft-com:office:office" xmlns:msdt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882"><head>
<!--[if gte mso 9]><xml>
<mso:CustomDocumentProperties>
<mso:PublishingContact msdt:dt="string">1073741823</mso:PublishingContact>
<mso:display_urn_x003a_schemas-microsoft-com_x003a_office_x003a_office_x0023_PublishingContact msdt:dt="string">System Account</mso:display_urn_x003a_schemas-microsoft-com_x003a_office_x003a_office_x0023_PublishingContact>
<mso:PublishingContactPicture msdt:dt="string"></mso:PublishingContactPicture>
<mso:PublishingContactName msdt:dt="string"></mso:PublishingContactName>
<mso:ContentTypeId msdt:dt="string">0x010100C568DB52D9D0A14D9B2FDCC96666E9F2007948130EC3DB064584E219954237AF390078FB5FE740F6714B9595501175ECD8F000727044016EAB3B45B9E104498E366C85</mso:ContentTypeId>
<mso:Comments msdt:dt="string"></mso:Comments>
<mso:PublishingContactEmail msdt:dt="string"></mso:PublishingContactEmail>
<mso:PublishingPageLayout msdt:dt="string">http://dmserver008/_catalogs/masterpage/ArticlePage.aspx, EstudoAndre</mso:PublishingPageLayout>
</mso:CustomDocumentProperties>
</xml><![endif]--><title>New Article</title></head>
To grab one, hit the Pages library => Content Menu => Send To => Download a Copy
Uploading a page file should work, as long as you get the settings right on the item as well as the document itself. After you upload the file you can set the content type and properties appropriately. If you create a page manually first, you should be able to get an object that has all the right settings.
However, I would strongly recommend getting set up to develop a console app that will run on the sharepoint server rather than relying on the web services. The server side apis (including PublishingPage) tend to be a lot easier to work with.