Kentico CMS 10 - DocumentHelper Attachment URL - c#

When using the CMS.DocumentEngine.DocumentHelper to retrieve documents of a custom page type, how would one get the path of an image attached to a Direct Uploader form control property type?
If documents/page are being retrieved in the following way:
DocumentQuery documents = DocumentHelper.GetDocuments("Custom.PageType")
.Path("/some-path", PathTypeEnum.Children)
.Page(page, size);
Then looping through retrieved documents to project to a custom class:
foreach(var document in documents) {
Foo foo = new Foo(document.getStringValue("SomeCustomColumnName", ""));
Using getStringValue() targeting the Direct Uploader field/property of the custom page type returns a document GUID such as "123456f-5fa9-4f64-8f4b-75c52db096d5".
In transformations, I can use GetFileUrl() such as GetFileUrl("AttachmentColumnName") to get the path, how could I use that in ASPX code behind when working with retrieved documents?
The custom page type data is provided to the client via an ASMX service and Ajax. The returned JSON data is used to generate/append markup to the page, including an <img />.
Thank you for any help you can provide!

You can try something like this in your item template:
/CMSPages/GetFile.aspx?guid=<%# Eval("SomeCustomColumnName")%>
You can also check to see if SomeCustomColumnName is empty or not before generating that link too if you don't want a broken image.
The other option is to convert that file upload to a URL selector and allow the users to upload files to the media library. This is probably a better approach IMHO.

It took me a few hours to figure this out.
In my case, I was able to get the image URL using the following:
Guid pageGuid = ValidationHelper.GetGuid(this.GetValue("ProductPage"), new Guid());
// Creates a new instance of the Tree provider
TreeProvider tree = new TreeProvider(MembershipContext.AuthenticatedUser);
CMS.DocumentEngine.TreeNode doc = DocumentHelper.GetDocuments()
.Where(d => d.NodeGUID == pageGuid)
CMS.DocumentEngine.TreeNode pageNode = tree
var document = DocumentHelper.GetDocument(pageNode, tree);
Guid imageGuid = document.GetGuidValue("ImageProperty", new Guid());
DocumentHelper.GetAttachmentUrl(imageGuid, 0);


Extracting string from Html page using C#

I have a source html page and I want to do the following:
extracting a specific string from the whole html page and save the new choosing string in a new html page.
creating a database on MySQL with 4 columns.
importing the data from the html page to the table on MySql.
I would be pretty thankful and grateful if someone could help me in that cause I have no that perfect knowledge of using C#.
You could use this code :
HttpClient http = new HttpClient();
//I have put you could use any.
var response = await http.GetByteArrayAsync("");
String source = Encoding.GetEncoding("utf-8").GetString(response, 0, response.Length - 1);
source = WebUtility.HtmlDecode(source);
HtmlDocument Nodes = new HtmlDocument();
In the Nodes object, you will have all the DOM elements in the HTML page.
You could use linq to filter out whatever you need.
Example :
List<HtmlNode> RequiredNodes = Nodes.DocumentNode.Descendants()
.Where(x => x.Attributes["Class"].Contains("List-Item")).ToList();
You will probably need to install Html Agility Pack NuGet or download it from the link.
hope this helps.

C# htmlAgility Webscrape html node inside the first Html node

I am trying to access these nodes
on this website.
however they appear to be in a secondary Html document within the initial one.
I am confused how I access the secondary html path and then parse through for the
this is an example of one of the nodes.
<div style="top:219px;left:555px;width:45px;height:14px;" id="" class="mls29">2</div>
I am using htmlAgility pack and I recieve null whenever I try to access Div.
I tried working my way down the nodes but It didn't work.
Any help or a place to look up the necessary information to figure this out would be appreciated
var webGet = new HtmlWeb();
var document = webGet.Load(" 623d-4f6a-9e49-e2e46ede136c&Report=Yes");
var divTags = document.DocumentNode.SelectNodes("/html");
var text = document.DocumentNode.InnerText;
You will be able to scrape the data if you access the following url:
HtmlWeb w = new HtmlWeb();
var hd = w.Load("");
var presentedBy = hd.DocumentNode.CssSelect("");
if (presentedBy != null)
As an example, scraping the Presented By field:
I use ScrapySharp nuget package along with HtmlAgilityPack, so I can scrape using css selectors instead of xpath expressions - something I find easier to do.
The url you are scraping from is your problem. I am scraping from the last get request that is performed after the page is loaded, as you can see in the screenshot below, using Firefox developer tools to analyze the site traffic/network requests/responses:
I could not yet identify who/what triggers this http request in the end (may be by javascript code, may be via one of the frame htmls that are requested in the main document (the frame-enabled one).
If you only have a couple of urls like this to scrape, then even manually extracting the correct url will be an option.

How do I copy content from Word Document another with images and links?

I've had some problem when copying content from a Word document to another Word document.
The document where the information should end up in have a header.
So far I have managed to copy the content to the second document and not affecting the header.
However I can't figure out how to bind the relationships for links and Images.
This is my code so far:
public static void AddContentToTemplateCopy(
string sourceDocumentPath, string endDocumentPath)
using (WordprocessingDocument sourceDoc =
WordprocessingDocument.Open(sourceDocumentPath, false))
using (WordprocessingDocument endDoc =
WordprocessingDocument.Open(endDocumentPath, true))
var sourceMainPart = sourceDoc.MainDocumentPart;
var sourceBody = sourceMainPart.Document.Body;
var endSection = endDoc.MainDocumentPart.Document.Body.Elements<SectionProperties>();
var endDocMainPart = endDoc.MainDocumentPart;
var sourceBodyClone = sourceBody.CloneNode(true);
sourceBodyClone.ReplaceChild(endSection.FirstOrDefault().CloneNode(true), sourceBodyClone.Elements<SectionProperties>().FirstOrDefault());
endDocMainPart.Document.ReplaceChild(sourceBodyClone, endDocMainPart.Document.Body);
foreach (HyperlinkRelationship link in sourceMainPart.HyperlinkRelationships)
endDocMainPart.AddHyperlinkRelationship(link.Uri, link.IsExternal, link.Id);
I get the following Error : 'rId6' ID conflicts with the ID of an existing relationship for the specified source.
And the if i have a Image in the content it can't be displayed.
If I zip the document and look at the files in the package I can find the Image but for the same reason as the links the Relation
So my question is: How do I bind the links and Images with their "_rels" references? or how do I copy them so that it works..
This is a Relationship link when I have added the link by hand.
<Relationship Target="media/image1.jpg" Type="" Id="rId11"/>
A picture to show that the link text is copied but have no formatting and that the image can't be displayed.
Thanks to the answer by JasonPlutext i managed to use OpenXML PowerTools (Version 2.2). Keep in mind that the .Net version is 3.5 when importing the project. You Might need to change it. (Supports Open XML 2.5 as well from what I've noticed)
Very simple to create new documents and take parts from old documents.
The code here is in my case where I want the formatting and content from one and then the Header from a template document. The order matters.
Hopefully this will save time for others with the same problem.
public static void AddContentToTemplateCopy(string templateDocumentPath,
string contentDocumentPath,
List<Source> sources,
string outName)
sources = new List<Source>()
new Source(new WmlDocument(contentDocumentPath),false),
new Source(new WmlDocument(templateDocumentPath),true),
DocumentBuilder.BuildDocument(sources, outName);
You might find it easier to try Eric White's document builder.

Sitecore media library paths

Using Sitecore 6.5, when images are rendered on a web page, a URL such as the one below is used
But if you add an image from the library in a content editor a path such as below is used
This makes sense as it's trying to use a meaningful path when rendered for the web.
We would like to use the first media image path to add images in the content editor in HTML view rather than the default second method. This is because we are actually taking some html files and automatically adding them in to Sitecore via a script and we can change the image paths to a location in the media library if the first image format is used by using a convention so the images should appear in the newly created items. We have now idea about a media library image ID.
The first format does appear to work as images are rendered in the content editor design editor and when the page is rendered but Sitecore marks these as broken links in the Content Editor. Are any ideas on whether we are safe to use this format?
You may want to avoid hard coding paths to media in the rich text field. The second "dynamic link" is an important feature of Sitecore in that it keeps a connection between the media and item in the Links database. This safeguards you if you ever delete or move the media.
Since it sounds like you are importing content from an external source and you already have a means of detecting the image paths, I would recommend (if possible) that you upload the images programmatically and insert the dynamic links.
Below is a function that you can call for uploading to the Media Library and getting back the media item:
Example usage:
var file = AddFile("/assets/images/my-image.jpg", "/sitecore/media library/images/example", "my-image");
The code:
private MediaItem AddFile(string relativeUrl, string sitecorePath, string mediaItemName)
var extension = Path.GetExtension(relativeUrl);
var localFilename = #"c:\temp\" + mediaItemName + extension;
using (var client = new WebClient())
client.DownloadFile("" + relativeUrl, localFilename);
// Create the options
var options = new MediaCreatorOptions
FileBased = false,
IncludeExtensionInItemName = false,
KeepExisting = false,
Versioned = false,
Destination = sitecorePath + "/" + mediaItemName,
Database = Factory.GetDatabase("master")
// Now create the file
var creator = new MediaCreator();
var mediaItem = creator.CreateFromFile(localFilename, options);
return mediaItem;
As for generating the dynamic link to the media, I actually haven't found a Sitecore method to do this, so I resorted to the following code:
var extension = !String.IsNullOrEmpty(Settings.Media.RequestExtension)
? Settings.Media.RequestExtension
: ((MediaItem)item).Extension;
var dynamicMediaUrl = String.Format(
No it will not cause any rendering issue apart from the broken links notification as you noted. Also when you select an image in the editor and select to edit the media folder will be at the root rather than at the image itself. But as Derek has noted, the use of dynamic links is an important feature to make sure your links do not break if something is moved or deleted.
I would add to his answer that since you are adding the text via a script you can detect images in the text using HtmlAgilityPack (already used in Sitecore) or FizzlerEx (more similar to jQuery syntax), use the code he provided to upload the images to the media library, grab the GUID and replace the src. Something along the lines of:
string content = "<whatever your html to go in the rich text field>";
HtmlDocument doc = new HtmlDocument();
foreach(HtmlNode img in doc.DocumentElement.SelectNodes("//img[starts-with(#src, '/media/')]")
HtmlAttribute attr = img["src"];
Item scMediaItem = UploadLocalMedia(attr.Value);
attr.Value = GetDynamicMediaUrl(scMediaItem);

How can I add an external image to a word document using OpenXml?

I am trying to use C# and Open XML to insert an image from a url into a doc. The image may change so I don't want to download it, I want it to remain an external reference.
I've found several examples like this one that allow me to add a local image:
How can I adapt that to take a URI? Or is there another approach altogether?
You can add an external image to an word document via a quick parts field.
For a description please see the following answer on superuser.
To realize the described steps programmatically you have to
use an external releationship to include an image from an URL.
Here are the steps to accomplish this:
Create an instance of the Picture class.
Add a Shape to specify the style of the picture (width/height).
Use the ImageData class to specify the ID of the external releationship.
Add an external releationship to the main document part. Give the external
releationship the same ID you specified in step 3.
The following code just implements the steps described above. The image is added to the
first paragraph in the word document.
using (WordprocessingDocument newDoc = WordprocessingDocument.Open(#"c:\temp\external_img.docx", true))
var run = new Run();
var picture = new Picture();
var shape = new Shape() { Id = "_x0000_i1025", Style = "width:453.5pt;height:270.8pt" };
var imageData = new ImageData() { RelationshipId = "rId56" };
var paragraph = newdoc.MainDocumentPart.Document.Body.Elements<Paragraph>().FirstOrDefault();
new System.Uri("<url to your picture>", System.UriKind.Absolute), "rId56");
In the code above I've omitted the code to define the shape type. I advise you to use a
tool like the OpenXML SDK productivity tool
to inspect a word document with an external releationship to an image.

