Using Sitecore 7.5, I am trying to store several html files inside of the Media Library. Then in my sublayout codebehind I am attempting to grab the inner content of those html files.
I had this working when I was storing the html file on the server. I would upload the file into the Media Library using 'upload as file', and then use the following code to read the content:
string filename = htmlMediaItem.Fields["File Path"].ToString();
string path = Server.MapPath(filename);
string content = System.IO.File.ReadAllText(path);
However I now would like to do this without storing the files on the server and instead only have them inside the media library. Is there anyway I can do this?
So far I have had a hard time trying to find information on the subject.
Thank you.
From what I understand you want to read content of an html file stored in Media Library.
Sitecore.Data.Items.Item sampleItem = Sitecore.Context.Database.GetItem("/sitecore/media library/Files/yourhtmlfile");
Sitecore.Data.Items.Item sampleMedia = new Sitecore.Data.Items.MediaItem(sampleItem);
using(var reader = new StreamReader(MediaManager.GetMedia(sampleMedia).GetStream().Stream))
{
string text = reader.ReadToEnd();
}
Related
I want to find whether a text is present in the uploaded PDF file in ASP.NET c#.
using (MemoryStream str = new MemoryStream(this.docUploadField.FileBytes))
{
using (StreamReader sr = new StreamReader(str, Encoding.UTF8))
{
string line = sr.ReadToEnd();
}
}
I am getting the below as the file content when I read the contents of file.
Please help me with this
You surely need some PDF reading library.
Most famous being
IText (ITextSharp for who remembers it): https://github.com/itext/itext7-dotnet
PdfSharp: https://github.com/empira/PDFsharp
and many other free options.
With those you open pdf file and read it and take the text you need.
Usually they give you a collection of the PDF elements (paragraphs, images, etc etc, and you loop through them or use a search function to look for what you need)
I have a component library that uses JS code to generate an image as a base64 string and the image needs to be transposed to C#. The image size is larger than MaximumReceiveMessageSize.
Can I get the value of the MaximumReceiveMessageSize property in C#? I need a way to correctly split the picture into chunks, or some other way to transfer it.
My component can be used in a Wasm or Server application. I can't change the value of the MaximumReceiveMessageSize property.
Thanks
Using a stream as described in the Stream from JavaScript to .NET solved my problem.
From Microsoft docs:
In JavaScript:
function streamToDotNet() {
return new Uint8Array(10000000);
}
In C# code:
var dataReference = await JS.InvokeAsync<IJSStreamReference>("streamToDotNet");
using var dataReferenceStream = await dataReference.OpenReadStreamAsync(maxAllowedSize: 10_000_000);
var outputPath = Path.Combine(Path.GetTempPath(), "file.txt");
using var outputFileStream = File.OpenWrite(outputPath);
await dataReferenceStream.CopyToAsync(outputFileStream);
In the preceding example: JS is an injected IJSRuntime instance. The dataReferenceStream is written to disk (file.txt) at the current user's temporary folder path (GetTempPath).
I am trying to download the contents of this webpage into my program. I have tried using WeblClient.DownloadString, WebClient.DownloadFile, then save it to a file and read it from a local file, but none of this is working. When I use breakpoints in Visual Studio, I see the string is correctly saved, but when I try to print it to a file, or print it to the console, nothing is displayed.
What I am aiming to do is download this webpage's content into a String then parse it with JSON.NET.
Here is my attempt to save it to a file:
WebClient webpage = new WebClient();
var html = webpage.DownloadString("https://api.fixer.io/latest");
String k = html;
File.WriteAllText(#"C:\Users\JCena\Desktop\Hell1o.txt", k);
You code is almost fine.
first, get rid of the "new" keyword.
second, make sure you don't have exception for permissions for the folder specified.
try that code:
WebClient webpage = new WebClient();
var html = webpage.DownloadString("https://api.fixer.io/");
String k = html;
File.WriteAllText(#"Hello.txt", k);
We need to export the entire page of MVC Application to PDF for that purpose need to get all the HTML contents (i.e. including dynamic content too)
To get the contents of page we used following code
string contents = File.ReadAllText(path);
but it will give only static content of page(i.e. it gives page source code) not new nodes added in DOM.
Then tried following code but this also gives static content
// WebClient object
WebClient client = new WebClient();
// Retrieve resource as a stream
Stream data = client.OpenRead(new Uri("xxxx.html"));
// Retrieve the text
StreamReader reader = new StreamReader(data);
string htmlContent = reader.ReadToEnd();
So i want to get enitre outerHTML of document in C# with out using any third party DLL . i googled so many links and everyone updated like use webbrowser control and get the content.
i don't how this will be useful for our application. Our Application is MVC4. we need to export the enitre page to PDF so we need enitre content OF HTML (dynamic content too)
How can i use this below code in ourt MVC Application to get document outerHTML
mshtml.HTMLDocument doc = webBrowser1.Document.DomDocument as mshtml.HTMLDocument;
string html = doc.documentElement.outerHTML;
or
var documentAsIHtmlDocument3 = (mshtml.IHTMLDocument3)webBrowser.Document.DomDocument;
StringReader sr = new StringReader(documentAsIHtmlDocument3.documentElement.outerHTML);
htmlDoc.Load(sr)
Any help on this.
You haven't mentioned what the PDF is intended for. Most likely it is for the visitor of the page to download. If that is true, maybe you could use jsPDF. That way you get around the problem with not having access to the entire page serverside.
Using Sitecore 6.5, when images are rendered on a web page, a URL such as the one below is used
~/media/OSS/Images/MyImage
But if you add an image from the library in a content editor a path such as below is used
~/media/1CFDDC34C94E460FAA2B1518DCA22360.PNG
This makes sense as it's trying to use a meaningful path when rendered for the web.
We would like to use the first media image path to add images in the content editor in HTML view rather than the default second method. This is because we are actually taking some html files and automatically adding them in to Sitecore via a script and we can change the image paths to a location in the media library if the first image format is used by using a convention so the images should appear in the newly created items. We have now idea about a media library image ID.
The first format does appear to work as images are rendered in the content editor design editor and when the page is rendered but Sitecore marks these as broken links in the Content Editor. Are any ideas on whether we are safe to use this format?
You may want to avoid hard coding paths to media in the rich text field. The second "dynamic link" is an important feature of Sitecore in that it keeps a connection between the media and item in the Links database. This safeguards you if you ever delete or move the media.
Since it sounds like you are importing content from an external source and you already have a means of detecting the image paths, I would recommend (if possible) that you upload the images programmatically and insert the dynamic links.
Below is a function that you can call for uploading to the Media Library and getting back the media item:
Example usage:
var file = AddFile("/assets/images/my-image.jpg", "/sitecore/media library/images/example", "my-image");
The code:
private MediaItem AddFile(string relativeUrl, string sitecorePath, string mediaItemName)
{
var extension = Path.GetExtension(relativeUrl);
var localFilename = #"c:\temp\" + mediaItemName + extension;
using (var client = new WebClient())
{
client.DownloadFile("http://yourdomain.com" + relativeUrl, localFilename);
}
// Create the options
var options = new MediaCreatorOptions
{
FileBased = false,
IncludeExtensionInItemName = false,
KeepExisting = false,
Versioned = false,
Destination = sitecorePath + "/" + mediaItemName,
Database = Factory.GetDatabase("master")
};
// Now create the file
var creator = new MediaCreator();
var mediaItem = creator.CreateFromFile(localFilename, options);
return mediaItem;
}
As for generating the dynamic link to the media, I actually haven't found a Sitecore method to do this, so I resorted to the following code:
var extension = !String.IsNullOrEmpty(Settings.Media.RequestExtension)
? Settings.Media.RequestExtension
: ((MediaItem)item).Extension;
var dynamicMediaUrl = String.Format(
"{0}{1}.{2}",
MediaManager.MediaLinkPrefix,
item.ID.ToShortID(),
extension);
No it will not cause any rendering issue apart from the broken links notification as you noted. Also when you select an image in the editor and select to edit the media folder will be at the root rather than at the image itself. But as Derek has noted, the use of dynamic links is an important feature to make sure your links do not break if something is moved or deleted.
I would add to his answer that since you are adding the text via a script you can detect images in the text using HtmlAgilityPack (already used in Sitecore) or FizzlerEx (more similar to jQuery syntax), use the code he provided to upload the images to the media library, grab the GUID and replace the src. Something along the lines of:
string content = "<whatever your html to go in the rich text field>";
HtmlDocument doc = new HtmlDocument();
doc.Load(content);
foreach(HtmlNode img in doc.DocumentElement.SelectNodes("//img[starts-with(#src, '/media/')]")
{
HtmlAttribute attr = img["src"];
Item scMediaItem = UploadLocalMedia(attr.Value);
attr.Value = GetDynamicMediaUrl(scMediaItem);
}