Search Blob storage files content using azure search - c#

I want to do a full text search on HTML files in the blob storage.I have created an azure search service, added data source to the service and created index and indexer through Azure portal.
I tested the Azure search service in the portal using Search explorer.It works fine.
But I wanted to display the search results in Console window using c# code instead of testing on search explorer.
Do I have to write a POJO class for DataSource even if data source for the service is created through Azure Portal
Followig is the code snippet
SearchServiceClient serviceClient = new SearchServiceClient(searchServiceName, new SearchCredentials(searchServiceKey));
ISearchIndexClient indexClient = serviceClient.Indexes.GetClient(indexName);
DocumentSearchResult searchResults = indexClient.Documents.Search(searchText);
I want to convert search results object to readable text and display in Console window. I tried Base64Decode method but no expected result. Please help me through this issue.
Thanks in advance!!!

The Document you receive will be JSON that contains each of the fields of the search document.
Your question is not clear as to whether you want to display the original HTML or the text extracted from the HTML document.
If you only care about the text (without HTML formatting), take a look at the content field. It will have the information that you need. Make sure the content field is retrievable in your search index so you get it as part of the result.
If you want the document with actual HTML formatting, usually that is not part of the result document as that is not indexed. In these cases, usually people add the metadata_storage_path to the index make sure that it is retrievable. Then using that path you can just go and read the original file from blob storage. If you used the metadata_storage_path field as the key of your index, and encoded it using base64, make sure to decode the path.

Related

Getting the attachment contents of an rtf mail in vsto c#

we are trying to get the content of the attachment's of the in the rtf mail but I have tried to search using different terms but have not found any reliable solution . can someone please help me to get the source of the attachment's as we get them in the html format.
The Outlook object model doesn't provide any property or method for getting the attachment content. To get the file attached you need to save it to the disk and then read the content from the there.
Also you may consider using a low-level API on which Outlook is based on - Extended MAPI. It allows getting the binary data of the attached file. Try using the Attachment.PropertyAccessor.GetProperty method while passing in the value "http://schemas.microsoft.com/mapi/proptag/0x37010102" (PR_ATTACH_DATA_BIN).
set msg = Application.ActiveExplorer.Selection(1)
set attach = msg.Attachments(1)
set ps = attach.PropertyAccessor
v = ps.GetProperty("http://schemas.microsoft.com/mapi/proptag/0x37010102")
debug.print ps.BinaryToString(v)
On the low level (Extended MAPI in C++ or Delphi), you need to open the PR_ATTACH_DATA_OBJ property as IStorage and extract the data from there (it depends on the actual type of the attachment). You can see the data in the streams in OutlookSpy (I am its author) - select the message, click IMessage button on the OutlookSpy ribbon, go to the GetAttachmentTable tab, double click on the attachment to open it, select the PR_ATTACH_DATA_OBJ property, right click, select IMAPIProp::OpenProperty, then IStorage. Raw data will be there as well as an image representing the attachment (so that Outlook won't have to start the host app when rendering the message).
If using Redemption is an option (I am also its author, it can be used from any language including C# and VBA), its version of RDOAttachment.SaveAsFile method handles OLE attachments for most popular formats (Word, Excel, bitmap, Power Point, Adobe PDF, etc.) - create an instance of the RDOSession object (using either CrealeOleObject or RedemptionLoader) and use RDOSession.GetRDOObjectFromOutlookObject method (pass either MailItem or Attachment object) to get back RDOMail or RDOAttachment object respectively.
i have been found a solution for getting the images content from the rtf mail directly whiteout hitting the low level api or anything.
the solution is not a straight forward one
save the mail to the disk using oDoc.SaveAs2(filepath, WdSaveFormat.wdFormatFilteredHTML);
after saving the mail you will get the folder which you save a .htm doc
now read the .htm doc
get the all the image nodes of the .htm doc
using the image nods you can get src attribute of the image node
using the src value of the image you get the image directly from the disk itself and you can use that image

InsertInlineImage or ReplaceImage URI

I'm writing a WinForms application. I created a Google Doc template file that contains placeholders like {{name}} for various text elements. I can successfully make a copy of this document and use the BatchUpdateDocumentRequest to modify them just fine.
However, I also have an embedded image in the document. I can obtain the objectId for this image just fine. I either want to replace this image with another or remove it from my template and then append my new image to the end of the document. In both cases, the InsertInlineImage or ReplaceImage classes require a URI of the image to insert or replace with. This is where I have an issue.
The image itself has been captured from a control on the WinForms. Its actually a chart. I've saved the image in PNG format since I know that is one of the formats supported by Google drive/docs. I figured in order to use it in the batch update, I would need to upload it first, so I did and got its file id and webcontentlink back in the response.
I'm not locked into any particular way of doing this. I originally tried creating an HTML file, uploading but then it would strip the image from it, so became useless, so I switched gears to using a Google Doc as my template and just try to replace elements in it instead. This went well until I got to the image.
Essentially no matter what I try to specify as the URI, it says the file in not in a supported format.
As far as I can tell, Google expects the URI to actually end in .png or be a real link versus a download URL you'd get from Google Drive.
Here is an example of the code I'm using to attempt to replace the image. The strImageObjectId is the objectId of the Embedded Object image in the template document copy that I want to replace. The Uri is what Google needs to pull the new image from. I'm happy to pull it from my local computer or Google Drive if only I could get it to accept it somehow.
BatchUpdateDocumentRequest batchUpdateRequest = new BatchUpdateDocumentRequest {
Requests = new List<Google.Apis.Docs.v1.Data.Request>()
};
request = new Google.Apis.Docs.v1.Data.Request {
ReplaceImage = new ReplaceImageRequest() {
ImageObjectId = strImageObjectId,
Uri = strChartWebContentLink
}
};
batchUpdateRequest.Requests.Add(request);
DocumentsResource.BatchUpdateRequest updateRequest =
sDocsService.Documents.BatchUpdate(batchUpdateRequest, strCopyFileId);
BatchUpdateDocumentResponse updateResponse = updateRequest.Execute();
I'm happy to use whatever method will get me to a point where I an end up with a Google Doc on Google Drive that was based on a template in which I can replace various text elements, but most importantly add/replace an image.
Thanks so much for the advice.
I got to the point were I believe I was specifying the URI correctly, but then I started getting an access forbidden error instead.
I didn't have time to hunt this one down, so I went back to creating an HTML template with my image, uploading as a Google Doc, exporting to PDF, and then uploading as a PDF. This ended up working because originally I was using a BMP as the file format and that is not supported by Google Docs, so I changed to a PNG instead and it worked just fine.
I think Google Docs needs to add the ability to add an image using a MemoryStream or some other programmatic base64 resource instead of purely being based on URIs and running into temporary upload or permission issues.
Hey I'm doing the same thing with you,
and I got this, by modify the download link format.
from this:
https://docs.google.com/uc?export=download&id={{YOUR GDRIVE IMAGE
ID}
to this
https://docs.google.com/uc?export=view&id={{YOUR GDRIVE IMAGE ID}
e.g :
uri: "https://docs.google.com/uc?export=view&id=1cjgyHqtYSgS0CBT4x-9eQIHRzOIfGgv-"
but the image should be set for public privilege

Data import via Management API successful, but data for custom dimensions does not show

I am trying to import data for custom dimension in Google Analytics through the .NET client library. In Google Analytics, when I view the uploads for a data set from Admin > Data Import > Manage Uploads, it says my uploads are successful, but the data for the custom dimension doesn't seem to show up in my report. Right now, I am just using my custom dimension to set the category for an article.
Here is how I am uploading through the .Net client library.
string accountId = "***";
string webPropertyId = "***";
string customDataSourceId = "***";
string contentType = "application/octet-stream";
IUploadProgress progress;
using (var dataStream = CreateArticleCsvStream(articles))
{
var fs = File.Create("test.csv");
dataStream.CopyTo(fs);
fs.Close();
progress = service.Management.Uploads.UploadData(accountId, webPropertyId, customDataSourceId, dataStream, contentType).Upload();
}
if (progress.Status == UploadStatus.Failed)
{
throw progress.Exception;
}
Here is the output for test.csv
ga:pagePath,ga:dimension1
/path/to/page/,"MyCategory"
When I download the file from the data set, I get the same file as the test.csv file, it just has a random filename that gets assisgned to it.
I found this other question similar to mine, but there was no solution posted. Any help would be appreciated.
I have also waited over 24 hours, but still nothing.
It took a few days of trial and error but I finally found the solution.
First thing to check is that your Website's URL is correct under Admin > View Settings. We had ours set up like my.domain.com/path/to/site when it should have just been my.domain.com. (We are using SharePoint, which is why path/to/site was appended to the site URL)
Second thing to check is that your key/pagePath entries are all correct. In our case, we had an extra forward slash at the end of the URL. For some reason, Google Analytics displays the trailing forward slash in reports, but does not actually store it for the pagePath.
Another error may be capitalization. It seems like GA applies filters after the data has been processed. If you add the lowercase/uppercase filter, notice that it only affects how the URLs display in your reports. Behind the scenes, it seems that GA still stores the URL with whatever capitalization the hit initially came in with. For example if the URL on your site is my.domain.com/path/to/PAGE.aspx and you apply the lowercase filter, the pagePath will display in your reports as /path/to/page.aspx. But, if you use the lowercase value in your csv import, the data will not join. You must use the pagePath that appears on your site (/path/to/PAGE.aspx in this case).
It would be nice if Google gave some log files when it tries to process and join the uploaded data with the existing data, rather than just saying the upload was successful even though the processing/joining stage may fail.

Retrive the Url from an Html Img Tag

BackGround Info
Currently working on a C# web api that will be returning selected Img url's as base64. I currently have the functionality that would preform the base64 conversion however, I am getting a large amount of text which also include Img Url's which I will need to crop out of the string and give it to my function to convert the img to base 64. I read up on an lib.("HtmlAgilityPack;") that should make this task easy but when I am use it I get "HtmlDocument.cs" not found. However, I am not submitting a document, but sending it a string which is HTML. I read the doc and it is suppose to work with a string as well, but it is not working for me. This is the code using "HtmlAgilityPack".
NON WORKING CODE
foreach(var item in returnList)
{
if (item.Content.Contains("~~/picture~~"))
{
HtmlDocument doc = new HtmlDocument();
doc.Load(item.Content);
Error Message From HtmlAgilityPack
Question
I am receiving a string which is Html from SharePoint. This Html string may be tokenized with heading tokens and/or picture tokens. I am trying to isolate the retrieve the html from the img src Hmtl tag. I understand that regex may be impractical, but I would consider working with a regex expressions is it available to retrieve the url from img src.
Sample String
Bullet~~Increased Cash Flow</li><li>~~/Document Text Bullet~~Tax Efficient Organizational Structures</li><li>~~/Document Text Bullet~~Tax Strategies that Closely Align with Business Strategies</li><li>~~/Document Text Bullet~~Complete Knowledge of State and Local Tax Obligations</li></ul><p>~~/Document Heading 2~~is the firm of choice</p><p>~~/Document Text~~When it comes to accounting and advisory services is the unique firm of choice. As a trusted advisor to our clients, we bring an integrated client service approach with dedicated industry experience. Dixon Hughes Goodman respects the value of every client relationship and provides clients throughout the U.S. with an unwavering commitment to hands-on, personal attention from our partners and senior-level professionals.</p><p>~~/Document Text~~of choice for clients in search of a trusted advisor to deal with their state and local tax needs. Through our leading best practices and experience, our SALT professionals offer quality and ease to the client engagement. We are proud to provide highly comprehensive services.</p>
<p>~~/picture~~<br></p><p>
<img src="/sites/ContentCenter/Graphics/map-al.jpg" alt="map al" style="width:611px;height:262px;" /> 
<br></p><p><br></p><p>
~~/picture~~<br></p><p>
<img src="/sites/ContentCenter/Graphics/Firm_Telescope_Illustration.jpg" alt="Firm_Telescope_Illustration.jpg" style="margin:5px;width:155px;height:155px;" /> </p><p></div><div class="ExternalClassAF0833CB235F437993D7BEE362A1A88A"><br></div><div class="ExternalClassAF0833CB235F437993D7BEE362A1A88A"><br></div><div class="ExternalClassAF0833CB235F437993D7BEE362A1A88A"><br></div>
Important
I am working with an HTML string, not a file.
The issue you are having is that C# is looking for a file and since it is not finding it, it tells you. This is not an error that will brake your app, it is just telling you that the file is not found and the Lib will than read the string given. This documentation can be found here https://htmlagilitypack.codeplex.com/SourceControl/latest#Trunk/HtmlAgilityPackDocumentation.shfbproj. The code below is a cookie cutter model that anyone can use.
Important
C# is looking for a file which can not be displayed, because it a string that is supplied. That is the message that you are getting, however your still will work as well with accordance to the doc provided and will not effect your code.
Exmample Code
HtmlAgilityPack.HtmlDocument htmlDocument = new HtmlAgilityPack.HtmlDocument();
htmlDocument.LoadHtml("YourContent"); // can be a string or can be a path.
HtmlAttribute att = url.Attributes["src"];
Uri imgUrl = new System.Uri("Url"+ att.Value); // build your url
string matchString = Regex.Match(original_text, "<img.+?src=[\"'](.+?)[\"'].+?>", RegexOptions.IgnoreCase).Groups[1].Value;
It has been asked multiple times here.
also here

Displaying pdf files in a web page from a sql database directly without needing to save them to the server file system

I'm currently using an html embed tag to display a pdf file that is saved on the local server. Is there a wayo to display a pdf file on my page without having to save it to the local file system of the server? I just wand to pass it to the view from the controller in such a way that it can be displayed as a pdf in the page without having it stored on the file system directly.
Alternatively, is there a way to call a method to delete the pdf file from the server once the user has navigated away from the page they are viewing? How do I tell if th euser has navicated away from the page and how do i cause that to trigger a method that will delete the file?
I created a MVC class called PdfResult that returns a byte array as a PDF file.
The purpose is as follows (can't upload the source code, sorry):
PdfResult inherits from FileStreamResult
Set the Content-Type header to application/pdf
Set the Content-Disposition to either attachment or inline, and set an appropriate file name
Convert your data to a Stream -- if your data is a byte array, then write it to a MemoryStream.
See https://stackoverflow.com/a/16673120/272072 for a good example of how to do this.
Then, your embed code just needs to point to the action method, as if it was a PDF file.
Here's an example:
public ActionResult ShowPdf() {
// Note: the view should contain a tag like <embed src='MyController/GetPdf'>
return View();
}
public ActionResult GetPdf() {
byte[] pdfBytes = dataRepo.GetPdf(...);
return new PdfResult(pdfBytes, "Filename.pdf", false) ;
}
Here is a link to a CodeProject article and code sample titled Download and Upload Images from SQL Server via ASP.NET MVC. This gives an example of an efficient method to stream content to and from SQL Server via MVC.
You can easily adapt the code to stream your PDF file downloads.
UPDATE
The article uses a DataReader, but it can easily be adapted to Linq2Sql or EF. As an example, here is the Read method where I am reading from the database and copying to the stream:
public override int Read(byte[] buffer, int offset, int count)
{
result = _attachments.ExecuteStoreQuery<byte[]>(
"SELECT SUBSTRING(AttachmentBytes, " + position.ToString() +
", " + count.ToString() + ") FROM Attachments WHERE Id = {0}",
id).First();
var bytesRead = result.Length;
Buffer.BlockCopy(result, 0, buffer, 0, bytesRead);
position += bytesRead;
return (int)bytesRead;
}
You can read the PDF as a bytestream from the database and save it to the http response stream. If you have set the content type correctly to application/pdf, then the browser will load the document in the PDF plugin.
Update (14/Oct/2011): You need to write the bytestream to the Response.OutputStream object. How you create and write the byte stream is dependent on how you have stored in the database and how you are retrieving it. The following code snippet is from an article we have on our website - Generate PDF Forms In ASP.NET Using PDFOne .NET v3.
// Get the page's output stream ready
Response.Clear();
Response.BufferOutput = true;
// Make the browser display the forms document
// using a PDF plug-in. (If no plug in is available,
// the browser will show the File -> Save As dialog box.
Response.ContentType = "application/pdf";
// Write the forms document to the browser
doc.Save(Response.OutputStream);
doc.Close();
doc.Dispose();
The doc object is from our component. You need not use that. This code snippet is only for your understanding. For your requirement, you may have to something like bytestream.save(Response.OutputStream) I guess. BTW, this code is for ordinary ASP.NET, not MVC.
DISCLAIMER: I work for Gnostice.
If you want to create the PDF 100% dynamically, you would generate it completely in memory then stream it out directly to the requesting web browser without saving it as a file. This is very easy to do with the right tools. I would recommend AspPDF from Persits.com as a way to do this very easily. Take a look at their online documentation to see how simple this is to do without creating a bunch of rendered PDF files all over your server.
If you cannot do something like that, then simply incorporate a process to cleanup your "expired" PDF files from your server's filesystem based on their age. For example, after you have created your local PDF file, you just look through the folder containing your temporary PDF's and delete any you find over a certain age. You cannot reliably tell if or when a user has navigated away from your page or site.
For the first part of your question, like mentioned in the comments, use some type of stream object to pass the PDF data around. Right now, you are streaming the file to the local file system, then streaming it once again to the embedded tag for display. Just do away with the intermediate step of saving to the file system, and do the whole thing in memory (although, that's not really a model of efficiency, and might not scale well).
Regarding the second part of your question, that's not as straightforward. MVC really has no concept of state (viewstate, etc.), so it doesn't have events that can be fired from a state change (say, navigating away from a page).
You could use Javascript to detect a user navigating away from your page (windows.onunload), that calls a (C#/VB) method to remove the file from the file system. You would probably have to use AJAX to communicate back to the server, using an HTTP POST method, and have something listening at that URL endpoint to fire your method that removes the file.

Categories

Resources