Accessing Microsoft Access 2003 and 2007 BLOB fields from .Net

Accessing Microsoft Access 2003 and 2007 BLOB fields from .Net - c#

I need to be able to extract BLOBs from both Access 2003 and Access 2007. Access 2003 stores BLOBs as "OLE Objects", and Access 2007 gives you another option, "Attachment". The major difference is that multiple attachments can added to a single row, whereas there can be only one BLOB per "OLE Object" data type.
I have to be able to do this without using interop as I cannot force the dependency of having Office installed. That leaves me with either DAO or ADO. So I have written the code to use both technologies to pull the BLOBs out of a test database with loads of different types of file types embedded.
The problem I have is that it seems Access wraps the embedded files in some type of meta data. The net result is that the file, once extracted, is no longer the same and can not be opened by the associated application because its "corrupted". Access stores things like the original file name and so forth in this meta data. I need to be able to strip that meta data off of the files to have the file in its original state.
Is there some dark voodoo magic which can do this? There is very little by way of documentation on this subject. Any help would be appreciated.
Thanks in advance.

This occours because OLE objects are stored like "images" on Access. This leads to performance issues, also problems like yours. To deal with these limitations, Microsoft introduced Attachment fields on Access 2007/2010, wich doesn't need OLE servers to run content. Attachments, wich should be more than one file per record, are automaticaly managed in backstage by MS-Access. Maybe you should use Attachments or change your database for SQL Server, MY-SQL, Firebird.

Related

How does SharePoint versioning engine store only changes to files and not the whole file?

One of the many things that SharePoint does extremely well is that when you have versioning enabled for files uploaded to a Document Library, every time you save changes to a file it only saves the difference from the previous version of the file to the Content Database but NOT the whole file again.
I am trying to duplicate that same behavior with standard C# code on either a File System folder in Windows or a SQL Database blob field. Does anyone have any idea or pointers on how SharePoint accomplishes this and how it can be done outside of SharePoint?

SharePoint uses a technique called data "shredding" to contain each change to a given file. Unfortunately, I don't think you will find enough technical details to truly reproduce what they are doing, but you might be able to devise a reasonable approximation using your own design.
When shredded, the data associated with a file such as Document.docx is distributed across a set of BLOBs associated with the file. The independent BLOBS are each assigned a unique ID (offset) to enable reconstruction in the correct order when requested by a user.
Each document "shred" is stored in a SQL database table named DocStreams. Each BLOB contains a numerical Id representative of the source BLOB when coalesced. When a client updates a file, only the shredded BLOB that corresponds to the change is updated with the update occurring on the database server as opposed to the Web server.
For more details on Shredding see
http://download.microsoft.com/download/9/6/6/9661DAC2-393D-445A-BDC1-E60743B1231E/Shredded%20Storage%20in%20SharePoint%202013.pdf
https://jeremythake.com/the-truth-behind-shredded-storage-in-sharepoint-2013-a84ec047f28e
https://www.c-sharpcorner.com/UploadFile/91b369/shredded-storage-in-sharepoint-2013/

How to persist data in Excel (VSTO) shared among few users?

I have to load huge amount of data, pre-process it, share it among few users and finally gather updates back from users.
This is what I did in my previous project -
Created an excel add-in using C++. Loaded the data in memory using the add-in code and processed it. For each type of data I have sent the processed data to a sheet and saved a new excel file. That way, if I have three types of data, I have created three new excel workbooks. My users then opened those new workbooks, made their changes and dropped a text file that contains their changes (through a button). The main excel keeps polling for those updates (text files) and loads them as soon as they are found. That's the way I get the updates back from my users.
I am not a fan of what I did in my previous project, it produces too many temporary files (of course I can delete those). In my current project I want to use C# VSTO Workbook so I can have more control over excel. I was hoping once I load the data, I will ask my users to open the same excel in Read-Only mode and they will make changes. While testing this, I realized user's excel (opened in read-only) mode does not see the loaded data. And their changes do not update the data held in memory. This probably means I have no idea what I am doing.
Do you guys have any idea how to achieve this? I will really appreciate any help/hint.

Excel supports so-called "co-authoring" mode, when many people can edit the same document at the same time. But there is might be a catch: afaik, you need a Share Point/Office Online server/OneDrive Business to support this scenario (you need a non-free office document server product).
Using VSTO, you can do just the same you have done with C++ add-in, but in C# (means, the set of capabilities is 1:1 - it basically just wraps C++ COM Excel API for .NET)
But for online version of Excel, there may be yet another alternative - javascript addins (now that's called "Office Addins", afaik). But I doubt you'd want to process your "huge amounts of data" with javascript.
So I would say, there is a good rule: Don't fix something that isn't broken :)
If the problem is the number of temporary files, these files is not the only option to transfer data between applications. You know, you can connect two applications directly (so that they can exchange data with messages/updates). Use network, Luke :)
Of course if your 3 users live on 3 deserted islands, totally disconnected from anything, exchanging with text files on USB stick may still be the only viable option...
I think the "web" solution could be: store your file in some "co-authoring"-capable service (sharepoint, google shees, onedrive, officeonline, whatever). Make some web job to update that file in that storage automatically. Just like a "fourth" user would do.

C# the best way to edit strings MS Word from database?

I will be building a desktop application that should interact with a database, I will need to build an API logically to contact the database remotely and retrieve data from there.
I was given a word file and I will be updating values where the black is the values I'm getting from the database. I will sometimes have to print the file.
I however not sure what's the best way to do this. Do I need to modify the Word file and return it to the default value each time? Should I use reports instead or something else?

I think there are no 'best of the best' practises
You may use DocX from NuGet
You may also get a direct document access using ms office interop word namespace
(afaik)

What is the standard way for dealing with PowerPoint (.PPTX) files on the server?

I've been tasked with a feature that can generate PowerPoint files on the server using C#. I'd basically start with a template, and programmatically replace some text with live data from the database. I've been doing some research into this area for the past day and here's what I've found:
PowerPoint has this sort of thing built in, meaning it can connect to external data sources and pull in data. Most examples of this, I've found, either use PowerPoint automation done on the server (I've been advised against this) or assume a SQL Server backend. Our company uses Oracle for our RDMS needs. Oracle has a solution for this called Oracle BI, but it requires a whole new web server setup to run various Java EE components and what not. I didn't look at the price, but knowing Oracle it's not cheap. It also requires new software to be installed on the end user's machine, which we really want to avoid.
Generating PowerPoint files on the fly is possible. The company that is basically the go-to guys for this problem (every help forum points to them, and they get all the rave reviews) is Aspose. They have .NET components for dealing with just about any Office format you can think of. The problem is, they are astronomically expensive. Just the PowerPoint component (a site license for up to 10 developers) would cost $3,995.
The third possibility is generating a solution in-house. After all, a PPTX file is just xml, right? Well, looking closer, a PPTX appears to be a gzip archive. It contains many folders, each containing many XML files. Modifying a PPTX file would, correct me if I'm wrong, entail unzipping the file to a temporary directory, reading the XML file and modifying the contents, then packaging up everything again and write the file out to the response stream. Perhaps there are libraries that can work with gzip streams on the fly without extracting everything.
My Question: Are there easier ways to work with a PPTX file using .NET that don't require working with compressed XML files or buying very expensive software? Basically, we need to modify a PowerPoint file, change some text, and allow the user to download that generated file from a web server.

OpenXML is Microsoft's .Net library that lets you manipulate Office documents. It lets you open a PPTX file and provides an object model that wraps the XML contents.
Here's the link to the OpenXML SDK and the MSDN documentation.
I've used OpenXML to let a ASP.Net page dynamically generate Word documents from a database.

Don't use Office Interop on a web server. It's an all-around bad idea.
If you are only replacing text placeholders for files that will not change, the home grown solution that finds the placeholders in the xml files in the gzip archive should be doable. .Net has had zip support for some time, and it is greatly improved if you are able to use .Net 4.5, so you shouldn't need to extract the archive to a temporary location at all.
PowerPoint should also support connecting directly to Oracle in the same way it supports connecting to Sql Server (just play around with the connection options), without needing the special Oracle BI stuff. However, I'd still prefer the home-grown solution, as this will only work while the powerpoint file is able to reach your database directly, which is typically only possible in your local LAN environment or with an active VPN.
If you want anything fancier than a simple text replacement, perhaps looks for an Aspose competitor.

Ways of searching an excel file in asp.net

Consider i have an excel file >200000 rows in it. What is the fastest way that can be implemented to search a partcular column value in this file using c# asp.net. Any suggestion.

Assuming 1) you can cache the file contents fine (not too large, file doesn't change, etc) and 2) you don't already have a mechanism for reading the file, I would just read the file once (at application start, or lazy load on demand, or whatever) into memory - I have used and really like the FileHelpers libs from http://www.filehelpers.com/ - see their excel example # http://www.filehelpers.com/example_exceldatalink.html
as part of the 'read in the file', you'd likely also create some indexes for the later queries. If you only cared about the one column, you could just push it all into a HashSet, for instance, so you can do a Contains later quickly.

You cannot access an Excel file from ASP.NET at all if you are using the Excel Automation APIs. These were written for use in a desktop application, not in a server application like ASP.NET. They will not work, are not supported, and may very well violate your license agreement with Microsoft.
There are third-party libraries that can access an Excel file safely from ASP.NET. These do not use the Automation APIs.

You may want to consider using an "OLE DB for Jet 4.0" connection, which you can query via ADO.NET. OLE DB access to Excel is provided via the MDAC component, which comes standard on versions of Windows after 2000. ConnectionStrings.com has OLE DB connection strings for connecting to Excel, as well as information on using Jet in a 64-bit environment.

Use EPPLus and read the file into an DataTable.
Could take some time, that file is a little bit big...

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.