Category Archives: Document Management

PowerShell Tip: Extract Images from DOCX File

Summary

It’s an immense pleasure to explore $objects! We do IT Automation tasks using C# or PowerShell. It depends on the requirement and business needs. Recently our team was engaged to extract images from the DOCX files. It’s possible by simply renaming it to ZIP and extract to get the images folder which is not optimal solution at our client environment. So, we used OPEN XML SDK 2.5 (Supported for Office 2016) to accomplish the task.

Requirement

Extract all the images from the given DOCX File. Example Get-Image -FilePath “C:\Temp\Document.Docx”

Solution

  • OPEN XML SDK 2.5
  • Windows PowerShell
  • Visual Studio 2015 (C# Class Library – Binary Module)
  • PowerShell Code

    #using namespace DocumentFormat.OpenXml;            
    [void][System.Reflection.Assembly]::LoadWithPartialName('DocumentFormat.OpenXML')            
    Function Get-Image            
    {            
        Param            
        (            
            [Parameter(Mandatory,            
                       ValueFromPipeline=$true,            
                       ValueFromPipelineByPropertyName=$true,            
                       Position=0,            
                       HelpMessage="Please Enter the Full Path")]            
            [string]            
            $FullName            
        )            
        Process            
        {            
            $Document = [DocumentFormat.OpenXml.Packaging.WordprocessingDocument]::Open($FullName,$false)            
            foreach($Image in $Document.MainDocumentPart.ImageParts)            
            {            
                 [uri]$Uri = $Image.Uri            
                 $ImageFileName = $Uri.ToString().Split("/").Where({$_},'Last',1)            
                 [System.IO.Stream]$Stream = $Document.Package.GetPart($Uri).GetStream()            
                 [System.Drawing.Bitmap]$BitMap = [System.Drawing.Bitmap]::new($Stream)            
                 $BitMap.Save("C:\Temp\Images\" + $ImageFileName)            
            }            
        }            
                   
    }            
                
    Get-Item 'C:\Temp\Demo.Docx' | Get-Image

    Cool, here is the C# Binary cmdlet code to do the same

    [Cmdlet(VerbsCommon.Get,"Image")]
    public class GetImage : Cmdlet
    {
        [Parameter(Mandatory = true,
                   Position = 0)]
        public string FilePath;
        protected override void ProcessRecord()
        {
            //base.ProcessRecord();
            FileStream filestream = System.IO.File.OpenRead(FilePath);
            FileInfo fileinfo = new FileInfo(FilePath);
            WordprocessingDocument document = WordprocessingDocument.Open(filestream, false);
            var images = document.MainDocumentPart.ImageParts;
            foreach(ImagePart image in images)
            {
                Uri uri = image.Uri;
                string mainimage = uri.ToString().Split('/').Last();
                Stream imgstream = document.Package.GetPart(uri).GetStream();
                Bitmap bitmapImg = new Bitmap(imgstream);
                bitmapImg.Save(@"C:\Temp\" + mainimage);
            }
        }
    }

    Conclusion

    We prefer to pick a tool which costs less to deliver the solution. So, at my client place we opted for PowerShell to achieve few automation task with respect to Data Manipulation. To loop through multiple files just used the below snippet:

    Get-ChildItem 'C:\Temp\ProjectFolder\' | Get-Image

    Enjoy PowerShell!

    Note: You can modify the filename of the image as required. In our case it needs to be the same because its customized.