View Full Version : Parsing Power Point Files
daenku32
10-21-2005, 02:28 PM
We have hundreds of power point documents, many of which's particular slides (pages) are only applicable under certain conditions. And these documents are saved in HTML format for viewing with IE. I am now looking into parsing the power point files to remove specific slides based on conditions.
So I'm right now looking at couple of options:
1. Insert text field into each slide containing the condition and having the web programmers do Regular Expression search through each slide, looking for the condition field.
2. Convert to a new standard in which I could define a condition field at a standard location of slide/page. However the new standard would need an easy to use GUI (for non-programmers) for creating content for the slides/pages.
Not sure which would be better on the long run. Any suggestions?
darksidepuffin
10-21-2005, 04:28 PM
Option one may be easiest if you -need- to use powerpoint -- I'm not familiar with powerpoint's output html...but I imagine thats the easiest you'll get..although it's still going to be quite tedious.
What language?
daenku32
10-21-2005, 05:12 PM
I believe they (Regional IT department, they control all the programming aspects of the site) uses one of the VB related languages (.NET, Script etc).
I would be open for non-PowerPoint standard, but like I said, the publishing must be simple without any programming involved.
darksidepuffin
10-21-2005, 05:20 PM
You could use a standard issue input based script that used form fields much like posts for this forum use...which may be easier or more difficult depending on the nature of the content of the powerpoints. These can often be made easier to maintain by people who prefer a gui by using a client side scripting WYSIWYG solution. Though quite often even the most development-shy person finds a standard form input easy to use.
If you could give us some idea of the nature of the slides content(eg: is it mainly text?mainly imagery?does it require combinations of both?does it require certain features that may be difficult to emulate outside of powerpoint?) it may help us give you some ideas.
daenku32
10-22-2005, 11:53 AM
You could use a standard issue input based script that used form fields much like posts for this forum use...which may be easier or more difficult depending on the nature of the content of the powerpoints. These can often be made easier to maintain by people who prefer a gui by using a client side scripting WYSIWYG solution. Though quite often even the most development-shy person finds a standard form input easy to use.
If you could give us some idea of the nature of the slides content(eg: is it mainly text?mainly imagery?does it require combinations of both?does it require certain features that may be difficult to emulate outside of powerpoint?) it may help us give you some ideas.
The actual content is techinical assembly instructions; pictures, arrows, text, etc.. With each of those objects not having a strict location on the sheet. Picture sizes and dimensions vary quite a bit. As do the text locations. All of this 'freestyle' is then surrounded by some predefined ares (name, date, etc). We use the 'freestyle' so we can get the most usage (minimal white space) out of a single page.
But even the predefined areas are filled by using PowerPoint alone. We don't second touch the source files after they are saved by PP.
I suppose parsing PP files or adding an XML element to the slide during it's creation would be more of a question to MS, I just hate to try to find help from them.
darksidepuffin
10-22-2005, 12:07 PM
In that case..you may be better off exporting the PP files as your doing. As far as parsing them...I have no familiarity with what exported powerpoints look like(I have had no use for the program outside of powerpoint projects in school) -- so if you could make up say a 2 slide presentation and export/post it here so I can see the output html...it may help give you some ideas as to the parsing.
daenku32
10-22-2005, 12:49 PM
Even a two page presentation seemed to give a few kilobytes of code. Including separate XML files...
But here is contents of one file that contains the actual information placed onto that particular slide. The only text I entered on that slide are
"Build ITER complex",
"Operator note: " and
"Task2."
And a picture: ITER_site_2002.jpg
All the rest of it was automatically created by PP when it saved it in HTML format.
So if I wrote a program to look for an indentifier, it would be like searching for one of the text fields I entered.
<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:p="urn:schemas-microsoft-com:office:powerpoint"
xmlns:oa="urn:schemas-microsoft-com:office:activation"
xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv=Content-Type content="text/html; charset=windows-1252">
<meta name=ProgId content=PowerPoint.Slide>
<meta name=Generator content="Microsoft PowerPoint 11">
<link id=Main-File rel=Main-File href="../Example.htm">
<link rel=Preview href=preview.wmf>
<!--[if !mso]>
<style>
v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
p\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
v\:textbox {display:none;}
</style>
<![endif]-->
<title>BuildingITER</title>
<meta name=Description content="10/22/2005: Build ITER complex">
<link rel=Stylesheet href="master03_stylesheet.css">
<![if !ppt]>
<style media=print>
<!--.sld
{left:0px !important;
width:6.0in !important;
height:4.5in !important;
font-size:107% !important;}
-->
</style>
<script src=script.js></script><script><!--
if( !IsNts() ) Redirect( "PPTSld" );
//--></script><!--[if vml]><script>g_vml = 1;
</script><![endif]--><![endif]><o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="3"/>
</o:shapelayout>
</head>
<body lang=EN-US style='margin:0px;background-color:black'
onclick="DocumentOnClick()" onresize="_RSW()" onload="LoadSld()"
onkeypress="_KPH()">
<div id=SlideObj class=sld style='position:absolute;top:0px;left:0px;
width:534px;height:400px;font-size:16px;background-color:white;clip:rect(0%, 101%, 101%, 0%);
visibility:hidden'><p:slide coordsize="720,540"
colors="#ffffff,#000000,#808080,#000000,#bbe0e3,#333399,#009999,#99cc00"
masterhref="master03.xml">
<p:shaperange href="master03.xml#_x0000_s1025"/><![if !ppt]><p:shaperange
href="master03.xml#_x0000_s1028"/><p:shaperange
href="master03.xml#_x0000_s1029"/><![endif]><v:shapetype id="_x0000_t75"
coordsize="21600,21600" o:spt="75" o:preferrelative="t" path="m@4@5l@4@11@9@11@9@5xe"
filled="f" stroked="f">
<v:stroke joinstyle="miter"/>
<v:formulas>
<v:f eqn="if lineDrawn pixelLineWidth 0"/>
<v:f eqn="sum @0 1 0"/>
<v:f eqn="sum 0 0 @1"/>
<v:f eqn="prod @2 1 2"/>
<v:f eqn="prod @3 21600 pixelWidth"/>
<v:f eqn="prod @3 21600 pixelHeight"/>
<v:f eqn="sum @0 0 1"/>
<v:f eqn="prod @6 1 2"/>
<v:f eqn="prod @7 21600 pixelWidth"/>
<v:f eqn="sum @8 21600 0"/>
<v:f eqn="prod @7 21600 pixelHeight"/>
<v:f eqn="sum @10 21600 0"/>
</v:formulas>
<v:path o:extrusionok="f" gradientshapeok="t" o:connecttype="rect"/>
<o:lock v:ext="edit" aspectratio="t"/>
</v:shapetype><v:shape id="_x0000_s3082" type="#_x0000_t75" style='position:absolute;
left:30pt;top:186pt;width:456pt;height:298.375pt'>
<v:imagedata src="slide0002_image003.jpg" o:title="ITER_site_2002"/>
</v:shape><![if !vml]><img border=0 v:shapes="_x0000_s3082"
src="slide0002_image004.jpg" style='position:absolute;top:34.5%;left:4.11%;
width:63.29%;height:55.25%'><![endif]><v:shapetype id="_x0000_t202"
coordsize="21600,21600" o:spt="202" path="m,l,21600r21600,l21600,xe">
<v:stroke joinstyle="miter"/>
<v:path gradientshapeok="t" o:connecttype="rect"/>
</v:shapetype><v:shape id="_x0000_s3074" type="#_x0000_t202" style='position:absolute;
left:7in;top:162pt;width:192pt;height:28.875pt;mso-wrap-style:square;
v-text-anchor:top' filled="f" fillcolor="#bbe0e3 [4]" stroked="f"
strokecolor="black [1]">
<v:fill color2="white [0]"/>
<v:shadow color="gray [2]"/>
<v:textbox style='mso-fit-shape-to-text:t'/>
</v:shape><v:shape id="_x0000_s3075" type="#_x0000_t202" style='position:absolute;
left:546pt;top:324pt;width:96pt;height:28.875pt;mso-wrap-style:square;
v-text-anchor:top' filled="f" fillcolor="#bbe0e3 [4]" stroked="f"
strokecolor="black [1]">
<v:fill color2="white [0]"/>
<v:shadow color="gray [2]"/>
<v:textbox style='mso-fit-shape-to-text:t'/>
</v:shape><p:shaperange href="master03.xml#_x0000_m1026"/><v:shape id="_x0000_s3079"
type="#_x0000_m1026" style='position:absolute;left:36pt;top:21.625pt;width:9in;
height:90pt'>
<v:fill o:detectmouseclick="f"/>
<v:stroke o:forcedash="f"/>
<o:lock v:ext="edit" text="f"/>
<p:placeholder type="title" position="-1"/></v:shape>
<div v:shape="_x0000_s3074" class=O style='mso-line-spacing:"100 50 0";
position:absolute;top:31.0%;left:70.97%;width:24.9%;height:4.0%'>Operator
note:</div>
<div v:shape="_x0000_s3075" class=O style='mso-line-spacing:"100 50 0";
position:absolute;top:61.0%;left:76.77%;width:11.42%;height:4.0%'>Task2.</div>
<div v:shape="_x0000_s3079" class=T style='position:absolute;top:8.0%;
left:5.99%;width:88.2%;height:9.25%'>Build ITER complex</div>
</p:slide></div>
</body>
</html>
vBulletin® v3.6.7, Copyright ©2000-2010, Jelsoft Enterprises Ltd.