No announcement yet.

using sed to parse html?

  • Filter
  • Time
  • Show
Clear All
new posts

  • using sed to parse html?

    I have got the biggest headache!

    I hope I am posting this to the right forum... I am trying to parse an html file using sed in a bash script. My problem is that I need to strip everything up to a particular tag, but the "everything" in this case includes tabs, spaces, newlines and linebreaks.

    The html that I am working on is:
        <div id="filters">
            <input type="hidden" name="filter-search-previous" value="">
            <input type="text" class="form-control" name="filter-search" placeholder="Search..." value="">
            <select class="form-control" name="filter-sort">
                <option >Newest products first</option>
                <option >Sort by expiry ascending</option>
                <option >Sort by expiry descending</option>
                <option >Sort by price (lowest first)</option>
                <option >Sort by price (highest first)</option>
            <button type="button" class="btn btn-warning" style="top: -1px; position: relative; margin-left: 10px;" onclick="refresh_products(0)">Refresh / Search</button>
        <div id="box-container-inner" style="position: relative">
            <div class="box" id="product_1208750">
            <div class="img-container">
    Now, what I want to do is to strip out everything up to, and including [ <div id="box-container-inner" style="position: relative"> ]. I have tried the following commands:

    sed -i 's/.*<div id="box-container-inner" style="position: relative">//' output.txt
    sed -i 's/[\s\S]*<div id="box-container-inner" style="position: relative">//' output.txt
    The first one just deletes that one line, and the second one doesn't do anything. Can someone help me with this? Either that or send me a bottle of aspirin! Thanks