Welcome guest. Before posting on our computer help forum, you must register. Click here it's easy and free.

Author Topic: Help with awk and sed  (Read 4909 times)

0 Members and 1 Guest are viewing this topic.

Renfield

    Topic Starter


    Greenhorn

    • Experience: Beginner
    • OS: Mac OS
    Help with awk and sed
    « on: January 20, 2012, 12:51:29 PM »
    Beginner here.  Trying to get certain lines from an xml file to print with awk and/or sed and I need help.

    I have an xml file like this:

    Code: [Select]
        <item id="26141427">
          <properties>
            <name>233D_camB_take02.mov</name>
            <path>/Dailies Released/VT096_DAY41_2011_10_27</path>
            <description>HI  CU AARON PREPPING CAMERA</description>
            <status></status>
            <approved />
            <created_by id="20184437">
              <name>Movie</name>
            </created_by>
            <created_timestamp>2011-10-28T21:04:51Z</created_timestamp>
            <modified_by id="17929743">
              <name>Some dude</name>
            </modified_by>
            <modified_timestamp>2011-10-31T14:59:54Z</modified_timestamp>
            <width>1280</width>
            <height>720</height>
            <timebase>24</timebase>
            <mime_type>video/quicktime</mime_type>
          </properties>
          <attributes>
            <attribute key="Camera">B</attribute>
            <attribute key="Description">HI  CU AARON PREPPING CAMERA</attribute>
            <attribute key="End">16:40:32:00</attribute>
            <attribute key="Name">233D-2B</attribute>
            <attribute key="Notes"></attribute>
            <attribute key="Scene">233D</attribute>
            <attribute key="Shoot_Date">10/27/2011</attribute>
            <attribute key="Shoot_Day">41</attribute>
            <attribute key="Start">16:37:52:00</attribute>
            <attribute key="Take">2</attribute>
            <attribute key="Tape">VT096</attribute>
          </attributes>
          <tags />
          <notes />
        </item>

    What I need is to print the lines:

    Code: [Select]
        <item id="26141427">
            <name>233D_camB_take02.mov</name>
            <attribute key="Name">233D-2B</attribute>
    In the end I need this in a document:

    Code: [Select]
    item id="26141427"
    233D_camB_take02.mov
    233D-2B
    Followed by a blank line and then the next item.  There are multiple items in the document.

    Some things to note, there may be multiple <name> </name> tags but I only need the ones with the string ".mov" present.  That string will always be present in every item but will only be present once in every item.

    However, as can be seen in the example above, there may or may not be other lines like <name>movie</name> and <name>Some dude</name>.  These need to be ignored.  So while the other entries I'm looking for can be found by searching for their tags, it is probably better to find that entry by looking for the ".mov" string.

    Also, there may or may not be a <attribute="name">some value</attribute> entry.  If it is there, I need it.  All other <attribute="something"> tags need to be ignored.

    Lastly, because each item may or may not have certain entries, this cannot be done by a line number algorithm but needs to be done by search for patterns.

    So, in summary:

    <item id="123456"> - Will always be present and will only be present once per item.  I need the output: item id=123456
    <name>something.mov</name> - Will always be present but only once with the ".mov" string.  May or may not be present with other strings.  I need the output: something.mov.  Other instances should be ignored.
    <attribute="name">something</attribute> May or may not be present.  If it is there, I need the output: something

    What I have so far is this:

    Code: [Select]
    sed -n '/<item id="/,/>/p' marcherdailiescopy.xml |
    awk '{sub("<properties>",""); print}' |
    awk '{sub("<",""); print}' |
    awk '{sub(">",""); print}'

    My first problem is that the sed command returns the item id but also returns the tags ans the next <properties> tag followed by the next item like this:

    Code: [Select]
        <item id="27385774">
          <properties>

    So I'm using awk to strip out the extra strings and characters there, but I know there is a more efficient way to do this.  I also don't know how to get awk or sed to grab the strings I need in order so it places them together.  I can get:

    item id
    item id
    item id
    ...
    ...

    value.mov
    value.mov
    value.mov
    ...
    ...
    ...

    But I need:

    item id
    value.mov
    name (if it is there)

    item id
    value.mov
    name (if it is there)

    ...
    ...

    I also don't know whether it would be more efficient to delete everything other than what I need or grab only what I need.  Any help would be Kool and the Gang!  :D

    Thanks,

    Dan
    « Last Edit: January 20, 2012, 01:09:16 PM by Renfield »

    Renfield

      Topic Starter


      Greenhorn

      • Experience: Beginner
      • OS: Mac OS
      Re: Help with awk and sed
      « Reply #1 on: January 20, 2012, 01:27:33 PM »
      I should also add that each item will follow the format:
      Code: [Select]
      <item id="123456">
      a bunch of tags and values
      </item>

      vpalukuru



        Newbie

        • informatica, Oracle, Unix
      • Experience: Beginner
      • OS: Unknown
      Re: Help with awk and sed
      « Reply #2 on: January 23, 2012, 02:08:22 AM »
      Try something like this

      Code: [Select]
      grep '.mov|item_id|attribute key="Name"' xmlfile
      You will get the required lines.

      Then use sed to extract what ever the data you need.

      Renfield

        Topic Starter


        Greenhorn

        • Experience: Beginner
        • OS: Mac OS
        Re: Help with awk and sed
        « Reply #3 on: January 23, 2012, 11:38:28 AM »
        That worked!  Thanks so much.