Extracting CDATA from RSS using BASH

Using a general command structure to extract data from the publish RSS by the ndbc.

wget https://www.ndbc.noaa.gov/data/latest_obs/<station_ID>.rss
html2text <station_ID>.rss > <station_ID>.txt

Gives a text file of the following structure:

<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="/rss/
ndbcrss.xsl"?>
CDATA[This feed shows recent marine weather observations from Station EBEF1.]]>
https://www.ndbc.noaa.gov/ Sat, 02 Jul 2022 09:30:39 +0000 Sat, 02 Jul 2022 09:
30:39 +0000 30 en-us webmaster.ndbc@noaa.gov (NDBC Webmaster)
webmaster.ndbc@noaa.gov (NDBC Webmaster)  https://www.ndbc.noaa.gov/images/
noaa_nws_xml_logo.gif
https://www.ndbc.noaa.gov/    Sat, 02 Jul 2022 09:30:39 +0000
CDATA[ July 2, 2022 4:54 am EDT
Location: 27.923N 82.421W
Atmospheric Pressure: 30.01 in (1016.4 mb)
Air Temperature: 77.4°F (25.2°C)
Water Temperature: 90.0°F (32.2°C)
]]>
https://www.ndbc.noaa.gov/station_page.php?station=ebef1 NDBC-EBEF1-
20220702085400 27.923 -82.421

The task is now to truncate all the information contained in the second CDATA portion.

html2text ebef1.rss | grep -A 100 "CDATA\[\ " | grep -B 100 "]]>"


CDATA[ July 2, 2022 4:54 am EDT
Location: 27.923N 82.421W
Atmospheric Pressure: 30.01 in (1016.4 mb)
Air Temperature: 77.4°F (25.2°C)
Water Temperature: 90.0°F (32.2°C)
]]>

Now I need to get rid of the delimiters.

html2text ebef1.rss | grep -A 100 "CDATA\[\ " | grep -B 100 "]]>" | sed "s/CDATA\[\ //g" | sed "s/]]>//g"

July 2, 2022 4:54 am EDT
Location: 27.923N 82.421W
Atmospheric Pressure: 30.01 in (1016.4 mb)
Air Temperature: 77.4°F (25.2°C)
Water Temperature: 90.0°F (32.2°C)
%d bloggers like this: