if one wants to keep the start and end lines/keywords while extracting the lines between 2 strings.
Please find below the code snippet that I used to extract sql statements from a shell script
def process_lines[in_filename, out_filename, start_kw, end_kw]:
try:
inp = open[in_filename, 'r', encoding='utf-8', errors='ignore']
out = open[out_filename, 'w+', encoding='utf-8', errors='ignore']
except FileNotFoundError as err:
print[f"File {in_filename} not found", err]
raise
except OSError as err:
print[f"OS error occurred trying to open {in_filename}", err]
raise
except Exception as err:
print[f"Unexpected error opening {in_filename} is", repr[err]]
raise
else:
with inp, out:
copy = False
for line in inp:
# first IF block to handle if the start and end on same line
if line.lstrip[].lower[].startswith[start_kw] and line.rstrip[].endswith[end_kw]:
copy = True
if copy: # keep the starts with keyword
out.write[line]
copy = False
continue
elif line.lstrip[].lower[].startswith[start_kw]:
copy = True
if copy: # keep the starts with keyword
out.write[line]
continue
elif line.rstrip[].endswith[end_kw]:
if copy: # keep the ends with keyword
out.write[line]
copy = False
continue
elif copy:
# write
out.write[line]
if __name__ == '__main__':
infile = "/Users/testuser/Downloads/testdir/BTEQ_TEST.sh"
outfile = f"{infile}.sql"
statement_start_list = ['database', 'create', 'insert', 'delete', 'update', 'merge', 'delete']
statement_end = ";"
process_lines[infile, outfile, tuple[statement_start_list], statement_end]
- file
- extraction
A common approach to this is using a state machine that reads the text until the marker is encountered, then starts a “recording mode”, and extracts the text until the
marker is encountered. This process can repeat if multiple sections may appear in the file and have to be extracted.
inRecordingMode = False for line in file: if not inRecordingMode: if line.startswith['']: inRecordingMode = True elif line.startswith['']: inRecordingMode = False else: yield line
For simple cases, this could also be solved with a regular expression.
- Extract Values between two strings in a text file using python
- printing lines between start and end point
- python - Read file from and to specific lines of text
I have a log file which is output by a script, the log file is rotated daily. It will contain the strings
Transfer started at timestamp
and
Transfer completed successfully at timestamp
repeatedly, as the mentioned transfer will take place hourly. The timestamps will have been previously created with date
.
- I want to capture the last instance of these two strings, and everything in between, into a separate file.
- If the started string is found near the end of the log file, with no following completed string, I want to capture everything up to EOF and output an error message to say that the end string was not found.
I'm guessing I'll need to use sed
or awk
but am really inexperienced with them. I want to use the command in a bash script, and understand what each part is doing, so some explanation would be very useful.
An example chunk of log file:
ERROR - Second tech sync failed with rsync error code 255 at Fri May 27 13:50:4$
--------------------------------------------------------------------
After_sync script completed successfully with no errors.
Main script finished at Fri May 27 13:50:43 BST 2016 with PID of 18808.
--------------------------------------------------------------------
Transfer started at Fri May 27 13:50:45 BST 2016
Logs transferred successfully.
Images transferred successfully.
Hashes transferred successfully.
37 approvals pending.
Transfer completed successfully at Fri May 27 14:05:16 BST 2016
--------------------------------------------------------------------
Local repository verification started at Fri May 27 14:35:02 BST 2016
...
The desired output:
Transfer started at Fri May 27 13:50:45 BST 2016
Logs transferred successfully.
Images transferred successfully.
Hashes transferred successfully.
37 approvals pending.
Transfer completed successfully at Fri May 27 14:05:16 BST 2016
However, if the log file was like this:
ERROR - Second tech sync failed with rsync error code 255 at Fri May 27 13:50:4$
--------------------------------------------------------------------
After_sync script completed successfully with no errors.
Main script finished at Fri May 27 13:50:43 BST 2016 with PID of 18808.
--------------------------------------------------------------------
Transfer started at Fri May 27 13:50:45 BST 2016
Logs transferred successfully.
Images transferred successfully.
Hashes transferred successfully.
I would want to output:
Transfer started at Fri May 27 13:50:45 BST 2016
Logs transferred successfully.
Images transferred successfully.
Hashes transferred successfully.
ERROR: transfer not complete by end of log file