I found a list of the majority of English words online, but the line breaks are of unix-style [encoded in Unicode: UTF-8]. I found it on this website: //dreamsteep.com/projects/the-english-open-word-list.html
How do I convert the line breaks to CRLF so I can iterate over them? The program I will be using them in goes through each line in the file, so the words have to be one per line.
This is a portion of the file: bitbackbitebackbiterbackbitersbackbitesbackbitingbackbittenbackboard
It should be:
bit
backbite
backbiter
backbiters
backbites
backbiting
backbitten
backboard
How can I convert my files to this type? Note: it's 26 files [one per letter] with 80,000 words or so in total [so the program should be very fast].
I don't know where to start because I've never worked with unicode. Thanks in advance!
Using rU
as the parameter [as suggested], with this in my code:
with open[my_file_name, 'rU'] as my_file:
for line in my_file:
new_words.append[str[line]]
my_file.close[]
I get this error:
Traceback [most recent call last]:
File "", line 1, in
addWords['B Words']
File "D:\my_stuff\Google Drive\documents\SCHOOL\Programming\Python\Programming Class\hangman.py", line 138, in addWords
for line in my_file:
File "C:\Python3.3\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode[input,self.errors,decoding_table][0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 7488: character maps to
Can anyone help me with this?
Change LF line endings to CRLF [Unix to Windows]
""" PYTHON SOFTWARE FOUNDATION LICENSE VERSION 2 -------------------------------------------- 1. This LICENSE AGREEMENT is between the Python Software Foundation ["PSF"], and the Individual or Organization ["Licensee"] accessing and otherwise using this software ["Python"] in source or binary form and its associated documentation. 2. Subject to the terms and conditions of this License Agreement, PSF hereby grants Licensee a nonexclusive, royalty-free, world-wide license to reproduce, analyze, test, perform and/or display publicly, prepare derivative works, distribute, and otherwise use Python alone or in any derivative version, provided, however, that PSF's License Agreement and PSF's notice of copyright, i.e., "Copyright [c] 2001, 2002, 2003, 2004 Python Software Foundation; All Rights Reserved" are retained in Python alone or in any derivative version prepared by Licensee. 3. In the event Licensee prepares a derivative work that is based on or incorporates Python or any part thereof, and wants to make the derivative work available to others as provided herein, then Licensee hereby agrees to include in any such work a brief summary of the changes made to Python. 4. PSF is making Python available to Licensee on an "AS IS" basis. PSF MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, PSF MAKES NO AND DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF PYTHON WILL NOT INFRINGE ANY THIRD PARTY RIGHTS. 5. PSF SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF PYTHON FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS AS A RESULT OF MODIFYING, DISTRIBUTING, OR OTHERWISE USING PYTHON, OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF. 6. This License Agreement will automatically terminate upon a material breach of its terms and conditions. 7. Nothing in this License Agreement shall be deemed to create any relationship of agency, partnership, or joint venture between PSF and Licensee. This License Agreement does not grant permission to use PSF trademarks or trade name in a trademark sense to endorse or promote products or services of Licensee, or any third party. 8. By copying, installing or otherwise using Python, Licensee agrees to be bound by the terms and conditions of this License Agreement. """ #! /usr/bin/env python "Replace LF with CRLF in argument files. Print names of changed files." import sys, re, os def main[]: for filename in sys.argv[1:]: if os.path.isdir[filename]: print filename, "Directory!" continue data = open[filename, "rb"].read[] if '\0' in data: print filename, "Binary!" continue newdata = re.sub["\r?\n", "\r\n", data] if newdata != data: print filename f = open[filename, "wb"] f.write[newdata] f.close[] if __name__ == '__main__': main[]
Related examples in the same category
1. | Print lines/words/chars stats of files by extension | ||
2. | Print the product of age and size of each file, in suitable units. | ||
3. | Copy one file's atime and mtime to another | ||
4. | Change CRLF line endings to LF [Windows to Unix] | ||
5. | Print a list of files that are mentioned in CVS directories. | ||
6. | Print file diffs in context, unified, or ndiff formats | ||
7. | Format du[1] output as a tree sorted by size | ||
8. | Recursively find symbolic links to a given path prefix | ||
9. | Find a program in PATH system Variable | ||
10. | Replace tabs with spaces in argument files | ||
11. | Convert GNU texinfo files into HTML | ||
12. | Reverse grep through a file [useful for big logfiles] | ||
13. | Intelligent diff between text files [Tim Peters] | ||
14. | Python utility to print MD5 checksums of argument files | ||
15. | Make a copy of a directory tree with symbolic links to all files in the original tree |