Recursionerror: maximum recursion depth exceeded while calling a python object

I've built a crawler that had to run on about 5M pages [by increasing the url ID] and then parses the pages which contain the info' I need.

after using an algorithm which run on the urls [200K] and saved the good and bad results I found that the I'm wasting a lot of time. I could see that there are a a few returning subtrahends which I can use to check the next valid url.

you can see the subtrahends quite fast [a little ex' of the few first "good IDs"] -

510000011 # +8
510000029 # +18
510000037 # +8
510000045 # +8
510000052 # +7
510000060 # +8
510000078 # +18
510000086 # +8
510000094 # +8
510000102 # +8
510000110 # etc'
510000128
510000136
510000144
510000151
510000169
510000177
510000185
510000193
510000201

after crawling about 200K urls which gave me only 14K good results I knew I was wasting my time and need to optimize it, so I run some statistics and built a function that will check the urls while increasing the id with 8\18\17\8 [top returning subtrahends ] etc'.

this is the function -

def checkNextID[ID]:
    global numOfRuns, curRes, lastResult
    while ID < lastResult:
        try:
            numOfRuns += 1
            if numOfRuns % 10 == 0:
                time.sleep[3] # sleep every 10 iterations
            if isValid[ID + 8]:
                parseHTML[curRes]
                checkNextID[ID + 8]
                return 0
            if isValid[ID + 18]:
                parseHTML[curRes]
                checkNextID[ID + 18]
                return 0
            if isValid[ID + 7]:
                parseHTML[curRes]
                checkNextID[ID + 7]
                return 0
            if isValid[ID + 17]:
                parseHTML[curRes]
                checkNextID[ID + 17]
                return 0
            if isValid[ID+6]:
                parseHTML[curRes]
                checkNextID[ID + 6]
                return 0
            if isValid[ID + 16]:
                parseHTML[curRes]
                checkNextID[ID + 16]
                return 0
            else:
                checkNextID[ID + 1]
                return 0
        except Exception, e:
            print "somethin went wrong: " + str[e]

what is basically does is -checkNextID[ID] is getting the first id I know that contain the data minus 8 so the first iteration will match the first "if isValid" clause [isValid[ID + 8] will return True].

lastResult is a variable which saves the last known url id, so we'll run until numOfRuns is

isValid[] is a function that gets an ID + one of the subtrahends and returns True if the url contains what I need and saves a soup object of the url to a global varibale named - 'curRes', it returns False if the url doesn't contain the data I need.

parseHTML is a function that gets the soup object [curRes], parses the data I need and then saves the data to a csv, then returns True.

if isValid[] returns True, we'll call parseHTML[] and then try to check the next ID+the subtrahends [by calling checkNextID[ID + subtrahends], if none of them will return what I'm looking for I'll increase it with 1 and check again until I'll find the next valid url.

you can see the rest of the code here

after running the code I got about 950~ good results and suddenly an exception had raised -

"somethin went wrong: maximum recursion depth exceeded while calling a Python object"

I could see on WireShark that the scipt stuck on id - 510009541 [I started my script with 510000003], the script tried getting the url with that ID a few times before I noticed the error and stopped it.

I was really exciting to see that I got the same results but 25x-40x times faster then my old script, with fewer HTTP requests, it's very precise, I have missed only 1 result for 1000 good results, which is find by me, it's impossible to rum 5M times, I had my old script running for 30 hours and got 14-15K results when my new script gave me 960~ results in 5-10 minutes.

I read about stack limitations, but there must be a solution for the algorithm I'm trying to implement in Python [I can't go back to my old "algorithm", it will never end].

Thanks!

The maximum recursion depth in Python is 1000.

You can verify this by calling sys.getrecursionlimit[] function:

import sys

print[sys.getrecursionlimit[]] # Prints 1000

You can change the limit by calling sys.setrecursionlimit[] method.

For example:

import sys

print[sys.setrecursionlimit[2000]]

Consider this a dangerous action!

If possible, instead of tweaking the recursion limit, try to implement your algorithm iteratively to avoid deep recursion.

Python Maximum Recursion Depth Exceded in Comparison
Why Is There a Recursion Depth Limit in Python
- What Is a Stack Overflow Error in Python
How to Change the Recursion Depth Limit in Python—Danger Zone!
Temporarily Change the Recursion Depth Limit in Python
Conclusion
Further Reading
Resources

Python Maximum Recursion Depth Exceded in Comparison

Whenever you exceed the recursion depth of 1000, you get an error in Python.

For example, if we try to compute a too large Fibonacci number, we get the recursion depth error.

# A function for computing Fibonacci numbers
def fibonacci[n]:
   if n

Python Maximum Recursion Depth Exceded in Comparison

Bài Viết Liên Quan

Toplist mới

Bài mới nhất

Chủ Đề