Python 2.7 的记忆化库

15 浏览
0 Comments

Python 2.7 的记忆化库

我注意到Python 3.2在functools库中提供了装饰器memoization。\nhttp://docs.python.org/py3k/library/functools.html#functools.lru_cache\n不幸的是,它还没有被回溯到2.7版本。为什么在2.7中没有提供这个功能呢?是否有第三方库提供相同的功能,还是我需要自己编写?

0
0 Comments

Python 2.7下的Memoization库的出现原因是因为Python 3.x版本的实现有一些问题。首先,Python 3.x版本在为每个实例启用单独的缓存时存在问题,意味着如果被缓存的函数是实例方法,如果我将缓存的maxsize设置为100,并且有100个实例,如果所有实例都是相同活跃的,缓存将不起作用。其次,如果运行clear_cache,它会清除所有实例的缓存。

为了解决这些问题,需要实现一个可以为每个实例启用单独缓存的Memoization库。以下是Python 2.7版本的lru_cache函数的实现代码:

import time
import functools
import collections
def lru_cache(maxsize = 255, timeout = None):
    """lru_cache(maxsize = 255, timeout = None) --> returns a decorator which returns an instance (a descriptor).
        Purpose         - This decorator factory will wrap a function / instance method and will supply a caching mechanism to the function.
                            For every given input params it will store the result in a queue of maxsize size, and will return a cached ret_val
                            if the same parameters are passed.
        Params          - maxsize - int, the cache size limit, anything added above that will delete the first values enterred (FIFO).
                            This size is per instance, thus 1000 instances with maxsize of 255, will contain at max 255K elements.
                        - timeout - int / float / None, every n seconds the cache is deleted, regardless of usage. If None - cache will never be refreshed.
        Notes           - If an instance method is wrapped, each instance will have it's own cache and it's own timeout.
                        - The wrapped function will have a cache_clear variable inserted into it and may be called to clear it's specific cache.
                        - The wrapped function will maintain the original function's docstring and name (wraps)
                        - The type of the wrapped function will no longer be that of a function but either an instance of _LRU_Cache_class or a functool.partial type.
        On Error        - No error handling is done, in case an exception is raised - it will permeate up.
    """
    class _LRU_Cache_class(object):
        def __init__(self, input_func, max_size, timeout):
            self._input_func        = input_func
            self._max_size          = max_size
            self._timeout           = timeout
            # This will store the cache for this function, format - {caller1 : [OrderedDict1, last_refresh_time1], caller2 : [OrderedDict2, last_refresh_time2]}.
            #   In case of an instance method - the caller is the instance, in case called from a regular function - the caller is None.
            self._caches_dict        = {}
        
        def cache_clear(self, caller = None):
            # Remove the cache for the caller, only if exists:
            if caller in self._caches_dict:
                del self._caches_dict[caller]
                self._caches_dict[caller] = [collections.OrderedDict(), time.time()]
        
        def __get__(self, obj, objtype):
            """ Called for instance methods """
            return_func = functools.partial(self._cache_wrapper, obj)
            return_func.cache_clear = functools.partial(self.cache_clear, obj)
            # Return the wrapped function and wraps it to maintain the docstring and the name of the original function:
            return functools.wraps(self._input_func)(return_func)
        
        def __call__(self, *args, **kwargs):
            """ Called for regular functions """
            return self._cache_wrapper(None, *args, **kwargs)
        
        # Set the cache_clear function in the __call__ operator:
        __call__.cache_clear = cache_clear
        
        def _cache_wrapper(self, caller, *args, **kwargs):
            # Create a unique key including the types (in order to differentiate between 1 and '1'):
            kwargs_key = "".join(map(lambda x : str(x) + str(type(kwargs[x])) + str(kwargs[x]), sorted(kwargs)))
            key = "".join(map(lambda x : str(type(x)) + str(x) , args)) + kwargs_key
            # Check if caller exists, if not create one:
            if caller not in self._caches_dict:
                self._caches_dict[caller] = [collections.OrderedDict(), time.time()]
            else:
                # Validate in case the refresh time has passed:
                if self._timeout != None:
                    if time.time() - self._caches_dict[caller][1] > self._timeout:
                        self.cache_clear(caller)
            # Check if the key exists, if so - return it:
            cur_caller_cache_dict = self._caches_dict[caller][0]
            if key in cur_caller_cache_dict:
                return cur_caller_cache_dict[key]
            # Validate we didn't exceed the max_size:
            if len(cur_caller_cache_dict) >= self._max_size:
                # Delete the first item in the dict:
                cur_caller_cache_dict.popitem(False)
            # Call the function and store the data in the cache (call it with the caller in case it's an instance function - Ternary condition):
            cur_caller_cache_dict[key] = self._input_func(caller, *args, **kwargs) if caller != None else self._input_func(*args, **kwargs)
            return cur_caller_cache_dict[key]
    
    # Return the decorator wrapping the class (also wraps the instance to maintain the docstring and the name of the original function):
    return (lambda input_func : functools.wraps(input_func)(_LRU_Cache_class(input_func, maxsize, timeout)))

下面是对lru_cache函数的单元测试代码:

#!/usr/bin/python
# -*- coding: utf-8 -*-
import time
import random
import unittest
import lru_cache
class Test_Decorators(unittest.TestCase):
    def test_decorator_lru_cache(self):
        class LRU_Test(object):
            """class"""
            def __init__(self):
                self.num = 0
            _cache.lru_cache(maxsize = 10, timeout = 3)
            def test_method(self, num):
                """test_method_doc"""
                self.num += num
                return self.num
        
        _cache.lru_cache(maxsize = 10, timeout = 3)
        def test_func(num):
            """test_func_doc"""
            return num
        
        _cache.lru_cache(maxsize = 10, timeout = 3)
        def test_func_time(num):
            """test_func_time_doc"""
            return time.time()
        
        _cache.lru_cache(maxsize = 10, timeout = None)
        def test_func_args(*args, **kwargs):
            return random.randint(1,10000000)
        
        # Init vars:
        c1 = LRU_Test()
        c2 = LRU_Test()
        m1 = c1.test_method
        m2 = c2.test_method
        f1 = test_func
        
        # Test basic caching functionality:
        self.assertEqual(m1(1), m1(1)) 
        self.assertEqual(c1.num, 1)     # c1.num now equals 1 - once cached, once real
        self.assertEqual(f1(1), f1(1))
        
        # Test caching is different between instances - once cached, once not cached:
        self.assertNotEqual(m1(2), m2(2))
        self.assertNotEqual(m1(2), m2(2))
        
        # Validate the cache_clear funcionality only on one instance:
        prev1 = m1(1)
        prev2 = m2(1)
        prev3 = f1(1)
        m1.cache_clear()
        self.assertNotEqual(m1(1), prev1)
        self.assertEqual(m2(1), prev2)
        self.assertEqual(f1(1), prev3)
        
        # Validate the docstring and the name are set correctly:
        self.assertEqual(m1.__doc__, "test_method_doc")
        self.assertEqual(f1.__doc__, "test_func_doc")
        self.assertEqual(m1.__name__, "test_method")
        self.assertEqual(f1.__name__, "test_func")
        
        # Test the limit of the cache, cache size is 10, fill 15 vars, the first 5 will be overwritten for each and the other 5 are untouched. Test that:
        c1.num = 0
        c2.num = 10
        m1.cache_clear()
        m2.cache_clear()
        f1.cache_clear()
        temp_list = map(lambda i : (test_func_time(i), m1(i), m2(i)), range(15))
        for i in range(5, 10):
            self.assertEqual(temp_list[i], (test_func_time(i), m1(i), m2(i)))
        for i in range(0, 5):
            self.assertNotEqual(temp_list[i], (test_func_time(i), m1(i), m2(i)))
        
        # With the last run the next 5 vars were overwritten, now it should have only 0..4 and 10..14:
        for i in range(5, 10):
            self.assertNotEqual(temp_list[i], (test_func_time(i), m1(i), m2(i)))
        
        # Test different vars don't collide:
        self.assertNotEqual(test_func_args(1), test_func_args('1'))
        self.assertNotEqual(test_func_args(1.0), test_func_args('1.0'))
        self.assertNotEqual(test_func_args(1.0), test_func_args(1))
        self.assertNotEqual(test_func_args(None), test_func_args('None'))
        self.assertEqual(test_func_args(test_func), test_func_args(test_func))
        self.assertEqual(test_func_args(LRU_Test), test_func_args(LRU_Test))
        self.assertEqual(test_func_args(object), test_func_args(object))
        self.assertNotEqual(test_func_args(1, num = 1), test_func_args(1, num = '1'))
        
        # Test the sorting of kwargs:
        self.assertEqual(test_func_args(1, aaa = 1, bbb = 2), test_func_args(1, bbb = 2, aaa = 1))
        self.assertNotEqual(test_func_args(1, aaa = '1', bbb = 2), test_func_args(1, bbb = 2, aaa = 1))
        
        # Sanity validation of values
        c1.num = 0
        c2.num = 10
        m1.cache_clear()
        m2.cache_clear()
        f1.cache_clear()
        self.assertEqual((f1(0), m1(0), m2(0)), (0, 0, 10))
        self.assertEqual((f1(0), m1(0), m2(0)), (0, 0, 10))
        self.assertEqual((f1(1), m1(1), m2(1)), (1, 1, 11))
        self.assertEqual((f1(2), m1(2), m2(2)), (2, 3, 13))
        self.assertEqual((f1(2), m1(2), m2(2)), (2, 3, 13))
        self.assertEqual((f1(3), m1(3), m2(3)), (3, 6, 16))
        self.assertEqual((f1(3), m1(3), m2(3)), (3, 6, 16))
        self.assertEqual((f1(4), m1(4), m2(4)), (4, 10, 20))
        self.assertEqual((f1(4), m1(4), m2(4)), (4, 10, 20))
        
        # Test timeout - sleep, it should refresh cache, and then check it was cleared:
        prev_time = test_func_time(0)
        self.assertEqual(test_func_time(0), prev_time)
        self.assertEqual(m1(4), 10)
        self.assertEqual(m2(4), 20)
        time.sleep(3.5)
        self.assertNotEqual(test_func_time(0), prev_time)
        self.assertNotEqual(m1(4), 10)
        self.assertNotEqual(m2(4), 20)
if __name__ == '__main__':
    unittest.main()

这个Memoization库提供了一个装饰器,可以为函数和实例方法提供缓存机制。对于每个给定的输入参数,它将结果存储在大小为maxsize的队列中,并且如果相同的参数被传递,则返回一个缓存的返回值。此外,该库还提供了一个cache_clear函数,可以用于清除特定实例的缓存。该库还维护了原始函数的文档字符串和名称。此外,每个实例都有自己的缓存和超时时间。

这个Memoization库在Python 2.7的环境下进行了单元测试,并通过了所有测试用例。

如果你对这个库感兴趣,可以在GitHub上发布并进行维护,这比Stack Overflow上的4年前的代码片段更有吸引力。

0
0 Comments

memoization library for python 2.7

在Python 2.7和PyPy中,有一个来自Python 3.2.3的functools模块的后移版本:functools32。它包括了lru_cache装饰器。那么,functools32和repoze.lru哪个性能更好呢?pylru没有锁,所以这不是一个公平的比较。

memoization是一种优化技术,用于存储函数的结果,以避免重复计算。函数在给定相同输入时,会从缓存中直接返回结果,而不再执行计算。这样可以大大提高程序的运行效率。

在Python中,functools模块提供了lru_cache装饰器,用于实现memoization。然而,在Python 2.7中,lru_cache装饰器是不可用的。为了解决这个问题,开发者们创建了functools32库,作为functools模块的后移版本,以便在Python 2.7中使用lru_cache装饰器。

在进行性能比较时,有人提出了repoze.lru库作为对比。然而,在比较性能之前,需要注意的是pylru库并不具备锁功能,因此与其他两个库进行比较是不公平的。锁是用于保护共享资源,防止多个线程同时访问和修改数据。如果一个库没有锁机制,那么在多线程环境下可能会出现竞态条件和数据一致性问题。因此,正确的比较应该是在具备相同功能和性能的情况下进行。

functools32是Python 2.7中使用memoization的解决方案之一。在性能比较中,应该选择具备锁机制的库进行比较,以确保公平性和正确性。

0
0 Comments

在Python 2.7中为什么没有提供memoization库的具体原因是什么?

根据提供的原因:不幸的是,2.x版本只接收错误修复,新功能只为3.x版本开发。

是否有第三方库提供相同的功能?

repoze.lru是Python 2.6、Python 2.7和Python 3.2的LRU缓存实现。

文档和源代码可以在GitHub上找到。

简单使用方法:

from repoze.lru import lru_cache
_cache(maxsize=500)
def fib(n):
    if n < 2:
        return n
    return fib(n-1) + fib(n-2)

functools32repoze.lru哪个性能更好?

functools32可以在2.7中使用。

repoze.lrufunctools32都比较慢。编写自己的装饰器更快。

0