Truly Automatic Refactoring of Python - a possible recipe?

This could be a useful thing to automate:

  1. Finding duplicate code with PMD, or even BETTER - rewrite code to use unified names of variables (think Prolog-like unification) in order to find more duplicates, e.g. def foo(name=”“) and def bar(zoo=”“) would actually be dups if they were unified to e.g. def function0(var0=”“) (assuming the function bodies are equal - when unified - though).
  2. Reduce duplicate code by using truly automated refactoring, this could be done by using the Rope refactoring library and empirically trying to do extract method on dups (i.e. if it succeeds the other corresponding dup(s) could be replaced by function calls). This algorithm that tries to extract methods from variable line spans of code can measure the efficiency of the refactoring in number of lines saved (this is probably a dynamic programming problem since there can be overlaps between potential function extractions). But naming of the new functions and variables are way harder to do, any suggestions? 
  3. Result: a lot of dups removed?

Related - Example of method extraction with Rope:


import unittest

import rope.base.codeanalyze
import rope.base.exceptions
import ropetest.testutils as testutils
from rope.refactor import extract
from ropetest import testutils

class ExtractMethodTest(unittest.TestCase):

    def setUp(self):
        super(ExtractMethodTest, self).setUp()
        self.project = testutils.sample_project()
        self.pycore = self.project.pycore

    def tearDown(self):
        super(ExtractMethodTest, self).tearDown()

    def test_simple_extract_function(self):
        code = "def a_func():\n    print('one')\n    print('two')\n"
        start, end = self._convert_line_range_to_offset(code, 2, 2)
        refactored = self.do_extract_method(code, start, end, 'extracted')
        expected = "def a_func():\n    extracted()\n    print('two')\n\n" \
                   "def extracted():\n    print('one')\n"
        self.assertEquals(expected, refactored)

What do you think of this (potential) approach?

Best regards,

Amund Tveit