Tech blog by @dizballanze/2018-05-24T11:27:00+03:00Blazing fast tests in Django2018-05-24T11:27:00+03:002018-05-24T11:27:00+03:00Admintag:None,2018-05-24:/django-blazing-fast-tests/<p>Slow tests not only waste developers time on waiting but also make it difficult to follow TDD best practices
(such as red-green testing). If it needs minutes or even longer to run test suit, it leads to infrequent whole suit run.
Which in its turn leads to late bugs discovery …</p><p>Slow tests not only waste developers time on waiting but also make it difficult to follow TDD best practices
(such as red-green testing). If it needs minutes or even longer to run test suit, it leads to infrequent whole suit run.
Which in its turn leads to late bugs discovery and fix.</p>
<p>In this post, I'll tell how to speed up tests of your Django application. Also, I'll describe what kills your tests
performance. I will use simple tests suit as an example in this post. You can find it
<a href="https://github.com/dizballanze/blazing-fast-django-tests-example/tree/initial">on GitHub</a>.</p>
<h2 id="parallel-testing">Parallel testing</h2>
<p>The most simple way to speed up tour tests without the need to make any code changes - run tests in parallel.
Django provides <code>--parallel</code> option for running tests in parallel. This parameter also accepts an optional number of
processes. If this number wasn't provided it uses processes count equal to count of processor cores. For most of the
cases, this is optimal.</p>
<p>Sequential running of tests from <a href="https://github.com/dizballanze/blazing-fast-django-tests-example/tree/initial">the example</a>
on my machine:</p>
<div class="highlight"><pre><span></span># python manage.py test
...........
----------------------------------------------------------------------
Ran 11 tests in 8.012s
</pre></div>
<p>Let's try to use <code>--parallel</code> option:</p>
<div class="highlight"><pre><span></span># python manage.py test --parallel
...........
----------------------------------------------------------------------
Ran 11 tests in 2.628s
</pre></div>
<p>As you can see tests completed more than 3 times faster.</p>
<blockquote>
<p>It's worth noting that Django distributes execution of different test cases between different processes. Consequently,
if you have fewer test cases then processor cores Django will decrease the count of processes to match count of test cases.
In our example, we have only 3 test cases, so parallelism is limited to 3 processes. On real projects you probably
have hundreds or even thousands of test cases and this problem won't touch you.</p>
<p>In some cases Django can't collect tracebacks on tests failures in parallel mode. In that case, you need to re-run
tests sequentially.</p>
</blockquote>
<h2 id="use-weak-passwords-hashing-algorithm">Use weak passwords hashing algorithm</h2>
<p>By default, Django uses a computationally difficult algorithm for passwords hashing. Regularly in new Django versions,
the hashing algorithm is reinforced. It needs for security, so intruder will need a lot of computing power to break passwords.</p>
<p>We don't need such a strong algorithm in tests. We can go with something faster, such as MD5.
Let's add switching to MD5 for tests to <code>settings.py</code>:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">sys</span>
<span class="n">TESTING</span> <span class="o">=</span> <span class="s1">'test'</span> <span class="ow">in</span> <span class="n">sys</span><span class="o">.</span><span class="n">argv</span>
<span class="k">if</span> <span class="n">TESTING</span><span class="p">:</span>
<span class="n">PASSWORD_HASHERS</span> <span class="o">=</span> <span class="p">[</span>
<span class="s1">'django.contrib.auth.hashers.MD5PasswordHasher'</span><span class="p">,</span>
<span class="p">]</span>
</pre></div>
<p>And run tests again:</p>
<div class="highlight"><pre><span></span># python manage.py test --parallel
...........
----------------------------------------------------------------------
Ran 11 tests in 0.564s
</pre></div>
<p>4.65x times faster 🚀 I saw how this simple hack accelerated huge test suit by an order of magnitude.</p>
<h2 id="create-data-only-when-we-need-it">Create data only when we need it</h2>
<p>A frequent mistake that slows down tests execution is to have a base test case with huge <code>setUp</code> method that creates data
for the whole test suit. At first sight, it may seem convenient, but it kills your tests performance because
<strong>all</strong> data are created before <strong>each</strong> test no matter if you need them in this particular test or not.</p>
<p>To fix this problem you need to simplify <code>setUp</code> method. Ideally fully remove <code>setUp</code> method from the base test case.
Test data should be created in particular test cases only when they are needed.</p>
<p>I added <a href="https://github.com/dizballanze/blazing-fast-django-tests-example/commit/45d89a2ee690e5077a7d957acd77aa4ceb1c41b8">corresponding changes</a>
to our example. Let's see how this will affect tests running time:</p>
<div class="highlight"><pre><span></span># python manage.py test --parallel
...........
----------------------------------------------------------------------
Ran 11 tests in 0.353s
</pre></div>
<p>That's 60% more speed up.</p>
<h2 id="setuptestdata">setUpTestData</h2>
<p>Base Django test case allows creating test data on the level of the test case instead of the test method. This allows
vastly accelerate tests execution. You need to move data creation to class method <code>setUpTestData</code>.</p>
<blockquote>
<p>Objects created in <code>setUpTestData</code> shouldn't change while test running because it can lead to instability due to
not fully isolated tests.</p>
</blockquote>
<p>I added changes to the example
<a href="https://github.com/dizballanze/blazing-fast-django-tests-example/commit/5594cac4fd93c699b572f6946ce0a04ad96495f5">here</a>.
Let's run tests again:</p>
<div class="highlight"><pre><span></span># python manage.py test --parallel
...........
----------------------------------------------------------------------
Ran 11 tests in 0.349s
</pre></div>
<p>Seems that we don't get the significant speed up. Let's see what's happen if we add
<a href="https://github.com/dizballanze/blazing-fast-django-tests-example/commit/c865ff7f72b04a0f72e8edfe3c523f1ee00b923e">some more tests</a>.
Without <code>setUpTestData</code> I get <code>0.536s</code> duration and with <code>setUpTestData</code> - <code>0.348s</code>. As you can see with <code>setUpTestData</code>
duration doesn't grow on adding new tests (besides duration of running the test itself) because test data aren't created
before each test.</p>
<h2 id="conclusion">Conclusion</h2>
<p>It's desirable to pay attention to the speed of the tests from the very beginning of the development. Using simple methods
you can get very fast tests and get maximum benefit from automatic testing.</p>
<p>live long and prosper 🖖</p>Быстрые тесты в Django2018-05-24T11:27:00+03:002018-05-24T11:27:00+03:00Admintag:None,2018-05-24:/../ru/django-blazing-fast-tests/<p>Медленные тесты не только тратят время разработчиков на ожидание, но и усложняют следование лучших практик TDD
(red-green testing). Когда тестовый набор выполняется несколько минут или дольше - это приводит к тому, что
весь набор тестов запускают редко и баги, которые можно было бы исправить раньше и быстрее, откладываются.</p>
<p>В этом посте …</p><p>Медленные тесты не только тратят время разработчиков на ожидание, но и усложняют следование лучших практик TDD
(red-green testing). Когда тестовый набор выполняется несколько минут или дольше - это приводит к тому, что
весь набор тестов запускают редко и баги, которые можно было бы исправить раньше и быстрее, откладываются.</p>
<p>В этом посте я расскажу как ускорить тесты вашего Django приложения и рассмотрю, что убивает скорость ваших тестов.
В качестве примера буду использовать простой набор тестов, который вы можете найти
<a href="https://github.com/dizballanze/blazing-fast-django-tests-example/tree/initial">на GitHub</a>.</p>
<h2 id="parallelnye-testy">Параллельные тесты</h2>
<p>Самый простой способ ускорить выполнение тестов без внесения каких-либо изменений в код - запуск тестов параллельно.
Для этого в Django нужно задать параметр <code>--parallel</code> при запуске тестов, также этот параметр принимает опционально
количество процессов. Если количество процессов не задано - берется равное количеству ядер процессора, для большинства
случаев это оптимально.</p>
<p>Последовательное выполнение тестов <a href="https://github.com/dizballanze/blazing-fast-django-tests-example/tree/initial">из примера</a>
на моей машине длится:</p>
<div class="highlight"><pre><span></span># python manage.py test
...........
----------------------------------------------------------------------
Ran 11 tests in 8.012s
</pre></div>
<p>Если запустить с параметром <code>--parallel</code>:</p>
<div class="highlight"><pre><span></span># python manage.py test --parallel
...........
----------------------------------------------------------------------
Ran 11 tests in 2.628s
</pre></div>
<p>Тесты выполнились больше чем в 3 раза быстрее.</p>
<blockquote>
<p>Стоит отметить, что Django распределяет выполнение различных тест кейсов (подклассов <code>unittest.TestCase</code>) между
разными процессами. Следовательно, если у вас тест кейсов меньше, чем количество ядер у процессора, то Django
уменьшить количество процессов до количества тест кейсов. В нашем примере только 3 тест кейса, что ограничивает
параллелизм 3мя процессами. На реальных проектах у вас как правило будут сотни или даже тысячи тест кейсов
и эта проблема не будет актуальной.</p>
<p>Также при запуске тестов параллельно, Django иногда не может собрать трейсбеки ошибок, в случае ошибки вам придется
перезапустить весь набор тестов последовательно.</p>
</blockquote>
<h2 id="ispolzovanie-slabogo-algoritma-kheshirovaniia-parolei">Использование слабого алгоритма хэширования паролей</h2>
<p>По-умолчанию, Django использует вычислительно сложный алгоритм хэширования паролей и регулярно в новых версиях Django
этот алгоритм еще сильнее усложняется. Это нужно для безопасности, чтобы перебор паролей требовал огромного количества
вычислительных ресурсов.</p>
<p>Для тестовых целей нам не нужен такой сложный алгоритм хэширования, мы можем использовать что-то быстрое, например, MD5.
Добавим в <code>settings.py</code> переключение на MD5 при запуске приложения в режиме тестирования:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">sys</span>
<span class="n">TESTING</span> <span class="o">=</span> <span class="s1">'test'</span> <span class="ow">in</span> <span class="n">sys</span><span class="o">.</span><span class="n">argv</span>
<span class="k">if</span> <span class="n">TESTING</span><span class="p">:</span>
<span class="n">PASSWORD_HASHERS</span> <span class="o">=</span> <span class="p">[</span>
<span class="s1">'django.contrib.auth.hashers.MD5PasswordHasher'</span><span class="p">,</span>
<span class="p">]</span>
</pre></div>
<p>Протестируем время выполнения тестов после этого изменения:</p>
<div class="highlight"><pre><span></span># python manage.py test --parallel
...........
----------------------------------------------------------------------
Ran 11 tests in 0.564s
</pre></div>
<p>Быстрее в 4.65x раз 🚀 Я видел, как этот простой хак увеличивал скорость выполнения огромного набора теста на порядок.</p>
<h2 id="sozdaem-dannye-kogda-oni-nuzhny">Создаем данные, когда они нужны</h2>
<p>Частой ошибкой, которая замедляет выполнение выполнение тестов является наличие базового тест кейса в котором
создается огромное количество тестовых данных в <code>setUp</code> методе, а все остальные тест кейсы наследуют от него.
На первый взгляд этот подход может показаться удобным, но он полностью убивает скорость ваших тестов, т.к.
перед запуском <strong>каждого</strong> теста выполняется создание <strong>всех</strong> данных, даже тех, которые не нужны в данном тесте.</p>
<p>Для решения этой проблемы нужно по максимуму упростить общий <code>setUp</code> метод, в идеале лучше вообще удалить его. Создание
тестовых данных же вынести в конкретные тест кейсы и создавать только те данные, которые действительно нужны.</p>
<p>Я внес <a href="https://github.com/dizballanze/blazing-fast-django-tests-example/commit/45d89a2ee690e5077a7d957acd77aa4ceb1c41b8">соответствующие изменения</a>
в наш пример. Посмотрим как это отразится на времени выполнения тестов:</p>
<div class="highlight"><pre><span></span># python manage.py test --parallel
...........
----------------------------------------------------------------------
Ran 11 tests in 0.353s
</pre></div>
<p>Еще на 60% быстрее.</p>
<h2 id="setuptestdata">setUpTestData</h2>
<p>Базовый тест кейс Django предоставляет возможность создавать тестовые данные на уровне тест кейса, а не каждого теста.
Это позволяет значительно ускорить выполнение тестов. Для этого нужно вынести создание данных в метод класса
<code>setUpTestData</code>.</p>
<blockquote>
<p>Созданные в <code>setUpTestData</code> объекты не должы меняться в процессе тестирования
иначе это может привести к не стабильным тестам, т.к. тесты не будут полностью изолированными.</p>
</blockquote>
<p><a href="https://github.com/dizballanze/blazing-fast-django-tests-example/commit/5594cac4fd93c699b572f6946ce0a04ad96495f5">Здесь</a>
я добавил изменения в пример. Посмотрим на время выполнения тестов:</p>
<div class="highlight"><pre><span></span># python manage.py test --parallel
...........
----------------------------------------------------------------------
Ran 11 tests in 0.349s
</pre></div>
<p>Существенного преимущества мы не получили, но давайте проверим, что будет, если
<a href="https://github.com/dizballanze/blazing-fast-django-tests-example/commit/c865ff7f72b04a0f72e8edfe3c523f1ee00b923e">добавить еще несколько тестов</a>.
Без <code>setUpTestData</code> я получил результат <code>0.563s</code>, с <code>setUpTestData</code> - <code>0.348s</code>. Т.е. при использовании <code>setUpTestData</code>
при добавлении новых тестов время практически не растет (кроме времени выполнения самого теста), т.к. не нужно для
каждого нового теста заново создавать данные.</p>
<h2 id="zakliuchenie">Заключение</h2>
<p>Желательно с самого начала разработки обращать внимание на скорость выполнения тестов. Используя ряд не сложных методов
вы можете добиться очень быстрого выполнения тестов и получать максимальную пользу от автоматического тестирования.</p>
<p>live long and prosper 🖖</p>Pay attention to the code coverage report2018-04-29T12:28:00+03:002018-04-29T12:28:00+03:00Admintag:None,2018-04-29:/test-coverage-report-in-use/<p>If you are reading this post, you probably write unit-tests (and that's a good thing). Also, with high probability you
have heard about <strong>code coverage</strong> metric. Which shows what code has run during testing. But how often do you actually
look at code coverage report? If not too often, when …</p><p>If you are reading this post, you probably write unit-tests (and that's a good thing). Also, with high probability you
have heard about <strong>code coverage</strong> metric. Which shows what code has run during testing. But how often do you actually
look at code coverage report? If not too often, when this post is for you. I'll try to show you how code coverage
report opens a lot of useful data for developers, which in the end allows improving code quality.</p>
<h2 id="testing">Testing</h2>
<h3 id="code-without-tests">Code without tests</h3>
<p>Most obvious benefit from code coverage report is an ability to detect which code isn't testing. Often this means that
you need to add a test(s) for this code to make sure that it works properly. And which is no less important, that it
will continue to work properly with further application development.</p>
<p>Sometimes you can think, that one or the other code section is too simple and can't have any bugs. Accordingly,
you can avoid wasting time writing tests for it. In this moments you shouldn't forget, that code is changing and every
change may cause <a href="https://en.wikipedia.org/wiki/Software_regression">regression</a>. Each test can save you from hours of
debugging.</p>
<h3 id="running-of-not-expected-code-bug">Running of not expected code bug</h3>
<p>Another case then a test passed but coverage report shows that code, that you expected to run, actually haven't run.</p>
<p>This may occur due to a bug in a test. Another function is used instead of the right one, which returns the same result.
A bug like this leads to false sense of confidence, that code is working and regressions are tracked which isn't true.
Finally, it may cause deploying of broken code to the production and you'll have a hard time to debug this.</p>
<p>The same result can be because of a bug in the code. For example, misconfigured web-application routing sends requests
to the wrong method, which occasionally returns expected response. This may lead to deploying the broken code to the
production and that would be hard to find this bug relying only on tests.</p>
<p>Reviewing code coverage report before sending changes to the repository, you can find such kind of bugs and prevent
them before they cause damage, not to mention the time savings.</p>
<h2 id="dead-code_1">Dead code</h2>
<h3 id="unused-code">Unused code</h3>
<p>In some cases, code coverage report can help you to find code which isn't used anymore. For example, private methods,
which aren't called anywhere.</p>
<p>There is nothing pleasant about wasting time on reading a dead code and trying to understand why it is needed. Code
like this should be removed promptly. If you think you may need this code in future - still remove it. You can restore
it from version control system if needed.</p>
<h3 id="dead-code-in-tests">Dead code in tests</h3>
<p>Another less obvious example of dead code - dead code in tests. You can have a test method in which you're looping over
a list of some objects and make asserts on each of them. If the list for some reason turns out to be empty, the test
will pass although none of the asserts actually occur. This kind of bugs is easy to discover with code coverage report
because loop body will be shown as not covered.</p>
<h2 id="conclusion_1">Conclusion</h2>
<p>Code coverage report is a very important tool for a developer. You should look into the report after each change.
Ideally, this should be a part of your CI pipeline. If covered lines number was increased build should fail. Or at
least you should get a warning about this. This allows to avoid many bugs and increase overall code health.</p>Обращайте внимание на отчет о покрытии кода тестами2018-04-29T12:28:00+03:002018-04-29T12:28:00+03:00Admintag:None,2018-04-29:/../ru/test-coverage-report-in-use/<p>Если вы читаете этот пост, вы скорее всего, пишите юнит-тесты (и правильно делаете). Также с большой вероятностью
вы знаете о метрике <strong>покрытие кода</strong>, которая показывает какой код был выполнен в процессе тестирования.
Но так ли часто вы заглядываете в отчет о покрытии кода? Если ответ отрицательный, то этот пост для …</p><p>Если вы читаете этот пост, вы скорее всего, пишите юнит-тесты (и правильно делаете). Также с большой вероятностью
вы знаете о метрике <strong>покрытие кода</strong>, которая показывает какой код был выполнен в процессе тестирования.
Но так ли часто вы заглядываете в отчет о покрытии кода? Если ответ отрицательный, то этот пост для вас.
Я постараюсь показать, как отчет о покрытии кода открывает для разработчика множество интересных и
полезных данных, в конечном счете позволяющих улучшить качество кода.</p>
<h2 id="testirovanie">Тестирование</h2>
<h3 id="kod-bez-testov">Код без тестов</h3>
<p>Самой очевидной пользой от отчета о покрытии кода является возможность определить какой код не тестируется.
Часто это означает, что вам нужно добавить тест(ы) для этого кода, чтобы убедиться, что он работает так, как ожидается
и что не менее важно, что он продолжит корректно работать при дальнейшей развитии приложения.</p>
<p>Сейчас вам может показаться, что тот или иной участок кода слишком простой и ошибок в нем быть не может, соответственно
можно не тратить время на написание тестов к нему. В такие моменты важно не забывать, что код меняется и каждое
изменение может привести к <a href="https://ru.wikipedia.org/wiki/%D0%A0%D0%B5%D0%B3%D1%80%D0%B5%D1%81%D1%81%D0%B8%D0%BE%D0%BD%D0%BD%D0%BE%D0%B5_%D1%82%D0%B5%D1%81%D1%82%D0%B8%D1%80%D0%BE%D0%B2%D0%B0%D0%BD%D0%B8%D0%B5">регрессии</a>
и написанный заранее тест, может сберечь часы потраченные на дебаг.</p>
<h3 id="bag-pri-kotorom-ispolniaetsia-ne-ozhidaemyi-kod">Баг при котором исполняется не ожидаемый код</h3>
<p>Более редкий кейс, когда тест проходит, но отчет о покрытии кода показывает, что код, который ожидается, что будет
выполнен, на самом деле не выполняется.</p>
<p>Это может быть из-за бага в тесте, вместо нужного метода вызывается другой, который возвращает такой же результат.
Такая ошибка вызывает уверенность в том, что код работает и регрессии отслеживаются, хотя это не так.
В итоге это может привести к попаданию багов на продакшн с вытекающими последствиями, а также к длительному
процессу поиска ошибки.</p>
<p>Такой же результат может быть из-за бага в коде, например, роутинг веб-приложения настроен неправильно и запрос
отправляется на другой метод, который по случайности возвращает ожидаемый результат. Это приводит к попаданию
неработающего кода на продакшн и найти ошибку в таком случае довольно сложно, если опираться только на тесты.</p>
<p>Изучая отчет о покрытии кода перед отправкой изменений в репозиторий, позволяет достаточно быстро находить такие ошибки
и исправлять их прежде чем они нанесут ущерб, не говоря уже об экономии времени.</p>
<h2 id="mertvyi-kod_1">Мертвый код</h2>
<h3 id="staryi-kod-kotoryi-uzhe-nigde-ne-ispolzuetsia">Старый код, который уже нигде не используется</h3>
<p>В некоторых случаях, отчет о покрытии может помочь выявить код, который уже нигде не используется. Например, приватные
методы, которые больше нигде не вызываются, в том числе в тестах, т.к. тестировать нужно только публичный интерфейс.</p>
<p>Нет ничего приятного в трате времени на изучение кода, который в итоге оказывается мертвым. Такой код следует сразу же
удалять, чтобы он не мешал дальнейшей поддержке приложения. Если вам кажется, что этот код может когда-то пригодиться -
все равно удалите его, если он не нужен прямо сейчас. При необходимости достать этот код из системы контроля версий
не составит труда.</p>
<h3 id="mertvyi-kod-v-testakh">Мертвый код в тестах</h3>
<p>Еще один более хитрый пример мертвого кода - мертвый код в тестах. В цикле вы проходите по списку объектов и
выполняете проверки на каждом из них. Если список по какой-то причине оказывается пустым - тест проходит, хотя
ни одной проверки выполнено не было. Такие ошибки также легко найти при помощи отчета о покрытии кода, т.к. тело
цикла будет показано как не покрытый код.</p>
<h2 id="zakliuchenie_1">Заключение</h2>
<p>Отчет о покрытии кода является очень важным инструментом в руках разработчика. После каждого изменения нужно смотреть
как изменится отчет. В идеале, это должно происходить автоматически и входить в CI pipeline. Если количество не
покрытых строк увеличилось - билд должен падать или по крайней мере должно отправляться предупреждение об этом.
Это позволит избежать ряда ошибок и в целом поддерживать код в более здоровом состоянии.</p>Django project optimization guide (part 3)2017-09-18T09:00:00+03:002017-09-18T09:00:00+03:00Yuri Shikanovtag:None,2017-09-18:/django-project-optimization-part-3/<p>Other parts of this guide:</p>
<ul>
<li><a href="/en/django-project-optimization-part-1">Part 1. Profiling and Django settings</a></li>
<li><a href="/en/django-project-optimization-part-2/">Part 2. Working with database</a></li>
<li>Part 3. Caching</li>
</ul>
<p>In this part of the guide, I will cover the most valuable approach to achieve high performance - caching.
The essence of caching is that you place the most commonly used data …</p><p>Other parts of this guide:</p>
<ul>
<li><a href="/en/django-project-optimization-part-1">Part 1. Profiling and Django settings</a></li>
<li><a href="/en/django-project-optimization-part-2/">Part 2. Working with database</a></li>
<li>Part 3. Caching</li>
</ul>
<p>In this part of the guide, I will cover the most valuable approach to achieve high performance - caching.
The essence of caching is that you place the most commonly used data to fast storage in order to speed up the access to them.
It's important to understand that the fast storage (i.e. memory) often has limited capacity. So we should use it only for
that data which will be often used.</p>
<h2 id="django-cache-framework">Django cache framework</h2>
<p>Django has different built-in cache features. Cache storage can be set up with <code>CACHES</code> dict in <code>settings.py</code>:</p>
<div class="highlight"><pre><span></span><span class="n">CACHES</span> <span class="o">=</span> <span class="p">{</span>
<span class="s2">"default"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"BACKEND"</span><span class="p">:</span> <span class="s2">"django.core.cache.backends.db.DatabaseCache"</span><span class="p">,</span>
<span class="s2">"LOCATION"</span><span class="p">:</span> <span class="s2">"my_cache_table"</span><span class="p">,</span>
<span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>Django has several built-in cache backends. Let's check some of them:</p>
<ul>
<li><code>DummyCache</code> - caches nothing, is used in development or testing environments to temporary disable caching,</li>
<li><code>DatabaseCache</code> - stores cache in the database. Not very fast storage, but can be useful to store results of long
calculations or difficult database queries.</li>
<li><code>MemcachedCache</code> uses <a href="http://memcached.org/">Memcached</a> as cache storage. You need to have Memcached server(s) to
use this backend.</li>
</ul>
<p><code>MemcachedCache</code> is the most suitable backend for production usage. <code>DatabaseCache</code> also can be useful in specific cases.
Also, Django supports 3rd-party backends, for example, Redis can be a good option as cache backend. Redis
provides <a href="http://antirez.com/news/94">more features</a> than Memcached and it's quite possible that you're already using
it in your project. You can install <code>django-redis</code> package and
<a href="http://niwinz.github.io/django-redis/latest/#_configure_as_cache_backend">configure</a> it as a cache backend.</p>
<h3 id="the-per-site-cache">The per-site cache</h3>
<p>If you don't have any dynamic content on your project, you can simply solve the cache problem - enabling
the per-site caching. You need to add several changes to your <code>settings.py</code> for this:</p>
<div class="highlight"><pre><span></span><span class="n">MIDDLEWARE</span> <span class="o">=</span> <span class="p">[</span>
<span class="s1">'django.middleware.cache.UpdateCacheMiddleware'</span><span class="p">,</span>
<span class="c1"># place all other middlewares here</span>
<span class="s1">'django.middleware.cache.FetchFromCacheMiddleware'</span><span class="p">,</span>
<span class="p">]</span>
<span class="c1"># Key in `CACHES` dict</span>
<span class="n">CACHE_MIDDLEWARE_ALIAS</span> <span class="o">=</span> <span class="s1">'default'</span>
<span class="c1"># Additional prefix for cache keys</span>
<span class="n">CACHE_MIDDLEWARE_KEY_PREFIX</span> <span class="o">=</span> <span class="s1">''</span>
<span class="c1"># Cache key TTL in seconds</span>
<span class="n">CACHE_MIDDLEWARE_SECONDS</span> <span class="o">=</span> <span class="mi">600</span>
</pre></div>
<p>As you can see, you should add middlewares at the beginning and at the end of <code>MIDDLEWARE</code> list. After that, all GET and HEAD
requests will be cached for <code>CACHE_MIDDLEWARE_SECONDS</code> seconds.</p>
<p>You can also clear cache programmatically:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.core.cache</span> <span class="kn">import</span> <span class="n">caches</span>
<span class="n">cache</span> <span class="o">=</span> <span class="n">caches</span><span class="p">[</span><span class="s1">'default'</span><span class="p">]</span> <span class="c1"># `default` is a key from CACHES dict in settings.py</span>
<span class="n">ache</span><span class="o">.</span><span class="n">clear</span><span class="p">()</span>
</pre></div>
<p>Or you can clean cache directly in the cache storage if necessary. An example for Redis:</p>
<div class="highlight"><pre><span></span>$ redis-cli -n <span class="m">1</span> FLUSHDB <span class="c1"># 1 is a DB number specified in settings.py</span>
</pre></div>
<h3 id="the-per-view-caching">The per-view caching</h3>
<p>In case it's not appropriate to cache the whole site, you can enable caching only for specific views (i.e. most highly used).
Django provides <code>cache_page</code> decorator for this:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.views.decorators.cache</span> <span class="kn">import</span> <span class="n">cache_page</span>
<span class="nd">@cache_page</span><span class="p">(</span><span class="mi">600</span><span class="p">,</span> <span class="n">cache</span><span class="o">=</span><span class="s1">'default'</span><span class="p">,</span> <span class="n">key_prefix</span><span class="o">=</span><span class="s1">''</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">author_page_view</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="n">username</span><span class="p">):</span>
<span class="n">author</span> <span class="o">=</span> <span class="n">get_object_or_404</span><span class="p">(</span><span class="n">Author</span><span class="p">,</span> <span class="n">username</span><span class="o">=</span><span class="n">username</span><span class="p">)</span>
<span class="n">show_articles_link</span> <span class="o">=</span> <span class="n">author</span><span class="o">.</span><span class="n">articles</span><span class="o">.</span><span class="n">exists</span><span class="p">()</span>
<span class="k">return</span> <span class="n">render</span><span class="p">(</span>
<span class="n">request</span><span class="p">,</span> <span class="s1">'blog/author.html'</span><span class="p">,</span>
<span class="n">context</span><span class="o">=</span><span class="nb">dict</span><span class="p">(</span><span class="n">author</span><span class="o">=</span><span class="n">author</span><span class="p">,</span> <span class="n">show_articles_link</span><span class="o">=</span><span class="n">show_articles_link</span><span class="p">))</span>
</pre></div>
<p><code>cache_page</code> accepts following arguments:</p>
<ul>
<li>first, required argument is a time-to-live of cache in seconds,</li>
<li><code>cache</code> - key from <code>CACHES</code> dict,</li>
<li><code>key_prefix</code> - cache key prefix.</li>
</ul>
<p>Also, you can apply this decorator in <code>urls.py</code>, that is convenient for Class-Based Views:</p>
<div class="highlight"><pre><span></span><span class="n">urlpatterns</span> <span class="o">=</span> <span class="p">[</span>
<span class="n">url</span><span class="p">(</span><span class="sa">r</span><span class="s1">'^$'</span><span class="p">,</span> <span class="n">cache_page</span><span class="p">(</span><span class="mi">600</span><span class="p">)(</span><span class="n">ArticlesListView</span><span class="o">.</span><span class="n">as_view</span><span class="p">()),</span> <span class="n">name</span><span class="o">=</span><span class="s1">'articles_list'</span><span class="p">),</span>
<span class="o">...</span>
<span class="p">]</span>
</pre></div>
<p>If for example page content changes are based on authenticated user, then this approach won't work. To solve this
problem you should use one of the approaches described below.</p>
<h3 id="template-fragment-caching">Template fragment caching</h3>
<p>In the previous part of this guide, I mentioned that QuerySet objects are lazy and SQL requests are delayed as long as possible.
We can take the advantage of this and cache template fragments, that will let us to avoid SQL requests for cache TTL. <code>cache</code> template
tag is provided for this:</p>
<div class="highlight"><pre><span></span>{% load cache %}
<span class="p"><</span><span class="nt">h1</span><span class="p">></span>Articles list<span class="p"></</span><span class="nt">h1</span><span class="p">></span>
<span class="p"><</span><span class="nt">p</span><span class="p">></span>Authors count: {{ authors_count }}<span class="p"></</span><span class="nt">p</span><span class="p">></span>
<span class="p"><</span><span class="nt">h2</span><span class="p">></span>Top authors<span class="p"></</span><span class="nt">h2</span><span class="p">></span>
{% cache 500 top_author %}
<span class="p"><</span><span class="nt">ul</span><span class="p">></span>
{% for author in top_authors %}
<span class="p"><</span><span class="nt">li</span><span class="p">></span>{{ author.username }} ({{ author.articles_count }})<span class="p"></</span><span class="nt">li</span><span class="p">></span>
{% endfor %}
<span class="p"></</span><span class="nt">ul</span><span class="p">></span>
{% endcache %}
{% cache 500 articles_list %}
{% for article in articles %}
<span class="p"><</span><span class="nt">article</span><span class="p">></span>
<span class="p"><</span><span class="nt">h2</span><span class="p">></span>{{ article.title }}<span class="p"></</span><span class="nt">h2</span><span class="p">></span>
<span class="p"><</span><span class="nt">time</span><span class="p">></span>{{ article.created_at }}<span class="p"></</span><span class="nt">time</span><span class="p">></span>
<span class="p"><</span><span class="nt">p</span><span class="p">></span>Author: <span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"{% url 'author_page' username=article.author.username %}"</span><span class="p">></span>{{ article.author.username }}<span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">p</span><span class="p">></span>
<span class="p"><</span><span class="nt">p</span><span class="p">></span>Tags:
{% for tag in article.tags.all %}
{{ tag }}{% if not forloop.last %}, {% endif %}
{% endfor %}
<span class="p"></</span><span class="nt">article</span><span class="p">></span>
{% endfor %}
{% endcache %}
</pre></div>
<p>The result of adding <code>cache</code> template tags to our template (before and after respectively):</p>
<p><img alt="django-templates-caching-results" src="/media/2017/8/templates-caching.png"/></p>
<p><code>cache</code> accepts following arguments:</p>
<ul>
<li>first required argument is a TTL of cache,</li>
<li>the second required argument is a fragment name,</li>
<li>optional additional variables which identify fragment by dynamic data,</li>
<li>keyword <code>using='default'</code> argument, should correspond to a key in <code>CACHES</code> dict.</li>
</ul>
<p>For example, we need to cache each template fragment separately for different users.
Let's provide an additional variable that identifies a user to <code>cache</code> template tag:</p>
<div class="highlight"><pre><span></span>{% cache 500 personal_articles_list request.user.username %}
<span class="c"><!-- ... --></span>
{% %}
</pre></div>
<p>You can even provide several additional variables like this to create caching key based on a combination of their values.</p>
<h3 id="the-low-level-caching">The low-level caching</h3>
<p>Django provides access to the low-level caching API. You can use it to save/extract/delete data by specified cache key.
Let's check out a small example:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.core.cache</span> <span class="kn">import</span> <span class="n">cache</span>
<span class="k">class</span> <span class="nc">ArticlesListView</span><span class="p">(</span><span class="n">ListView</span><span class="p">):</span>
<span class="o">...</span>
<span class="k">def</span> <span class="nf">get_context_data</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="n">context</span> <span class="o">=</span> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="n">get_context_data</span><span class="p">(</span><span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
<span class="n">authors_count</span> <span class="o">=</span> <span class="n">cache</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'authors_count'</span><span class="p">)</span>
<span class="k">if</span> <span class="n">authors_count</span> <span class="ow">is</span> <span class="bp">None</span><span class="p">:</span>
<span class="n">authors_count</span> <span class="o">=</span> <span class="n">Author</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">count</span><span class="p">()</span>
<span class="n">cache</span><span class="o">.</span><span class="n">set</span><span class="p">(</span><span class="s1">'authors_count'</span><span class="p">,</span> <span class="n">authors_count</span><span class="p">)</span>
<span class="n">context</span><span class="p">[</span><span class="s1">'authors_count'</span><span class="p">]</span> <span class="o">=</span> <span class="n">authors_count</span>
<span class="o">...</span>
<span class="k">return</span> <span class="n">context</span>
</pre></div>
<p>In this code fragment, we check whether authors count is in the cache by <code>authors_count</code> key. <code>cache.get</code> method returns
not <code>None</code> value if the key exists in the cache storage. In that case, we can use this value without any requests to the database.
Otherwise, code requests authors count from the database and saves it in the cache. In this way, we avoid database requests for
a cache TTL. </p>
<p>Besides database queries results, it makes sense to cache results of complex calculations or requests to external services.
It's important to understand that data can change and cache will contain stale information. There are several approaches to minimize
a chance to use stale data from cache:</p>
<ul>
<li>set up cache TTL to correspond frequency of data change,</li>
<li>add cache invalidation.</li>
</ul>
<p>Cache invalidation should happen on data change. Let's check out how to add cache invalidation for authors count
example:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.db.models.signals</span> <span class="kn">import</span> <span class="n">post_delete</span><span class="p">,</span> <span class="n">post_save</span>
<span class="kn">from</span> <span class="nn">django.dispatch</span> <span class="kn">import</span> <span class="n">receiver</span>
<span class="kn">from</span> <span class="nn">django.core.cache</span> <span class="kn">import</span> <span class="n">cache</span>
<span class="k">def</span> <span class="nf">clear_authors_count_cache</span><span class="p">():</span>
<span class="n">cache</span><span class="o">.</span><span class="n">delete</span><span class="p">(</span><span class="s1">'authors_count'</span><span class="p">)</span>
<span class="nd">@receiver</span><span class="p">(</span><span class="n">post_delete</span><span class="p">,</span> <span class="n">sender</span><span class="o">=</span><span class="n">Author</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">author_post_delete_handler</span><span class="p">(</span><span class="n">sender</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="n">clear_authors_count_cache</span><span class="p">()</span>
<span class="nd">@receiver</span><span class="p">(</span><span class="n">post_save</span><span class="p">,</span> <span class="n">sender</span><span class="o">=</span><span class="n">Author</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">author_post_save_handler</span><span class="p">(</span><span class="n">sender</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="k">if</span> <span class="n">kwargs</span><span class="p">[</span><span class="s1">'created'</span><span class="p">]:</span>
<span class="n">clear_authors_count_cache</span><span class="p">()</span>
</pre></div>
<p>In this example 2 signal handlers on adding/deleting of authors were added. That makes possible to remove cache by
<code>authors_count</code> key on authors quantity change and the new number of authors will be fetched from the database.</p>
<h2 id="cached_property_1">cached_property</h2>
<p>Besides of cache framework, Django provides an ability to cache methods' invocations right in the process memory. This type of caching
is possible only for methods without arguments (besides of <code>self</code>). This type of cache will live while the corresponding instance exists.</p>
<p><code>cached_property</code> is included Django decorator. Methods with this decorator also become properties. Let's check out
an example:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Author</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">username</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span><span class="n">max_length</span><span class="o">=</span><span class="mi">64</span><span class="p">,</span> <span class="n">db_index</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">email</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">EmailField</span><span class="p">()</span>
<span class="n">bio</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">TextField</span><span class="p">()</span>
<span class="nd">@cached_property</span>
<span class="k">def</span> <span class="nf">articles_count</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">articles</span><span class="o">.</span><span class="n">count</span><span class="p">()</span>
</pre></div>
<p>Let's check how the <code>article_count</code> property works with enabled SQL logging:</p>
<div class="highlight"><pre><span></span><span class="o">>>></span> <span class="kn">from</span> <span class="nn">blog.models</span> <span class="kn">import</span> <span class="n">Author</span>
<span class="o">>>></span> <span class="n">author</span> <span class="o">=</span> <span class="n">Author</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">first</span><span class="p">()</span>
<span class="p">(</span><span class="mf">0.002</span><span class="p">)</span> <span class="n">SELECT</span> <span class="s2">"blog_author"</span><span class="o">.</span><span class="s2">"id"</span><span class="p">,</span> <span class="s2">"blog_author"</span><span class="o">.</span><span class="s2">"username"</span><span class="p">,</span> <span class="s2">"blog_author"</span><span class="o">.</span><span class="s2">"email"</span><span class="p">,</span> <span class="s2">"blog_author"</span><span class="o">.</span><span class="s2">"bio"</span> <span class="n">FROM</span> <span class="s2">"blog_author"</span> <span class="n">ORDER</span> <span class="n">BY</span> <span class="s2">"blog_author"</span><span class="o">.</span><span class="s2">"id"</span> <span class="n">ASC</span> <span class="n">LIMIT</span> <span class="mi">1</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">()</span>
<span class="o">>>></span> <span class="n">author</span><span class="o">.</span><span class="n">articles_count</span>
<span class="p">(</span><span class="mf">0.001</span><span class="p">)</span> <span class="n">SELECT</span> <span class="n">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span> <span class="n">AS</span> <span class="s2">"__count"</span> <span class="n">FROM</span> <span class="s2">"blog_article"</span> <span class="n">WHERE</span> <span class="s2">"blog_article"</span><span class="o">.</span><span class="s2">"author_id"</span> <span class="o">=</span> <span class="mi">142601</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="mi">142601</span><span class="p">,)</span>
<span class="mi">28</span>
<span class="o">>>></span> <span class="n">author</span><span class="o">.</span><span class="n">articles_count</span>
<span class="mi">28</span>
</pre></div>
<p>As you can see, repeated access to the <code>article_count</code> property doesn't cause any SQL requests. But if we create another
instance of this <code>Author</code> class, this property won't be cached until first access to it. That's because the cache is tied
to the specific instance of <code>Author</code> class.</p>
<h2 id="cacheops">Cacheops</h2>
<p><a href="https://github.com/Suor/django-cacheops">django-cacheops</a> is a 3rd party package, that allows you quickly enable caching
of database requests almost without code changes. You can solve most of the caching cases just by setting up the package in
<code>settings.py</code>.</p>
<p>Let's check an example of simple cacheops usage. As a test project, I will use the
<a href="https://github.com/dizballanze/django-optimization-guide-2-sample">sample project</a> from the previous part of this guide.</p>
<p>Cacheops is using Redis as a cache storage, so we need to setup Redis connection parameters in <code>settings.py</code>.</p>
<div class="highlight"><pre><span></span><span class="n">CACHEOPS_REDIS</span> <span class="o">=</span> <span class="s2">"redis://localhost:6379/1"</span>
<span class="n">INSTALLED_APPS</span> <span class="o">=</span> <span class="p">[</span>
<span class="o">...</span>
<span class="s1">'cacheops'</span><span class="p">,</span>
<span class="p">]</span>
<span class="n">CACHEOPS</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'blog.*'</span><span class="p">:</span> <span class="p">{</span><span class="s1">'ops'</span><span class="p">:</span> <span class="s1">'all'</span><span class="p">,</span> <span class="s1">'timeout'</span><span class="p">:</span> <span class="mi">60</span><span class="o">*</span><span class="mi">15</span><span class="p">},</span>
<span class="s1">'*.*'</span><span class="p">:</span> <span class="p">{</span><span class="s1">'timeout'</span><span class="p">:</span> <span class="mi">60</span><span class="o">*</span><span class="mi">60</span><span class="p">},</span>
<span class="p">}</span>
</pre></div>
<p>Just like that, we added caching of all databases requests from all models of <code>blog</code> application for 15 minutes.
As a bonus cacheops provides automatic cache invalidation for not only time-based but also event-based by setting up
model signals of corresponding models.</p>
<p>If necessary, you can setup caching more accurate and specify it per model and per request type settings.
Few examples:</p>
<div class="highlight"><pre><span></span><span class="n">CACHEOPS</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'blog.author'</span><span class="p">:</span> <span class="p">{</span><span class="s1">'ops'</span><span class="p">:</span> <span class="s1">'all'</span><span class="p">,</span> <span class="s1">'timeout'</span><span class="p">:</span> <span class="mi">60</span> <span class="o">*</span> <span class="mi">60</span><span class="p">},</span> <span class="c1"># cache all queries to `Author` model for an hour</span>
<span class="s1">'blog.article'</span><span class="p">:</span> <span class="p">{</span><span class="s1">'ops'</span><span class="p">:</span> <span class="s1">'fetch'</span><span class="p">,</span> <span class="s1">'timeout'</span><span class="p">:</span> <span class="mi">60</span> <span class="o">*</span> <span class="mi">10</span><span class="p">},</span> <span class="c1"># cache `Article` fetch queries for 10 minutes</span>
<span class="c1"># Or</span>
<span class="s1">'blog.article'</span><span class="p">:</span> <span class="p">{</span><span class="s1">'ops'</span><span class="p">:</span> <span class="s1">'get'</span><span class="p">,</span> <span class="s1">'timeout'</span><span class="p">:</span> <span class="mi">60</span> <span class="o">*</span> <span class="mi">15</span><span class="p">},</span> <span class="c1"># cache `Article` get queries for 15 minutes</span>
<span class="c1"># Or</span>
<span class="s1">'blog.article'</span><span class="p">:</span> <span class="p">{</span><span class="s1">'ops'</span><span class="p">:</span> <span class="s1">'count'</span><span class="p">,</span> <span class="s1">'timeout'</span><span class="p">:</span> <span class="mi">60</span> <span class="o">*</span> <span class="mi">60</span> <span class="o">*</span> <span class="mi">3</span><span class="p">},</span> <span class="c1"># cache `Article` fetch queries for 3 hours</span>
<span class="s1">'*.*'</span><span class="p">:</span> <span class="p">{</span><span class="s1">'timeout'</span><span class="p">:</span> <span class="mi">60</span> <span class="o">*</span> <span class="mi">60</span><span class="p">},</span>
<span class="s1">'*.*'</span><span class="p">:</span> <span class="p">{</span><span class="s1">'timeout'</span><span class="p">:</span> <span class="mi">60</span> <span class="o">*</span> <span class="mi">60</span><span class="p">},</span>
<span class="p">}</span>
</pre></div>
<p>Besides that, cacheops also has several useful features:</p>
<ul>
<li>manual caching by <code>QuerySet</code> method: <code>Article.objects.filter(tag=2).cache()</code>,</li>
<li>function results caching with bounding to models and automatic invalidation,</li>
<li>views caching with models bounding and automatic invalidation,</li>
<li>template fragments caching and more.</li>
</ul>
<p>You should check out <a href="https://github.com/Suor/django-cacheops/blob/master/README.rst">cacheops' README</a> for details.</p>
<h2 id="http-caching">HTTP caching</h2>
<p>If your project uses HTTP, you should consider using built-in cache capabilities in HTTP protocol. They allow cache
results of safe requests (GET and HEAD) on a client (i.e. browser) and on intermediate proxy servers.</p>
<p>Caching control is performed by HTTP headers. You can setup these headers in application or web server (Nginx, Apache, etc).</p>
<p>Django provides several convenient middlewares and view decorators to control HTTP caching.</p>
<h3 id="vary">Vary</h3>
<p>The <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Vary"><code>Vary</code></a> header allows to specify a list of header names,
which values will be used to create cache keys. Django provides view decorator
<a href="https://docs.djangoproject.com/en/1.11/topics/cache/#using-vary-headers"><code>vary_on_headers</code></a> for control of this header.</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.views.decorators.vary</span> <span class="kn">import</span> <span class="n">vary_on_headers</span>
<span class="nd">@vary_on_headers</span><span class="p">(</span><span class="s1">'User-Agent'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">author_page_view</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="n">username</span><span class="p">):</span>
<span class="o">...</span>
</pre></div>
<p>In this case, different cache keys will be used for different values of <code>User-Agent</code> header.</p>
<h3 id="cache-control">Cache-Control</h3>
<p>The <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control"><code>Cache-Control</code></a> header is used to control
how caching is performed.
<a href="https://docs.djangoproject.com/en/1.11/topics/cache/#controlling-cache-using-other-headers"><code>cache_control</code></a>
view decorator allows setting this header directives.</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.views.decorators.cache</span> <span class="kn">import</span> <span class="n">cache_control</span>
<span class="nd">@cache_control</span><span class="p">(</span><span class="n">private</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">max_age</span><span class="o">=</span><span class="mi">3600</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">author_page_view</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="n">username</span><span class="p">):</span>
<span class="o">...</span>
</pre></div>
<p>Let's check out some <code>Cache-Contol</code> directives:</p>
<ul>
<li><code>public</code>, <code>private</code> - allows or forbids caching in public caches (proxy servers, etc). This is an important directive
because it allows securing private content that should be available only for specific users.</li>
<li><code>no-cache</code> - disables caching that makes a client to perform a request to the origin server.</li>
<li><code>max-age</code> - after that time (in seconds) cache is considered stale.</li>
</ul>
<h3 id="last-modified-etag">Last-Modified & Etag</h3>
<p>HTTP protocol provides more complicated caching capabilities which allow verifying data freshness by the server with
conditional requests. To make this capabilities work server should provide one or both of the following headers:</p>
<ul>
<li><code>Last-Modified</code> - date and time of last resource modification,</li>
<li><code>Etag</code> - resource version identifier (hash or version number)</li>
</ul>
<p>After that, client should provide <code>If-Modified-Since</code> and <code>If-None-Match</code> headers on following requests.
Server checks if the resource wasn't changed since the last request and returns 304 response without the body.
This allows to perform repeated resource that loads only if it was changed. In this way it saves time and server resources.</p>
<p>Besides caching, described capabilities are used to precondition checking in unsafe requests (POST, PUT, etc).
But this is beyond the topic of this guide.</p>
<p>Django provides several ways to control <code>ETag</code> and <code>Last-Modified</code> headers. The simplest one is to use
<code>ConditionalGetMiddleware</code>. This middleware based on view response adds <code>Etag</code> header to all GET requests.
Also, it checks request headers and returns 304 if the resource wasn't changed.</p>
<p>This approach has several drawbacks:</p>
<ul>
<li>middleware is applied to all views of the project, that sometimes isn't necessary,</li>
<li>it generates full response to get resource version, that requires lots of server resources,</li>
<li>it works only for GET requests.</li>
</ul>
<p>There is more accurate approach, you should use <code>condition</code> view decorator, that allows specifying custom
functions to generate <code>Etag</code> and/or <code>Last-Modified</code> headers. In this functions, you can use more effective
approach of detecting resource version. You can just request one field from the database with no need to generate
a full response.</p>
<div class="highlight"><pre><span></span><span class="c1"># models.py</span>
<span class="k">class</span> <span class="nc">Author</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="o">...</span>
<span class="n">updated_at</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">DateTimeField</span><span class="p">(</span><span class="n">auto_now</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="c1"># views.py</span>
<span class="kn">from</span> <span class="nn">django.views.decorators.http</span> <span class="kn">import</span> <span class="n">condition</span>
<span class="k">def</span> <span class="nf">author_updated_at</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="n">username</span><span class="p">):</span>
<span class="n">updated_at</span> <span class="o">=</span> <span class="n">Author</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">username</span><span class="o">=</span><span class="n">username</span><span class="p">)</span><span class="o">.</span><span class="n">values_list</span><span class="p">(</span><span class="s1">'updated_at'</span><span class="p">,</span> <span class="n">flat</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="k">if</span> <span class="n">updated_at</span><span class="p">:</span>
<span class="k">return</span> <span class="n">updated_at</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="k">return</span> <span class="bp">None</span>
<span class="nd">@condition</span><span class="p">(</span><span class="n">last_modified_func</span><span class="o">=</span><span class="n">author_updated_at</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">author_page_view</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="n">username</span><span class="p">):</span>
<span class="o">...</span>
</pre></div>
<p><code>author_updated_at</code> function performs simple database request, that returns resource's last update date. This is more
efficient than fetching all necessary data from the database and rendering a template. After any changes to author
are done the function returns new date that will lead to cache invalidation.</p>
<h2 id="static-files-caching_1">Static files caching</h2>
<p>You should cache static files to speed up repeated pages loading. This will prevent browser from repeated loading of
scripts, styles, images, etc.</p>
<p>Most likely you won't serve static files by Django in a production environment, because it's slow and unsafe.
Usually, web server is used for this task, i.e. Nginx. Let's check out how to set up caching of static files with Nginx:</p>
<div class="highlight"><pre><span></span><span class="k">server</span> <span class="p">{</span>
<span class="c1"># ...</span>
<span class="kn">location</span> <span class="s">/static/</span> <span class="p">{</span>
<span class="kn">expires</span> <span class="s">360d</span><span class="p">;</span>
<span class="kn">alias</span> <span class="s">/home/www/proj/static/</span><span class="p">;</span>
<span class="p">}</span>
<span class="kn">location</span> <span class="s">/media/</span> <span class="p">{</span>
<span class="kn">expires</span> <span class="s">360d</span><span class="p">;</span>
<span class="kn">alias</span> <span class="s">/home/www/proj/media/</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>Where,</p>
<ul>
<li><code>/static/</code> base path to static files, it's the same place where <code>collectstatic</code> copies files,</li>
<li><code>/media/</code> base path to user generated files.</li>
</ul>
<p>In this case, we cache all static for 360 days. It's important that URL of static files are changed when files are
changed. This will lead to loading of a new version of the file. You can use GET parameters with version numbers:
<code>script.js?version=123</code>. But I prefer using <a href="https://django-compressor.readthedocs.io/en/latest/">Django Compressor</a>,
that generates unique file names for all scripts and styles on each change.</p>Оптимизация производительности Django проектов (часть 3)2017-08-03T13:53:00+03:002017-08-03T13:53:00+03:00Yuri Shikanovtag:None,2017-08-03:/../ru/django-project-optimization-part-3/<p>Остальные статьи цикла:</p>
<ul>
<li><a href="/ru/django-project-optimization-part-1/">Часть 1. Профилирование и настройки Django</a></li>
<li><a href="/ru/django-project-optimization-part-2/">Часть 2. Работа с базой данных</a></li>
<li>Часть 3. Кэширование</li>
</ul>
<p>В этой части серии мы рассмотрим важнейший подход к обеспечению высокой производительности - кэширование. Суть кэширования
в том, чтобы размещать часто используемые данные в быстром хранилище для ускорения доступа к ним. Важно понять …</p><p>Остальные статьи цикла:</p>
<ul>
<li><a href="/ru/django-project-optimization-part-1/">Часть 1. Профилирование и настройки Django</a></li>
<li><a href="/ru/django-project-optimization-part-2/">Часть 2. Работа с базой данных</a></li>
<li>Часть 3. Кэширование</li>
</ul>
<p>В этой части серии мы рассмотрим важнейший подход к обеспечению высокой производительности - кэширование. Суть кэширования
в том, чтобы размещать часто используемые данные в быстром хранилище для ускорения доступа к ним. Важно понять, что
быстрое хранилище (например, оперативная память) часто имеет очень ограниченный объем и его нужно использовать для
хранения только тех данных, которые с большой вероятностью будут запрошены.</p>
<h2 id="kesh-freimvork-django">Кэш фреймворк Django</h2>
<p>Django предоставляет ряд средств для кэширования из коробки. Хранилище кэша настраивается при помощи словаря <code>CACHES</code>
в <code>settings.py</code>:</p>
<div class="highlight"><pre><span></span><span class="n">CACHES</span> <span class="o">=</span> <span class="p">{</span>
<span class="s2">"default"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"BACKEND"</span><span class="p">:</span> <span class="s2">"django.core.cache.backends.db.DatabaseCache"</span><span class="p">,</span>
<span class="s2">"LOCATION"</span><span class="p">:</span> <span class="s2">"my_cache_table"</span><span class="p">,</span>
<span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>Django предоставляет несколько встроенных бекендов для кэша, рассмотрим некоторые из них:</p>
<ul>
<li><code>DummyCache</code> - ничего не кэширует, используется при разработке/тестировании, если нужно временно отключить кэширование,</li>
<li><code>DatabaseCache</code> - хранит кэш в БД, не самый быстрый вариант, но может быть полезен для хранения результатов долгих
вычислений или сложных SQL запросов,</li>
<li><code>MemcachedCache</code> - использует <a href="http://memcached.org/">Memcached</a> в качестве хранилища, для использования этого бекенда
вам понадобится поднять сервер(ы) Memcached.</li>
</ul>
<p>Для использования в продакшене лучше всего подходит <code>MemcachedCache</code> и в некоторых случаях может быть полезен <code>DatabaseCache</code>.
Также Django позволяет использовать сторонние бекенды, например, удачным вариантом может быть использование Redis в
качестве хранилища для кэша. Redis <a href="http://antirez.com/news/94">предоставляет больше возможностей</a> чем Memcached и вы
скорее всего и так уже используете его в вашем проекте. Вы можете установить пакет <code>django-redis</code>
и <a href="http://niwinz.github.io/django-redis/latest/#_configure_as_cache_backend">настроить</a> его как бекенд для вашего кэша.</p>
<h3 id="keshirovanie-vsego-saita">Кэширование всего сайта</h3>
<p>Если на вашем сайте нет динамического контента, который часто меняется, то вы можете решить проблему кэширования
просто - включив кэширование всего сайта. Для этого нужно добавить несколько настроек в <code>settings.py</code>:</p>
<div class="highlight"><pre><span></span><span class="n">MIDDLEWARE</span> <span class="o">=</span> <span class="p">[</span>
<span class="s1">'django.middleware.cache.UpdateCacheMiddleware'</span><span class="p">,</span>
<span class="c1"># place all other middlewares here</span>
<span class="s1">'django.middleware.cache.FetchFromCacheMiddleware'</span><span class="p">,</span>
<span class="p">]</span>
<span class="c1"># Key in `CACHES` dict</span>
<span class="n">CACHE_MIDDLEWARE_ALIAS</span> <span class="o">=</span> <span class="s1">'default'</span>
<span class="c1"># Additional prefix for cache keys</span>
<span class="n">CACHE_MIDDLEWARE_KEY_PREFIX</span> <span class="o">=</span> <span class="s1">''</span>
<span class="c1"># Cache key TTL in seconds</span>
<span class="n">CACHE_MIDDLEWARE_SECONDS</span> <span class="o">=</span> <span class="mi">600</span>
</pre></div>
<p>После добавления показанных выше middleware первым и последним в списке, все GET и HEAD запросы будут кэшироваться на
указанное в параметре <code>CACHE_MIDDLEWARE_SECONDS</code> время.</p>
<p>При необходимости вы даже можете програмно сбрасывать кэш:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.core.cache</span> <span class="kn">import</span> <span class="n">caches</span>
<span class="n">cache</span> <span class="o">=</span> <span class="n">caches</span><span class="p">[</span><span class="s1">'default'</span><span class="p">]</span> <span class="c1"># `default` is a key from CACHES dict in settings.py</span>
<span class="n">ache</span><span class="o">.</span><span class="n">clear</span><span class="p">()</span>
</pre></div>
<p>Или можно сбросить кэш непосредственно в используемом хранилище. Например, для Redis:</p>
<div class="highlight"><pre><span></span>$ redis-cli -n <span class="m">1</span> FLUSHDB <span class="c1"># 1 is a DB number specified in settings.py</span>
</pre></div>
<h3 id="keshirovanie-view">Кэширование view</h3>
<p>Если в вашем случае не целесообразно кэшировать весь сайт, то вы можете включить кэширование только определенных view, которые
создают наибольшую нагрузку. Для этого Django предоставляет декоратор <code>cache_page</code>:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.views.decorators.cache</span> <span class="kn">import</span> <span class="n">cache_page</span>
<span class="nd">@cache_page</span><span class="p">(</span><span class="mi">600</span><span class="p">,</span> <span class="n">cache</span><span class="o">=</span><span class="s1">'default'</span><span class="p">,</span> <span class="n">key_prefix</span><span class="o">=</span><span class="s1">''</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">author_page_view</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="n">username</span><span class="p">):</span>
<span class="n">author</span> <span class="o">=</span> <span class="n">get_object_or_404</span><span class="p">(</span><span class="n">Author</span><span class="p">,</span> <span class="n">username</span><span class="o">=</span><span class="n">username</span><span class="p">)</span>
<span class="n">show_articles_link</span> <span class="o">=</span> <span class="n">author</span><span class="o">.</span><span class="n">articles</span><span class="o">.</span><span class="n">exists</span><span class="p">()</span>
<span class="k">return</span> <span class="n">render</span><span class="p">(</span>
<span class="n">request</span><span class="p">,</span> <span class="s1">'blog/author.html'</span><span class="p">,</span>
<span class="n">context</span><span class="o">=</span><span class="nb">dict</span><span class="p">(</span><span class="n">author</span><span class="o">=</span><span class="n">author</span><span class="p">,</span> <span class="n">show_articles_link</span><span class="o">=</span><span class="n">show_articles_link</span><span class="p">))</span>
</pre></div>
<p><code>cache_page</code> принимает следующие параметры:</p>
<ul>
<li>первый обязательный аргумент задает TTL кэша в секундах,</li>
<li><code>cache</code> - ключ в словаре <code>CACHES</code>,</li>
<li><code>key_prefix</code> - префикс для ключей кэша.</li>
</ul>
<p>Также этот декоратор можно применить в <code>urls.py</code>, что удобно для Class-Based Views:</p>
<div class="highlight"><pre><span></span><span class="n">urlpatterns</span> <span class="o">=</span> <span class="p">[</span>
<span class="n">url</span><span class="p">(</span><span class="sa">r</span><span class="s1">'^$'</span><span class="p">,</span> <span class="n">cache_page</span><span class="p">(</span><span class="mi">600</span><span class="p">)(</span><span class="n">ArticlesListView</span><span class="o">.</span><span class="n">as_view</span><span class="p">()),</span> <span class="n">name</span><span class="o">=</span><span class="s1">'articles_list'</span><span class="p">),</span>
<span class="o">...</span>
<span class="p">]</span>
</pre></div>
<p>Если часть контента сайта меняется в зависимости, например, от того какой пользователь аутентифицирован, то такой подход
не подойдет. Для решения этой проблемы можно воспользоваться одним из вариантов описанных ниже.</p>
<h3 id="keshirovanie-chasti-shablona">Кэширование части шаблона</h3>
<p>В предыдущей части этой серии статей было описано, что QuerySet объекты ленивые и SQL запросы не выполняются без
крайней необходимости. Мы можем воспользоваться этим и закэшировать фрагменты шаблона, что позволит избежать SQL запросов
на время жизни кэша. Для этого нужно воспользоваться тегом шаблона <code>cache</code>:</p>
<div class="highlight"><pre><span></span>{% load cache %}
<span class="p"><</span><span class="nt">h1</span><span class="p">></span>Articles list<span class="p"></</span><span class="nt">h1</span><span class="p">></span>
<span class="p"><</span><span class="nt">p</span><span class="p">></span>Authors count: {{ authors_count }}<span class="p"></</span><span class="nt">p</span><span class="p">></span>
<span class="p"><</span><span class="nt">h2</span><span class="p">></span>Top authors<span class="p"></</span><span class="nt">h2</span><span class="p">></span>
{% cache 500 top_author %}
<span class="p"><</span><span class="nt">ul</span><span class="p">></span>
{% for author in top_authors %}
<span class="p"><</span><span class="nt">li</span><span class="p">></span>{{ author.username }} ({{ author.articles_count }})<span class="p"></</span><span class="nt">li</span><span class="p">></span>
{% endfor %}
<span class="p"></</span><span class="nt">ul</span><span class="p">></span>
{% endcache %}
{% cache 500 articles_list %}
{% for article in articles %}
<span class="p"><</span><span class="nt">article</span><span class="p">></span>
<span class="p"><</span><span class="nt">h2</span><span class="p">></span>{{ article.title }}<span class="p"></</span><span class="nt">h2</span><span class="p">></span>
<span class="p"><</span><span class="nt">time</span><span class="p">></span>{{ article.created_at }}<span class="p"></</span><span class="nt">time</span><span class="p">></span>
<span class="p"><</span><span class="nt">p</span><span class="p">></span>Author: <span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"{% url 'author_page' username=article.author.username %}"</span><span class="p">></span>{{ article.author.username }}<span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">p</span><span class="p">></span>
<span class="p"><</span><span class="nt">p</span><span class="p">></span>Tags:
{% for tag in article.tags.all %}
{{ tag }}{% if not forloop.last %}, {% endif %}
{% endfor %}
<span class="p"></</span><span class="nt">article</span><span class="p">></span>
{% endfor %}
{% endcache %}
</pre></div>
<p>Результат добавления тегов <code>cache</code> в шаблон (до и после соответственно):</p>
<p><img alt="django-templates-caching-results" src="/media/2017/8/templates-caching.png"/></p>
<p><code>cache</code> принимает следующие аргументы:</p>
<ul>
<li>первый обязательный аргумент означает TTL кэша в секундах,</li>
<li>обязательное название фрагмента,</li>
<li>не обязательные дополнительные переменные, которые идентифицируют фрагмент по динамическим данным,</li>
<li>ключевой параметр <code>using='default'</code>, должен соответствовать ключу словаря <code>CACHES</code> в <code>settings.py</code>.</li>
</ul>
<p>Например, если нужно, чтобы для каждого пользователя фрагмент кэшировался отдельно, то нужно передать в тег <code>cache</code>
переменную которая идентифицирует пользователя:</p>
<div class="highlight"><pre><span></span>{% cache 500 personal_articles_list request.user.username %}
<span class="c"><!-- ... --></span>
{% %}
</pre></div>
<p>При необходимости можно передавать несколько таких переменных для создания ключей на основе комбинации их значений.</p>
<h3 id="nizkourovnevoe-keshirovanie">Низкоуровневое кэширование</h3>
<p>Django предоставляет доступ к низкоуровневому API кэш фреймворка. Вы можете использовать его для
сохранения/извлечения/удаления данных по определенному ключу в кэше. Рассмотрим небольшой пример:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.core.cache</span> <span class="kn">import</span> <span class="n">cache</span>
<span class="k">class</span> <span class="nc">ArticlesListView</span><span class="p">(</span><span class="n">ListView</span><span class="p">):</span>
<span class="o">...</span>
<span class="k">def</span> <span class="nf">get_context_data</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="n">context</span> <span class="o">=</span> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="n">get_context_data</span><span class="p">(</span><span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
<span class="n">authors_count</span> <span class="o">=</span> <span class="n">cache</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'authors_count'</span><span class="p">)</span>
<span class="k">if</span> <span class="n">authors_count</span> <span class="ow">is</span> <span class="bp">None</span><span class="p">:</span>
<span class="n">authors_count</span> <span class="o">=</span> <span class="n">Author</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">count</span><span class="p">()</span>
<span class="n">cache</span><span class="o">.</span><span class="n">set</span><span class="p">(</span><span class="s1">'authors_count'</span><span class="p">,</span> <span class="n">authors_count</span><span class="p">)</span>
<span class="n">context</span><span class="p">[</span><span class="s1">'authors_count'</span><span class="p">]</span> <span class="o">=</span> <span class="n">authors_count</span>
<span class="o">...</span>
<span class="k">return</span> <span class="n">context</span>
</pre></div>
<p>В этом фрагменте кода мы проверяем, есть ли в кэше количество авторов, которое должно быть по ключу <code>authors_count</code>.
Если есть (<code>cache.get</code> вернул не <code>None</code>), то используем значение из кэша. Иначе запрашиваем значение из БД и сохраняем
в кэш. Таким образом в течении времени жизни ключа в кэше мы больше не будем обращаться к БД.</p>
<p>Кроме результатов запросов к БД, также есть смысл кэшировать результаты сложных вычислений или обращения к внешним сервисам.
Важно при этом учитывать, что данные могут изменится и в кэше будет устаревшая информация. Для того, чтобы минимизировать
вероятность использования устаревших данных из кэша нужно:</p>
<ul>
<li>настроить адекватное TTL для кэша, которое бы соответствовало частоте изменения кэшируемых данных,</li>
<li>реализовать инвалидацию кэша.</li>
</ul>
<p>Инвалидация кеша должна происходить по событию изменения данных. Рассмотрим, как можно реализовать инвалидацию для примера
с количеством авторов:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.db.models.signals</span> <span class="kn">import</span> <span class="n">post_delete</span><span class="p">,</span> <span class="n">post_save</span>
<span class="kn">from</span> <span class="nn">django.dispatch</span> <span class="kn">import</span> <span class="n">receiver</span>
<span class="kn">from</span> <span class="nn">django.core.cache</span> <span class="kn">import</span> <span class="n">cache</span>
<span class="k">def</span> <span class="nf">clear_authors_count_cache</span><span class="p">():</span>
<span class="n">cache</span><span class="o">.</span><span class="n">delete</span><span class="p">(</span><span class="s1">'authors_count'</span><span class="p">)</span>
<span class="nd">@receiver</span><span class="p">(</span><span class="n">post_delete</span><span class="p">,</span> <span class="n">sender</span><span class="o">=</span><span class="n">Author</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">author_post_delete_handler</span><span class="p">(</span><span class="n">sender</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="n">clear_authors_count_cache</span><span class="p">()</span>
<span class="nd">@receiver</span><span class="p">(</span><span class="n">post_save</span><span class="p">,</span> <span class="n">sender</span><span class="o">=</span><span class="n">Author</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">author_post_save_handler</span><span class="p">(</span><span class="n">sender</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="k">if</span> <span class="n">kwargs</span><span class="p">[</span><span class="s1">'created'</span><span class="p">]:</span>
<span class="n">clear_authors_count_cache</span><span class="p">()</span>
</pre></div>
<p>Были добавлены 2 обработчика сигналов: создание и удаление автора. Теперь при изменении количества авторов
значение в кэше по ключу <code>authors_count</code> будет сбрасываться и в view будет запрашиваться новое количество авторов из БД.</p>
<h2 id="cached_property_1">cached_property</h2>
<p>Кроме кэш фреймворка Django также предоставляет возможность кэшировать обращение к функции прямо в памяти процесса.
Такой вид кэша возможен только для методов не принимающих никаких параметров кроме <code>self</code>. Такой кэш будет жить до тех
пор пока существует соответствующий объект.</p>
<p><code>cached_property</code> это декоратор входящий в Django. Результат применения его к методу, кроме кэширования, метод становится
свойством и вызывается неявно без необходимости указания круглых скобок. Рассмотрим пример:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Author</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">username</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span><span class="n">max_length</span><span class="o">=</span><span class="mi">64</span><span class="p">,</span> <span class="n">db_index</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">email</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">EmailField</span><span class="p">()</span>
<span class="n">bio</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">TextField</span><span class="p">()</span>
<span class="nd">@cached_property</span>
<span class="k">def</span> <span class="nf">articles_count</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">articles</span><span class="o">.</span><span class="n">count</span><span class="p">()</span>
</pre></div>
<p>Проверим как работает свойство <code>article_count</code> с включенным логированием SQL:</p>
<div class="highlight"><pre><span></span><span class="o">>>></span> <span class="kn">from</span> <span class="nn">blog.models</span> <span class="kn">import</span> <span class="n">Author</span>
<span class="o">>>></span> <span class="n">author</span> <span class="o">=</span> <span class="n">Author</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">first</span><span class="p">()</span>
<span class="p">(</span><span class="mf">0.002</span><span class="p">)</span> <span class="n">SELECT</span> <span class="s2">"blog_author"</span><span class="o">.</span><span class="s2">"id"</span><span class="p">,</span> <span class="s2">"blog_author"</span><span class="o">.</span><span class="s2">"username"</span><span class="p">,</span> <span class="s2">"blog_author"</span><span class="o">.</span><span class="s2">"email"</span><span class="p">,</span> <span class="s2">"blog_author"</span><span class="o">.</span><span class="s2">"bio"</span> <span class="n">FROM</span> <span class="s2">"blog_author"</span> <span class="n">ORDER</span> <span class="n">BY</span> <span class="s2">"blog_author"</span><span class="o">.</span><span class="s2">"id"</span> <span class="n">ASC</span> <span class="n">LIMIT</span> <span class="mi">1</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">()</span>
<span class="o">>>></span> <span class="n">author</span><span class="o">.</span><span class="n">articles_count</span>
<span class="p">(</span><span class="mf">0.001</span><span class="p">)</span> <span class="n">SELECT</span> <span class="n">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span> <span class="n">AS</span> <span class="s2">"__count"</span> <span class="n">FROM</span> <span class="s2">"blog_article"</span> <span class="n">WHERE</span> <span class="s2">"blog_article"</span><span class="o">.</span><span class="s2">"author_id"</span> <span class="o">=</span> <span class="mi">142601</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="mi">142601</span><span class="p">,)</span>
<span class="mi">28</span>
<span class="o">>>></span> <span class="n">author</span><span class="o">.</span><span class="n">articles_count</span>
<span class="mi">28</span>
</pre></div>
<p>Как вы видите, повторное обращение к свойству <code>article_count</code> не вызывает SQL запрос. Но если мы создадим еще один
экземпляр автора, то в нем это свойство не будет закэшированно, до того как мы впервые к нему обратимся, т.к. кэш в
данном случае привязан к экземпляру класса <code>Author</code>.</p>
<h2 id="cacheops">Cacheops</h2>
<p><a href="https://github.com/Suor/django-cacheops">django-cacheops</a> это сторонний пакет, который позволяет очень быстро внедрить
кэширование запросов к БД практически не меняя код проекта. Большую часть случаев можно решить просто задав ряд
настроек этого пакета в <code>settings.py</code>.</p>
<p>Рассмотрим на примере простой вариант использования этого пакета. В качестве тестового проекта будем использовать
<a href="https://github.com/dizballanze/django-optimization-guide-2-sample">пример</a> из прошлой части серии.</p>
<p>Cacheops использует Redis в качестве хранилища кэша, в <code>settings.py</code> нужно указать параметры подключения к серверу Redis.</p>
<div class="highlight"><pre><span></span><span class="n">CACHEOPS_REDIS</span> <span class="o">=</span> <span class="s2">"redis://localhost:6379/1"</span>
<span class="n">INSTALLED_APPS</span> <span class="o">=</span> <span class="p">[</span>
<span class="o">...</span>
<span class="s1">'cacheops'</span><span class="p">,</span>
<span class="p">]</span>
<span class="n">CACHEOPS</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'blog.*'</span><span class="p">:</span> <span class="p">{</span><span class="s1">'ops'</span><span class="p">:</span> <span class="s1">'all'</span><span class="p">,</span> <span class="s1">'timeout'</span><span class="p">:</span> <span class="mi">60</span><span class="o">*</span><span class="mi">15</span><span class="p">},</span>
<span class="s1">'*.*'</span><span class="p">:</span> <span class="p">{</span><span class="s1">'timeout'</span><span class="p">:</span> <span class="mi">60</span><span class="o">*</span><span class="mi">60</span><span class="p">},</span>
<span class="p">}</span>
</pre></div>
<p>Вот так просто мы добавили кэширование всех запросов к моделям из приложения <code>blog</code> на 15 минут. При этом cacheops
автоматически настраивает инвалидацию кэша не только по времени, но и по событиям обновления данных, при помощи сигналов
соответствующих моделей.</p>
<p>При необходимости можно настроить кэширование не только всех моделей приложения но и каждую модель отдельно и для разных
запросов. Несколько примеров:</p>
<div class="highlight"><pre><span></span><span class="n">CACHEOPS</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'blog.author'</span><span class="p">:</span> <span class="p">{</span><span class="s1">'ops'</span><span class="p">:</span> <span class="s1">'all'</span><span class="p">,</span> <span class="s1">'timeout'</span><span class="p">:</span> <span class="mi">60</span> <span class="o">*</span> <span class="mi">60</span><span class="p">},</span> <span class="c1"># cache all queries to `Author` model for an hour</span>
<span class="s1">'blog.article'</span><span class="p">:</span> <span class="p">{</span><span class="s1">'ops'</span><span class="p">:</span> <span class="s1">'fetch'</span><span class="p">,</span> <span class="s1">'timeout'</span><span class="p">:</span> <span class="mi">60</span> <span class="o">*</span> <span class="mi">10</span><span class="p">},</span> <span class="c1"># cache `Article` fetch queries for 10 minutes</span>
<span class="c1"># Or</span>
<span class="s1">'blog.article'</span><span class="p">:</span> <span class="p">{</span><span class="s1">'ops'</span><span class="p">:</span> <span class="s1">'get'</span><span class="p">,</span> <span class="s1">'timeout'</span><span class="p">:</span> <span class="mi">60</span> <span class="o">*</span> <span class="mi">15</span><span class="p">},</span> <span class="c1"># cache `Article` get queries for 15 minutes</span>
<span class="c1"># Or</span>
<span class="s1">'blog.article'</span><span class="p">:</span> <span class="p">{</span><span class="s1">'ops'</span><span class="p">:</span> <span class="s1">'count'</span><span class="p">,</span> <span class="s1">'timeout'</span><span class="p">:</span> <span class="mi">60</span> <span class="o">*</span> <span class="mi">60</span> <span class="o">*</span> <span class="mi">3</span><span class="p">},</span> <span class="c1"># cache `Article` fetch queries for 3 hours</span>
<span class="s1">'*.*'</span><span class="p">:</span> <span class="p">{</span><span class="s1">'timeout'</span><span class="p">:</span> <span class="mi">60</span> <span class="o">*</span> <span class="mi">60</span><span class="p">},</span>
<span class="p">}</span>
</pre></div>
<p>Кроме этого cacheops имеет ряд других функций, некоторые из них:</p>
<ul>
<li>ручное кэширование <code>Article.objects.filter(tag=2).cache()</code>,</li>
<li>кэширование результатов выполнения функций с привязкой к модели и автоматической инвалидацией,</li>
<li>кэширование view с привязкой к модели и автоматической инвалидацией,</li>
<li>кэширование фрагментов шаблона и многое другое.</li>
</ul>
<p>Рекомендую ознакомится с <a href="https://github.com/Suor/django-cacheops/blob/master/README.rst">README cacheops</a> чтобы узнать
подробности.</p>
<h2 id="http-keshirovanie">HTTP кэширование</h2>
<p>Если ваш проект использует HTTP, то кроме серверного кэширования вы также можете использовать встроенные в HTTP протокол
механизмы кэширования. Они позволяют настроить кэширование результатов безопасных запросов (GET и HEAD) на клиенте
(например, браузере) и на промежуточных прокси-серверах.</p>
<p>Управление кэшированием осуществляется при помощи HTTP заголовков. Установку этих заголовков можно настроить в приложении
или, например, на web-сервере (Nginx, Apache, etc).</p>
<p>Django предоставляет middleware и несколько удобных декораторов для управления HTTP кэшем.</p>
<h3 id="vary">Vary</h3>
<p>Заголовок <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Vary"><code>Vary</code></a> позволяет задать список названий
заголовков, значения в которых будут учитываться при создании ключа кэша. Django предоставляет view декоратор
<a href="https://docs.djangoproject.com/en/1.11/topics/cache/#using-vary-headers"><code>vary_on_headers</code></a> для управления этим заголовком.</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.views.decorators.vary</span> <span class="kn">import</span> <span class="n">vary_on_headers</span>
<span class="nd">@vary_on_headers</span><span class="p">(</span><span class="s1">'User-Agent'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">author_page_view</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="n">username</span><span class="p">):</span>
<span class="o">...</span>
</pre></div>
<p>В данном случае, для разных значений заголовка <code>User-Agent</code> будут разные ключи кэша.</p>
<h3 id="cache-control">Cache-Control</h3>
<p>Заголовок <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control"><code>Cache-Control</code></a> позволяет задавать
различные параметры управляющие механизмом кэширования. Для задания этого заголовка можно использовать встроенный
в Django view декортатор
<a href="https://docs.djangoproject.com/en/1.11/topics/cache/#controlling-cache-using-other-headers"><code>cache_control</code></a>.</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.views.decorators.cache</span> <span class="kn">import</span> <span class="n">cache_control</span>
<span class="nd">@cache_control</span><span class="p">(</span><span class="n">private</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">max_age</span><span class="o">=</span><span class="mi">3600</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">author_page_view</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="n">username</span><span class="p">):</span>
<span class="o">...</span>
</pre></div>
<p>Рассмотрим некоторые директивы заголовка <code>Cache-Control</code>:</p>
<ul>
<li><code>public</code>, <code>private</code> - разрешает или запрещает кэширование в публичном кэше (прокси серверах и тд). Это важные директивы,
которые позволяют обезопасить приватный контент, который должен быть доступен только определенным пользователям.</li>
<li><code>no-cache</code> - отключает кэширование, что заставляет клиент делать запрос к серверу.</li>
<li><code>max-age</code> - время в секундах, после которого считается, что контент устарел и его нужно запросить заново.</li>
</ul>
<h3 id="last-modified-etag">Last-Modified & Etag</h3>
<p>HTTP протокол предоставляет и более сложный механизм кэширования, который позволяет уточнять у сервера актуальна ли
кэшированная версия контента при помощи условных запросов. Для работы этого механизма сервер должен отдавать следующие
заголовки (один из них или оба):</p>
<ul>
<li><code>Last-Modified</code> - дата и время последнего изменения ресурса.</li>
<li><code>Etag</code> - идентификатор версии ресурса (уникальный хэш или номер версии).</li>
</ul>
<p>После этого при повторном обращении к ресурсу клиент должен использовать заголовки <code>If-Modified-Since</code> и <code>If-None-Match</code>
соответственно. В таком случае, если ресурс не изменился (исходя из значений Etag и/или Last-Modified), то сервер
вернет статус 304 без тела ответа. Это позволяет выполнять повторную загрузку ресурса только в том случае, если он изменился
и тем самым съекономить время и ресурсы сервера.</p>
<p>Кроме кэширования, описанные выше заголовки применяются для проверки предусловий в запросах изменяющих ресурс (POST, PUT и тд).
Но обсуждение этого вопроса выходит за рамки данной статьи.</p>
<p>Django предоставляет несколько способов задания заголовков <code>Etag</code> и <code>Last-Modified</code>. Самый простой способ - использование
<code>ConditionalGetMiddleware</code>. Этот middleware добавляет заголовок <code>Etag</code>, на основе ответа view, ко всем
GET запросам приложения. Также он проверяет заголовки запроса и возвращает 304, если ресурс не изменился.</p>
<p>Этот подход имеет ряд недостатков:</p>
<ul>
<li>middleware применяется сразу ко всем view проекта, что не всегда нужно,</li>
<li>для проверки актуальности ресурса необходимо сгенерировать полный ответ view,</li>
<li>работает только для GET запросов.</li>
</ul>
<p>Для тонкой настройки нужно применять декоратор <code>condition</code>, который позволяет задавать кастомные функции для
генерации заголовков <code>Etag</code> и/или <code>Last-Modified</code>. В этих функциях можно реализовать более экономный способ определения
версии ресурса, например, на основе поля в БД, без необходимости генерации полного ответа view.</p>
<div class="highlight"><pre><span></span><span class="c1"># models.py</span>
<span class="k">class</span> <span class="nc">Author</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="o">...</span>
<span class="n">updated_at</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">DateTimeField</span><span class="p">(</span><span class="n">auto_now</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="c1"># views.py</span>
<span class="kn">from</span> <span class="nn">django.views.decorators.http</span> <span class="kn">import</span> <span class="n">condition</span>
<span class="k">def</span> <span class="nf">author_updated_at</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="n">username</span><span class="p">):</span>
<span class="n">updated_at</span> <span class="o">=</span> <span class="n">Author</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">username</span><span class="o">=</span><span class="n">username</span><span class="p">)</span><span class="o">.</span><span class="n">values_list</span><span class="p">(</span><span class="s1">'updated_at'</span><span class="p">,</span> <span class="n">flat</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="k">if</span> <span class="n">updated_at</span><span class="p">:</span>
<span class="k">return</span> <span class="n">updated_at</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="k">return</span> <span class="bp">None</span>
<span class="nd">@condition</span><span class="p">(</span><span class="n">last_modified_func</span><span class="o">=</span><span class="n">author_updated_at</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">author_page_view</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="n">username</span><span class="p">):</span>
<span class="o">...</span>
</pre></div>
<p>В функции <code>author_updated_at</code> выполняется простой запрос БД, который возвращает дату последнего обновления ресурса,
что требует значительно меньше ресурсов чем получение всех нужных данных для view из БД и рендеринг шаблона.
При этом при изменении автора функция вернет новую дату, что приведет к инвалидации кэша.</p>
<h2 id="keshirovanie-staticheskikh-failov_1">Кэширование статических файлов</h2>
<p>Для ускорения повторной загрузки страниц рекомендуется включить кэширование статических файлов, чтобы браузер повторно
не запрашивал те же скрипты, стили, картинки и тд.</p>
<p>В продакшн окружении вы скорее всего не будете отдавать статические файлы через Django, т.к. это медленно и не безопасно.
Для этой задачи обычно используется Nginx или другой web-сервер. Рассмотрим как настроить кэширование статики на примере Nginx:</p>
<div class="highlight"><pre><span></span><span class="k">server</span> <span class="p">{</span>
<span class="c1"># ...</span>
<span class="kn">location</span> <span class="s">/static/</span> <span class="p">{</span>
<span class="kn">expires</span> <span class="s">360d</span><span class="p">;</span>
<span class="kn">alias</span> <span class="s">/home/www/proj/static/</span><span class="p">;</span>
<span class="p">}</span>
<span class="kn">location</span> <span class="s">/media/</span> <span class="p">{</span>
<span class="kn">expires</span> <span class="s">360d</span><span class="p">;</span>
<span class="kn">alias</span> <span class="s">/home/www/proj/media/</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>Где,</p>
<ul>
<li><code>/static/</code> это базовый путь к статическим файлам, куда, в том числе, копируются файлы при выполнении <code>collectstatic</code>,</li>
<li><code>/media/</code> базовый путь к файлам загружаемым пользователями.</li>
</ul>
<p>В данном примере мы кэшируем всю статику на 360 дней. Важно, чтобы при изменении какого-либо статического файла, его
URL также изменялся, что приведет к загрузке новой версии файла. Для этого можно добавлять GET параметры к файлам с
номером версии: <code>script.js?version=123</code>. Но мне больше нравится использовать
<a href="https://django-compressor.readthedocs.io/en/latest/">Django Compressor</a>, который кроме всего прочего, генерирует
уникальное имя для скриптов и стилей при их изменении.</p>Django project optimization guide (part 2)2017-06-29T17:02:00+03:002017-06-29T17:02:00+03:00Yuri Shikanovtag:None,2017-06-29:/django-project-optimization-part-2/<p>Other parts of this guide:</p>
<ul>
<li><a href="/en/django-project-optimization-part-1">Part 1. Profiling and Django settings</a></li>
<li>Part 2. Working with database</li>
<li><a href="/en/django-project-optimization-part-3/">Part 3. Caching</a></li>
</ul>
<p>Table of Contents:</p>
<ul>
<li><a href="#mass-edit">Mass edit</a><ul>
<li><a href="#mass-insertion">Mass insertion</a></li>
<li><a href="#mass-m2m-insertion">Mass M2M insertion</a></li>
<li><a href="#mass-update">Mass update</a></li>
<li><a href="#mass-delete">Mass delete</a></li>
</ul>
</li>
<li><a href="#iterator_1">Iterator</a></li>
<li><a href="#foreign-keys">Foreign keys</a></li>
<li><a href="#retrieving-of-related-objects">Retrieving of related objects</a></li>
<li><a href="#defer-fields-retrieving">Defer fields retrieving</a></li>
<li><a href="#database-indexes">Database indexes</a></li>
<li><a href="#lenqs-vs-qscount">len(qs) vs qs.count …</a></li></ul><p>Other parts of this guide:</p>
<ul>
<li><a href="/en/django-project-optimization-part-1">Part 1. Profiling and Django settings</a></li>
<li>Part 2. Working with database</li>
<li><a href="/en/django-project-optimization-part-3/">Part 3. Caching</a></li>
</ul>
<p>Table of Contents:</p>
<ul>
<li><a href="#mass-edit">Mass edit</a><ul>
<li><a href="#mass-insertion">Mass insertion</a></li>
<li><a href="#mass-m2m-insertion">Mass M2M insertion</a></li>
<li><a href="#mass-update">Mass update</a></li>
<li><a href="#mass-delete">Mass delete</a></li>
</ul>
</li>
<li><a href="#iterator_1">Iterator</a></li>
<li><a href="#foreign-keys">Foreign keys</a></li>
<li><a href="#retrieving-of-related-objects">Retrieving of related objects</a></li>
<li><a href="#defer-fields-retrieving">Defer fields retrieving</a></li>
<li><a href="#database-indexes">Database indexes</a></li>
<li><a href="#lenqs-vs-qscount">len(qs) vs qs.count</a></li>
<li><a href="#count vs exists">count vs exists</a></li>
<li><a href="#Lazy QuerySet">Lazy QuerySet</a></li>
</ul>
<p>This is the second part of Django project optimization series. The first part was about profiling and Django settings,
it's available <a href="/django-project-optimization-part-1/">here</a>. This part will be about working
with database optimization (Django models).</p>
<p>We will use SQL logging and Django Debug Toolbar described in the first part of this series. I will use
PostgreSQL in all examples, but most part of this guide will be useful for other databases too.</p>
<p>Examples in this part are based on simple blog application, that we will build and optimize throughout the guide. Let's
begin with the following models:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">models</span>
<span class="k">class</span> <span class="nc">Tag</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">name</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span><span class="n">max_length</span><span class="o">=</span><span class="mi">64</span><span class="p">)</span>
<span class="k">def</span> <span class="fm">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">name</span>
<span class="k">class</span> <span class="nc">Author</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">username</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span><span class="n">max_length</span><span class="o">=</span><span class="mi">64</span><span class="p">)</span>
<span class="n">email</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">EmailField</span><span class="p">()</span>
<span class="n">bio</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">TextField</span><span class="p">()</span>
<span class="k">def</span> <span class="fm">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">username</span>
<span class="k">class</span> <span class="nc">Article</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">title</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span><span class="n">max_length</span><span class="o">=</span><span class="mi">64</span><span class="p">)</span>
<span class="n">content</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">TextField</span><span class="p">()</span>
<span class="n">created_at</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">DateField</span><span class="p">()</span>
<span class="n">author</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">ForeignKey</span><span class="p">(</span><span class="n">Author</span><span class="p">)</span>
<span class="n">tags</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">ManyToManyField</span><span class="p">(</span><span class="n">Tag</span><span class="p">)</span>
<span class="k">def</span> <span class="fm">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">title</span>
</pre></div>
<p>All code is available on <a href="https://github.com/dizballanze/django-optimization-guide-2-sample/tree/initial">GitHub</a>
with <a href="https://github.com/dizballanze/django-optimization-guide-2-sample/tags">tags</a>.</p>
<h2 id="mass-edit">Mass edit</h2>
<h3 id="mass-insertion">Mass insertion</h3>
<p>Let's imagine, that our new blog application replaces some old one and we need to transfer data to new models.
We have exported data from old application to large JSON files. File with authors has following structure:</p>
<div class="highlight"><pre><span></span><span class="p">[</span>
<span class="p">{</span>
<span class="nt">"username"</span><span class="p">:</span> <span class="s2">"mackchristopher"</span><span class="p">,</span>
<span class="nt">"email"</span><span class="p">:</span> <span class="s2">"dcortez@yahoo.com"</span><span class="p">,</span>
<span class="nt">"bio"</span><span class="p">:</span> <span class="s2">"Vitae mollitia in modi suscipit similique. Tempore sunt aliquid porro. Molestias tempora quos corporis quam."</span>
<span class="p">}</span>
<span class="p">]</span>
</pre></div>
<p>Let's create Django command to import authors from JSON file:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Command</span><span class="p">(</span><span class="n">BaseCommand</span><span class="p">):</span>
<span class="n">help</span> <span class="o">=</span> <span class="s1">'Load authors from `data/old_authors.json`'</span>
<span class="n">DATA_FILE_PATH</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">settings</span><span class="o">.</span><span class="n">BASE_DIR</span><span class="p">,</span> <span class="s1">'..'</span><span class="p">,</span> <span class="s1">'data'</span><span class="p">,</span> <span class="s1">'old_data.json'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">handle</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">DATA_FILE_PATH</span><span class="p">,</span> <span class="s1">'r'</span><span class="p">)</span> <span class="k">as</span> <span class="n">json_file</span><span class="p">:</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">json</span><span class="o">.</span><span class="n">loads</span><span class="p">(</span><span class="n">json_file</span><span class="o">.</span><span class="n">read</span><span class="p">())</span>
<span class="k">for</span> <span class="n">author</span> <span class="ow">in</span> <span class="n">data</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_import_author</span><span class="p">(</span><span class="n">author</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">_import_author</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">author_data</span><span class="p">):</span>
<span class="n">author</span> <span class="o">=</span> <span class="n">Author</span><span class="p">(</span>
<span class="n">username</span><span class="o">=</span><span class="n">author_data</span><span class="p">[</span><span class="s1">'username'</span><span class="p">],</span>
<span class="n">email</span><span class="o">=</span><span class="n">author_data</span><span class="p">[</span><span class="s1">'email'</span><span class="p">],</span>
<span class="n">bio</span><span class="o">=</span><span class="n">author_data</span><span class="p">[</span><span class="s1">'bio'</span><span class="p">])</span>
<span class="n">author</span><span class="o">.</span><span class="n">save</span><span class="p">()</span>
</pre></div>
<p>Now we will check how many SQL requests performed on importing 200 authors. Run in <code>python manage.py shell</code>:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.core.management</span> <span class="kn">import</span> <span class="n">call_command</span>
<span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">connection</span>
<span class="n">call_command</span><span class="p">(</span><span class="s1">'load_data'</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">connection</span><span class="o">.</span><span class="n">queries</span><span class="p">))</span>
</pre></div>
<p>This code will print a bunch of SQL requests (because SQL logging is enabled) and in the last line, we will see number <code>200</code>
This means, that for every author we perform separated <code>INSERT</code> SQL request. If you have huge amount of data, this
approach could be very slow. Let's use method <code>bulk_create</code> of <code>Author</code> model manager:</p>
<div class="highlight"><pre><span></span> <span class="k">def</span> <span class="nf">handle</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">DATA_FILE_PATH</span><span class="p">,</span> <span class="s1">'r'</span><span class="p">)</span> <span class="k">as</span> <span class="n">json_file</span><span class="p">:</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">json</span><span class="o">.</span><span class="n">loads</span><span class="p">(</span><span class="n">json_file</span><span class="o">.</span><span class="n">read</span><span class="p">())</span>
<span class="n">author_instances</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">author</span> <span class="ow">in</span> <span class="n">data</span><span class="p">:</span>
<span class="n">author_instances</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_import_author</span><span class="p">(</span><span class="n">author</span><span class="p">))</span>
<span class="n">Author</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">bulk_create</span><span class="p">(</span><span class="n">author_instances</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">_import_author</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">author_data</span><span class="p">):</span>
<span class="n">author</span> <span class="o">=</span> <span class="n">Author</span><span class="p">(</span>
<span class="n">username</span><span class="o">=</span><span class="n">author_data</span><span class="p">[</span><span class="s1">'username'</span><span class="p">],</span>
<span class="n">email</span><span class="o">=</span><span class="n">author_data</span><span class="p">[</span><span class="s1">'email'</span><span class="p">],</span>
<span class="n">bio</span><span class="o">=</span><span class="n">author_data</span><span class="p">[</span><span class="s1">'bio'</span><span class="p">])</span>
<span class="k">return</span> <span class="n">author</span>
</pre></div>
<p>This command generates one huge SQL request for all authors.</p>
<blockquote>
<p>If you have really large amount of data, probably, you will need to break insertion into several SQL requests.
You can use a <code>batch_size</code> argument of the <code>bulk_create</code> method for this.
If we want to insert 200 objects (rows) to a database and provide <code>bulk_size=50</code>, Django will generate 4 requests.</p>
<p>The <code>bulk_size</code> method has several drawbacks, you can read about them in the <a href="https://docs.djangoproject.com/en/1.11/ref/models/querysets/#bulk-create">documentation</a>.</p>
</blockquote>
<h3 id="mass-m2m-insertion">Mass M2M insertion</h3>
<p>Now we need to import articles and tags. They are available in separate JSON file with following structure:</p>
<div class="highlight"><pre><span></span><span class="p">[</span>
<span class="p">{</span>
<span class="nt">"created_at"</span><span class="p">:</span> <span class="s2">"2016-06-11"</span><span class="p">,</span>
<span class="nt">"author"</span><span class="p">:</span> <span class="s2">"nichole52"</span><span class="p">,</span>
<span class="nt">"tags"</span><span class="p">:</span> <span class="p">[</span>
<span class="s2">"ab"</span><span class="p">,</span>
<span class="s2">"iure"</span><span class="p">,</span>
<span class="s2">"iusto"</span>
<span class="p">],</span>
<span class="nt">"title"</span><span class="p">:</span> <span class="s2">"..."</span><span class="p">,</span>
<span class="nt">"content"</span><span class="p">:</span> <span class="s2">"..."</span>
<span class="p">}</span>
<span class="p">]</span>
</pre></div>
<p>Let's write another command for this:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Command</span><span class="p">(</span><span class="n">BaseCommand</span><span class="p">):</span>
<span class="n">help</span> <span class="o">=</span> <span class="s1">'Load articles from `data/old_articles.json`'</span>
<span class="n">DATA_FILE_PATH</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">settings</span><span class="o">.</span><span class="n">BASE_DIR</span><span class="p">,</span> <span class="s1">'..'</span><span class="p">,</span> <span class="s1">'data'</span><span class="p">,</span> <span class="s1">'old_articles.json'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">handle</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">DATA_FILE_PATH</span><span class="p">,</span> <span class="s1">'r'</span><span class="p">)</span> <span class="k">as</span> <span class="n">json_file</span><span class="p">:</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">json</span><span class="o">.</span><span class="n">loads</span><span class="p">(</span><span class="n">json_file</span><span class="o">.</span><span class="n">read</span><span class="p">())</span>
<span class="k">for</span> <span class="n">article</span> <span class="ow">in</span> <span class="n">data</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_import_article</span><span class="p">(</span><span class="n">article</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">_import_article</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">article_data</span><span class="p">):</span>
<span class="n">author</span> <span class="o">=</span> <span class="n">Author</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">username</span><span class="o">=</span><span class="n">article_data</span><span class="p">[</span><span class="s1">'author'</span><span class="p">])</span>
<span class="n">article</span> <span class="o">=</span> <span class="n">Article</span><span class="p">(</span>
<span class="n">title</span><span class="o">=</span><span class="n">article_data</span><span class="p">[</span><span class="s1">'title'</span><span class="p">],</span>
<span class="n">content</span><span class="o">=</span><span class="n">article_data</span><span class="p">[</span><span class="s1">'content'</span><span class="p">],</span>
<span class="n">created_at</span><span class="o">=</span><span class="n">article_data</span><span class="p">[</span><span class="s1">'created_at'</span><span class="p">],</span>
<span class="n">author</span><span class="o">=</span><span class="n">author</span><span class="p">)</span>
<span class="n">article</span><span class="o">.</span><span class="n">save</span><span class="p">()</span>
<span class="k">for</span> <span class="n">tag</span> <span class="ow">in</span> <span class="n">article_data</span><span class="p">[</span><span class="s1">'tags'</span><span class="p">]:</span>
<span class="n">tag_instance</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">Tag</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">get_or_create</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="n">tag</span><span class="p">)</span>
<span class="n">article</span><span class="o">.</span><span class="n">tags</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">tag_instance</span><span class="p">)</span>
</pre></div>
<p>After running this command database received 3349 SQL requests! Most of them look like as follows:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">001</span><span class="p">)</span> <span class="k">SELECT</span> <span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"tag_id"</span> <span class="k">FROM</span> <span class="ss">"blog_article_tags"</span> <span class="k">WHERE</span> <span class="p">(</span><span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"article_id"</span> <span class="o">=</span> <span class="mi">2319</span> <span class="k">AND</span> <span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"tag_id"</span> <span class="k">IN</span> <span class="p">(</span><span class="mi">67</span><span class="p">));</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="mi">2319</span><span class="p">,</span> <span class="mi">67</span><span class="p">)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">000</span><span class="p">)</span> <span class="k">INSERT</span> <span class="k">INTO</span> <span class="ss">"blog_article_tags"</span> <span class="p">(</span><span class="ss">"article_id"</span><span class="p">,</span> <span class="ss">"tag_id"</span><span class="p">)</span> <span class="k">VALUES</span> <span class="p">(</span><span class="mi">2319</span><span class="p">,</span> <span class="mi">67</span><span class="p">)</span> <span class="n">RETURNING</span> <span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="mi">2319</span><span class="p">,</span> <span class="mi">67</span><span class="p">)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">000</span><span class="p">)</span> <span class="k">SELECT</span> <span class="ss">"blog_tag"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span> <span class="ss">"blog_tag"</span><span class="p">.</span><span class="ss">"name"</span> <span class="k">FROM</span> <span class="ss">"blog_tag"</span> <span class="k">WHERE</span> <span class="ss">"blog_tag"</span><span class="p">.</span><span class="ss">"name"</span> <span class="o">=</span> <span class="s1">'fugiat'</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="s1">'fugiat'</span><span class="p">,)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">001</span><span class="p">)</span> <span class="k">SELECT</span> <span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"tag_id"</span> <span class="k">FROM</span> <span class="ss">"blog_article_tags"</span> <span class="k">WHERE</span> <span class="p">(</span><span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"article_id"</span> <span class="o">=</span> <span class="mi">2319</span> <span class="k">AND</span> <span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"tag_id"</span> <span class="k">IN</span> <span class="p">(</span><span class="mi">68</span><span class="p">));</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="mi">2319</span><span class="p">,</span> <span class="mi">68</span><span class="p">)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">000</span><span class="p">)</span> <span class="k">INSERT</span> <span class="k">INTO</span> <span class="ss">"blog_article_tags"</span> <span class="p">(</span><span class="ss">"article_id"</span><span class="p">,</span> <span class="ss">"tag_id"</span><span class="p">)</span> <span class="k">VALUES</span> <span class="p">(</span><span class="mi">2319</span><span class="p">,</span> <span class="mi">68</span><span class="p">)</span> <span class="n">RETURNING</span> <span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="mi">2319</span><span class="p">,</span> <span class="mi">68</span><span class="p">)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">000</span><span class="p">)</span> <span class="k">SELECT</span> <span class="ss">"blog_tag"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span> <span class="ss">"blog_tag"</span><span class="p">.</span><span class="ss">"name"</span> <span class="k">FROM</span> <span class="ss">"blog_tag"</span> <span class="k">WHERE</span> <span class="ss">"blog_tag"</span><span class="p">.</span><span class="ss">"name"</span> <span class="o">=</span> <span class="s1">'repellat'</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="s1">'repellat'</span><span class="p">,)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">001</span><span class="p">)</span> <span class="k">SELECT</span> <span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"tag_id"</span> <span class="k">FROM</span> <span class="ss">"blog_article_tags"</span> <span class="k">WHERE</span> <span class="p">(</span><span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"article_id"</span> <span class="o">=</span> <span class="mi">2319</span> <span class="k">AND</span> <span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"tag_id"</span> <span class="k">IN</span> <span class="p">(</span><span class="mi">58</span><span class="p">));</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="mi">2319</span><span class="p">,</span> <span class="mi">58</span><span class="p">)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">000</span><span class="p">)</span> <span class="k">INSERT</span> <span class="k">INTO</span> <span class="ss">"blog_article_tags"</span> <span class="p">(</span><span class="ss">"article_id"</span><span class="p">,</span> <span class="ss">"tag_id"</span><span class="p">)</span> <span class="k">VALUES</span> <span class="p">(</span><span class="mi">2319</span><span class="p">,</span> <span class="mi">58</span><span class="p">)</span> <span class="n">RETURNING</span> <span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="mi">2319</span><span class="p">,</span> <span class="mi">58</span>
</pre></div>
<p>Adding each tag to the article is performed with separated request. We can improve this command by invoking <code>article.tags.add</code>
method with all tags for a current article:</p>
<div class="highlight"><pre><span></span> <span class="k">def</span> <span class="nf">_import_article</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">article_data</span><span class="p">):</span>
<span class="c1"># ...</span>
<span class="n">tags</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">tag</span> <span class="ow">in</span> <span class="n">article_data</span><span class="p">[</span><span class="s1">'tags'</span><span class="p">]:</span>
<span class="n">tag_instance</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">Tag</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">get_or_create</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="n">tag</span><span class="p">)</span>
<span class="n">tags</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">tag_instance</span><span class="p">)</span>
<span class="n">article</span><span class="o">.</span><span class="n">tags</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="o">*</span><span class="n">tags</span><span class="p">)</span>
</pre></div>
<p>This version sends only 1834 requests, almost 2 times fewer.</p>
<h3 id="mass-update">Mass update</h3>
<p>After data import, we decided, that we need to disallow commenting on old articles (created before 2012). I added
the <code>comments_on</code> boolean field to the <code>Article</code> model. Now, we need to set its values:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">connection</span>
<span class="kn">from</span> <span class="nn">blog.models</span> <span class="kn">import</span> <span class="n">Article</span>
<span class="k">for</span> <span class="n">article</span> <span class="ow">in</span> <span class="n">Article</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">created_at__year__lt</span><span class="o">=</span><span class="mi">2012</span><span class="p">):</span>
<span class="n">article</span><span class="o">.</span><span class="n">comments_on</span> <span class="o">=</span> <span class="bp">False</span>
<span class="n">article</span><span class="o">.</span><span class="n">save</span><span class="p">()</span>
<span class="k">print</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">connection</span><span class="o">.</span><span class="n">queries</span><span class="p">))</span>
</pre></div>
<p>This code generates 179 requests like following:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">000</span><span class="p">)</span> <span class="k">UPDATE</span> <span class="ss">"blog_article"</span> <span class="k">SET</span> <span class="ss">"title"</span> <span class="o">=</span> <span class="s1">'Saepe eius facere magni et eligendi minima sint.'</span><span class="p">,</span> <span class="ss">"content"</span> <span class="o">=</span> <span class="s1">'...'</span><span class="p">,</span> <span class="ss">"created_at"</span> <span class="o">=</span> <span class="s1">'1992-03-01'</span><span class="p">::</span><span class="nb">date</span><span class="p">,</span> <span class="ss">"author_id"</span> <span class="o">=</span> <span class="mi">730</span><span class="p">,</span> <span class="ss">"comments_on"</span> <span class="o">=</span> <span class="k">false</span> <span class="k">WHERE</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"id"</span> <span class="o">=</span> <span class="mi">3507</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="s1">'Saepe eius facere magni et eligendi minima sint.'</span><span class="p">,</span> <span class="s1">'...'</span><span class="p">,</span> <span class="n">datetime</span><span class="p">.</span><span class="nb">date</span><span class="p">(</span><span class="mi">1992</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span> <span class="mi">730</span><span class="p">,</span> <span class="k">False</span><span class="p">,</span> <span class="mi">3507</span><span class="p">)</span>
</pre></div>
<p>This code generates an individual request for each article older than 2012. Moreover, this code rewrites all fields of the article. This can overwrite changes made between SELECT and UPDATE requests, that means that we not only get the performance issue, but also we get the race condition.</p>
<p>Instead, we can use <code>update</code> method of <code>QuerySet</code> instance:</p>
<div class="highlight"><pre><span></span><span class="n">Article</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">created_at__year__lt</span><span class="o">=</span><span class="mi">2012</span><span class="p">)</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="n">comments_on</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
</pre></div>
<p>This code generates just one SQL request:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">004</span><span class="p">)</span> <span class="k">UPDATE</span> <span class="ss">"blog_article"</span> <span class="k">SET</span> <span class="ss">"comments_on"</span> <span class="o">=</span> <span class="k">false</span> <span class="k">WHERE</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"created_at"</span> <span class="o"><</span> <span class="s1">'2012-01-01'</span><span class="p">::</span><span class="nb">date</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="k">False</span><span class="p">,</span> <span class="n">datetime</span><span class="p">.</span><span class="nb">date</span><span class="p">(</span><span class="mi">2012</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
</pre></div>
<p>If field update requires complex logic, that can't be performed by single <code>UPDATE</code> request, you can compute field values
via Python code and then use one of the following option:</p>
<div class="highlight"><pre><span></span><span class="n">Model</span><span class="o">.</span><span class="n">object</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="nb">id</span><span class="o">=</span><span class="n">instance</span><span class="o">.</span><span class="n">id</span><span class="p">)</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="n">field</span><span class="o">=</span><span class="n">computed_value</span><span class="p">)</span>
<span class="c1"># or</span>
<span class="n">instance</span><span class="o">.</span><span class="n">field</span> <span class="o">=</span> <span class="n">computed_value</span>
<span class="n">instance</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">update_fields</span><span class="o">=</span><span class="p">(</span><span class="s1">'fields'</span><span class="p">,))</span>
</pre></div>
<p>But this options also suffers from race conditions.</p>
<h3 id="mass-delete">Mass delete</h3>
<p>Now, we need to remove all articles with tag <code>minus</code>:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">connection</span>
<span class="kn">from</span> <span class="nn">blog.models</span> <span class="kn">import</span> <span class="n">Article</span>
<span class="k">for</span> <span class="n">article</span> <span class="ow">in</span> <span class="n">Article</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">tags__name</span><span class="o">=</span><span class="s1">'minus'</span><span class="p">):</span>
<span class="n">article</span><span class="o">.</span><span class="n">delete</span><span class="p">()</span>
<span class="k">print</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">connection</span><span class="o">.</span><span class="n">queries</span><span class="p">))</span>
</pre></div>
<p>This code generates 93 requests as follows:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">000</span><span class="p">)</span> <span class="k">DELETE</span> <span class="k">FROM</span> <span class="ss">"blog_article_tags"</span> <span class="k">WHERE</span> <span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"article_id"</span> <span class="k">IN</span> <span class="p">(</span><span class="mi">3510</span><span class="p">);</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="mi">3510</span><span class="p">,)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">000</span><span class="p">)</span> <span class="k">DELETE</span> <span class="k">FROM</span> <span class="ss">"blog_article"</span> <span class="k">WHERE</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"id"</span> <span class="k">IN</span> <span class="p">(</span><span class="mi">3510</span><span class="p">);</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="mi">3510</span><span class="p">,)</span>
</pre></div>
<p>At first, this code removes the connection between article and tag. After that, the article itself is deleted. We can perform
this in less amount of <code>requests</code> with <code>delete</code> method of <code>QuerySet</code> instance:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">connection</span>
<span class="kn">from</span> <span class="nn">blog.models</span> <span class="kn">import</span> <span class="n">Article</span>
<span class="n">Article</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">tags__name</span><span class="o">=</span><span class="s1">'minus'</span><span class="p">)</span><span class="o">.</span><span class="n">delete</span><span class="p">()</span>
<span class="k">print</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">connection</span><span class="o">.</span><span class="n">queries</span><span class="p">))</span>
</pre></div>
<p>This code perform the same but only with 3 requests to the database:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">004</span><span class="p">)</span> <span class="k">SELECT</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"title"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"content"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"created_at"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"author_id"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"comments_on"</span> <span class="k">FROM</span> <span class="ss">"blog_article"</span> <span class="k">INNER</span> <span class="k">JOIN</span> <span class="ss">"blog_article_tags"</span> <span class="k">ON</span> <span class="p">(</span><span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"id"</span> <span class="o">=</span> <span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"article_id"</span><span class="p">)</span> <span class="k">INNER</span> <span class="k">JOIN</span> <span class="ss">"blog_tag"</span> <span class="k">ON</span> <span class="p">(</span><span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"tag_id"</span> <span class="o">=</span> <span class="ss">"blog_tag"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">)</span> <span class="k">WHERE</span> <span class="ss">"blog_tag"</span><span class="p">.</span><span class="ss">"name"</span> <span class="o">=</span> <span class="s1">'minus'</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="s1">'minus'</span><span class="p">,)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">002</span><span class="p">)</span> <span class="k">DELETE</span> <span class="k">FROM</span> <span class="ss">"blog_article_tags"</span> <span class="k">WHERE</span> <span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"article_id"</span> <span class="k">IN</span> <span class="p">(</span><span class="mi">3713</span><span class="p">,</span> <span class="mi">3717</span><span class="p">,</span> <span class="mi">3722</span><span class="p">,</span> <span class="p">...);</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="mi">3713</span><span class="p">,</span> <span class="mi">3717</span><span class="p">,</span> <span class="mi">3722</span><span class="p">,</span> <span class="p">...)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">001</span><span class="p">)</span> <span class="k">DELETE</span> <span class="k">FROM</span> <span class="ss">"blog_article"</span> <span class="k">WHERE</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"id"</span> <span class="k">IN</span> <span class="p">(</span><span class="mi">3713</span><span class="p">,</span> <span class="mi">3717</span><span class="p">,</span> <span class="p">...);</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="mi">3713</span><span class="p">,</span> <span class="mi">3717</span><span class="p">,</span> <span class="mi">3722</span><span class="p">,</span> <span class="p">...)</span><span class="o">``</span><span class="k">sql</span>
</pre></div>
<p>At first, ids of all articles, marked with <code>minus</code> tag, are selected. Then the second request removes all connections
between this articles and tags. At last the articles itself are deleted.</p>
<h2 id="iterator_1">Iterator</h2>
<p>Let's pretend that we need to export articles to CSV file. This is the command to perform export:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Command</span><span class="p">(</span><span class="n">BaseCommand</span><span class="p">):</span>
<span class="n">help</span> <span class="o">=</span> <span class="s1">'Export articles to csv'</span>
<span class="n">EXPORT_FILE_PATH</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">settings</span><span class="o">.</span><span class="n">BASE_DIR</span><span class="p">,</span> <span class="s1">'..'</span><span class="p">,</span> <span class="s1">'data'</span><span class="p">,</span> <span class="s1">'articles_export.csv'</span><span class="p">)</span>
<span class="n">COLUMNS</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'title'</span><span class="p">,</span> <span class="s1">'content'</span><span class="p">,</span> <span class="s1">'created_at'</span><span class="p">,</span> <span class="s1">'author'</span><span class="p">,</span> <span class="s1">'comments_on'</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">handle</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">EXPORT_FILE_PATH</span><span class="p">,</span> <span class="s1">'w'</span><span class="p">)</span> <span class="k">as</span> <span class="n">export_file</span><span class="p">:</span>
<span class="n">articles_writer</span> <span class="o">=</span> <span class="n">csv</span><span class="o">.</span><span class="n">writer</span><span class="p">(</span><span class="n">export_file</span><span class="p">,</span> <span class="n">delimiter</span><span class="o">=</span><span class="s1">';'</span><span class="p">)</span>
<span class="n">articles_writer</span><span class="o">.</span><span class="n">writerow</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">COLUMNS</span><span class="p">)</span>
<span class="k">for</span> <span class="n">article</span> <span class="ow">in</span> <span class="n">Article</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">select_related</span><span class="p">(</span><span class="s1">'author'</span><span class="p">)</span><span class="o">.</span><span class="n">all</span><span class="p">():</span>
<span class="n">articles_writer</span><span class="o">.</span><span class="n">writerow</span><span class="p">([</span><span class="nb">getattr</span><span class="p">(</span><span class="n">article</span><span class="p">,</span> <span class="n">column</span><span class="p">)</span> <span class="k">for</span> <span class="n">column</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">COLUMNS</span><span class="p">])</span>
</pre></div>
<p>For testing purpose, I generated around 100Mb of articles and loaded them to DB. After that, I ran CSV export command with
<a href="https://pypi.python.org/pypi/memory_profiler">memory profiler</a>.</p>
<div class="highlight"><pre><span></span>mprof run python manage.py export_articles
mprof plot
</pre></div>
<p>As a result, I received the following graph of memory consumption:</p>
<p><img alt="export articles profiling" src="/media/2017/6/export_articles_without_iterator.png"/></p>
<p>Command utilizes ~250Mb of memory because <code>QuerySet</code> receives all articles from DB at once and caches them to the memory
in order to use it in the next accesses to <code>QuerySet</code>. You can reduce memory consumption through use of <code>iterator</code> method.
This method allows to get query results one by one (with the <a href="http://initd.org/psycopg/docs/cursor.html">server-side cursor</a>)
and also it disables caching.</p>
<div class="highlight"><pre><span></span><span class="c1"># ...</span>
<span class="k">for</span> <span class="n">article</span> <span class="ow">in</span> <span class="n">Article</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">select_related</span><span class="p">(</span><span class="s1">'author'</span><span class="p">)</span><span class="o">.</span><span class="n">iterator</span><span class="p">():</span>
<span class="c1"># ...</span>
</pre></div>
<p>This is the result of running the updated command in the memory profiler:</p>
<p><img alt="export articles profiling" src="/media/2017/6/export_articles_with_iterator.png"/></p>
<p>Now the command utilizes only 50Mb of memory. Also, the pleasant side-effect is that memory utilization almost constant
for any amount of articles. Those are results for ~200Mb of articles (without and with the <code>iterator</code>):</p>
<p><img alt="huge export articles profiling" src="/media/2017/6/export_articles_huge_before_and_after.png"/></p>
<h2 id="foreign-keys">Foreign keys</h2>
<p>Now we have to add Django admin action to make a copy of the article: </p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">clone_article</span><span class="p">(</span><span class="n">modeladmin</span><span class="p">,</span> <span class="n">request</span><span class="p">,</span> <span class="n">queryset</span><span class="p">):</span>
<span class="k">if</span> <span class="n">queryset</span><span class="o">.</span><span class="n">count</span><span class="p">()</span> <span class="o">!=</span> <span class="mi">1</span><span class="p">:</span>
<span class="n">modeladmin</span><span class="o">.</span><span class="n">message_user</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="s2">"You could clone only one article at a time."</span><span class="p">,</span> <span class="n">level</span><span class="o">=</span><span class="n">messages</span><span class="o">.</span><span class="n">ERROR</span><span class="p">)</span>
<span class="k">return</span>
<span class="n">origin_article</span> <span class="o">=</span> <span class="n">queryset</span><span class="o">.</span><span class="n">first</span><span class="p">()</span>
<span class="n">cloned_article</span> <span class="o">=</span> <span class="n">Article</span><span class="p">(</span>
<span class="n">title</span><span class="o">=</span><span class="s2">"{} (COPY)"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">origin_article</span><span class="o">.</span><span class="n">title</span><span class="p">),</span>
<span class="n">content</span><span class="o">=</span><span class="n">origin_article</span><span class="o">.</span><span class="n">content</span><span class="p">,</span>
<span class="n">created_at</span><span class="o">=</span><span class="n">origin_article</span><span class="o">.</span><span class="n">created_at</span><span class="p">,</span>
<span class="n">author</span><span class="o">=</span><span class="n">origin_article</span><span class="o">.</span><span class="n">author</span><span class="p">,</span>
<span class="n">comments_on</span><span class="o">=</span><span class="n">origin_article</span><span class="o">.</span><span class="n">comments_on</span><span class="p">)</span>
<span class="n">cloned_article</span><span class="o">.</span><span class="n">save</span><span class="p">()</span>
<span class="n">cloned_article</span><span class="o">.</span><span class="n">tags</span> <span class="o">=</span> <span class="n">origin_article</span><span class="o">.</span><span class="n">tags</span><span class="o">.</span><span class="n">all</span><span class="p">()</span>
<span class="n">modeladmin</span><span class="o">.</span><span class="n">message_user</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="s2">"Article successfully cloned"</span><span class="p">,</span> <span class="n">level</span><span class="o">=</span><span class="n">messages</span><span class="o">.</span><span class="n">SUCCESS</span><span class="p">)</span>
<span class="n">clone_article</span><span class="o">.</span><span class="n">short_description</span> <span class="o">=</span> <span class="s1">'Clone article'</span>
</pre></div>
<p>SQL logs:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">001</span><span class="p">)</span> <span class="k">SELECT</span> <span class="k">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span> <span class="k">AS</span> <span class="ss">"__count"</span> <span class="k">FROM</span> <span class="ss">"blog_article"</span> <span class="k">WHERE</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"id"</span> <span class="k">IN</span> <span class="p">(</span><span class="mi">31582</span><span class="p">);</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="mi">31582</span><span class="p">,)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">001</span><span class="p">)</span> <span class="k">SELECT</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"title"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"content"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"created_at"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"author_id"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"comments_on"</span> <span class="k">FROM</span> <span class="ss">"blog_article"</span> <span class="k">WHERE</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"id"</span> <span class="k">IN</span> <span class="p">(</span><span class="mi">31582</span><span class="p">)</span> <span class="k">ORDER</span> <span class="k">BY</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"created_at"</span> <span class="k">DESC</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"id"</span> <span class="k">DESC</span> <span class="k">LIMIT</span> <span class="mi">1</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="mi">31582</span><span class="p">,)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">000</span><span class="p">)</span> <span class="k">SELECT</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"username"</span><span class="p">,</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"email"</span><span class="p">,</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"bio"</span> <span class="k">FROM</span> <span class="ss">"blog_author"</span> <span class="k">WHERE</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"id"</span> <span class="o">=</span> <span class="mi">2156</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="mi">2156</span><span class="p">,)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">001</span><span class="p">)</span> <span class="k">INSERT</span> <span class="k">INTO</span> <span class="ss">"blog_article"</span> <span class="p">(</span><span class="ss">"title"</span><span class="p">,</span> <span class="ss">"content"</span><span class="p">,</span> <span class="ss">"created_at"</span><span class="p">,</span> <span class="ss">"author_id"</span><span class="p">,</span> <span class="ss">"comments_on"</span><span class="p">)</span> <span class="k">VALUES</span> <span class="p">(</span><span class="s1">'Explicabo maiores nobis cum vel fugit. (COPY)'</span><span class="p">,</span> <span class="p">...</span>
</pre></div>
<p>For some reason, author's data is also fetched from DB, but we don't need any information about author besides his/her id
(that already is in the article as a foreign key). To fix this you need to refer directly to a foreign key through
<code>origin_article.author_id</code>. I rewrote cloned object population as follows:</p>
<div class="highlight"><pre><span></span><span class="n">cloned_article</span> <span class="o">=</span> <span class="n">Article</span><span class="p">(</span>
<span class="n">title</span><span class="o">=</span><span class="s2">"{} (COPY)"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">origin_article</span><span class="o">.</span><span class="n">title</span><span class="p">),</span>
<span class="n">content</span><span class="o">=</span><span class="n">origin_article</span><span class="o">.</span><span class="n">content</span><span class="p">,</span>
<span class="n">created_at</span><span class="o">=</span><span class="n">origin_article</span><span class="o">.</span><span class="n">created_at</span><span class="p">,</span>
<span class="n">author_id</span><span class="o">=</span><span class="n">origin_article</span><span class="o">.</span><span class="n">author_id</span><span class="p">,</span>
<span class="n">comments_on</span><span class="o">=</span><span class="n">origin_article</span><span class="o">.</span><span class="n">comments_on</span><span class="p">)</span>
</pre></div>
<p>And there is no author related request in logs.</p>
<h2 id="retrieving-of-related-objects">Retrieving of related objects</h2>
<p>It's time to make our articles public. I will begin with simple articles list page. Let's build view:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">ArticlesListView</span><span class="p">(</span><span class="n">ListView</span><span class="p">):</span>
<span class="n">template_name</span> <span class="o">=</span> <span class="s1">'blog/articles_list.html'</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">Article</span>
<span class="n">context_object_name</span> <span class="o">=</span> <span class="s1">'articles'</span>
<span class="n">paginate_by</span> <span class="o">=</span> <span class="mi">20</span>
</pre></div>
<p>There is information about an article, author and tags in the template:</p>
<div class="highlight"><pre><span></span><span class="x"><article></span>
<span class="x"> <h2></span><span class="cp">{{</span> <span class="nv">article.title</span> <span class="cp">}}</span><span class="x"></h2></span>
<span class="x"> <time></span><span class="cp">{{</span> <span class="nv">article.created_at</span> <span class="cp">}}</span><span class="x"></time></span>
<span class="x"> <p>Author: </span><span class="cp">{{</span> <span class="nv">article.author.username</span> <span class="cp">}}</span><span class="x"></p></span>
<span class="x"> <p>Tags:</span>
<span class="x"> </span><span class="cp">{%</span> <span class="k">for</span> <span class="nv">tag</span> <span class="k">in</span> <span class="nv">article.tags.all</span> <span class="cp">%}</span><span class="x"></span>
<span class="x"> </span><span class="cp">{{</span> <span class="nv">tag</span> <span class="cp">}}{%</span> <span class="k">if</span> <span class="k">not</span> <span class="nb">forloop</span><span class="nv">.last</span> <span class="cp">%}</span><span class="x">, </span><span class="cp">{%</span> <span class="k">endif</span> <span class="cp">%}</span><span class="x"></span>
<span class="x"> </span><span class="cp">{%</span> <span class="k">endfor</span> <span class="cp">%}</span><span class="x"></span>
<span class="x"></article></span>
</pre></div>
<p>DDT shows us that this page generates 45 SQL request as follows:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">002</span><span class="p">)</span> <span class="k">SELECT</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"title"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"content"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"created_at"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"author_id"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"comments_on"</span> <span class="k">FROM</span> <span class="ss">"blog_article"</span> <span class="k">LIMIT</span> <span class="mi">20</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">()</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">001</span><span class="p">)</span> <span class="k">SELECT</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"username"</span><span class="p">,</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"email"</span><span class="p">,</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"bio"</span> <span class="k">FROM</span> <span class="ss">"blog_author"</span> <span class="k">WHERE</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"id"</span> <span class="o">=</span> <span class="mi">2043</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="mi">2043</span><span class="p">,)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">001</span><span class="p">)</span> <span class="k">SELECT</span> <span class="ss">"blog_tag"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span> <span class="ss">"blog_tag"</span><span class="p">.</span><span class="ss">"name"</span> <span class="k">FROM</span> <span class="ss">"blog_tag"</span> <span class="k">INNER</span> <span class="k">JOIN</span> <span class="ss">"blog_article_tags"</span> <span class="k">ON</span> <span class="p">(</span><span class="ss">"blog_tag"</span><span class="p">.</span><span class="ss">"id"</span> <span class="o">=</span> <span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"tag_id"</span><span class="p">)</span> <span class="k">WHERE</span> <span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"article_id"</span> <span class="o">=</span> <span class="mi">20425</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="mi">20425</span><span class="p">,)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">000</span><span class="p">)</span> <span class="k">SELECT</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"username"</span><span class="p">,</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"email"</span><span class="p">,</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"bio"</span> <span class="k">FROM</span> <span class="ss">"blog_author"</span> <span class="k">WHERE</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"id"</span> <span class="o">=</span> <span class="mi">2043</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="mi">2043</span><span class="p">,)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">001</span><span class="p">)</span> <span class="k">SELECT</span> <span class="ss">"blog_tag"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span> <span class="ss">"blog_tag"</span><span class="p">.</span><span class="ss">"name"</span> <span class="k">FROM</span> <span class="ss">"blog_tag"</span> <span class="k">INNER</span> <span class="k">JOIN</span> <span class="ss">"blog_article_tags"</span> <span class="k">ON</span> <span class="p">(</span><span class="ss">"blog_tag"</span><span class="p">.</span><span class="ss">"id"</span> <span class="o">=</span> <span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"tag_id"</span><span class="p">)</span> <span class="k">WHERE</span> <span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"article_id"</span> <span class="o">=</span> <span class="mi">20426</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="mi">20426</span><span class="p">,)</span>
</pre></div>
<p>Primarily we receive all articles (with considering of pagination). Then authors and tags are obtained for each article apart.
Our goal is to make Django get all this related data in minimum possible amount of database requests.</p>
<p>Let's begin with authors. To make <code>QuerySet</code> retrieve related data by foreign keys we need to use the <code>select_related</code> method. I updated the <code>queryset</code> in view as follows:</p>
<div class="highlight"><pre><span></span><span class="n">queryset</span> <span class="o">=</span> <span class="n">Article</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">select_related</span><span class="p">(</span><span class="s1">'author'</span><span class="p">)</span>
</pre></div>
<p>After that DDT shows us that the amount of SQL requests is reduced to 25. That happens because data about articles and
authors data is fetched now by a single <code>JOIN</code> SQL request:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">004</span><span class="p">)</span> <span class="k">SELECT</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"title"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"content"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"created_at"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"author_id"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"comments_on"</span><span class="p">,</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"username"</span><span class="p">,</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"email"</span><span class="p">,</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"bio"</span> <span class="k">FROM</span> <span class="ss">"blog_article"</span> <span class="k">INNER</span> <span class="k">JOIN</span> <span class="ss">"blog_author"</span> <span class="k">ON</span> <span class="p">(</span><span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"author_id"</span> <span class="o">=</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">)</span> <span class="k">LIMIT</span> <span class="mi">21</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">()</span>
</pre></div>
<p>The <code>select_realted</code> method works only with foreign keys in the current model. To reduce an amount of requests while fetching
multiple related objects (like tags in our example), we need to use the <code>prefetch_related</code> method. The updated <code>queryset</code>
looks like this:</p>
<div class="highlight"><pre><span></span><span class="n">queryset</span> <span class="o">=</span> <span class="n">Article</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">select_related</span><span class="p">(</span><span class="s1">'author'</span><span class="p">)</span><span class="o">.</span><span class="n">prefetch_related</span><span class="p">(</span><span class="s1">'tags'</span><span class="p">)</span>
</pre></div>
<p>Now DDT shows only 7 requests. Only 2 of them are responsible for displaying articles list:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">002</span><span class="p">)</span> <span class="k">SELECT</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"title"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"content"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"created_at"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"author_id"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"comments_on"</span><span class="p">,</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"username"</span><span class="p">,</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"email"</span><span class="p">,</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"bio"</span> <span class="k">FROM</span> <span class="ss">"blog_article"</span> <span class="k">INNER</span> <span class="k">JOIN</span> <span class="ss">"blog_author"</span> <span class="k">ON</span> <span class="p">(</span><span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"author_id"</span> <span class="o">=</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">)</span> <span class="k">LIMIT</span> <span class="mi">20</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">()</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">001</span><span class="p">)</span> <span class="k">SELECT</span> <span class="p">(</span><span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"article_id"</span><span class="p">)</span> <span class="k">AS</span> <span class="ss">"_prefetch_related_val_article_id"</span><span class="p">,</span> <span class="ss">"blog_tag"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span> <span class="ss">"blog_tag"</span><span class="p">.</span><span class="ss">"name"</span> <span class="k">FROM</span> <span class="ss">"blog_tag"</span> <span class="k">INNER</span> <span class="k">JOIN</span> <span class="ss">"blog_article_tags"</span> <span class="k">ON</span> <span class="p">(</span><span class="ss">"blog_tag"</span><span class="p">.</span><span class="ss">"id"</span> <span class="o">=</span> <span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"tag_id"</span><span class="p">)</span> <span class="k">WHERE</span> <span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"article_id"</span> <span class="k">IN</span> <span class="p">(</span><span class="mi">16352</span><span class="p">,</span> <span class="mi">16353</span><span class="p">,</span> <span class="mi">16354</span><span class="p">,</span> <span class="mi">16355</span><span class="p">,</span> <span class="mi">16356</span><span class="p">,</span> <span class="mi">16357</span><span class="p">,</span> <span class="mi">16358</span><span class="p">,</span> <span class="mi">16359</span><span class="p">,</span> <span class="mi">16360</span><span class="p">,</span> <span class="mi">16361</span><span class="p">,</span> <span class="mi">16362</span><span class="p">,</span> <span class="mi">16363</span><span class="p">,</span> <span class="mi">16344</span><span class="p">,</span> <span class="mi">16345</span><span class="p">,</span> <span class="mi">16346</span><span class="p">,</span> <span class="mi">16347</span><span class="p">,</span> <span class="mi">16348</span><span class="p">,</span> <span class="mi">16349</span><span class="p">,</span> <span class="mi">16350</span><span class="p">,</span> <span class="mi">16351</span><span class="p">);</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="mi">16352</span><span class="p">,</span> <span class="mi">16353</span><span class="p">,</span> <span class="mi">16354</span><span class="p">,</span> <span class="mi">16355</span><span class="p">,</span> <span class="mi">16356</span><span class="p">,</span> <span class="mi">16357</span><span class="p">,</span> <span class="mi">16358</span><span class="p">,</span> <span class="mi">16359</span><span class="p">,</span> <span class="mi">16360</span><span class="p">,</span> <span class="mi">16361</span><span class="p">,</span> <span class="mi">16362</span><span class="p">,</span> <span class="mi">16363</span><span class="p">,</span> <span class="mi">16344</span><span class="p">,</span> <span class="mi">16345</span><span class="p">,</span> <span class="mi">16346</span><span class="p">,</span> <span class="mi">16347</span><span class="p">,</span> <span class="mi">16348</span><span class="p">,</span> <span class="mi">16349</span><span class="p">,</span> <span class="mi">16350</span><span class="p">,</span> <span class="mi">16351</span><span class="p">)</span>
</pre></div>
<blockquote>
<p>You must use <code>select_related</code> to retrieve objects by a foreign key in the current model. To retrieve M2M objects
and objects from other models that refer to the current one you should use <code>prefetch_related</code>.</p>
<p>Also, you can use <code>prefetch_related</code> to fetch related objects of arbitrary nesting levels.</p>
<p><code>Tag.objects.all().prefetch_related('article_set__author')</code></p>
<p>This code will fetch all articles and corresponding authors with tag.</p>
</blockquote>
<h2 id="defer-fields-retrieving">Defer fields retrieving</h2>
<p>If you look closer to the previous example you can see that we retrieve more fields than we needed.
This is the result of the request in DDT:</p>
<p><img alt="SQL query result for articles list" src="/media/2017/6/sql-queries-results.png"/></p>
<p>This SQL request retrieves all fields of article and author, including a potentially huge text of an article. You can
significantly reduce an amount of transferring data with <code>defer</code> method. This method defers retrieving of given fields.
In case of some code tries to access deferred field it will be retrieved in separate SQL request on-demand. Let's add
<code>defer</code> invocation to the <code>queryset</code>:</p>
<div class="highlight"><pre><span></span><span class="n">queryset</span> <span class="o">=</span> <span class="n">Article</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">select_related</span><span class="p">(</span><span class="s1">'author'</span><span class="p">)</span><span class="o">.</span><span class="n">prefetch_related</span><span class="p">(</span><span class="s1">'tags'</span><span class="p">)</span><span class="o">.</span><span class="n">defer</span><span class="p">(</span><span class="s1">'content'</span><span class="p">,</span> <span class="s1">'comments_on'</span><span class="p">)</span>
</pre></div>
<p>Now Django don't retrieve unneeded fields and this reduces the time of request processing (before and after <code>defer</code>):</p>
<p><img alt="DDT - SQL speedup after defer" src="/media/2017/6/sql-speedup-defer.png"/></p>
<p>But this request still fetches more data than we need. We receive all author fields. It would be easier to give a list
of fields that we actually need. We can use 'only' method to define required fields, other fields will be deferred:</p>
<div class="highlight"><pre><span></span><span class="n">queryset</span> <span class="o">=</span> <span class="n">Article</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">select_related</span><span class="p">(</span><span class="s1">'author'</span><span class="p">)</span><span class="o">.</span><span class="n">prefetch_related</span><span class="p">(</span><span class="s1">'tags'</span><span class="p">)</span><span class="o">.</span><span class="n">only</span><span class="p">(</span>
<span class="s1">'title'</span><span class="p">,</span> <span class="s1">'created_at'</span><span class="p">,</span> <span class="s1">'author__username'</span><span class="p">,</span> <span class="s1">'tags__name'</span><span class="p">)</span>
</pre></div>
<p>As a result, we receive only data we need:</p>
<p><img alt="DDT - SQL after only" src="/media/2017/6/sql-after-only.png"/></p>
<p><code>defer</code> and <code>only</code> perform the same task - limiting fetched fields in requests. Differences between this methods are:</p>
<ul>
<li><code>defer</code> defers only specified fields,</li>
<li><code>only</code> defers all fields except specified.</li>
</ul>
<h2 id="database-indexes">Database indexes</h2>
<p>Now we decided to create an author page, that should be accessible by URL like this: <code>/authors/<username></code>.</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">author_page_view</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="n">username</span><span class="p">):</span>
<span class="n">author</span> <span class="o">=</span> <span class="n">get_object_or_404</span><span class="p">(</span><span class="n">Author</span><span class="p">,</span> <span class="n">username</span><span class="o">=</span><span class="n">username</span><span class="p">)</span>
<span class="k">return</span> <span class="n">render</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="s1">'blog/author.html'</span><span class="p">,</span> <span class="n">context</span><span class="o">=</span><span class="nb">dict</span><span class="p">(</span><span class="n">author</span><span class="o">=</span><span class="n">author</span><span class="p">))</span>
</pre></div>
<p>This code works pretty fast on a small amount of data. But if an amount of data is big and continues to grow, performance
inevitably falls. That's because DBMS has to scan the entire table to find a row by <code>username</code> field. A better
approach is to use database indexes. They allow DBMS to search data much faster. For adding an index to the field you should
add the <code>db_index=True</code> argument to the corresponding model field. After that, you should make and execute the migration.</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Author</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">username</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span><span class="n">max_length</span><span class="o">=</span><span class="mi">64</span><span class="p">,</span> <span class="n">db_index</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="c1"># ...</span>
</pre></div>
<p>Let's compare the performance before and after we added the index on a database that contains 100K of authors:</p>
<p>Without index:</p>
<p><img alt="select by username without index" src="/media/2017/6/ddt-select-by-username-without-index.png"/></p>
<p>With index:</p>
<p><img alt="select by username with index" src="/media/2017/6/ddt-select-by-username-with-index.png"/></p>
<p>The request is 16x times faster now!</p>
<blockquote>
<p>Indexes are useful not only for data filtration. They speed up sortation as well. Also, many of DBMS provide
multi-field indexes to speed up filtration and sorting by several fields. You should read a documentation to your
DBMS for details.</p>
</blockquote>
<h2 id="lenqs-vs-qscount">len(qs) vs qs.count</h2>
<p>For some reason, we decided to display a counter of authors on the articles list page. Let's update the view:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">ArticlesListView</span><span class="p">(</span><span class="n">ListView</span><span class="p">):</span>
<span class="c1"># ...</span>
<span class="k">def</span> <span class="nf">get_context_data</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="n">context</span> <span class="o">=</span> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="n">get_context_data</span><span class="p">(</span><span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
<span class="n">context</span><span class="p">[</span><span class="s1">'authors_count'</span><span class="p">]</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">Author</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">all</span><span class="p">())</span>
<span class="k">return</span> <span class="n">context</span>
</pre></div>
<p>This code generates following SQL request:</p>
<p><img alt="DDT - len(qs)" src="/media/2017/6/ddt-authors-len-queryset.png"/></p>
<p>On the screenshot, you can see that we fetch all authors from a database. Therefore counting is performed by the Python
code in view. The optimal approach is to retrieve only the number of authors from the database. We can use <code>count</code>
method for this:</p>
<div class="highlight"><pre><span></span> <span class="n">context</span><span class="p">[</span><span class="s1">'authors_count'</span><span class="p">]</span> <span class="o">=</span> <span class="n">Author</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">count</span><span class="p">()</span>
</pre></div>
<p>This is a request generated by the updated code:</p>
<p><img alt="DDT - len(qs)" src="/media/2017/6/ddt-authors-count.png"/></p>
<p>Now Django generates a much more optimal request for our task.</p>
<h2 id="count-vs-exists">count vs exists</h2>
<p>We need to display a link to author's articles on the author's page, but only if he has any. One possible solution is to retrieve
a count of articles and compare it if it is more than 0. Like this:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">author_page_view</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="n">username</span><span class="p">):</span>
<span class="n">author</span> <span class="o">=</span> <span class="n">get_object_or_404</span><span class="p">(</span><span class="n">Author</span><span class="p">,</span> <span class="n">username</span><span class="o">=</span><span class="n">username</span><span class="p">)</span>
<span class="n">show_articles_link</span> <span class="o">=</span> <span class="p">(</span><span class="n">author</span><span class="o">.</span><span class="n">articles</span><span class="o">.</span><span class="n">count</span><span class="p">()</span> <span class="o">></span> <span class="mi">0</span><span class="p">)</span>
<span class="k">return</span> <span class="n">render</span><span class="p">(</span>
<span class="n">request</span><span class="p">,</span> <span class="s1">'blog/author.html'</span><span class="p">,</span>
<span class="n">context</span><span class="o">=</span><span class="nb">dict</span><span class="p">(</span><span class="n">author</span><span class="o">=</span><span class="n">author</span><span class="p">,</span> <span class="n">show_articles_link</span><span class="o">=</span><span class="n">show_articles_link</span><span class="p">))</span>
</pre></div>
<p>But if we have a huge amount of articles this code will work slowly. Since we don't need to know the exact amount of articles,
we could use <code>exists</code> method, that checks if <code>QuerySet</code> has at least one result.</p>
<div class="highlight"><pre><span></span> <span class="c1"># ...</span>
<span class="n">show_articles_link</span> <span class="o">=</span> <span class="n">author</span><span class="o">.</span><span class="n">articles</span><span class="o">.</span><span class="n">exists</span><span class="p">()</span>
<span class="c1"># ...</span>
</pre></div>
<p>Let's compare performance on a large amount of articles (~10K):</p>
<p><img alt="DDT - exists vs count" src="/media/2017/6/ddt-exists-vs-count.png"/></p>
<p>So, we reach the goal with requests that 10x faster.</p>
<h2 id="lazy-queryset">Lazy QuerySet</h2>
<p>Now we want the authors to compete each over. For that we will add a rating of top-20 authors by articles count.</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">ArticlesListView</span><span class="p">(</span><span class="n">ListView</span><span class="p">):</span>
<span class="c1"># ...</span>
<span class="k">def</span> <span class="nf">get_context_data</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="c1"># ...</span>
<span class="n">context</span><span class="p">[</span><span class="s1">'top_authors'</span><span class="p">]</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span>
<span class="n">Author</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">order_by</span><span class="p">(</span><span class="s1">'-articles_count'</span><span class="p">))[:</span><span class="mi">20</span><span class="p">]</span>
<span class="c1"># ...</span>
</pre></div>
<p>Here we retrieve the list of all authors sorted by articles count and slice first 20 from it. <code>articles</code> count in this
example is a denormalized field with a count of articles of the current user. In a real project, you would probably want
to add signals to update this field on data changes.</p>
<p>I think it's clear that this approach is not ideal. DDT confirms this:</p>
<p><img alt="DDT - get top authors slice" src="/media/2017/6/ddt-top-authors-list.png"/></p>
<p>Of course, we need to receive already truncated list of authors from the database. For that you need to understand that
<code>QuerySet</code> tries to defer hitting the database as far as possible. <code>QuerySet</code> hits database in the following cases:</p>
<ul>
<li>iteration (i.e., <code>for obj in Model.objects.all():</code>),</li>
<li>slicing with specified step (i.e., <code>Model.objects.all()[::2]</code>),</li>
<li>call of <code>len</code> (i.e., <code>len(Model.objects.all())</code>,</li>
<li>call of <code>list</code> (i.e., <code>list(Model.objects.all())</code>,</li>
<li>call of <code>bool</code> (i.e., <code>bool(Model.objects.all())</code>,</li>
<li>serialization with <a href="https://docs.python.org/3/library/pickle.html">pickle</a>.</li>
</ul>
<p>Therefore by calling <code>list</code> we forced <code>QuerySet</code> to hit database and return a list of objects. Slicing was performed
on the list, not on <code>QuerySet</code>. To limit authors in a SQL request we should apply slicing to <code>QuerySet</code> itself:</p>
<div class="highlight"><pre><span></span><span class="n">context</span><span class="p">[</span><span class="s1">'top_authors'</span><span class="p">]</span> <span class="o">=</span> <span class="n">Author</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">order_by</span><span class="p">(</span><span class="s1">'-articles_count'</span><span class="p">)[:</span><span class="mi">20</span><span class="p">]</span>
</pre></div>
<p><img alt="DDT - get top authors slice on queryset" src="/media/2017/6/ddt-top-authors-qs-slice.png"/></p>
<p>As you can see, a size of a fetch is limited in the request: <code>...LIMIT 20</code>. Also, DDT shows that <code>QuerySet</code> deferred
hitting the DB until template rendering.</p>Оптимизация производительности Django проектов (часть 2)2017-06-27T16:00:00+03:002017-06-27T16:00:00+03:00Yuri Shikanovtag:None,2017-06-27:/../ru/django-project-optimization-part-2/<p>Остальные статьи цикла:</p>
<ul>
<li><a href="/ru/django-project-optimization-part-1/">Часть 1. Профилирование и настройки Django</a></li>
<li>Часть 2. Работа с базой данных</li>
<li><a href="/ru/django-project-optimization-part-3/">Часть 3. Кэширование</a></li>
</ul>
<p>Содержание:</p>
<ul>
<li><a href="#massovye-izmeneniia">Массовые изменения</a><ul>
<li><a href="#massovaia-vstavka">Массовая вставка</a></li>
<li><a href="#massovaia-vstavka-m2m">Массовая вставка M2M</a></li>
<li><a href="#massovoe-izmenenie">Массовое изменение</a></li>
<li><a href="#massovoe-udalenie-obektov">Массовое удаление объектов</a></li>
</ul>
</li>
<li><a href="#iterator_1">Iterator</a></li>
<li><a href="#ispolzovanie-vneshnikh-kliuchei">Использование внешних ключей</a></li>
<li><a href="#poluchenie-sviazannykh-obektov">Получение связанных объектов</a></li>
<li><a href="#ogranichenie-polei-v-vyborkakh">Ограничение полей в выборках</a></li>
<li><a href="#indeksy-bd">Индексы БД</a></li>
<li><a href="#lenqs-vs-qscount">len(qs) vs qs.count</a></li>
<li><a href="#count-vs-exists">count …</a></li></ul><p>Остальные статьи цикла:</p>
<ul>
<li><a href="/ru/django-project-optimization-part-1/">Часть 1. Профилирование и настройки Django</a></li>
<li>Часть 2. Работа с базой данных</li>
<li><a href="/ru/django-project-optimization-part-3/">Часть 3. Кэширование</a></li>
</ul>
<p>Содержание:</p>
<ul>
<li><a href="#massovye-izmeneniia">Массовые изменения</a><ul>
<li><a href="#massovaia-vstavka">Массовая вставка</a></li>
<li><a href="#massovaia-vstavka-m2m">Массовая вставка M2M</a></li>
<li><a href="#massovoe-izmenenie">Массовое изменение</a></li>
<li><a href="#massovoe-udalenie-obektov">Массовое удаление объектов</a></li>
</ul>
</li>
<li><a href="#iterator_1">Iterator</a></li>
<li><a href="#ispolzovanie-vneshnikh-kliuchei">Использование внешних ключей</a></li>
<li><a href="#poluchenie-sviazannykh-obektov">Получение связанных объектов</a></li>
<li><a href="#ogranichenie-polei-v-vyborkakh">Ограничение полей в выборках</a></li>
<li><a href="#indeksy-bd">Индексы БД</a></li>
<li><a href="#lenqs-vs-qscount">len(qs) vs qs.count</a></li>
<li><a href="#count-vs-exists">count vs exists</a></li>
<li><a href="#lenivyi-queryset">Ленивый QuerySet</a></li>
</ul>
<p>Это продолжение серии статей про оптимизацию Django приложений. Первая часть доступна
<a href="/ru/django-project-optimization-part-1/">здесь</a> и рассказывает о профилировании и настройках Django. В этой части
мы рассмотрим оптимизацию работы с БД (модели Django).</p>
<p>В этой части часто будет использоваться логирование SQL запросов и DDT, про которые написано в первом посте.
В качестве БД во всех примерах будет использоваться PostgreSQL, но для пользователей других СУБД большая часть статьи
также будет актуальна.</p>
<p>Примеры в этой части будут основаны на простом приложении блога, которое мы будем разрабатывать и оптимизировать по
ходу статьи. Начнем с следующих моделей:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">models</span>
<span class="k">class</span> <span class="nc">Tag</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">name</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span><span class="n">max_length</span><span class="o">=</span><span class="mi">64</span><span class="p">)</span>
<span class="k">def</span> <span class="fm">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">name</span>
<span class="k">class</span> <span class="nc">Author</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">username</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span><span class="n">max_length</span><span class="o">=</span><span class="mi">64</span><span class="p">)</span>
<span class="n">email</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">EmailField</span><span class="p">()</span>
<span class="n">bio</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">TextField</span><span class="p">()</span>
<span class="k">def</span> <span class="fm">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">username</span>
<span class="k">class</span> <span class="nc">Article</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">title</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span><span class="n">max_length</span><span class="o">=</span><span class="mi">64</span><span class="p">)</span>
<span class="n">content</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">TextField</span><span class="p">()</span>
<span class="n">created_at</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">DateField</span><span class="p">()</span>
<span class="n">author</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">ForeignKey</span><span class="p">(</span><span class="n">Author</span><span class="p">)</span>
<span class="n">tags</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">ManyToManyField</span><span class="p">(</span><span class="n">Tag</span><span class="p">)</span>
<span class="k">def</span> <span class="fm">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">title</span>
</pre></div>
<p>Весь код доступен на <a href="https://github.com/dizballanze/django-optimization-guide-2-sample/tree/initial">GitHub</a>
с разбивкой по <a href="https://github.com/dizballanze/django-optimization-guide-2-sample/tags">тегам</a>.</p>
<h2 id="massovye-izmeneniia">Массовые изменения</h2>
<h3 id="massovaia-vstavka">Массовая вставка</h3>
<p>Предположим, что наше новое приложение блога заменяет старое приложение и нам нужно перенести данные в новые модели.
Мы экспортировали данные из старого приложения в огромные JSON файлы. Файл с авторами имеет следующий вид:</p>
<div class="highlight"><pre><span></span><span class="p">[</span>
<span class="p">{</span>
<span class="nt">"username"</span><span class="p">:</span> <span class="s2">"mackchristopher"</span><span class="p">,</span>
<span class="nt">"email"</span><span class="p">:</span> <span class="s2">"dcortez@yahoo.com"</span><span class="p">,</span>
<span class="nt">"bio"</span><span class="p">:</span> <span class="s2">"Vitae mollitia in modi suscipit similique. Tempore sunt aliquid porro. Molestias tempora quos corporis quam."</span>
<span class="p">}</span>
<span class="p">]</span>
</pre></div>
<p>Сделаем команду Django для импортирования авторов из JSON файла:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Command</span><span class="p">(</span><span class="n">BaseCommand</span><span class="p">):</span>
<span class="n">help</span> <span class="o">=</span> <span class="s1">'Load authors from `data/old_authors.json`'</span>
<span class="n">DATA_FILE_PATH</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">settings</span><span class="o">.</span><span class="n">BASE_DIR</span><span class="p">,</span> <span class="s1">'..'</span><span class="p">,</span> <span class="s1">'data'</span><span class="p">,</span> <span class="s1">'old_data.json'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">handle</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">DATA_FILE_PATH</span><span class="p">,</span> <span class="s1">'r'</span><span class="p">)</span> <span class="k">as</span> <span class="n">json_file</span><span class="p">:</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">json</span><span class="o">.</span><span class="n">loads</span><span class="p">(</span><span class="n">json_file</span><span class="o">.</span><span class="n">read</span><span class="p">())</span>
<span class="k">for</span> <span class="n">author</span> <span class="ow">in</span> <span class="n">data</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_import_author</span><span class="p">(</span><span class="n">author</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">_import_author</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">author_data</span><span class="p">):</span>
<span class="n">author</span> <span class="o">=</span> <span class="n">Author</span><span class="p">(</span>
<span class="n">username</span><span class="o">=</span><span class="n">author_data</span><span class="p">[</span><span class="s1">'username'</span><span class="p">],</span>
<span class="n">email</span><span class="o">=</span><span class="n">author_data</span><span class="p">[</span><span class="s1">'email'</span><span class="p">],</span>
<span class="n">bio</span><span class="o">=</span><span class="n">author_data</span><span class="p">[</span><span class="s1">'bio'</span><span class="p">])</span>
<span class="n">author</span><span class="o">.</span><span class="n">save</span><span class="p">()</span>
</pre></div>
<p>Проверим сколько SQL запросов выполняется при загрузке 200 авторов. Используем <code>python manage.py shell</code>:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.core.management</span> <span class="kn">import</span> <span class="n">call_command</span>
<span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">connection</span>
<span class="n">call_command</span><span class="p">(</span><span class="s1">'load_data'</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">connection</span><span class="o">.</span><span class="n">queries</span><span class="p">))</span>
</pre></div>
<p>Этот код выведет множество SQL запросов (т.к. у нас включено их логирование), а в последней строке будет число <code>200</code>.
Это означает, что для каждого автора выполняется отдельный <code>INSERT</code> SQL запрос. Если у вас большое количество данных,
то такой подход может быть очень медленным. Воспользуемся методом <code>bulk_create</code> менеджера модели <code>Author</code>:</p>
<div class="highlight"><pre><span></span> <span class="k">def</span> <span class="nf">handle</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">DATA_FILE_PATH</span><span class="p">,</span> <span class="s1">'r'</span><span class="p">)</span> <span class="k">as</span> <span class="n">json_file</span><span class="p">:</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">json</span><span class="o">.</span><span class="n">loads</span><span class="p">(</span><span class="n">json_file</span><span class="o">.</span><span class="n">read</span><span class="p">())</span>
<span class="n">author_instances</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">author</span> <span class="ow">in</span> <span class="n">data</span><span class="p">:</span>
<span class="n">author_instances</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_import_author</span><span class="p">(</span><span class="n">author</span><span class="p">))</span>
<span class="n">Author</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">bulk_create</span><span class="p">(</span><span class="n">author_instances</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">_import_author</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">author_data</span><span class="p">):</span>
<span class="n">author</span> <span class="o">=</span> <span class="n">Author</span><span class="p">(</span>
<span class="n">username</span><span class="o">=</span><span class="n">author_data</span><span class="p">[</span><span class="s1">'username'</span><span class="p">],</span>
<span class="n">email</span><span class="o">=</span><span class="n">author_data</span><span class="p">[</span><span class="s1">'email'</span><span class="p">],</span>
<span class="n">bio</span><span class="o">=</span><span class="n">author_data</span><span class="p">[</span><span class="s1">'bio'</span><span class="p">])</span>
<span class="k">return</span> <span class="n">author</span>
</pre></div>
<p>Запустив команду, описанным выше способом, мы увидим, что был выполнен один огромный запрос к БД, для всех авторов.</p>
<blockquote>
<p>Если вам действительно нужно вставить большой объем данных, возможно, придется разбить вставку на несколько запросов.
Для этого существует параметр <code>batch_size</code> у метода <code>bulk_create</code>, который задает максимальное количество объектов,
которые будут вставлены за один запрос. Т.е. если у нас 200 объектов, задав <code>bulk_size = 50</code> мы получим 4 запроса.</p>
<p>У метода <code>bulk_size</code> есть ряд ограничений с которыми вы можете ознакомиться в <a href="https://docs.djangoproject.com/en/1.11/ref/models/querysets/#bulk-create">документации</a>.</p>
</blockquote>
<h3 id="massovaia-vstavka-m2m">Массовая вставка M2M</h3>
<p>Теперь нам нужно вставить статьи и теги, которые находятся в отдельном JSON файле с следующей структурой:</p>
<div class="highlight"><pre><span></span><span class="p">[</span>
<span class="p">{</span>
<span class="nt">"created_at"</span><span class="p">:</span> <span class="s2">"2016-06-11"</span><span class="p">,</span>
<span class="nt">"author"</span><span class="p">:</span> <span class="s2">"nichole52"</span><span class="p">,</span>
<span class="nt">"tags"</span><span class="p">:</span> <span class="p">[</span>
<span class="s2">"ab"</span><span class="p">,</span>
<span class="s2">"iure"</span><span class="p">,</span>
<span class="s2">"iusto"</span>
<span class="p">],</span>
<span class="nt">"title"</span><span class="p">:</span> <span class="s2">"..."</span><span class="p">,</span>
<span class="nt">"content"</span><span class="p">:</span> <span class="s2">"..."</span>
<span class="p">}</span>
<span class="p">]</span>
</pre></div>
<p>Напишем для этого еще одну команду Django:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Command</span><span class="p">(</span><span class="n">BaseCommand</span><span class="p">):</span>
<span class="n">help</span> <span class="o">=</span> <span class="s1">'Load articles from `data/old_articles.json`'</span>
<span class="n">DATA_FILE_PATH</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">settings</span><span class="o">.</span><span class="n">BASE_DIR</span><span class="p">,</span> <span class="s1">'..'</span><span class="p">,</span> <span class="s1">'data'</span><span class="p">,</span> <span class="s1">'old_articles.json'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">handle</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">DATA_FILE_PATH</span><span class="p">,</span> <span class="s1">'r'</span><span class="p">)</span> <span class="k">as</span> <span class="n">json_file</span><span class="p">:</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">json</span><span class="o">.</span><span class="n">loads</span><span class="p">(</span><span class="n">json_file</span><span class="o">.</span><span class="n">read</span><span class="p">())</span>
<span class="k">for</span> <span class="n">article</span> <span class="ow">in</span> <span class="n">data</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_import_article</span><span class="p">(</span><span class="n">article</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">_import_article</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">article_data</span><span class="p">):</span>
<span class="n">author</span> <span class="o">=</span> <span class="n">Author</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">username</span><span class="o">=</span><span class="n">article_data</span><span class="p">[</span><span class="s1">'author'</span><span class="p">])</span>
<span class="n">article</span> <span class="o">=</span> <span class="n">Article</span><span class="p">(</span>
<span class="n">title</span><span class="o">=</span><span class="n">article_data</span><span class="p">[</span><span class="s1">'title'</span><span class="p">],</span>
<span class="n">content</span><span class="o">=</span><span class="n">article_data</span><span class="p">[</span><span class="s1">'content'</span><span class="p">],</span>
<span class="n">created_at</span><span class="o">=</span><span class="n">article_data</span><span class="p">[</span><span class="s1">'created_at'</span><span class="p">],</span>
<span class="n">author</span><span class="o">=</span><span class="n">author</span><span class="p">)</span>
<span class="n">article</span><span class="o">.</span><span class="n">save</span><span class="p">()</span>
<span class="k">for</span> <span class="n">tag</span> <span class="ow">in</span> <span class="n">article_data</span><span class="p">[</span><span class="s1">'tags'</span><span class="p">]:</span>
<span class="n">tag_instance</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">Tag</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">get_or_create</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="n">tag</span><span class="p">)</span>
<span class="n">article</span><span class="o">.</span><span class="n">tags</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">tag_instance</span><span class="p">)</span>
</pre></div>
<p>Запустив ее я получил 3349 SQL запросов! Многие из которых имели следующий вид:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">001</span><span class="p">)</span> <span class="k">SELECT</span> <span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"tag_id"</span> <span class="k">FROM</span> <span class="ss">"blog_article_tags"</span> <span class="k">WHERE</span> <span class="p">(</span><span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"article_id"</span> <span class="o">=</span> <span class="mi">2319</span> <span class="k">AND</span> <span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"tag_id"</span> <span class="k">IN</span> <span class="p">(</span><span class="mi">67</span><span class="p">));</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="mi">2319</span><span class="p">,</span> <span class="mi">67</span><span class="p">)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">000</span><span class="p">)</span> <span class="k">INSERT</span> <span class="k">INTO</span> <span class="ss">"blog_article_tags"</span> <span class="p">(</span><span class="ss">"article_id"</span><span class="p">,</span> <span class="ss">"tag_id"</span><span class="p">)</span> <span class="k">VALUES</span> <span class="p">(</span><span class="mi">2319</span><span class="p">,</span> <span class="mi">67</span><span class="p">)</span> <span class="n">RETURNING</span> <span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="mi">2319</span><span class="p">,</span> <span class="mi">67</span><span class="p">)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">000</span><span class="p">)</span> <span class="k">SELECT</span> <span class="ss">"blog_tag"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span> <span class="ss">"blog_tag"</span><span class="p">.</span><span class="ss">"name"</span> <span class="k">FROM</span> <span class="ss">"blog_tag"</span> <span class="k">WHERE</span> <span class="ss">"blog_tag"</span><span class="p">.</span><span class="ss">"name"</span> <span class="o">=</span> <span class="s1">'fugiat'</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="s1">'fugiat'</span><span class="p">,)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">001</span><span class="p">)</span> <span class="k">SELECT</span> <span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"tag_id"</span> <span class="k">FROM</span> <span class="ss">"blog_article_tags"</span> <span class="k">WHERE</span> <span class="p">(</span><span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"article_id"</span> <span class="o">=</span> <span class="mi">2319</span> <span class="k">AND</span> <span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"tag_id"</span> <span class="k">IN</span> <span class="p">(</span><span class="mi">68</span><span class="p">));</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="mi">2319</span><span class="p">,</span> <span class="mi">68</span><span class="p">)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">000</span><span class="p">)</span> <span class="k">INSERT</span> <span class="k">INTO</span> <span class="ss">"blog_article_tags"</span> <span class="p">(</span><span class="ss">"article_id"</span><span class="p">,</span> <span class="ss">"tag_id"</span><span class="p">)</span> <span class="k">VALUES</span> <span class="p">(</span><span class="mi">2319</span><span class="p">,</span> <span class="mi">68</span><span class="p">)</span> <span class="n">RETURNING</span> <span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="mi">2319</span><span class="p">,</span> <span class="mi">68</span><span class="p">)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">000</span><span class="p">)</span> <span class="k">SELECT</span> <span class="ss">"blog_tag"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span> <span class="ss">"blog_tag"</span><span class="p">.</span><span class="ss">"name"</span> <span class="k">FROM</span> <span class="ss">"blog_tag"</span> <span class="k">WHERE</span> <span class="ss">"blog_tag"</span><span class="p">.</span><span class="ss">"name"</span> <span class="o">=</span> <span class="s1">'repellat'</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="s1">'repellat'</span><span class="p">,)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">001</span><span class="p">)</span> <span class="k">SELECT</span> <span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"tag_id"</span> <span class="k">FROM</span> <span class="ss">"blog_article_tags"</span> <span class="k">WHERE</span> <span class="p">(</span><span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"article_id"</span> <span class="o">=</span> <span class="mi">2319</span> <span class="k">AND</span> <span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"tag_id"</span> <span class="k">IN</span> <span class="p">(</span><span class="mi">58</span><span class="p">));</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="mi">2319</span><span class="p">,</span> <span class="mi">58</span><span class="p">)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">000</span><span class="p">)</span> <span class="k">INSERT</span> <span class="k">INTO</span> <span class="ss">"blog_article_tags"</span> <span class="p">(</span><span class="ss">"article_id"</span><span class="p">,</span> <span class="ss">"tag_id"</span><span class="p">)</span> <span class="k">VALUES</span> <span class="p">(</span><span class="mi">2319</span><span class="p">,</span> <span class="mi">58</span><span class="p">)</span> <span class="n">RETURNING</span> <span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="mi">2319</span><span class="p">,</span> <span class="mi">58</span>
</pre></div>
<p>Добавление каждого тега к статье выполняется отдельным запросом. Это можно улучшить передавая методу <code>article.tags.add</code>
сразу список тегов:</p>
<div class="highlight"><pre><span></span> <span class="k">def</span> <span class="nf">_import_article</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">article_data</span><span class="p">):</span>
<span class="c1"># ...</span>
<span class="n">tags</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">tag</span> <span class="ow">in</span> <span class="n">article_data</span><span class="p">[</span><span class="s1">'tags'</span><span class="p">]:</span>
<span class="n">tag_instance</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">Tag</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">get_or_create</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="n">tag</span><span class="p">)</span>
<span class="n">tags</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">tag_instance</span><span class="p">)</span>
<span class="n">article</span><span class="o">.</span><span class="n">tags</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="o">*</span><span class="n">tags</span><span class="p">)</span>
</pre></div>
<p>Этот вариант отправляет 1834 запроса, почти в 2 раза меньше, неплохой результат, учитывая что мы изменили всего лишь
пару строк кода.</p>
<h3 id="massovoe-izmenenie">Массовое изменение</h3>
<p>После переноса данных пришла идея, что к старым статьям (раньше 2012 года) нужно запретить комментирование. Для этого
было добавлено логическое поле <code>comments_on</code> к модели <code>Article</code> и нам необходимо проставить его значение:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">connection</span>
<span class="kn">from</span> <span class="nn">blog.models</span> <span class="kn">import</span> <span class="n">Article</span>
<span class="k">for</span> <span class="n">article</span> <span class="ow">in</span> <span class="n">Article</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">created_at__year__lt</span><span class="o">=</span><span class="mi">2012</span><span class="p">):</span>
<span class="n">article</span><span class="o">.</span><span class="n">comments_on</span> <span class="o">=</span> <span class="bp">False</span>
<span class="n">article</span><span class="o">.</span><span class="n">save</span><span class="p">()</span>
<span class="k">print</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">connection</span><span class="o">.</span><span class="n">queries</span><span class="p">))</span>
</pre></div>
<p>Запустив этот код я получил 179 запросов следующего вида:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">000</span><span class="p">)</span> <span class="k">UPDATE</span> <span class="ss">"blog_article"</span> <span class="k">SET</span> <span class="ss">"title"</span> <span class="o">=</span> <span class="s1">'Saepe eius facere magni et eligendi minima sint.'</span><span class="p">,</span> <span class="ss">"content"</span> <span class="o">=</span> <span class="s1">'...'</span><span class="p">,</span> <span class="ss">"created_at"</span> <span class="o">=</span> <span class="s1">'1992-03-01'</span><span class="p">::</span><span class="nb">date</span><span class="p">,</span> <span class="ss">"author_id"</span> <span class="o">=</span> <span class="mi">730</span><span class="p">,</span> <span class="ss">"comments_on"</span> <span class="o">=</span> <span class="k">false</span> <span class="k">WHERE</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"id"</span> <span class="o">=</span> <span class="mi">3507</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="s1">'Saepe eius facere magni et eligendi minima sint.'</span><span class="p">,</span> <span class="s1">'...'</span><span class="p">,</span> <span class="n">datetime</span><span class="p">.</span><span class="nb">date</span><span class="p">(</span><span class="mi">1992</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span> <span class="mi">730</span><span class="p">,</span> <span class="k">False</span><span class="p">,</span> <span class="mi">3507</span><span class="p">)</span>
</pre></div>
<p>Кроме того, что для каждой статьи подходящей по условию происходит отдельный SQL запрос, еще и все поля этих статей
перезаписываются. А это может привести к перезаписи изменений сделанных в промежутке между <code>SELECT</code> и <code>UPDATE</code> запросами.
Т.е. кроме проблем с производительностью мы также получаем race condition.</p>
<p>Вместо этого мы можем использовать метод <code>update</code> доступный у объектов <code>QuerySet</code>:</p>
<div class="highlight"><pre><span></span><span class="n">Article</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">created_at__year__lt</span><span class="o">=</span><span class="mi">2012</span><span class="p">)</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="n">comments_on</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
</pre></div>
<p>Этот код генерирует всего один SQL запрос:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">004</span><span class="p">)</span> <span class="k">UPDATE</span> <span class="ss">"blog_article"</span> <span class="k">SET</span> <span class="ss">"comments_on"</span> <span class="o">=</span> <span class="k">false</span> <span class="k">WHERE</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"created_at"</span> <span class="o"><</span> <span class="s1">'2012-01-01'</span><span class="p">::</span><span class="nb">date</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="k">False</span><span class="p">,</span> <span class="n">datetime</span><span class="p">.</span><span class="nb">date</span><span class="p">(</span><span class="mi">2012</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
</pre></div>
<p>Если для изменения полей нужна сложная логика, которую нельзя реализовать полностью в update операторе, можете вычислить
значение поля в Python коде и затем использовать один из следующих вариантов:</p>
<div class="highlight"><pre><span></span><span class="n">Model</span><span class="o">.</span><span class="n">object</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="nb">id</span><span class="o">=</span><span class="n">instance</span><span class="o">.</span><span class="n">id</span><span class="p">)</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="n">field</span><span class="o">=</span><span class="n">computed_value</span><span class="p">)</span>
<span class="c1"># or</span>
<span class="n">instance</span><span class="o">.</span><span class="n">field</span> <span class="o">=</span> <span class="n">computed_value</span>
<span class="n">instance</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">update_fields</span><span class="o">=</span><span class="p">(</span><span class="s1">'fields'</span><span class="p">,))</span>
</pre></div>
<p>Но оба эти варианта также страдают от race condition, хоть и в меньшей степени.</p>
<h3 id="massovoe-udalenie-obektov">Массовое удаление объектов</h3>
<p>Сейчас нам потребовалось удалить все статьи отмеченные тегом <code>minus</code>:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">connection</span>
<span class="kn">from</span> <span class="nn">blog.models</span> <span class="kn">import</span> <span class="n">Article</span>
<span class="k">for</span> <span class="n">article</span> <span class="ow">in</span> <span class="n">Article</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">tags__name</span><span class="o">=</span><span class="s1">'minus'</span><span class="p">):</span>
<span class="n">article</span><span class="o">.</span><span class="n">delete</span><span class="p">()</span>
<span class="k">print</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">connection</span><span class="o">.</span><span class="n">queries</span><span class="p">))</span>
</pre></div>
<p>Код сгенерировал 93 запроса следующего вида:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">000</span><span class="p">)</span> <span class="k">DELETE</span> <span class="k">FROM</span> <span class="ss">"blog_article_tags"</span> <span class="k">WHERE</span> <span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"article_id"</span> <span class="k">IN</span> <span class="p">(</span><span class="mi">3510</span><span class="p">);</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="mi">3510</span><span class="p">,)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">000</span><span class="p">)</span> <span class="k">DELETE</span> <span class="k">FROM</span> <span class="ss">"blog_article"</span> <span class="k">WHERE</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"id"</span> <span class="k">IN</span> <span class="p">(</span><span class="mi">3510</span><span class="p">);</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="mi">3510</span><span class="p">,)</span>
</pre></div>
<p>Сначала удаляется связь статьи с тегом в промежуточной таблице, а затем и сама статья. Мы можем сделать это за
меньшее количество запросов, используя метод <code>delete</code> класса <code>QuerySet</code>:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">connection</span>
<span class="kn">from</span> <span class="nn">blog.models</span> <span class="kn">import</span> <span class="n">Article</span>
<span class="n">Article</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">tags__name</span><span class="o">=</span><span class="s1">'minus'</span><span class="p">)</span><span class="o">.</span><span class="n">delete</span><span class="p">()</span>
<span class="k">print</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">connection</span><span class="o">.</span><span class="n">queries</span><span class="p">))</span>
</pre></div>
<p>Этот код выполняет то же самое всего за 3 запроса к БД:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">004</span><span class="p">)</span> <span class="k">SELECT</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"title"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"content"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"created_at"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"author_id"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"comments_on"</span> <span class="k">FROM</span> <span class="ss">"blog_article"</span> <span class="k">INNER</span> <span class="k">JOIN</span> <span class="ss">"blog_article_tags"</span> <span class="k">ON</span> <span class="p">(</span><span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"id"</span> <span class="o">=</span> <span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"article_id"</span><span class="p">)</span> <span class="k">INNER</span> <span class="k">JOIN</span> <span class="ss">"blog_tag"</span> <span class="k">ON</span> <span class="p">(</span><span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"tag_id"</span> <span class="o">=</span> <span class="ss">"blog_tag"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">)</span> <span class="k">WHERE</span> <span class="ss">"blog_tag"</span><span class="p">.</span><span class="ss">"name"</span> <span class="o">=</span> <span class="s1">'minus'</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="s1">'minus'</span><span class="p">,)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">002</span><span class="p">)</span> <span class="k">DELETE</span> <span class="k">FROM</span> <span class="ss">"blog_article_tags"</span> <span class="k">WHERE</span> <span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"article_id"</span> <span class="k">IN</span> <span class="p">(</span><span class="mi">3713</span><span class="p">,</span> <span class="mi">3717</span><span class="p">,</span> <span class="mi">3722</span><span class="p">,</span> <span class="p">...);</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="mi">3713</span><span class="p">,</span> <span class="mi">3717</span><span class="p">,</span> <span class="mi">3722</span><span class="p">,</span> <span class="p">...)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">001</span><span class="p">)</span> <span class="k">DELETE</span> <span class="k">FROM</span> <span class="ss">"blog_article"</span> <span class="k">WHERE</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"id"</span> <span class="k">IN</span> <span class="p">(</span><span class="mi">3713</span><span class="p">,</span> <span class="mi">3717</span><span class="p">,</span> <span class="p">...);</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="mi">3713</span><span class="p">,</span> <span class="mi">3717</span><span class="p">,</span> <span class="mi">3722</span><span class="p">,</span> <span class="p">...)</span><span class="o">``</span><span class="k">sql</span>
</pre></div>
<p>Сначала одним запросом получается список идентификаторов всех статей, отмеченных тегом <code>minus</code>, затем второй запрос
удаляет связи сразу всех этих статей с тегами, и последний запрос удаляет статьи.</p>
<h2 id="iterator_1">Iterator</h2>
<p>Предположим, нам нужно добавить возможность экспорта статей в CSV формат. Сделаем для этого простую команду Django:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Command</span><span class="p">(</span><span class="n">BaseCommand</span><span class="p">):</span>
<span class="n">help</span> <span class="o">=</span> <span class="s1">'Export articles to csv'</span>
<span class="n">EXPORT_FILE_PATH</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">settings</span><span class="o">.</span><span class="n">BASE_DIR</span><span class="p">,</span> <span class="s1">'..'</span><span class="p">,</span> <span class="s1">'data'</span><span class="p">,</span> <span class="s1">'articles_export.csv'</span><span class="p">)</span>
<span class="n">COLUMNS</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'title'</span><span class="p">,</span> <span class="s1">'content'</span><span class="p">,</span> <span class="s1">'created_at'</span><span class="p">,</span> <span class="s1">'author'</span><span class="p">,</span> <span class="s1">'comments_on'</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">handle</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">EXPORT_FILE_PATH</span><span class="p">,</span> <span class="s1">'w'</span><span class="p">)</span> <span class="k">as</span> <span class="n">export_file</span><span class="p">:</span>
<span class="n">articles_writer</span> <span class="o">=</span> <span class="n">csv</span><span class="o">.</span><span class="n">writer</span><span class="p">(</span><span class="n">export_file</span><span class="p">,</span> <span class="n">delimiter</span><span class="o">=</span><span class="s1">';'</span><span class="p">)</span>
<span class="n">articles_writer</span><span class="o">.</span><span class="n">writerow</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">COLUMNS</span><span class="p">)</span>
<span class="k">for</span> <span class="n">article</span> <span class="ow">in</span> <span class="n">Article</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">select_related</span><span class="p">(</span><span class="s1">'author'</span><span class="p">)</span><span class="o">.</span><span class="n">all</span><span class="p">():</span>
<span class="n">articles_writer</span><span class="o">.</span><span class="n">writerow</span><span class="p">([</span><span class="nb">getattr</span><span class="p">(</span><span class="n">article</span><span class="p">,</span> <span class="n">column</span><span class="p">)</span> <span class="k">for</span> <span class="n">column</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">COLUMNS</span><span class="p">])</span>
</pre></div>
<p>Для тестирования этой команды я сгенерировал около 100Мb статей и загрузил их в БД. Далее я запустил команду через профайлер
памяти <a href="https://pypi.python.org/pypi/memory_profiler">memory_profiler</a>.</p>
<div class="highlight"><pre><span></span>mprof run python manage.py export_articles
mprof plot
</pre></div>
<p>В результате я получил следующий график по использованию памяти:</p>
<p><img alt="export articles profiling" src="/media/2017/6/export_articles_without_iterator.png"/></p>
<p>Команда использует около 250Mb памяти, потому что при выполнении запроса <code>QuerySet</code> получает из БД сразу все статьи и
кэширует их в памяти, чтобы при последующем обращении к этому <code>QuerySet</code> дополнительные запросы не выполнялись.
Мы можем уменьшить объем используемой памяти, используя метод <code>iterator</code> класса <code>QuerySet</code>, который позволяет получать
результаты по одному, используя <a href="http://initd.org/psycopg/docs/cursor.html">server-side cursor</a>, и при этом он отключает
кэширование результатов в <code>QuerySet</code>:</p>
<div class="highlight"><pre><span></span><span class="c1"># ...</span>
<span class="k">for</span> <span class="n">article</span> <span class="ow">in</span> <span class="n">Article</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">select_related</span><span class="p">(</span><span class="s1">'author'</span><span class="p">)</span><span class="o">.</span><span class="n">iterator</span><span class="p">():</span>
<span class="c1"># ...</span>
</pre></div>
<p>Запустив обновленный пример в профайлере я получил следующий результат:</p>
<p><img alt="export articles profiling" src="/media/2017/6/export_articles_with_iterator.png"/></p>
<p>Теперь команда использует всего 50Mb. Также приятным побочным эффектом является то, что при любом размере данных,
при использовании <code>iterator</code>, команда использует постоянный объем памяти. Вот графики для ~200Mb статей
(без <code>iterator</code> и с ним соответственно):</p>
<p><img alt="huge export articles profiling" src="/media/2017/6/export_articles_huge_before_and_after.png"/></p>
<h2 id="ispolzovanie-vneshnikh-kliuchei">Использование внешних ключей</h2>
<p>Теперь нам потребовалось добавить действие в админку статей для создания копии статьи:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">clone_article</span><span class="p">(</span><span class="n">modeladmin</span><span class="p">,</span> <span class="n">request</span><span class="p">,</span> <span class="n">queryset</span><span class="p">):</span>
<span class="k">if</span> <span class="n">queryset</span><span class="o">.</span><span class="n">count</span><span class="p">()</span> <span class="o">!=</span> <span class="mi">1</span><span class="p">:</span>
<span class="n">modeladmin</span><span class="o">.</span><span class="n">message_user</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="s2">"You could clone only one article at a time."</span><span class="p">,</span> <span class="n">level</span><span class="o">=</span><span class="n">messages</span><span class="o">.</span><span class="n">ERROR</span><span class="p">)</span>
<span class="k">return</span>
<span class="n">origin_article</span> <span class="o">=</span> <span class="n">queryset</span><span class="o">.</span><span class="n">first</span><span class="p">()</span>
<span class="n">cloned_article</span> <span class="o">=</span> <span class="n">Article</span><span class="p">(</span>
<span class="n">title</span><span class="o">=</span><span class="s2">"{} (COPY)"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">origin_article</span><span class="o">.</span><span class="n">title</span><span class="p">),</span>
<span class="n">content</span><span class="o">=</span><span class="n">origin_article</span><span class="o">.</span><span class="n">content</span><span class="p">,</span>
<span class="n">created_at</span><span class="o">=</span><span class="n">origin_article</span><span class="o">.</span><span class="n">created_at</span><span class="p">,</span>
<span class="n">author</span><span class="o">=</span><span class="n">origin_article</span><span class="o">.</span><span class="n">author</span><span class="p">,</span>
<span class="n">comments_on</span><span class="o">=</span><span class="n">origin_article</span><span class="o">.</span><span class="n">comments_on</span><span class="p">)</span>
<span class="n">cloned_article</span><span class="o">.</span><span class="n">save</span><span class="p">()</span>
<span class="n">cloned_article</span><span class="o">.</span><span class="n">tags</span> <span class="o">=</span> <span class="n">origin_article</span><span class="o">.</span><span class="n">tags</span><span class="o">.</span><span class="n">all</span><span class="p">()</span>
<span class="n">modeladmin</span><span class="o">.</span><span class="n">message_user</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="s2">"Article successfully cloned"</span><span class="p">,</span> <span class="n">level</span><span class="o">=</span><span class="n">messages</span><span class="o">.</span><span class="n">SUCCESS</span><span class="p">)</span>
<span class="n">clone_article</span><span class="o">.</span><span class="n">short_description</span> <span class="o">=</span> <span class="s1">'Clone article'</span>
</pre></div>
<p>В логах можно увидеть следующие запросы к БД:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">001</span><span class="p">)</span> <span class="k">SELECT</span> <span class="k">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span> <span class="k">AS</span> <span class="ss">"__count"</span> <span class="k">FROM</span> <span class="ss">"blog_article"</span> <span class="k">WHERE</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"id"</span> <span class="k">IN</span> <span class="p">(</span><span class="mi">31582</span><span class="p">);</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="mi">31582</span><span class="p">,)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">001</span><span class="p">)</span> <span class="k">SELECT</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"title"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"content"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"created_at"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"author_id"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"comments_on"</span> <span class="k">FROM</span> <span class="ss">"blog_article"</span> <span class="k">WHERE</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"id"</span> <span class="k">IN</span> <span class="p">(</span><span class="mi">31582</span><span class="p">)</span> <span class="k">ORDER</span> <span class="k">BY</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"created_at"</span> <span class="k">DESC</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"id"</span> <span class="k">DESC</span> <span class="k">LIMIT</span> <span class="mi">1</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="mi">31582</span><span class="p">,)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">000</span><span class="p">)</span> <span class="k">SELECT</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"username"</span><span class="p">,</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"email"</span><span class="p">,</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"bio"</span> <span class="k">FROM</span> <span class="ss">"blog_author"</span> <span class="k">WHERE</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"id"</span> <span class="o">=</span> <span class="mi">2156</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="mi">2156</span><span class="p">,)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">001</span><span class="p">)</span> <span class="k">INSERT</span> <span class="k">INTO</span> <span class="ss">"blog_article"</span> <span class="p">(</span><span class="ss">"title"</span><span class="p">,</span> <span class="ss">"content"</span><span class="p">,</span> <span class="ss">"created_at"</span><span class="p">,</span> <span class="ss">"author_id"</span><span class="p">,</span> <span class="ss">"comments_on"</span><span class="p">)</span> <span class="k">VALUES</span> <span class="p">(</span><span class="s1">'Explicabo maiores nobis cum vel fugit. (COPY)'</span><span class="p">,</span> <span class="p">...</span>
</pre></div>
<p>У нас почему-то запрашивается автор, хотя нам не нужны какие-либо данные об авторе, кроме его ID. Чтобы исправить это,
нужно обращаться к внешнему ключу напрямую, для получения id автора нужно использовать <code>origin_article.author_id</code>.
Теперь код клонирования статьи будет выглядеть следующим образом:</p>
<div class="highlight"><pre><span></span><span class="n">cloned_article</span> <span class="o">=</span> <span class="n">Article</span><span class="p">(</span>
<span class="n">title</span><span class="o">=</span><span class="s2">"{} (COPY)"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">origin_article</span><span class="o">.</span><span class="n">title</span><span class="p">),</span>
<span class="n">content</span><span class="o">=</span><span class="n">origin_article</span><span class="o">.</span><span class="n">content</span><span class="p">,</span>
<span class="n">created_at</span><span class="o">=</span><span class="n">origin_article</span><span class="o">.</span><span class="n">created_at</span><span class="p">,</span>
<span class="n">author_id</span><span class="o">=</span><span class="n">origin_article</span><span class="o">.</span><span class="n">author_id</span><span class="p">,</span>
<span class="n">comments_on</span><span class="o">=</span><span class="n">origin_article</span><span class="o">.</span><span class="n">comments_on</span><span class="p">)</span>
</pre></div>
<p>И в логах больше нет запросов на получение информации об авторе.</p>
<h2 id="poluchenie-sviazannykh-obektov">Получение связанных объектов</h2>
<p>Наконец-то пришло время сделать наши статьи публично доступными, и начнем мы со страницы со списком статей. Реализуем
view, используя <code>ListView</code>:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">ArticlesListView</span><span class="p">(</span><span class="n">ListView</span><span class="p">):</span>
<span class="n">template_name</span> <span class="o">=</span> <span class="s1">'blog/articles_list.html'</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">Article</span>
<span class="n">context_object_name</span> <span class="o">=</span> <span class="s1">'articles'</span>
<span class="n">paginate_by</span> <span class="o">=</span> <span class="mi">20</span>
</pre></div>
<p>В шаблоне мы выводим информацию о статье, авторе и тегах:</p>
<div class="highlight"><pre><span></span><span class="x"><article></span>
<span class="x"> <h2></span><span class="cp">{{</span> <span class="nv">article.title</span> <span class="cp">}}</span><span class="x"></h2></span>
<span class="x"> <time></span><span class="cp">{{</span> <span class="nv">article.created_at</span> <span class="cp">}}</span><span class="x"></time></span>
<span class="x"> <p>Author: </span><span class="cp">{{</span> <span class="nv">article.author.username</span> <span class="cp">}}</span><span class="x"></p></span>
<span class="x"> <p>Tags:</span>
<span class="x"> </span><span class="cp">{%</span> <span class="k">for</span> <span class="nv">tag</span> <span class="k">in</span> <span class="nv">article.tags.all</span> <span class="cp">%}</span><span class="x"></span>
<span class="x"> </span><span class="cp">{{</span> <span class="nv">tag</span> <span class="cp">}}{%</span> <span class="k">if</span> <span class="k">not</span> <span class="nb">forloop</span><span class="nv">.last</span> <span class="cp">%}</span><span class="x">, </span><span class="cp">{%</span> <span class="k">endif</span> <span class="cp">%}</span><span class="x"></span>
<span class="x"> </span><span class="cp">{%</span> <span class="k">endfor</span> <span class="cp">%}</span><span class="x"></span>
<span class="x"></article></span>
</pre></div>
<p>DDT показывает при открытии списка статей 45 SQL запросов следующего вида:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">002</span><span class="p">)</span> <span class="k">SELECT</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"title"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"content"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"created_at"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"author_id"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"comments_on"</span> <span class="k">FROM</span> <span class="ss">"blog_article"</span> <span class="k">LIMIT</span> <span class="mi">20</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">()</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">001</span><span class="p">)</span> <span class="k">SELECT</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"username"</span><span class="p">,</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"email"</span><span class="p">,</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"bio"</span> <span class="k">FROM</span> <span class="ss">"blog_author"</span> <span class="k">WHERE</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"id"</span> <span class="o">=</span> <span class="mi">2043</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="mi">2043</span><span class="p">,)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">001</span><span class="p">)</span> <span class="k">SELECT</span> <span class="ss">"blog_tag"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span> <span class="ss">"blog_tag"</span><span class="p">.</span><span class="ss">"name"</span> <span class="k">FROM</span> <span class="ss">"blog_tag"</span> <span class="k">INNER</span> <span class="k">JOIN</span> <span class="ss">"blog_article_tags"</span> <span class="k">ON</span> <span class="p">(</span><span class="ss">"blog_tag"</span><span class="p">.</span><span class="ss">"id"</span> <span class="o">=</span> <span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"tag_id"</span><span class="p">)</span> <span class="k">WHERE</span> <span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"article_id"</span> <span class="o">=</span> <span class="mi">20425</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="mi">20425</span><span class="p">,)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">000</span><span class="p">)</span> <span class="k">SELECT</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"username"</span><span class="p">,</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"email"</span><span class="p">,</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"bio"</span> <span class="k">FROM</span> <span class="ss">"blog_author"</span> <span class="k">WHERE</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"id"</span> <span class="o">=</span> <span class="mi">2043</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="mi">2043</span><span class="p">,)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">001</span><span class="p">)</span> <span class="k">SELECT</span> <span class="ss">"blog_tag"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span> <span class="ss">"blog_tag"</span><span class="p">.</span><span class="ss">"name"</span> <span class="k">FROM</span> <span class="ss">"blog_tag"</span> <span class="k">INNER</span> <span class="k">JOIN</span> <span class="ss">"blog_article_tags"</span> <span class="k">ON</span> <span class="p">(</span><span class="ss">"blog_tag"</span><span class="p">.</span><span class="ss">"id"</span> <span class="o">=</span> <span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"tag_id"</span><span class="p">)</span> <span class="k">WHERE</span> <span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"article_id"</span> <span class="o">=</span> <span class="mi">20426</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="mi">20426</span><span class="p">,)</span>
</pre></div>
<p>Т.е. мы сначала получаем все статьи одним SQL запросом (с учетом пагинации) и затем для каждой из этих статей отдельно
запрашиваются автор и теги. Нам нужно заставить Django запросить все эти данные меньшим количеством запросов.</p>
<p>Начнем с получения авторов, для того, чтобы <code>QuerySet</code> получил заранее данные по определенным внешним ключам есть метод <code>select_related</code>. Обновим <code>queryset</code> в нашем view для использования этого метода:</p>
<div class="highlight"><pre><span></span><span class="n">queryset</span> <span class="o">=</span> <span class="n">Article</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">select_related</span><span class="p">(</span><span class="s1">'author'</span><span class="p">)</span>
</pre></div>
<p>После этого DDT показывает уже 25 SQL запросов, т.к. получение информации об авторах и статьях теперь выполняется одним
SQL запросом с <code>JOIN</code>:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">004</span><span class="p">)</span> <span class="k">SELECT</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"title"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"content"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"created_at"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"author_id"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"comments_on"</span><span class="p">,</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"username"</span><span class="p">,</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"email"</span><span class="p">,</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"bio"</span> <span class="k">FROM</span> <span class="ss">"blog_article"</span> <span class="k">INNER</span> <span class="k">JOIN</span> <span class="ss">"blog_author"</span> <span class="k">ON</span> <span class="p">(</span><span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"author_id"</span> <span class="o">=</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">)</span> <span class="k">LIMIT</span> <span class="mi">21</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">()</span>
</pre></div>
<p>Метод <code>select_related</code> работает только с внешними ключами в текущей модели, для того, чтобы уменьшить количество запросов
при получении множества связанных объектов (таких как теги в нашем примере), нужно использовать метод <code>prefetch_related</code>.
Опять обновим атрибут <code>queryset</code> у класса <code>AticlsListView</code>:</p>
<div class="highlight"><pre><span></span><span class="n">queryset</span> <span class="o">=</span> <span class="n">Article</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">select_related</span><span class="p">(</span><span class="s1">'author'</span><span class="p">)</span><span class="o">.</span><span class="n">prefetch_related</span><span class="p">(</span><span class="s1">'tags'</span><span class="p">)</span>
</pre></div>
<p>И теперь DDT показывает всего 7 запросов. Если проигнорировать запросы, которые выполняет пагинатор и запросы, связанные
с сессией получаем всего 2 запроса для отображения списка статей:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">002</span><span class="p">)</span> <span class="k">SELECT</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"title"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"content"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"created_at"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"author_id"</span><span class="p">,</span> <span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"comments_on"</span><span class="p">,</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"username"</span><span class="p">,</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"email"</span><span class="p">,</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"bio"</span> <span class="k">FROM</span> <span class="ss">"blog_article"</span> <span class="k">INNER</span> <span class="k">JOIN</span> <span class="ss">"blog_author"</span> <span class="k">ON</span> <span class="p">(</span><span class="ss">"blog_article"</span><span class="p">.</span><span class="ss">"author_id"</span> <span class="o">=</span> <span class="ss">"blog_author"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">)</span> <span class="k">LIMIT</span> <span class="mi">20</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">()</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">001</span><span class="p">)</span> <span class="k">SELECT</span> <span class="p">(</span><span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"article_id"</span><span class="p">)</span> <span class="k">AS</span> <span class="ss">"_prefetch_related_val_article_id"</span><span class="p">,</span> <span class="ss">"blog_tag"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span> <span class="ss">"blog_tag"</span><span class="p">.</span><span class="ss">"name"</span> <span class="k">FROM</span> <span class="ss">"blog_tag"</span> <span class="k">INNER</span> <span class="k">JOIN</span> <span class="ss">"blog_article_tags"</span> <span class="k">ON</span> <span class="p">(</span><span class="ss">"blog_tag"</span><span class="p">.</span><span class="ss">"id"</span> <span class="o">=</span> <span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"tag_id"</span><span class="p">)</span> <span class="k">WHERE</span> <span class="ss">"blog_article_tags"</span><span class="p">.</span><span class="ss">"article_id"</span> <span class="k">IN</span> <span class="p">(</span><span class="mi">16352</span><span class="p">,</span> <span class="mi">16353</span><span class="p">,</span> <span class="mi">16354</span><span class="p">,</span> <span class="mi">16355</span><span class="p">,</span> <span class="mi">16356</span><span class="p">,</span> <span class="mi">16357</span><span class="p">,</span> <span class="mi">16358</span><span class="p">,</span> <span class="mi">16359</span><span class="p">,</span> <span class="mi">16360</span><span class="p">,</span> <span class="mi">16361</span><span class="p">,</span> <span class="mi">16362</span><span class="p">,</span> <span class="mi">16363</span><span class="p">,</span> <span class="mi">16344</span><span class="p">,</span> <span class="mi">16345</span><span class="p">,</span> <span class="mi">16346</span><span class="p">,</span> <span class="mi">16347</span><span class="p">,</span> <span class="mi">16348</span><span class="p">,</span> <span class="mi">16349</span><span class="p">,</span> <span class="mi">16350</span><span class="p">,</span> <span class="mi">16351</span><span class="p">);</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="mi">16352</span><span class="p">,</span> <span class="mi">16353</span><span class="p">,</span> <span class="mi">16354</span><span class="p">,</span> <span class="mi">16355</span><span class="p">,</span> <span class="mi">16356</span><span class="p">,</span> <span class="mi">16357</span><span class="p">,</span> <span class="mi">16358</span><span class="p">,</span> <span class="mi">16359</span><span class="p">,</span> <span class="mi">16360</span><span class="p">,</span> <span class="mi">16361</span><span class="p">,</span> <span class="mi">16362</span><span class="p">,</span> <span class="mi">16363</span><span class="p">,</span> <span class="mi">16344</span><span class="p">,</span> <span class="mi">16345</span><span class="p">,</span> <span class="mi">16346</span><span class="p">,</span> <span class="mi">16347</span><span class="p">,</span> <span class="mi">16348</span><span class="p">,</span> <span class="mi">16349</span><span class="p">,</span> <span class="mi">16350</span><span class="p">,</span> <span class="mi">16351</span><span class="p">)</span>
</pre></div>
<blockquote>
<p>Используйте <code>select_related</code> для внешних ключей в текущей модели. Для получения M2M объектов и объектов из моделей
ссылающихся на текущую, используйте <code>prefetch_related</code>.</p>
<p>Также <code>prefetch_related</code> можно использовать для получения связанных объектов большей вложенности: </p>
<p><code>Tag.objects.all().prefetch_related('article_set__author')</code></p>
<p>Этот код запросит вместе с тегом также все статьи отмеченные тегом и всех авторов этих статей.</p>
</blockquote>
<h2 id="ogranichenie-polei-v-vyborkakh">Ограничение полей в выборках</h2>
<p>Если мы присмотримся получше к SQL запросам в предыдущем примере, мы увидим, что мы получаем больше полей, чем нам нужно.
В DDT можно посмотреть результаты запроса и убедиться в этом:</p>
<p><img alt="SQL query result for articles list" src="/media/2017/6/sql-queries-results.png"/></p>
<p>Мы получаем все поля автора и статьи, включая текст статьи огромного размера. Можно значительно
уменьшить объем передаваемых данных, используя метод defer, который позволяет отложить получение определенных полей.
В случае, если в коде все же произойдет обращение к такому полю, то Django сделает дополнительный запрос для его получения.
Добавим вызов метода <code>defer</code> в <code>queryset</code>:</p>
<div class="highlight"><pre><span></span><span class="n">queryset</span> <span class="o">=</span> <span class="n">Article</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">select_related</span><span class="p">(</span><span class="s1">'author'</span><span class="p">)</span><span class="o">.</span><span class="n">prefetch_related</span><span class="p">(</span><span class="s1">'tags'</span><span class="p">)</span><span class="o">.</span><span class="n">defer</span><span class="p">(</span><span class="s1">'content'</span><span class="p">,</span> <span class="s1">'comments_on'</span><span class="p">)</span>
</pre></div>
<p>Теперь некоторые ненужные поля не запрашиваются и это уменьшило время обработки запроса, как нам показывает DDT
(до и после <code>defer</code> соответственно):</p>
<p><img alt="DDT - SQL speedup after defer" src="/media/2017/6/sql-speedup-defer.png"/></p>
<p>Мы все еще получаем множество полей автора, которые мы не используем. Проще было бы указать только те поля,
которые нам действительно нужны. Для этого есть метод <code>only</code>, передав которому названия полей, остальные поля будут отложены:</p>
<div class="highlight"><pre><span></span><span class="n">queryset</span> <span class="o">=</span> <span class="n">Article</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">select_related</span><span class="p">(</span><span class="s1">'author'</span><span class="p">)</span><span class="o">.</span><span class="n">prefetch_related</span><span class="p">(</span><span class="s1">'tags'</span><span class="p">)</span><span class="o">.</span><span class="n">only</span><span class="p">(</span>
<span class="s1">'title'</span><span class="p">,</span> <span class="s1">'created_at'</span><span class="p">,</span> <span class="s1">'author__username'</span><span class="p">,</span> <span class="s1">'tags__name'</span><span class="p">)</span>
</pre></div>
<p>В результате мы получаем только нужные данные, что можно посмотреть в DDT:</p>
<p><img alt="DDT - SQL after only" src="/media/2017/6/sql-after-only.png"/></p>
<p>Т.е. <code>defer</code> и <code>only</code> выполняют одну и ту же задачу, ограничения полей в выборках, различие только в то что:</p>
<ul>
<li><code>defer</code> откладывает получение полей переданных в качестве аргументов,</li>
<li><code>only</code> откладывает получение всех полей, кроме переданных.</li>
</ul>
<h2 id="indeksy-bd">Индексы БД</h2>
<p>Нам нужно сделать страницу автора, которая будет доступна по такому URL: <code>/authors/<username></code>. Сделаем view
для этого:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">author_page_view</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="n">username</span><span class="p">):</span>
<span class="n">author</span> <span class="o">=</span> <span class="n">get_object_or_404</span><span class="p">(</span><span class="n">Author</span><span class="p">,</span> <span class="n">username</span><span class="o">=</span><span class="n">username</span><span class="p">)</span>
<span class="k">return</span> <span class="n">render</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="s1">'blog/author.html'</span><span class="p">,</span> <span class="n">context</span><span class="o">=</span><span class="nb">dict</span><span class="p">(</span><span class="n">author</span><span class="o">=</span><span class="n">author</span><span class="p">))</span>
</pre></div>
<p>Этот код работает достаточно быстро при небольшом объеме данных, но если объем значительный и продолжает расти, то
производительность будет только падать. Все дело в том, что для поиска по полю <code>username</code> СУБД приходится сканировать
всю таблицу до тех пор пока не найдет нужное значение. Есть вариант лучше - добавить на данное поле индекс, что позволит
СУБД искать гораздо эффективнее. Для добавления индекса нужно добавить аргумент <code>db_index=True</code> в объявление
поля <code>username</code>, а затем создать и применить миграции:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Author</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">username</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span><span class="n">max_length</span><span class="o">=</span><span class="mi">64</span><span class="p">,</span> <span class="n">db_index</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="c1"># ...</span>
</pre></div>
<p>Сравним производительность до и после добавления индекса на БД авторов размером в 100К.</p>
<p>Без индекса:</p>
<p><img alt="select by username without index" src="/media/2017/6/ddt-select-by-username-without-index.png"/></p>
<p>С индексом:</p>
<p><img alt="select by username with index" src="/media/2017/6/ddt-select-by-username-with-index.png"/></p>
<p>Запрос выполнился быстрее в 16 раз!</p>
<blockquote>
<p>Индексы полезны не только при фильтрации данных, но и при сортировке. Также многие СУБД позволяют делать индексы по
нескольким полям, что полезно, если вы фильтруете данные по набору полей. Советую изучить документацию к вашей СУБД,
чтобы узнать подробности.</p>
</blockquote>
<h2 id="lenqs-vs-qscount">len(qs) vs qs.count</h2>
<p>По какой-то причине, нам потребовалось вывести на странице со списком статей счетчик с количеством авторов. Обновим view:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">ArticlesListView</span><span class="p">(</span><span class="n">ListView</span><span class="p">):</span>
<span class="c1"># ...</span>
<span class="k">def</span> <span class="nf">get_context_data</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="n">context</span> <span class="o">=</span> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="n">get_context_data</span><span class="p">(</span><span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
<span class="n">context</span><span class="p">[</span><span class="s1">'authors_count'</span><span class="p">]</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">Author</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">all</span><span class="p">())</span>
<span class="k">return</span> <span class="n">context</span>
</pre></div>
<p>Посмотрим какие SQL запросы генерирует этот код:</p>
<p><img alt="DDT - len(qs)" src="/media/2017/6/ddt-authors-len-queryset.png"/></p>
<p>На скриншоте мы видим, что запрашиваются все значения из таблицы авторов, соответственно подсчет количества происходит
уже в самом view. Конечно это не самый оптимальный вариант и нам было бы достаточно получить из БД одно число -
количество авторов. Для этого можно использовать метод <code>count</code>:</p>
<div class="highlight"><pre><span></span> <span class="n">context</span><span class="p">[</span><span class="s1">'authors_count'</span><span class="p">]</span> <span class="o">=</span> <span class="n">Author</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">count</span><span class="p">()</span>
</pre></div>
<p>Посмотрим результат в DDT:</p>
<p><img alt="DDT - len(qs)" src="/media/2017/6/ddt-authors-count.png"/></p>
<p>Теперь Django сгенерировал намного более оптимальный запрос для нашей задачи.</p>
<h2 id="count-vs-exists">count vs exists</h2>
<p>На странице автора нужно вывести ссылку на каталог статей этого автора, если у него есть статьи. Одним из решений будет
получить количество статей и сравнить равно ли количество 0, например так:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">author_page_view</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="n">username</span><span class="p">):</span>
<span class="n">author</span> <span class="o">=</span> <span class="n">get_object_or_404</span><span class="p">(</span><span class="n">Author</span><span class="p">,</span> <span class="n">username</span><span class="o">=</span><span class="n">username</span><span class="p">)</span>
<span class="n">show_articles_link</span> <span class="o">=</span> <span class="p">(</span><span class="n">author</span><span class="o">.</span><span class="n">articles</span><span class="o">.</span><span class="n">count</span><span class="p">()</span> <span class="o">></span> <span class="mi">0</span><span class="p">)</span>
<span class="k">return</span> <span class="n">render</span><span class="p">(</span>
<span class="n">request</span><span class="p">,</span> <span class="s1">'blog/author.html'</span><span class="p">,</span>
<span class="n">context</span><span class="o">=</span><span class="nb">dict</span><span class="p">(</span><span class="n">author</span><span class="o">=</span><span class="n">author</span><span class="p">,</span> <span class="n">show_articles_link</span><span class="o">=</span><span class="n">show_articles_link</span><span class="p">))</span>
</pre></div>
<p>Но при большом количестве статей этот код будет работать медленно. Т.к. нам не нужно знать точное количество статей
у пользователя, то мы можем использовать метод <code>exists</code>, который проверяет, что в <code>QuertSet</code> есть хотя бы один результат:</p>
<div class="highlight"><pre><span></span> <span class="c1"># ...</span>
<span class="n">show_articles_link</span> <span class="o">=</span> <span class="n">author</span><span class="o">.</span><span class="n">articles</span><span class="o">.</span><span class="n">exists</span><span class="p">()</span>
<span class="c1"># ...</span>
</pre></div>
<p>Сравниваем производительность при большом количестве статей (~10K):</p>
<p><img alt="DDT - exists vs count" src="/media/2017/6/ddt-exists-vs-count.png"/></p>
<p>Мы достигли цели запросом, который выполняется в 10 раз быстрее.</p>
<h2 id="lenivyi-queryset">Ленивый QuerySet</h2>
<p>Теперь нам захотелось, чтобы авторы конкурировали между собой, для этого мы добавим рейтинг топ-20 авторов по количеству
статей.</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">ArticlesListView</span><span class="p">(</span><span class="n">ListView</span><span class="p">):</span>
<span class="c1"># ...</span>
<span class="k">def</span> <span class="nf">get_context_data</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="c1"># ...</span>
<span class="n">context</span><span class="p">[</span><span class="s1">'top_authors'</span><span class="p">]</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span>
<span class="n">Author</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">order_by</span><span class="p">(</span><span class="s1">'-articles_count'</span><span class="p">))[:</span><span class="mi">20</span><span class="p">]</span>
<span class="c1"># ...</span>
</pre></div>
<p>Здесь мы получаем список всех авторов, отсортированный по количеству статей, и берем первые 20 элементов этого списка.
<code>articles_count</code>, в нашем примере, это денормализованное поле, которое содержит количество статей у данного автора.
На реальном проекте, возможно вы захотели бы настроить сигналы, для актуализации этого поля.</p>
<p>Думаю уже сейчас понятно, что это не самый оптимальный вариант, это подтверждает и DDT:</p>
<p><img alt="DDT - get top authors slice" src="/media/2017/6/ddt-top-authors-list.png"/></p>
<p>Конечно нам нужно, чтобы ограничение выборки первыми 20-ю авторами происходило на стороне БД. Для этого нужно понять,
что <code>QuerySet</code> старается максимально отсрочить выполнение запроса к БД. Непосредственно запрос к БД осуществляется в
следующих случаях:</p>
<ul>
<li>итерация по QuerySet (например, <code>for obj in Model.objects.all():</code>),</li>
<li>slicing, если вы используете "нарезку" с определенным шагом (например, <code>Model.objects.all()[::2]</code>),</li>
<li>применение метода <code>len</code> (например, <code>len(Model.objects.all())</code>,</li>
<li>применение метода <code>list</code> (например, <code>list(Model.objects.all())</code>,</li>
<li>применение метода <code>bool</code> (например, <code>bool(Model.objects.all())</code>,</li>
<li>сериализация при помощи <a href="https://docs.python.org/3/library/pickle.html">pickle</a>.</li>
</ul>
<p>Т.е. вызвав <code>list</code> мы заставили <code>QuerySet</code> выполнить запрос к БД и вернуть нам список объектов, после чего уже к нему была
применена операция обрезки. Для того, чтобы ограничение выборки происходило в SQL запросе, нужно применить slicing
к самому <code>QuerySet</code>:</p>
<div class="highlight"><pre><span></span><span class="n">context</span><span class="p">[</span><span class="s1">'top_authors'</span><span class="p">]</span> <span class="o">=</span> <span class="n">Author</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">order_by</span><span class="p">(</span><span class="s1">'-articles_count'</span><span class="p">)[:</span><span class="mi">20</span><span class="p">]</span>
</pre></div>
<p><img alt="DDT - get top authors slice on queryset" src="/media/2017/6/ddt-top-authors-qs-slice.png"/></p>
<p>Теперь размер выборки ограничивается в запросе: <code>...LIMIT 20</code>. Также видно, что отправка запроса
к БД была отложена до итерации по циклу в шаблоне.</p>Django project optimization guide (part 1)2017-06-14T16:47:00+03:002017-06-14T16:47:00+03:00Admintag:None,2017-06-14:/django-project-optimization-part-1/<p>Other parts of this guide:</p>
<ul>
<li>Part 1. Profiling and Django settings</li>
<li><a href="/en/django-project-optimization-part-2/">Part 2. Working with database</a></li>
<li><a href="/en/django-project-optimization-part-3/">Part 3. Caching</a></li>
</ul>
<p>Django is a powerful framework used in many great projects. It provides many batteries, that speed up development and
therefore reduces the price of it. When a project becomes large and …</p><p>Other parts of this guide:</p>
<ul>
<li>Part 1. Profiling and Django settings</li>
<li><a href="/en/django-project-optimization-part-2/">Part 2. Working with database</a></li>
<li><a href="/en/django-project-optimization-part-3/">Part 3. Caching</a></li>
</ul>
<p>Django is a powerful framework used in many great projects. It provides many batteries, that speed up development and
therefore reduces the price of it. When a project becomes large and is used by many users you inevitably will run
into performance problems. In this guide, I will try define potential problems and how to fix them.</p>
<p>This is the first part of a series about Django performance optimization. It will cover profiling and Django settings.</p>
<h2 id="profiling">Profiling</h2>
<p>Before starting to make any optimizations you should measure current performance to be able to compare results of
optimizations. And you should be able to measure performance regularly after each change, so this process should be
automatized.</p>
<p>Profiling is a process of measurement metrics of your project. Such as server response time, CPU usage, memory usage, etc.
Python has its own <a href="https://docs.python.org/3/library/profile.html">profiler</a> in the standard library. It works pretty
good in profiling code chunks, but for profiling a whole Django project more convenient solutions exist.</p>
<h3 id="django-logging">Django logging</h3>
<p>One of the most common optimization issues are needles and/or inefficient SQL queries. You could set up Django
logging to display all SQL queries into the console. Add to <code>settings.py</code> file:</p>
<div class="highlight"><pre><span></span><span class="n">LOGGING</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'version'</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
<span class="s1">'disable_existing_loggers'</span><span class="p">:</span> <span class="bp">False</span><span class="p">,</span>
<span class="s1">'handlers'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'console'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'class'</span><span class="p">:</span> <span class="s1">'logging.StreamHandler'</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">},</span>
<span class="s1">'loggers'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'django.db.backends'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'level'</span><span class="p">:</span> <span class="s1">'DEBUG'</span><span class="p">,</span>
<span class="s1">'handlers'</span><span class="p">:</span> <span class="p">[</span><span class="s1">'console'</span><span class="p">],</span>
<span class="p">}</span>
<span class="p">},</span>
<span class="p">}</span>
</pre></div>
<p>also, make sure that <code>DEBUG = True</code>. After reloading server, you should see SQL queries and corresponding time
in the console for every request you make:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">002</span><span class="p">)</span> <span class="k">SELECT</span> <span class="k">DISTINCT</span> <span class="ss">"handbooks_size"</span><span class="p">.</span><span class="ss">"size_type_id"</span><span class="p">,</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"size_id"</span> <span class="k">FROM</span> <span class="ss">"goods_goods"</span> <span class="k">LEFT</span> <span class="k">OUTER</span> <span class="k">JOIN</span> <span class="ss">"handbooks_size"</span> <span class="k">ON</span> <span class="p">(</span><span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"size_id"</span> <span class="o">=</span> <span class="ss">"handbooks_size"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">)</span> <span class="k">WHERE</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"status"</span> <span class="k">IN</span> <span class="p">(</span><span class="s1">'reserved'</span><span class="p">,</span> <span class="s1">'sold'</span><span class="p">,</span> <span class="s1">'approved'</span><span class="p">)</span> <span class="k">ORDER</span> <span class="k">BY</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"size_id"</span> <span class="k">ASC</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="s1">'reserved'</span><span class="p">,</span> <span class="s1">'sold'</span><span class="p">,</span> <span class="s1">'approved'</span><span class="p">)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">001</span><span class="p">)</span> <span class="k">SELECT</span> <span class="k">DISTINCT</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"color_id"</span> <span class="k">FROM</span> <span class="ss">"goods_goods"</span> <span class="k">WHERE</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"status"</span> <span class="k">IN</span> <span class="p">(</span><span class="s1">'reserved'</span><span class="p">,</span> <span class="s1">'sold'</span><span class="p">,</span> <span class="s1">'approved'</span><span class="p">)</span> <span class="k">ORDER</span> <span class="k">BY</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"color_id"</span> <span class="k">ASC</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="s1">'reserved'</span><span class="p">,</span> <span class="s1">'sold'</span><span class="p">,</span> <span class="s1">'approved'</span><span class="p">)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">001</span><span class="p">)</span> <span class="k">SELECT</span> <span class="k">DISTINCT</span> <span class="ss">"handbooks_size"</span><span class="p">.</span><span class="ss">"row"</span><span class="p">,</span> <span class="ss">"handbooks_size"</span><span class="p">.</span><span class="ss">"size_type_id"</span><span class="p">,</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"size_id"</span> <span class="k">FROM</span> <span class="ss">"goods_goods"</span> <span class="k">LEFT</span> <span class="k">OUTER</span> <span class="k">JOIN</span> <span class="ss">"handbooks_size"</span> <span class="k">ON</span> <span class="p">(</span><span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"size_id"</span> <span class="o">=</span> <span class="ss">"handbooks_size"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">)</span> <span class="k">WHERE</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"status"</span> <span class="k">IN</span> <span class="p">(</span><span class="s1">'reserved'</span><span class="p">,</span> <span class="s1">'sold'</span><span class="p">,</span> <span class="s1">'approved'</span><span class="p">)</span> <span class="k">ORDER</span> <span class="k">BY</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"size_id"</span> <span class="k">ASC</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="s1">'reserved'</span><span class="p">,</span> <span class="s1">'sold'</span><span class="p">,</span> <span class="s1">'approved'</span><span class="p">)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">000</span><span class="p">)</span> <span class="k">SELECT</span> <span class="k">DISTINCT</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"season"</span> <span class="k">FROM</span> <span class="ss">"goods_goods"</span> <span class="k">WHERE</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"status"</span> <span class="k">IN</span> <span class="p">(</span><span class="s1">'reserved'</span><span class="p">,</span> <span class="s1">'sold'</span><span class="p">,</span> <span class="s1">'approved'</span><span class="p">)</span> <span class="k">ORDER</span> <span class="k">BY</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"season"</span> <span class="k">ASC</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="s1">'reserved'</span><span class="p">,</span> <span class="s1">'sold'</span><span class="p">,</span> <span class="s1">'approved'</span><span class="p">)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">000</span><span class="p">)</span> <span class="k">SELECT</span> <span class="k">DISTINCT</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"state"</span> <span class="k">FROM</span> <span class="ss">"goods_goods"</span> <span class="k">WHERE</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"status"</span> <span class="k">IN</span> <span class="p">(</span><span class="s1">'reserved'</span><span class="p">,</span> <span class="s1">'sold'</span><span class="p">,</span> <span class="s1">'approved'</span><span class="p">)</span> <span class="k">ORDER</span> <span class="k">BY</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"state"</span> <span class="k">ASC</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="s1">'reserved'</span><span class="p">,</span> <span class="s1">'sold'</span><span class="p">,</span> <span class="s1">'approved'</span><span class="p">)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">002</span><span class="p">)</span> <span class="k">SELECT</span> <span class="k">MAX</span><span class="p">(</span><span class="ss">"__col1"</span><span class="p">),</span> <span class="k">MIN</span><span class="p">(</span><span class="ss">"__col2"</span><span class="p">)</span> <span class="k">FROM</span> <span class="p">(</span><span class="k">SELECT</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"id"</span> <span class="k">AS</span> <span class="n">Col1</span><span class="p">,</span> <span class="k">CASE</span> <span class="k">WHEN</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"status"</span> <span class="o">=</span> <span class="s1">'sold'</span> <span class="k">THEN</span> <span class="mi">1</span> <span class="k">ELSE</span> <span class="mi">0</span> <span class="k">END</span> <span class="k">AS</span> <span class="ss">"x_order"</span><span class="p">,</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"price_sell"</span> <span class="k">AS</span> <span class="ss">"__col1"</span><span class="p">,</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"price_sell"</span> <span class="k">AS</span> <span class="ss">"__col2"</span> <span class="k">FROM</span> <span class="ss">"goods_goods"</span> <span class="k">WHERE</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"status"</span> <span class="k">IN</span> <span class="p">(</span><span class="s1">'reserved'</span><span class="p">,</span> <span class="s1">'sold'</span><span class="p">,</span> <span class="s1">'approved'</span><span class="p">)</span> <span class="k">GROUP</span> <span class="k">BY</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span> <span class="k">CASE</span> <span class="k">WHEN</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"status"</span> <span class="o">=</span> <span class="s1">'sold'</span> <span class="k">THEN</span> <span class="mi">1</span> <span class="k">ELSE</span> <span class="mi">0</span> <span class="k">END</span><span class="p">)</span> <span class="n">subquery</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="s1">'sold'</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="s1">'reserved'</span><span class="p">,</span> <span class="s1">'sold'</span><span class="p">,</span> <span class="s1">'approved'</span><span class="p">,</span> <span class="s1">'sold'</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">001</span><span class="p">)</span> <span class="k">SELECT</span> <span class="k">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span> <span class="k">FROM</span> <span class="p">(</span><span class="k">SELECT</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"id"</span> <span class="k">AS</span> <span class="n">Col1</span><span class="p">,</span> <span class="k">CASE</span> <span class="k">WHEN</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"status"</span> <span class="o">=</span> <span class="s1">'sold'</span> <span class="k">THEN</span> <span class="mi">1</span> <span class="k">ELSE</span> <span class="mi">0</span> <span class="k">END</span> <span class="k">AS</span> <span class="ss">"x_order"</span> <span class="k">FROM</span> <span class="ss">"goods_goods"</span> <span class="k">WHERE</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"status"</span> <span class="k">IN</span> <span class="p">(</span><span class="s1">'reserved'</span><span class="p">,</span> <span class="s1">'sold'</span><span class="p">,</span> <span class="s1">'approved'</span><span class="p">)</span> <span class="k">GROUP</span> <span class="k">BY</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span> <span class="k">CASE</span> <span class="k">WHEN</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"status"</span> <span class="o">=</span> <span class="s1">'sold'</span> <span class="k">THEN</span> <span class="mi">1</span> <span class="k">ELSE</span> <span class="mi">0</span> <span class="k">END</span><span class="p">)</span> <span class="n">subquery</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="s1">'sold'</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="s1">'reserved'</span><span class="p">,</span> <span class="s1">'sold'</span><span class="p">,</span> <span class="s1">'approved'</span><span class="p">,</span> <span class="s1">'sold'</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="p">[</span><span class="mi">15</span><span class="o">/</span><span class="n">Jun</span><span class="o">/</span><span class="mi">2017</span> <span class="mi">11</span><span class="p">:</span><span class="mi">03</span><span class="p">:</span><span class="mi">49</span><span class="p">]</span> <span class="ss">"GET /goods HTTP/1.0"</span> <span class="mi">200</span> <span class="mi">32583</span>
</pre></div>
<h3 id="django-debug-toolbar">Django Debug Toolbar</h3>
<p><a href="http://django-debug-toolbar.readthedocs.io/en/stable/">This</a> Django application provides a set of toolbars, some of
them are great for profiling. Actually, it has built-in SQL panel, that has even more informative log of SQL queries
with additional features, like time chart, traceback, a result of <code>EXPLAIN</code> command, etc.</p>
<p><img alt="DDT" src="/media/2017/6/ddt.png"/></p>
<p>Also, DDT has non-default built-in profiling panel. It provides a web interface to profiling results of the current request.
To enable it, you should add <code>debug_toolbar.panels.profiling.ProfilingPanel</code> to <code>DEBUG_TOOLBAR_PANELS</code> list in `settings.py.</p>
<p><img alt="DDT profiling panel" src="/media/2017/6/ddt-profiling-panel.png"/></p>
<h3 id="silk">Silk</h3>
<p>Another great package for profiling is Silk. It's especially useful if you have an API and therefore you can't use DDT.
Installation instructions can be found on <a href="https://github.com/django-silk/silk#installation">GitHub</a>.</p>
<p><img alt="silky-screenshot.png" src="/media/2017/6/silky-screenshot.png"/></p>
<p>After set up you should reboot the server and open <code>/silk/</code> in a browser. The web interface of Silk provides:</p>
<ul>
<li>Requests statistic,</li>
<li>SQL queries,</li>
<li>profiling results.</li>
</ul>
<p>You can enable profiler for the whole project by setting <code>SILKY_PYTHON_PROFILER = True</code> in <code>settings.py</code>. Or you
can profile only certain functions/blocks of code with help of decorator and context processor:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">silk.profiling.profiler</span> <span class="kn">import</span> <span class="n">silk_profile</span>
<span class="nd">@silk_profile</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">'View Blog Post'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">post</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="n">post_id</span><span class="p">):</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">Post</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">pk</span><span class="o">=</span><span class="n">post_id</span><span class="p">)</span>
<span class="k">return</span> <span class="n">render_to_response</span><span class="p">(</span><span class="s1">'post.html'</span><span class="p">,</span> <span class="p">{</span>
<span class="s1">'post'</span><span class="p">:</span> <span class="n">p</span>
<span class="p">})</span>
<span class="k">def</span> <span class="nf">post</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="n">post_id</span><span class="p">):</span>
<span class="k">with</span> <span class="n">silk_profile</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">'View Blog Post #</span><span class="si">%d</span><span class="s1">'</span> <span class="o">%</span> <span class="bp">self</span><span class="o">.</span><span class="n">pk</span><span class="p">):</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">Post</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">pk</span><span class="o">=</span><span class="n">post_id</span><span class="p">)</span>
<span class="k">return</span> <span class="n">render_to_response</span><span class="p">(</span><span class="s1">'post.html'</span><span class="p">,</span> <span class="p">{</span>
<span class="s1">'post'</span><span class="p">:</span> <span class="n">p</span>
<span class="p">})</span>
</pre></div>
<h3 id="profiling-data">Profiling data</h3>
<p>It's very important to use production-like data for profiling. Ideally, you should grab a dump from the production database and use it
on your local machine. If you try to measure performance on an empty/small database you can receive wrong results, that don't
help you to optimize project correctly.</p>
<h2 id="load-testing_1">Load testing</h2>
<p>After optimizations, you should perform load testing to make sure that performance is on sufficient level to work on production
load. For this type of testing, you need to setup copy of your production environment. Fortunately, cloud services and
deploy automation allow us to make such setup in a minute.</p>
<p>I recommend using <a href="http://locust.io/">Locust</a> for load testing. Its main feature is that you can describe all your
tests in plain Python code. You can set up sophisticated load scenarios that would be close to real users behavior.
The example of <code>locustfile.py</code>:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">locust</span> <span class="kn">import</span> <span class="n">HttpLocust</span><span class="p">,</span> <span class="n">TaskSet</span><span class="p">,</span> <span class="n">task</span>
<span class="k">class</span> <span class="nc">UserBehavior</span><span class="p">(</span><span class="n">TaskSet</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">on_start</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="sd">""" on_start is called when a Locust start before any task is scheduled """</span>
<span class="bp">self</span><span class="o">.</span><span class="n">login</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">login</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">client</span><span class="o">.</span><span class="n">post</span><span class="p">(</span><span class="s2">"/login"</span><span class="p">,</span> <span class="p">{</span><span class="s2">"username"</span><span class="p">:</span><span class="s2">"ellen_key"</span><span class="p">,</span> <span class="s2">"password"</span><span class="p">:</span><span class="s2">"education"</span><span class="p">})</span>
<span class="nd">@task</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">index</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">client</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">"/"</span><span class="p">)</span>
<span class="nd">@task</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">profile</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">client</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">"/profile"</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">WebsiteUser</span><span class="p">(</span><span class="n">HttpLocust</span><span class="p">):</span>
<span class="n">task_set</span> <span class="o">=</span> <span class="n">UserBehavior</span>
<span class="n">min_wait</span> <span class="o">=</span> <span class="mi">5000</span>
<span class="n">max_wait</span> <span class="o">=</span> <span class="mi">9000</span>
</pre></div>
<p>Also, Locust provide web-interface to run tests and see results:</p>
<p><img alt="Locust web interface" src="/media/2017/6/locust-screenshot.png"/></p>
<p>Best thing, that you can setup Locust once and use it to verify project performance after every change. Maybe you could even
add it to your CI/CD pipeline!</p>
<h2 id="django-settings">Django settings</h2>
<p>In this section I will describe Django settings, that may affect the performance.</p>
<h3 id="database-connection-lifetime">Database connection lifetime</h3>
<p>By default, Django closes the database connection at the end of each request. You could setup TTL of a database
connection by changing <a href="https://docs.djangoproject.com/en/1.11/ref/settings/#conn-max-age"><code>CONN_MAX_AGE</code></a> value:</p>
<ul>
<li><code>0</code> - close connection at the end of each request,</li>
<li><code>> 0</code> - TTL in seconds,</li>
<li><code>None</code> - unlimited TTL.</li>
</ul>
<div class="highlight"><pre><span></span><span class="n">DATABASES</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'default'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'ENGINE'</span><span class="p">:</span> <span class="s1">'django.db.backends.postgresql'</span><span class="p">,</span>
<span class="s1">'NAME'</span><span class="p">:</span> <span class="s1">'mydatabase'</span><span class="p">,</span>
<span class="s1">'USER'</span><span class="p">:</span> <span class="s1">'mydatabaseuser'</span><span class="p">,</span>
<span class="s1">'PASSWORD'</span><span class="p">:</span> <span class="s1">'mypassword'</span><span class="p">,</span>
<span class="s1">'HOST'</span><span class="p">:</span> <span class="s1">'127.0.0.1'</span><span class="p">,</span>
<span class="s1">'PORT'</span><span class="p">:</span> <span class="s1">'5432'</span><span class="p">,</span>
<span class="s1">'CONN_MAX_AGE'</span><span class="p">:</span> <span class="mi">60</span> <span class="o">*</span> <span class="mi">10</span><span class="p">,</span> <span class="c1"># 10 minutes</span>
<span class="p">}</span>
<span class="p">}</span>
</pre></div>
<h3 id="templates-caching">Templates caching</h3>
<p>If you use Django version less than 1.11, you should consider enabling templates caching. By default Django (<1.11) reads
from the file system and compiles templates every time they're rendered. You could use <code>django.template.loaders.cached.Loader</code>
to enable templates caching in memory. Add to <code>settings.py</code>:</p>
<div class="highlight"><pre><span></span><span class="n">TEMPLATES</span> <span class="o">=</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s1">'BACKEND'</span><span class="p">:</span> <span class="s1">'django.template.backends.django.DjangoTemplates'</span><span class="p">,</span>
<span class="s1">'DIRS'</span><span class="p">:</span> <span class="p">[</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">BASE_DIR</span><span class="p">,</span> <span class="s1">'foo'</span><span class="p">,</span> <span class="s1">'bar'</span><span class="p">),</span> <span class="p">],</span>
<span class="s1">'OPTIONS'</span><span class="p">:</span> <span class="p">{</span>
<span class="c1"># ...</span>
<span class="s1">'loaders'</span><span class="p">:</span> <span class="p">[</span>
<span class="p">(</span><span class="s1">'django.template.loaders.cached.Loader'</span><span class="p">,</span> <span class="p">[</span>
<span class="s1">'django.template.loaders.filesystem.Loader'</span><span class="p">,</span>
<span class="s1">'django.template.loaders.app_directories.Loader'</span><span class="p">,</span>
<span class="p">]),</span>
<span class="p">],</span>
<span class="p">},</span>
<span class="p">},</span>
<span class="p">]</span>
</pre></div>
<h3 id="redis-cache-backend">Redis cache backend</h3>
<p>Django provides several built-in cache backends, such as database backend, file based backend, etc. I recommend to store
your cache in Redis. Redis is a popular in-memory data structure store, probably you already use it in your project.
To set up Redis as cache backend you need to use third-party package, e.g. <code>django-redis</code>.</p>
<p>Install django-redis with pip:</p>
<div class="highlight"><pre><span></span>pip install django-redis
</pre></div>
<p>Add cache settings to <code>settings.py</code>:</p>
<div class="highlight"><pre><span></span><span class="n">CACHES</span> <span class="o">=</span> <span class="p">{</span>
<span class="s2">"default"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"BACKEND"</span><span class="p">:</span> <span class="s2">"django_redis.cache.RedisCache"</span><span class="p">,</span>
<span class="s2">"LOCATION"</span><span class="p">:</span> <span class="s2">"redis://127.0.0.1:6379/1"</span><span class="p">,</span>
<span class="s2">"OPTIONS"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"CLIENT_CLASS"</span><span class="p">:</span> <span class="s2">"django_redis.client.DefaultClient"</span><span class="p">,</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>Read full documentation <a href="http://niwinz.github.io/django-redis/latest/">here</a>.</p>
<h3 id="sessions-backend">Sessions backend</h3>
<p>By default Django stores sessions in a database. To speed up this we can store sessions in a cache. Add to <code>settings.py</code>:</p>
<div class="highlight"><pre><span></span><span class="n">SESSION_ENGINE</span> <span class="o">=</span> <span class="s2">"django.contrib.sessions.backends.cache"</span>
<span class="n">SESSION_CACHE_ALIAS</span> <span class="o">=</span> <span class="s2">"default"</span>
</pre></div>
<h3 id="remove-unneeded-middlewares">Remove unneeded middlewares</h3>
<p>Check the list of middlewares (<code>MIDDLEWARE</code> in <code>settings.py</code>). Make sure you need all of them and remove unneeded.
Django calls each middleware for each processed request, so there can be significant overhead.</p>
<p>If you have custom middleware, that is used only in the segment of requests, you could try to move this functionality
to view mixin or decorator. So other endpoints will not have an overhead of this middleware.</p>Оптимизация производительности Django проектов (часть 1)2017-06-14T16:47:00+03:002017-06-14T16:47:00+03:00Admintag:None,2017-06-14:/../ru/django-project-optimization-part-1/<p>Остальные статьи цикла:</p>
<ul>
<li>Часть 1. Профилирование и настройки Django</li>
<li><a href="/ru/django-project-optimization-part-2/">Часть 2. Работа с базой данных</a></li>
<li><a href="/ru/django-project-optimization-part-3/">Часть 3. Кэширование</a></li>
</ul>
<p>Django это мощный фреймворк используемый в множестве отличных проектов. Из коробки в нем включено много полезных
батареек, которые значительно ускоряют разработку и соответственно уменьшают ее стоимость. Однако, когда проект
растет и набирает …</p><p>Остальные статьи цикла:</p>
<ul>
<li>Часть 1. Профилирование и настройки Django</li>
<li><a href="/ru/django-project-optimization-part-2/">Часть 2. Работа с базой данных</a></li>
<li><a href="/ru/django-project-optimization-part-3/">Часть 3. Кэширование</a></li>
</ul>
<p>Django это мощный фреймворк используемый в множестве отличных проектов. Из коробки в нем включено много полезных
батареек, которые значительно ускоряют разработку и соответственно уменьшают ее стоимость. Однако, когда проект
растет и набирает аудиторию, вы неизбежно столкнетесь с проблемами производительности. В этом посте я попробую
рассказать о том с какими проблемами вы можете столкнуться и как их решить.</p>
<p>Это первая статья из серии, здесь будут рассмотрено профилирование и настройки Django.</p>
<h2 id="profilirovanie">Профилирование</h2>
<p>Перед тем выполнять оптимизацию необходимо измерить текущую производительность, чтобы после оптимизации можно было сравнить
результаты. Такие измерения нужно будет делать часто, после каждого изменения, так что процесс должен быть автоматизированным.</p>
<p>Профилирование - это процесс измерения метрик проекта. Таких как: время ответа сервера, использование CPU,
использование памяти и тд. Python предоставляет <a href="https://docs.python.org/3/library/profile.html">профайлер</a> в стандартной
библиотеке, который вполне удобно использовать для измерения производительности кусков кода.
Но для профилирования целового проекта существуют более удобные решения.</p>
<h3 id="logirovanie">Логирование</h3>
<p>Самая частая проблема производительности это лишние и/или не эффективные запросы к БД. Можно настроить логирование,
для просмотра всех SQL запросов, которые выполняются в процессе обработки запроса. Добавьте в <code>settings.py</code>:</p>
<div class="highlight"><pre><span></span><span class="n">LOGGING</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'version'</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
<span class="s1">'disable_existing_loggers'</span><span class="p">:</span> <span class="bp">False</span><span class="p">,</span>
<span class="s1">'handlers'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'console'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'class'</span><span class="p">:</span> <span class="s1">'logging.StreamHandler'</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">},</span>
<span class="s1">'loggers'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'django.db.backends'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'level'</span><span class="p">:</span> <span class="s1">'DEBUG'</span><span class="p">,</span>
<span class="s1">'handlers'</span><span class="p">:</span> <span class="p">[</span><span class="s1">'console'</span><span class="p">],</span>
<span class="p">}</span>
<span class="p">},</span>
<span class="p">}</span>
</pre></div>
<p>Убедитесь, что <code>DEBUG = True</code> и перезагрузите сервер. Теперь в консоли должны выводится все SQL запросы и длительность
выполнения каждого из них.</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">002</span><span class="p">)</span> <span class="k">SELECT</span> <span class="k">DISTINCT</span> <span class="ss">"handbooks_size"</span><span class="p">.</span><span class="ss">"size_type_id"</span><span class="p">,</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"size_id"</span> <span class="k">FROM</span> <span class="ss">"goods_goods"</span> <span class="k">LEFT</span> <span class="k">OUTER</span> <span class="k">JOIN</span> <span class="ss">"handbooks_size"</span> <span class="k">ON</span> <span class="p">(</span><span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"size_id"</span> <span class="o">=</span> <span class="ss">"handbooks_size"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">)</span> <span class="k">WHERE</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"status"</span> <span class="k">IN</span> <span class="p">(</span><span class="s1">'reserved'</span><span class="p">,</span> <span class="s1">'sold'</span><span class="p">,</span> <span class="s1">'approved'</span><span class="p">)</span> <span class="k">ORDER</span> <span class="k">BY</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"size_id"</span> <span class="k">ASC</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="s1">'reserved'</span><span class="p">,</span> <span class="s1">'sold'</span><span class="p">,</span> <span class="s1">'approved'</span><span class="p">)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">001</span><span class="p">)</span> <span class="k">SELECT</span> <span class="k">DISTINCT</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"color_id"</span> <span class="k">FROM</span> <span class="ss">"goods_goods"</span> <span class="k">WHERE</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"status"</span> <span class="k">IN</span> <span class="p">(</span><span class="s1">'reserved'</span><span class="p">,</span> <span class="s1">'sold'</span><span class="p">,</span> <span class="s1">'approved'</span><span class="p">)</span> <span class="k">ORDER</span> <span class="k">BY</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"color_id"</span> <span class="k">ASC</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="s1">'reserved'</span><span class="p">,</span> <span class="s1">'sold'</span><span class="p">,</span> <span class="s1">'approved'</span><span class="p">)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">001</span><span class="p">)</span> <span class="k">SELECT</span> <span class="k">DISTINCT</span> <span class="ss">"handbooks_size"</span><span class="p">.</span><span class="ss">"row"</span><span class="p">,</span> <span class="ss">"handbooks_size"</span><span class="p">.</span><span class="ss">"size_type_id"</span><span class="p">,</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"size_id"</span> <span class="k">FROM</span> <span class="ss">"goods_goods"</span> <span class="k">LEFT</span> <span class="k">OUTER</span> <span class="k">JOIN</span> <span class="ss">"handbooks_size"</span> <span class="k">ON</span> <span class="p">(</span><span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"size_id"</span> <span class="o">=</span> <span class="ss">"handbooks_size"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">)</span> <span class="k">WHERE</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"status"</span> <span class="k">IN</span> <span class="p">(</span><span class="s1">'reserved'</span><span class="p">,</span> <span class="s1">'sold'</span><span class="p">,</span> <span class="s1">'approved'</span><span class="p">)</span> <span class="k">ORDER</span> <span class="k">BY</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"size_id"</span> <span class="k">ASC</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="s1">'reserved'</span><span class="p">,</span> <span class="s1">'sold'</span><span class="p">,</span> <span class="s1">'approved'</span><span class="p">)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">000</span><span class="p">)</span> <span class="k">SELECT</span> <span class="k">DISTINCT</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"season"</span> <span class="k">FROM</span> <span class="ss">"goods_goods"</span> <span class="k">WHERE</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"status"</span> <span class="k">IN</span> <span class="p">(</span><span class="s1">'reserved'</span><span class="p">,</span> <span class="s1">'sold'</span><span class="p">,</span> <span class="s1">'approved'</span><span class="p">)</span> <span class="k">ORDER</span> <span class="k">BY</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"season"</span> <span class="k">ASC</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="s1">'reserved'</span><span class="p">,</span> <span class="s1">'sold'</span><span class="p">,</span> <span class="s1">'approved'</span><span class="p">)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">000</span><span class="p">)</span> <span class="k">SELECT</span> <span class="k">DISTINCT</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"state"</span> <span class="k">FROM</span> <span class="ss">"goods_goods"</span> <span class="k">WHERE</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"status"</span> <span class="k">IN</span> <span class="p">(</span><span class="s1">'reserved'</span><span class="p">,</span> <span class="s1">'sold'</span><span class="p">,</span> <span class="s1">'approved'</span><span class="p">)</span> <span class="k">ORDER</span> <span class="k">BY</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"state"</span> <span class="k">ASC</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="s1">'reserved'</span><span class="p">,</span> <span class="s1">'sold'</span><span class="p">,</span> <span class="s1">'approved'</span><span class="p">)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">002</span><span class="p">)</span> <span class="k">SELECT</span> <span class="k">MAX</span><span class="p">(</span><span class="ss">"__col1"</span><span class="p">),</span> <span class="k">MIN</span><span class="p">(</span><span class="ss">"__col2"</span><span class="p">)</span> <span class="k">FROM</span> <span class="p">(</span><span class="k">SELECT</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"id"</span> <span class="k">AS</span> <span class="n">Col1</span><span class="p">,</span> <span class="k">CASE</span> <span class="k">WHEN</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"status"</span> <span class="o">=</span> <span class="s1">'sold'</span> <span class="k">THEN</span> <span class="mi">1</span> <span class="k">ELSE</span> <span class="mi">0</span> <span class="k">END</span> <span class="k">AS</span> <span class="ss">"x_order"</span><span class="p">,</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"price_sell"</span> <span class="k">AS</span> <span class="ss">"__col1"</span><span class="p">,</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"price_sell"</span> <span class="k">AS</span> <span class="ss">"__col2"</span> <span class="k">FROM</span> <span class="ss">"goods_goods"</span> <span class="k">WHERE</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"status"</span> <span class="k">IN</span> <span class="p">(</span><span class="s1">'reserved'</span><span class="p">,</span> <span class="s1">'sold'</span><span class="p">,</span> <span class="s1">'approved'</span><span class="p">)</span> <span class="k">GROUP</span> <span class="k">BY</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span> <span class="k">CASE</span> <span class="k">WHEN</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"status"</span> <span class="o">=</span> <span class="s1">'sold'</span> <span class="k">THEN</span> <span class="mi">1</span> <span class="k">ELSE</span> <span class="mi">0</span> <span class="k">END</span><span class="p">)</span> <span class="n">subquery</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="s1">'sold'</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="s1">'reserved'</span><span class="p">,</span> <span class="s1">'sold'</span><span class="p">,</span> <span class="s1">'approved'</span><span class="p">,</span> <span class="s1">'sold'</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">001</span><span class="p">)</span> <span class="k">SELECT</span> <span class="k">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span> <span class="k">FROM</span> <span class="p">(</span><span class="k">SELECT</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"id"</span> <span class="k">AS</span> <span class="n">Col1</span><span class="p">,</span> <span class="k">CASE</span> <span class="k">WHEN</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"status"</span> <span class="o">=</span> <span class="s1">'sold'</span> <span class="k">THEN</span> <span class="mi">1</span> <span class="k">ELSE</span> <span class="mi">0</span> <span class="k">END</span> <span class="k">AS</span> <span class="ss">"x_order"</span> <span class="k">FROM</span> <span class="ss">"goods_goods"</span> <span class="k">WHERE</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"status"</span> <span class="k">IN</span> <span class="p">(</span><span class="s1">'reserved'</span><span class="p">,</span> <span class="s1">'sold'</span><span class="p">,</span> <span class="s1">'approved'</span><span class="p">)</span> <span class="k">GROUP</span> <span class="k">BY</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span> <span class="k">CASE</span> <span class="k">WHEN</span> <span class="ss">"goods_goods"</span><span class="p">.</span><span class="ss">"status"</span> <span class="o">=</span> <span class="s1">'sold'</span> <span class="k">THEN</span> <span class="mi">1</span> <span class="k">ELSE</span> <span class="mi">0</span> <span class="k">END</span><span class="p">)</span> <span class="n">subquery</span><span class="p">;</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="s1">'sold'</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="s1">'reserved'</span><span class="p">,</span> <span class="s1">'sold'</span><span class="p">,</span> <span class="s1">'approved'</span><span class="p">,</span> <span class="s1">'sold'</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="p">[</span><span class="mi">15</span><span class="o">/</span><span class="n">Jun</span><span class="o">/</span><span class="mi">2017</span> <span class="mi">11</span><span class="p">:</span><span class="mi">03</span><span class="p">:</span><span class="mi">49</span><span class="p">]</span> <span class="ss">"GET /goods HTTP/1.0"</span> <span class="mi">200</span> <span class="mi">32583</span>
</pre></div>
<h3 id="django-debug-toolbar">Django Debug Toolbar</h3>
<p><a href="http://django-debug-toolbar.readthedocs.io/en/stable/">Это</a> Django приложение, которые предоставляет набор панелей,
некоторые из которых удобно использовать для профилирование. По умолчанию включена SQL панель, которая предоставляет
даже больше информации чем стандартное логирование Django. Некоторые дополнительные возможности: временная диаграмма
запросов, traceback, просмотр результатов и <code>EXPLAIN</code> каждого запроса.</p>
<p><img alt="DDT" src="/media/2017/6/ddt.png"/></p>
<p>DDT также поставляется с отключенной по умолчанию панелью для профилирования. Эта панель отображает результаты профилирования
в удобном web-интерфейсе. Для включения панели добавьте <code>debug_toolbar.panels.profiling.ProfilingPanel</code> в
список <code>DEBUG_TOOLBAR_PANELS</code> в <code>settings.py</code>.</p>
<p><img alt="DDT profiling panel" src="/media/2017/6/ddt-profiling-panel.png"/></p>
<h3 id="silk">Silk</h3>
<p>Еще один отличный пакет, который особенно пригодится если у вас API и соответственно DDT нельзя использовать.
Как установить и настроить пакет можно посмотреть на <a href="https://github.com/django-silk/silk#installation">github проекта</a>.</p>
<p><img alt="silky-screenshot.png" src="/media/2017/6/silky-screenshot.png"/></p>
<p>После установки и настройки перезагрузите сервер и перейдите по URL: <code>/silk/</code>. По этому адресу должен быть доступен
web-интерфейс, который показывает:</p>
<ul>
<li>Статистику по запросам (в разрезе метод/URL с возможностью просмотра отдельных запросов),</li>
<li>просмотр SQL запросов,</li>
<li>просмотр результатов профилирования.</li>
</ul>
<p>Профайлер можно включить для всего проекта установив <code>SILKY_PYTHON_PROFILER = True</code> в <code>settings.py</code>. Или использовать
только в определенных местах, заключив профилируемый код в декоратор или контекст процессор:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">silk.profiling.profiler</span> <span class="kn">import</span> <span class="n">silk_profile</span>
<span class="nd">@silk_profile</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">'View Blog Post'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">post</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="n">post_id</span><span class="p">):</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">Post</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">pk</span><span class="o">=</span><span class="n">post_id</span><span class="p">)</span>
<span class="k">return</span> <span class="n">render_to_response</span><span class="p">(</span><span class="s1">'post.html'</span><span class="p">,</span> <span class="p">{</span>
<span class="s1">'post'</span><span class="p">:</span> <span class="n">p</span>
<span class="p">})</span>
<span class="k">def</span> <span class="nf">post</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="n">post_id</span><span class="p">):</span>
<span class="k">with</span> <span class="n">silk_profile</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">'View Blog Post #</span><span class="si">%d</span><span class="s1">'</span> <span class="o">%</span> <span class="bp">self</span><span class="o">.</span><span class="n">pk</span><span class="p">):</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">Post</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">pk</span><span class="o">=</span><span class="n">post_id</span><span class="p">)</span>
<span class="k">return</span> <span class="n">render_to_response</span><span class="p">(</span><span class="s1">'post.html'</span><span class="p">,</span> <span class="p">{</span>
<span class="s1">'post'</span><span class="p">:</span> <span class="n">p</span>
<span class="p">})</span>
</pre></div>
<h3 id="testovye-dannye">Тестовые данные</h3>
<p>Очень важно использовать для профилирования данные похожие на те, что используются в production. В идеале нужно взять бекап
с production сервера, развернуть его на локальной машине и использовать эти данные для профилирования проекта. Если вы
попробуете профилировать проект на пустой/маленькой базе данных, вероятно, вы получите некорректный результат, который
не будет соответствовать реальным проблемам на боевом окружении, что не поможет выполнить нужные оптимизации.</p>
<h2 id="nagruzochnoe-testirovanie_1">Нагрузочное тестирование</h2>
<p>После оптимизации хорошей идеей будет провести нагрузочное тестирование, чтобы убедится, что уровень производительности
приложения соответствует реальной (или ожидаемой) нагрузке или SLA. Для этого типа тестирования вам потребуется окружение
аналогичное используемому на production. К счастью облачные сервисы и автоматизированная сборка проектов позволяют
разворачивать такое окружение за считанные минуты.</p>
<p>Рекомендую использовать <a href="http://locust.io/">Locust</a> для нагрузочного тестирования. Главное преимущество Locust,
что тесты описываются в виде Python кода. Можно настраивать сложные сценарии тестирования, чтобы максимально
приблизить нагрузку к той, которую генерируют реальные пользователи. Пример <code>locustfile.py</code>:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">locust</span> <span class="kn">import</span> <span class="n">HttpLocust</span><span class="p">,</span> <span class="n">TaskSet</span><span class="p">,</span> <span class="n">task</span>
<span class="k">class</span> <span class="nc">UserBehavior</span><span class="p">(</span><span class="n">TaskSet</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">on_start</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="sd">""" on_start is called when a Locust start before any task is scheduled """</span>
<span class="bp">self</span><span class="o">.</span><span class="n">login</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">login</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">client</span><span class="o">.</span><span class="n">post</span><span class="p">(</span><span class="s2">"/login"</span><span class="p">,</span> <span class="p">{</span><span class="s2">"username"</span><span class="p">:</span><span class="s2">"ellen_key"</span><span class="p">,</span> <span class="s2">"password"</span><span class="p">:</span><span class="s2">"education"</span><span class="p">})</span>
<span class="nd">@task</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">index</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">client</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">"/"</span><span class="p">)</span>
<span class="nd">@task</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">profile</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">client</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">"/profile"</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">WebsiteUser</span><span class="p">(</span><span class="n">HttpLocust</span><span class="p">):</span>
<span class="n">task_set</span> <span class="o">=</span> <span class="n">UserBehavior</span>
<span class="n">min_wait</span> <span class="o">=</span> <span class="mi">5000</span>
<span class="n">max_wait</span> <span class="o">=</span> <span class="mi">9000</span>
</pre></div>
<p>Также Locust предоставляет web-интерфейс для запуска тестов и просмотра результатов:</p>
<p><img alt="Locust web interface" src="/media/2017/6/locust-screenshot.png"/></p>
<p>Лучше всего то, что можно настроить Locust один раз и использовать для тестирования производительности после каждого
вносимого изменения. Возможно вы даже сможете добавить его в ваш CI/CD pipeline.</p>
<h2 id="nastroiki-django">Настройки Django</h2>
<p>В этом разделе мы рассмотрим настройки Django, которые могут повлиять на производительность.</p>
<h3 id="ttl-soedineniia-s-bd">TTL соединения с БД</h3>
<p>По умолчанию Django закрывает соединение с БД после завершения каждого запроса. Можно настроить TTL соединения с БД,
изменив значение параметра <a href="https://docs.djangoproject.com/en/1.11/ref/settings/#conn-max-age"><code>CONN_MAX_AGE</code></a>:</p>
<ul>
<li><code>0</code> - закрывать соединение после выполнения каждого запроса</li>
<li><code>> 0</code> - TTL в секундах,</li>
<li><code>None</code> - неограниченное TTL.</li>
</ul>
<div class="highlight"><pre><span></span>DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql',
'NAME': 'mydatabase',
'USER': 'mydatabaseuser',
'PASSWORD': 'mypassword',
'HOST': '127.0.0.1',
'PORT': '5432',
'CONN_MAX_AGE': 60 * 10, # 10 minutes
}
}
</pre></div>
<h3 id="keshirovanie-shablonov">Кэширование шаблонов</h3>
<p>Если вам приходится использовать Django версии меньше чем 1.11, то вы можете рассмотреть включение кэширования шаблонов.
По умолчанию, Django (<1.11) считывает и компилирует шаблоны каждый раз, когда они рендерятся. Можно использовать
загрузчик <code>django.template.loaders.cached.Loader</code> для включения кэширования шаблонов в памяти. Отредактируйте в
<code>settings.py</code>:</p>
<div class="highlight"><pre><span></span>TEMPLATES = [
{
'BACKEND': 'django.template.backends.django.DjangoTemplates',
'DIRS': [os.path.join(BASE_DIR, 'foo', 'bar'), ],
'OPTIONS': {
# ...
'loaders': [
('django.template.loaders.cached.Loader', [
'django.template.loaders.filesystem.Loader',
'django.template.loaders.app_directories.Loader',
]),
],
},
},
]
</pre></div>
<h3 id="redis-kak-khranilishche-kesha">Redis как хранилище кэша</h3>
<p>Django предоставляет несколько вариантов хранилищ для кэша, например, БД, файловая система и тд. Рекомендую хранить кэш
в Redis - популярное хранилище объектов в памяти, с большой вероятностью вы уже используете его в своем проекте.
Для настройки Redis, как хранилища кэша нам нужно будет установить сторонний пакет, например <code>django-redis</code>.</p>
<p>Устанавливаем django-redis при помощи pip:</p>
<div class="highlight"><pre><span></span>pip install django-redis
</pre></div>
<p>Добавьте настройки кэша в <code>settings.py</code>:</p>
<div class="highlight"><pre><span></span>CACHES = {
"default": {
"BACKEND": "django_redis.cache.RedisCache",
"LOCATION": "redis://127.0.0.1:6379/1",
"OPTIONS": {
"CLIENT_CLASS": "django_redis.client.DefaultClient",
}
}
}
</pre></div>
<p>Читайте полную документацию <a href="http://niwinz.github.io/django-redis/latest/">здесь</a>.</p>
<h3 id="khranilishche-sessii">Хранилище сессий</h3>
<p>По умолчанию Django хранит сессии в БД. Для ускорения не помешает хранить сессии в кэше. Добавьте следующее
в <code>settings.py</code>:</p>
<div class="highlight"><pre><span></span>SESSION_ENGINE = "django.contrib.sessions.backends.cache"
SESSION_CACHE_ALIAS = "default"
</pre></div>
<h3 id="udalenie-nenuzhnykh-middleware">Удаление ненужных middleware</h3>
<p>Проверьте список используемых middleware (<code>MIDDLEWARE</code> в <code>settings.py</code>). Убедитесь, что там нет ничего не нужного.
Django вызывает каждый middleware для каждого обрабатываемого запроса, так что накладные расходы могут быть значительными.</p>
<p>Если у вас есть какой-либо кастомный middleware, который используется не для всех запросов, попробуйте вынести его
функциональность в mixin для view или декоратор. Это позволит избавится от задержек при обработке остальных запросов,
которые не требуют такой функциональности.</p>