Django

Code

Ticket #3454 (closed: fixed)

Opened 2 years ago

Last modified 2 years ago

sqlite backend is using row_factory when it should be using text_factory

Reported by: Brian Harring <ferringb@gmail.com> Assigned to: adrian
Milestone: Component: Database layer (models, ORM)
Version: SVN Keywords: unicode-branch
Cc: Triage Stage: Accepted
Has patch: 1 Needs documentation: 0
Needs tests: 0 Patch needs improvement: 1

Description

currently, sqlite has

def utf8rowFactory(cursor, row):
  def utf8(s):
    if type(s) == unicode: return s.encode("utf-8")
    return s
  return [utf8(r) for r in row]

for row_factory; problem here is that it's rebuilding each record regardless of whether or not the utf8 conversion is required. doing

Database.text_factory = lambda s:s.decode("utf-8")

limits the conversion to just TEXT objects.

This is a bit faster; that said, I'm wondering why the forced conversion- sqlite stores data in utf8, if

Database.text_factory = str

ware set, the whole decoding/encoding would be bypassed, and the native encoding (utf8) would be passed back.

In terms of performance, using Database.text_factory = lambda s:s.decode("utf-8") gains are dependant upon the column types; greater # of non-text fields, greater the gain.

Real gain is via turning off the encode/decode and using str directly (underlying utf8); same gain in terms of avoiding extra inspection, but avoids all the extra work.

Only downside to either change I can see is that raw sql queries would return str instead of sqlites unicode. Not really sure if this is an actual issue however (don't see any other such limitation in the backends).

Patch is attached for the encode/decode variant; unless there are good reasons, would just bypass the encoding/decoding entirely.

Attachments

patch (1.1 kB) - added by Brian Harring <ferringb@gmail.com> on 02/07/07 21:37:28.
use text_factory instead of row_factory for unicode conversion

Change History

02/07/07 21:37:28 changed by Brian Harring <ferringb@gmail.com>

  • attachment patch added.

use text_factory instead of row_factory for unicode conversion

02/10/07 04:37:27 changed by Simon G. <dev@simon.net.nz>

  • needs_better_patch changed.
  • stage changed from Unreviewed to Design decision needed.
  • needs_tests changed.
  • needs_docs changed.

02/10/07 05:02:40 changed by mtredinnick

  • needs_better_patch set to 1.
  • stage changed from Design decision needed to Accepted.

I don't think the use of decode() in this patch is correct. s.decode('utf-8') is for converting a UTF-8 encoded string into a unicode object, but I suspect you want to be converting unicode objects into UTF-8 for storage purposes (that was what we were doing previously).

We cannot bypass the encoding step entirely, because there is no guarantee at all that internal strings will be UTF-8 encoded streams of bytes (or immediately convertible to such by Python). We are going to move to unicode everywhere outside of the internal/external interfaces (which will be conversion points), but that hasn't happened yet.

Other than that, the motivation behind the patch looks like a good find. Thanks.

03/05/07 20:19:00 changed by ferringb@gmail.com

Yeah, that ought to be s.encode('utf-8').

Would suggest sticking a note in the sqlite backend code that when unicode is default, to just disable these conversions since sqlite already spits back unicode.

06/16/07 20:27:03 changed by mtredinnick

  • keywords set to unicode-branch.

This ticket has become a non-issue in the unicode branch (no converter is needed at all). Will close it when that branch is merged into trunk.

07/04/07 07:11:05 changed by mtredinnick

  • status changed from new to closed.
  • resolution set to fixed.

(In [5609]) Merged Unicode branch into trunk (r4952:5608). This should be fully backwards compatible for all practical purposes.

Fixed #2391, #2489, #2996, #3322, #3344, #3370, #3406, #3432, #3454, #3492, #3582, #3690, #3878, #3891, #3937, #4039, #4141, #4227, #4286, #4291, #4300, #4452, #4702


Add/Change #3454 (sqlite backend is using row_factory when it should be using text_factory)




Change Properties
Action