Generating diagrams from relational databases

As a first step to understand a database schema people usually study a database schema diagram which depicts the relationships between different kinds of objects that are modeled in the database. Sometimes, though, looking at a database schema diagram is not enough due to the complexity of the relations between data. Then, it is useful to insert some test data and observe how the relationships are actualized on these test data. Unfortunately, I could not find a utility to transform a database filled with data into a diagram that displays the actual relations between the database’s data.

You can think of database schema diagrams as class diagrams in UML. There are plenty of utilities to generate those. But, actual database diagrams are more like object diagrams and I have not found utilities to generate them. So, below is the source code of a utility that generates a dot file, which can be transformed through GraphViz into a nice diagram, like the one below. Most of the specific user parametrization ought to be done to the tableInfo dictionary and to the links array.

Sample data diagram for a blog database

import sqlite3
conn = sqlite3.connect(<path to database>)
def dict_factory(cursor, row):
  d = {}
  for idx,col in enumerate(cursor.description):
    d[col[0]] = row[idx]
  return d
conn.row_factory = dict_factory

tableInfo = {
  'comment' : {'color':'#ff0000', 'spec':'comment: %(id)s'},
  'post'    : {'color':'#00ff00', 'spec':'post: %(title)s'},
  'tag'     : {'color':'#0000ff', 'spec':'tag: %(name)s'},
  'user'    : {'color':'#ffff00', 'spec':'user: %(username)s'},
}

links = [
  [{'table':'comment','keys':['postId']},
   {'table':'post','keys':['id']}],

  [{'table':'post','keys':['authorId']},
   {'table':'user','keys':['id']}],

  [{'table':'post','keys':['id']},
   {'table':'posttag','keys':['postId']},
   {'table':'posttag','keys':['tagId']},
   {'table':'tag','keys':['id']}],
]

def getDot(table1, row1, table2, row2, tableInfo):
  name1 = tableInfo[table1]['spec'] % row1
  name2 = tableInfo[table2]['spec'] % row2
  name1 = '"%s"' % name1;
  name2 = '"%s"' % name2;
  ret = '%s--%s\n' % (name1,name2)
  ret += '%s[fillcolor="%s"]\n' % (name1,tableInfo[table1]['color'])
  ret += '%s[fillcolor="%s"]' % (name2,tableInfo[table2]['color'])
  return ret

def isMultiEqual(row1, row2, fields1, fields2):
  for i in xrange(len(fields1)):
    if (row1[fields1[i]] != row2[fields2[i]]):
      return False
  return True

def getMatches(linkParticipant1, linkParticipant2):
  cursor = conn.cursor()
  cursor.execute('select * from %s' % linkParticipant1['table'])
  table1 = cursor.fetchall()
  cursor.execute('select * from %s' % linkParticipant2['table'])
  table2 = cursor.fetchall()

  matches = []
  fields1 = linkParticipant1['keys']
  fields2 = linkParticipant2['keys']
  for row1 in table1:
    for row2 in table2:
      if (isMultiEqual(row1, row2, fields1, fields2)):
        matches.append([row1, row2])
  return matches

print 'graph{'
print 'node [style="filled"]'
for link in links:
  if (len(link) == 2):
    for match in getMatches(link[0], link[1]):
      print getDot(link[0]['table'], match[0], link[1]['table'], match[1], tableInfo)
  elif (len(link) == 4):
    matches1 = getMatches(link[0], link[1])
    matches2 = getMatches(link[2], link[3])
    fields = link[1]['keys']
    fields.extend(link[2]['keys'])
    for match1 in matches1:
      for match2 in matches2:
        if (isMultiEqual(match1[1], match2[0], fields, fields)):
          print getDot(link[0]['table'], match1[0], link[3]['table'], match2[1], tableInfo)

print '}'

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: