Database Subsetting Tool
SourceForge.net Logo
java.net Member Button: 90px wide

Data flow

architecture

  • Datamodel. Describes tables and associations.
  • Restrictionmodel. Restricts associations and controls the exportation.
  • Extractionmodel. Defines the subject and determines the restriction models.
  • Freedom. The database from which data is exported or deleted.
  • Jail. The database to import data into.

The Entity Graph 

During exportation a graph of entities will be constructed inside the database. For this three tables are required:
  • EntityGraph the graph
  • Entity entity in the graph
  • Dependency dependencies between entities

For each entity in the graph the primary-key and the type (as table name) is stored. The birthday of an entity is the day of the insertion into the graph.


Finding the transitive closure 

The graph will be constructed day after day (step-by-step) starting at day 1:
  • at day 1 all subject entities are inserted into the graph
  • at day n all entities referenced by one of the entities born at day n-1 are inserted into the graph. Dependencies between entities are stored.

The progression set of a day is the set of all tables from which entities are inserted at that day. The progession set of day 1 is the subject set. The process stops if the progression set of a day is empty.

Algorithm
   1. insert subject into graph 
   2. progression-set := { subject } 
   3. while progression-set != {} 
          1. progression-of-today := {}
          2. for all (restricted) associations A between tables T1 and T2, if T1 or T2 in progression-set:
                1. if T2 (T1) depends on T1 (T2):
                        add dependencies from T2(T1)-entities born yesterday
                        to the entities they are associated with according to A
                2. insert all entities of T1 (T2), which are associated according to A with an entity of T2 (T1) born yesterday
                3. if at least one entity of T1 (T2) is inserted then progression-of-today := progression-of-today + {  T1 (T2) }
          3. progression-set := progression-of-today

  Example

      Subject = { A1, A2 }

  
desing




day progression set entities dependencies
1 A A1, A2
2 B, C + B1, B2, C3, C4 A2->C3, A2->C4
3 B, C + C1, C2, B3, B5 + B1->C1, B1->C2
4 B + B6 no change
5
no change no change


SQL-Statements

Line 1, insert subject

INSERT INTO Entity(PK, birthday, type)

  SELECT <subject>.PK, 1, '<subject>'
  FROM <subject>
    WHERE (<condition> AND 103
       NOT EXISTS (SELECT * FROM Entity WHERE <subject>.PK=Entity.PK)
    [fetch first <limit> rows only]


Line 3.2.1, adding dependencies

INSERT INTO Dependency(from_PK, to_PK)
  SELECT <T2>.PK, <T1>.PK
  FROM Entity E1, Entity E2,
       <T2> JOIN <T1> ON <join-condition>
  WHERE E1.type='<T1>' AND E2.type='<T2>'
        AND E1.PK=<T1>.PK
        AND E2.PK=<T2>.PK


Line 3.2.2, adding referenced entities

INSERT INTO Entity(PK, birthday, type)
  SELECT DISTINCT <T2>.PK, <today>, '<T2>'
  FROM <T1> JOIN <T2> ON <join-condition>,
       Entity
  WHERE <T1>.PK=Entity.PK AND Entity.birthday = <yesterday>
       AND NOT EXISTS
          (SELECT * FROM Entity WHERE <T2>.PK=Entity.PK)