Skip to content

Bank Merge Feature

James Kennington requested to merge feature-bank-zip into main

This MR introduces a few new features to the Bank class related to template identifier assignment.

  1. Reserved identifiers, that are removed from the identifier sample space prior to assignment
  2. Reassign identifiers, to ignore current ids and reassign from a new sample space
  3. Merge banks together to produce a new bank
  4. CLI script for running two workflows: single-output or multi-output (see below section)

Fixes:

Console API Examples

See the examples explicitly under /path/to/manifold/examples/bank-merge

Single-Output Mode

This example demonstrates how to use the merge function to merge two template banks. See the example directory manifold/examples/bank-merge/single-output. Merging these banks in the "single output" mode will produce a new bank object "bank_all" with the following properties:

  • Union: bank_all.rectangles will contain the union of the all input banks' rectangles.
  • No Collision: No two rectangles in bank_all will have the same identifier.
  • Preservation: The subset of templates in bank_all coming from the first input bank will have the same identifiers as they did in the first bank.

To run this example (from this directory) without the config file, run the following command:

manifold_cbc_bank_merge --input-banks inp1.h5 inp2.h5 inp3.h5 --output-bank bank_all.h5 --mode single

The below table summarizes the template identifier behavior in this example (note, the ids are chosen here arbitrarily to illustrate the above principles, but in practice will be randomly selected from a much larger range):

Bank File Template IDs
inp1.h5 0, 1
inp2.h5 0, 1
inp3.h5 0, 1
bank_all.h5 0, 1, 2, 3, 5, 6

Multi-Output Mode

This example demonstrates how to use the merge function to merge multiple template banks into separate files. See the example directory manifold/examples/bank-merge/multi-output. Merging these banks in the "multi output" mode will produce one new bank object per input bank object with the following properties:

  • Disjoint Union: bank_i.rectangles will contain the original rectangles of the i-th input.
  • No Collision: No two rectangles in any output banks will have the same identifier.
  • Preservation: The templates in the first output bank will have the same identifiers as they did in the first bank.

To run this example (from this directory) without the config file, run the following command:

manifold_cbc_bank_merge --input-banks inp1.h5 inp2.h5 inp3.h5 --output-bank out1.h5 out2.h5 out3.h5 --mode multi

The below table summarizes the template identifier behavior in this example (note, the ids are chosen here arbitrarily to illustrate the above principles, but in practice will be randomly selected from a much larger range):

Bank File Template IDs
inp1.h5 0, 1
inp2.h5 0, 1
inp3.h5 0, 1
out1.h5 0, 1
out2.h5 2, 3
out3.h5 4, 5

Python API Examples

Reserved IDs

Note that in the example below, the sample space for identifiers has been limited to {0, 1, 2}

sampled_ids = set()
for i in range(10):
    bank = cbc.Bank(rectangles=[_sample_rectangle([1.0, 1.0, 1.0], [1.1, 1.1, 1.1]),
                                _sample_rectangle([1.1, 1.1, 1.1], [1.2, 1.2, 1.2]), ])
    sampled_ids = sampled_ids.union(set(bank.ids))
assert sampled_ids == {0, 1, 2}

# Now sample with a "reserved" ID to ensure it is not used
sampled_ids = set()
for i in range(10):
    bank = cbc.Bank(rectangles=[_sample_rectangle([1.0, 1.0, 1.0], [1.1, 1.1, 1.1]),
                                _sample_rectangle([1.1, 1.1, 1.1], [1.2, 1.2, 1.2]), ],
                    reserved_ids=[2])
    sampled_ids = sampled_ids.union(set(bank.ids))
assert sampled_ids == {0, 1}

Merging Banks

# Create some sample rectangles
r1 = _sample_rectangle([1.0, 1.0, 1.0], [1.1, 1.1, 1.1])
r2 = _sample_rectangle([1.1, 1.1, 1.1], [1.2, 1.2, 1.2])
r3 = _sample_rectangle([1.2, 1.2, 1.2], [1.3, 1.3, 1.3])
r4 = _sample_rectangle([0.9, 0.9, 0.9], [1.0, 1.0, 1.0])
r5 = _sample_rectangle([1.4, 1.4, 1.4], [1.5, 1.5, 1.5])
r6 = _sample_rectangle([1.5, 1.5, 1.5], [1.6, 1.6, 1.6])

with mock.patch('manifold.sources.cbc.Bank.MAX_TEMPLATE_ID', 6):
    # The above limits the possible IDs to {0, 1, 2, 3, 4, 5}
    special_bank = cbc.Bank(rectangles=[r1, r2], ids=[0, 1])
    bank2 = cbc.Bank(rectangles=[r3, r4], ids=[2, 3])
    bank3 = cbc.Bank(rectangles=[r5, r6], ids=[4, 5])
    bank_all = special_bank.merge(bank2, bank3)

# Check that all rectangles are present
assert len(bank_all.rectangles) == 6
for r in [r1, r2, r3, r4, r5, r6]:
    assert r in bank_all.rectangles

# Check that original IDs of special bank are preserved
assert bank_all.rectangles[bank_all.ids_map[0]] == r1
assert bank_all.rectangles[bank_all.ids_map[1]] == r2

# Check that original IDs of other banks are not preserved
matches = []
for r, oid in zip():
    if bank_all.rectangles[bank_all.ids_map[oid]] == r:
        matches.append(oid)
assert len(matches) < 4
Edited by James Kennington

Merge request reports