Skip to content
Snippets Groups Projects

Bank Merge Feature

Merged James Kennington requested to merge feature-bank-zip into main
2 unresolved threads

This MR introduces a few new features to the Bank class related to template identifier assignment.

  1. Reserved identifiers, that are removed from the identifier sample space prior to assignment
  2. Reassign identifiers, to ignore current ids and reassign from a new sample space
  3. Merge banks together to produce a new bank
  4. CLI script for running two workflows: single-output or multi-output (see below section)

Fixes:

Console API Examples

See the examples explicitly under /path/to/manifold/examples/bank-merge

Single-Output Mode

This example demonstrates how to use the merge function to merge two template banks. See the example directory manifold/examples/bank-merge/single-output. Merging these banks in the "single output" mode will produce a new bank object "bank_all" with the following properties:

  • Union: bank_all.rectangles will contain the union of the all input banks' rectangles.
  • No Collision: No two rectangles in bank_all will have the same identifier.
  • Preservation: The subset of templates in bank_all coming from the first input bank will have the same identifiers as they did in the first bank.

To run this example (from this directory) without the config file, run the following command:

manifold_cbc_bank_merge --input-banks inp1.h5 inp2.h5 inp3.h5 --output-bank bank_all.h5 --mode single

The below table summarizes the template identifier behavior in this example (note, the ids are chosen here arbitrarily to illustrate the above principles, but in practice will be randomly selected from a much larger range):

Bank File Template IDs
inp1.h5 0, 1
inp2.h5 0, 1
inp3.h5 0, 1
bank_all.h5 0, 1, 2, 3, 5, 6

Multi-Output Mode

This example demonstrates how to use the merge function to merge multiple template banks into separate files. See the example directory manifold/examples/bank-merge/multi-output. Merging these banks in the "multi output" mode will produce one new bank object per input bank object with the following properties:

  • Disjoint Union: bank_i.rectangles will contain the original rectangles of the i-th input.
  • No Collision: No two rectangles in any output banks will have the same identifier.
  • Preservation: The templates in the first output bank will have the same identifiers as they did in the first bank.

To run this example (from this directory) without the config file, run the following command:

manifold_cbc_bank_merge --input-banks inp1.h5 inp2.h5 inp3.h5 --output-bank out1.h5 out2.h5 out3.h5 --mode multi

The below table summarizes the template identifier behavior in this example (note, the ids are chosen here arbitrarily to illustrate the above principles, but in practice will be randomly selected from a much larger range):

Bank File Template IDs
inp1.h5 0, 1
inp2.h5 0, 1
inp3.h5 0, 1
out1.h5 0, 1
out2.h5 2, 3
out3.h5 4, 5

Python API Examples

Reserved IDs

Note that in the example below, the sample space for identifiers has been limited to {0, 1, 2}

sampled_ids = set()
for i in range(10):
    bank = cbc.Bank(rectangles=[_sample_rectangle([1.0, 1.0, 1.0], [1.1, 1.1, 1.1]),
                                _sample_rectangle([1.1, 1.1, 1.1], [1.2, 1.2, 1.2]), ])
    sampled_ids = sampled_ids.union(set(bank.ids))
assert sampled_ids == {0, 1, 2}

# Now sample with a "reserved" ID to ensure it is not used
sampled_ids = set()
for i in range(10):
    bank = cbc.Bank(rectangles=[_sample_rectangle([1.0, 1.0, 1.0], [1.1, 1.1, 1.1]),
                                _sample_rectangle([1.1, 1.1, 1.1], [1.2, 1.2, 1.2]), ],
                    reserved_ids=[2])
    sampled_ids = sampled_ids.union(set(bank.ids))
assert sampled_ids == {0, 1}

Merging Banks

# Create some sample rectangles
r1 = _sample_rectangle([1.0, 1.0, 1.0], [1.1, 1.1, 1.1])
r2 = _sample_rectangle([1.1, 1.1, 1.1], [1.2, 1.2, 1.2])
r3 = _sample_rectangle([1.2, 1.2, 1.2], [1.3, 1.3, 1.3])
r4 = _sample_rectangle([0.9, 0.9, 0.9], [1.0, 1.0, 1.0])
r5 = _sample_rectangle([1.4, 1.4, 1.4], [1.5, 1.5, 1.5])
r6 = _sample_rectangle([1.5, 1.5, 1.5], [1.6, 1.6, 1.6])

with mock.patch('manifold.sources.cbc.Bank.MAX_TEMPLATE_ID', 6):
    # The above limits the possible IDs to {0, 1, 2, 3, 4, 5}
    special_bank = cbc.Bank(rectangles=[r1, r2], ids=[0, 1])
    bank2 = cbc.Bank(rectangles=[r3, r4], ids=[2, 3])
    bank3 = cbc.Bank(rectangles=[r5, r6], ids=[4, 5])
    bank_all = special_bank.merge(bank2, bank3)

# Check that all rectangles are present
assert len(bank_all.rectangles) == 6
for r in [r1, r2, r3, r4, r5, r6]:
    assert r in bank_all.rectangles

# Check that original IDs of special bank are preserved
assert bank_all.rectangles[bank_all.ids_map[0]] == r1
assert bank_all.rectangles[bank_all.ids_map[1]] == r2

# Check that original IDs of other banks are not preserved
matches = []
for r, oid in zip():
    if bank_all.rectangles[bank_all.ids_map[oid]] == r:
        matches.append(oid)
assert len(matches) < 4
Edited by James Kennington

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • added 1 commit

    Compare with previous version

  • James Kennington changed the description

    changed the description

  • requested review from @chad-hanna and @shio.sakon

  • added 1 commit

    Compare with previous version

    • @james.kennington I think this looks good. I only really looked at the reserved ids and merge features. I am curious in practice how we will ensure that the constraints are same in banks that are to be merged. Generally they won't be if it is e.g., an SSM and All Sky bank. What are your thoughts on that? Does the user just have to update the constraints on each bank to be the superset first?

    • I was 50/50 when adding that part. Figured it might be good practice to have some specific condition for constraints. Also, I just realized that the constraints attr of the Bank is actually just a dict of coordinate -> (min, max), (I thought it was the functions themselves). It would be possible for the merge function to effectively "union" the boundaries of constraints automatically. For instance:

      new_constraints = {coord: (min([b.constraints[coord] for b in banks]), 
                                 max([b.constraints[coord] for b in banks]))
                         for coord in self.constraints}

      Thoughts?

    • Please register or sign in to reply
    • @james.kennington We can discuss this on zoom, but my questions are the following:

      • Can the assign_ids function still be used as it currently is? I.e., that function randomly assigns template ids to templates after templates are generated since we need template ids. Does the new assign_ids function still do that when a bank is being generated, not only when we give it the "reserved" bank and "reassign" bank?
      • In the example you have in the overview of this MR, the max template id value is assumed to be equal to the number of templates. (assert len(bank_all.rectangles) == 6) However, this line would mean that the max template id value is not necessarily equal to the number of templates in a bank, right? Does the max template id value have to be equal to the number of templates in a bank?
      • I might be misunderstanding how we want to do mass models and pastros so I would like @chad-hanna 's opinion here. I thought what needs to happen is that we have the AllSky bank (= "reserved" bank) and the online SSM bank (= "reassign" bank), assign mass models and pastros to them given they have different temp ids in both banks, but in searches we run the banks separately. Then, my question would be: will the two individual banks that go into the merged bank still be distinguishable even after assigning different template IDs?
    • Good questions, will answer here and we can discuss further if helpful:

      1. All default behavior has been preserved. Yes, the assign_ids function can still be used (and is still used on Bank creation) the same way as it was before, given the same arguments.
      2. Enforcing MAX_ID = len(templates) was only done for testing, to ensure that certain test criteria were attributable to proper behavior and not random sampling on a larger space. Specifically, this was useful in making sure the non-reserved ids get reassigned.
      3. Great question, "will input banks be distinguishable after merge." The answer: we can choose either. Currently, no. It's easy to add a feature that "keeps track" of the source bank of each template. Under the hood, this amounts to two different operations: union, or disjoint-union. The latter generates a bundle, with an "index" (projection map) to track the source set. We could easily achieve this by adding a bundle attr to the Bank class, something like the below:
      class Bank:
      
          def __init__(..., source_map: dict = None):
              ...
              self.source_map = source_map
      
          def merge(self, *others, ...):
              ...
              source_map = {}
              for other in others:
                  ...
                  source_map.update({r: other for r in other.rectangles})
      
              ...
              return Bank(..., source_map=source_map)

      Note, we could also invert the source map and keep track of source equivalence classes (perhaps more efficient depending on downstream usage). Example:

      source_map[other] = other.rectangles 
    • @james.kennington

      1. Sounds good!
      2. Oh ok!
      3. Yeah, the bundle would be great, I think. 4.I'd want @chad-hanna 's opinion, but I think we'd want to be able to run the two banks separately, after assigning unique template ids to all the templates in the two banks.
    • Please register or sign in to reply
  • added 1 commit

    Compare with previous version

  • added 1 commit

    • 5091ab97 - add script for profiling existing bank

    Compare with previous version

  • James Kennington added 6 commits

    added 6 commits

    Compare with previous version

  • James Kennington changed the description

    changed the description

  • added 1 commit

    Compare with previous version

  • James Kennington added 2 commits

    added 2 commits

    • 10e7ecff - fix constraint reconciliation and add test
    • 3d4f0b6e - wip change to example multi

    Compare with previous version

  • @shio.sakon confirmed the CLI worked as expected, all unit tests pass

  • James Kennington mentioned in commit fa5a5178

    mentioned in commit fa5a5178

  • Wanting Niu mentioned in epic gstlal&179

    mentioned in epic gstlal&179

Please register or sign in to reply
Loading