Object

Merging of OIFits files

OIFits format

This format carry observation data. Two parts types are present in it: observation data and referential.

Referential data, contains descriptions of:

  • observed objects (table OI_TARGET),
  • type of light observed (table OI_WAVELENGTH),
  • station used to do observation (table OI_ARRAY)

Observation data (column OI_DATA): contains observed values, with link to referential tables.

By extension, these data can be of different natures: tables OI_VIS, OI_VIS2, OI_T3

Merging issues

The questionable point of this operation is the ambiguity in the way data are linked to referential tables.

There is no absolute referential for then name of target object, type of light, station index. All are relative to the partial referential of each file.

To process the merge, a good solution for the global process is to create result file by copying the first file and then integrate each part of the second file in it.

Process of OI_TARGET

Data point to target by an id (column TARGET_ID) which is relative to the file, id can not be used to find common Target in the 2 files.

For merging of this part it will be necessary to make a coherent final referential by changing some id and report this change in data part.

It is maybe possible to use name of target (column TARGET) be with no absolute guaranty because it is a weak key with no normalisation.

A normalisation could be done, like for example lowercasing and deletion of unsignificant characters (space, tab, etc)

Procedure may be:

  • copy OI_TARGET content of the first file into result file to start the table (partial referential used as start for future referential as it is coherent by its own)
  • browse all lines of it to get a map (name_id1) giving TARGET_ID of each name (TARGET) and to get the greater id (max_id) in this table.
  • create a map (id_map) to link old id with future ids for second files
  • for each line of OI_TARGET of second file:
    • searching if the target is already present in new OI_TARGET: with same or similar name (TARGET), or position, ...
    • if search is positive:
      • add in the map the id link giving found target id (by map name_id1) of the new table with old id target in the second file
    • else:
      • increment max_id to get new unused id
      • create a new line in the new table by copying content of the line of second file but with the new id as TARGET_ID
      • add in the map the id link giving found target id of the new table with old id target in the second file
  • return map of link between old id and new id (target_ids_map)

[Could the same object positioned in different coordinate system be on several lines in OI_TARGET ?] -- GillesDuvert - 24 May 2018 : There are no "different coordinate systems" permitted, only the RA,DEC coordinate system. The values however may differ, for the same star, if the EPOCH is different. I suggest to consider EPOCHS different from 2000.0 to be an error (even if the norms says it's just recommended) and in that rare case just treat this as two objects. Until we have a big problem, which may never happen.

Process of OI_WAVELENGTH

For this part the problem will be on the name (KEYWORD_INSNAME) as there is no id present.

Some treatment could be done on the name to determine if a WL is already present in new referential.

The process could be the same than for OI_TARGET with return of a map giving link to the name in new referential for each old name of file 2 (wlnames_map)

[Should a same INSNAME give several lines in new table if wave band is different ?] -- GillesDuvert - 24 May 2018 : the only case I see that will be frequent is the following:

  • An instrument produces OIFITS with always the same INSNAME.
  • However, it recalibrates regularly its spectrograph and both EFF_WAVE and EFF_BAND change with time. Eventually the number of wavelengths may change.
  • This should be handled as several different instruments (INSNAME_date1, INSNAME_date2 etc) if the differences are significative. different number of wavelengths: clearly different INSNAME. The EFF_BAND, which is markedly badly measured, should be ignored in these comparisons. A robust EFF_BAND estimate is just the increment between two successive EFF_WAVES. Then, a difference of EFF_WAVEs of 1/3 of EFF_BAND is still, IMHO, compatible with having only one INSNAME, not 2.

Process of OI_ARRAY

This one is more complicated as link is done by several fields with some treatment. OI_DATA point to OI_ARRAY by the name of installation (OI_ARRAY) and id of station (STA_INDEX) But id of station is a simple number in OI_ARRAY, and a combination of several in OI_DATA -- GillesDuvert - 24 May 2018 : That is not a problem as, for each individual oifits, there is a bijection between the STA_INDEX and the indexes in the OI_ARRAY

Command line

first argument: command: merge, list, filter, normalize, ...

command: merge, list, ...

-o: output, name of result file

list of input files

Development

Sprint 1

  • utility method which takes 2 OIFitsFile structures and return a new one created from scratch ... DONE
  • merge of 2 files only if they handle the same target (error otherwise) ... DONE
  • OI_WAVELENGTH are not merged, for those having the same name, an index is added as suffix: _<%idx> ... DONE
  • OI_ARRAY are not merged, for those having the same name, an index is added as suffix: _<%idx> ... DONE

Sprint 2 : SPLIT AND MERGE (June 2018)

Run 1 :

Goals (discussed on 22th may)

  • setup a first command line program to:
    • list content of given files ... DONE
      • reuse OIFitsViewer code (TSV)
    • convert input file to an ouput file (clean / normalize)
      • (next option could be to give specific version)
    • possibility to merge more than 2 files... DONE

Result (discussed on 29th may)

  • first CLI cover initial GOALS wink
  • current limitation of merge : only one single target supported

Run2 :

Goals (discussed on 29th may)

  • Support parameters to filter multiple files before merging
    • by insname
    • by target
  • Validate before saving ( provide an option ?)
  • Support also OIFITS V2 as output
    • how to merge header (mandatory vs specific to the content of one or more files), OI_FLUX (should be ok, to verify), OI_INSPOL (postponed), OI_CORR (may be like OI_WAVELENGHT/OIARRAY)
    • implement tests

Result (to be discussed on 5th june)

  • Support parameters to filter multiple files before merging
    • by insname : DONE but limited to exact values
    • by target : TODO but medium priority
  • Validate before saving ( provide an option ?)
  • Support also OIFITS V2 as output
    • how to merge header (mandatory vs specific to the content of one or more files), OI_FLUX (should be ok, to verify), OI_INSPOL (postponed), OI_CORR (may be like OI_WAVELENGHT/OIARRAY)
    • implement tests

Notes:

  • we will have to figure Gravity Dual Feed data that can get 2 targets
    • goal is to split frindge tracker data vs science data

Run3 :

Goals (discussed on 5th june)

  • technical refactoring
    • define objects to handle data selection ... related to granules
    • add more information messages during the data processing
  • try to handle dual feed gravity like data (only one target per oidata table) to split target's data using insname filter
    • i.e. do not throw an error when analysing data for merge

Run4 :

Result (discussed on 15th june)

  • still on refactoring thinking

Goals (discussed on 15th june - next expected on 21st june)

  • Main action:
    • Enhance the OIE's GUI so that we can save a selection (probably done in the current botom left tree). The instrument level must be added so we can choose a specific one.
  • technical refactoring continues
  • TO CONTINUE ? try to handle dual feed gravity like data (only one target per oidata table) to split target's data using insname filter
    • i.e. do not throw an error when analysing data for merge

Run5 :

Result (discussed on 21st june)

  • still on refactoring thinking : but it seems to be more ubvious

Questions:

  • can we get two OIData table with the same OI_WAVELENGTH but distrinct OI_ARRAY ? * YES !
  • Should we always be able to provide the original oifits file (highlighted) for a given subset ? * NO

Notes:

  • No routine that gather closed OI_WAVL
  • We can also think on merging multiple OI_ARRAY that will solve the first previous question
  • advanced filter (formulae criteria..) could come in the future

Goals

  • Main action:
    • Enhance the OIE's GUI so that we can save a selection (probably done in the current botom left tree). The instrument level must be added so we can choose a specific one.
  • technical refactoring continues
  • TO CONTINUE ? try to handle dual feed gravity like data (only one target per oidata table) to split target's data using insname filter
    • i.e. do not throw an error when analysing data for merge

Future Sprints...

  • merge, on some criteria to be precisely establish, referential data of OI_TARGET, OI_WAVELENGTH, OI_ARRAY, dates (nightIds)
    • Handle wildcards for given insname / target names ?
    • do not repeat duplicated tables (OI_WAVE, OI_ARRAY) ?
      • yes, do not repeat if mesurements are not exactly the same values by default but user can also request data deduplicating ( closed targets, close backend configuration )
      • 1/10 for deltalambda ?
      • WHAT for oi_array
      • WHAT for oi_target
    • do not support target selection given to position.
  • provide such processing in the OIFitsExplorer's GUI
Edit | Attach | Watch | Print version | History: r22 < r21 < r20 < r19 < r18 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r22 - 2018-06-21 - GuillaumeMella
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback