Auxiliary classes for image slicing#

Working with astronomical images in FITS (Flexible Image Transport System) format is a fundamental task in modern astronomy. However, one of the most common and persistent sources of programming errors when handling FITS files in Python stems from the different indexing and slicing conventions used across various systems. These differences, rooted in historical computing decisions, continue to confuse astronomers and programmers alike, leading to subtle bugs that can compromise scientific results.

Historical background

The FITS format was developed in the late 1970s by astronomers who needed a standardized way to exchange astronomical data between different institutions and computer systems. The first FITS standard was defined in 1981, at a time when FORTRAN was the dominant programming language in scientific computing. FORTRAN, developed by IBM in the 1950s, used 1-based indexing—arrays started at index 1, not 0. This was a natural choice for scientists and mathematicians who were accustomed to numbering things starting from 1. When the FITS standard was established, it inherited this convention, defining the first pixel of an image as having coordinates (1, 1).

In parallel, the C programming language, developed in the early 1970s, adopted 0-based indexing. This choice was influenced by the way arrays work at the memory level: the index represents an offset from the base address. An index of 0 means “zero offset from the start,” making it the most natural choice for low-level programming. As C became dominant in systems programming and eventually influenced many modern languages (including Python), 0-based indexing became the de facto standard in computer science. However, the astronomical community had already standardized on 1-based indexing through FITS.

Impact on Scientific Work

These indexing errors can have serious consequences:

  1. Wrong Regions Extracted: Analyzing the wrong part of an image leads to incorrect measurements

  2. Misaligned Data: When combining multiple images, index errors cause misalignment

  3. Coordinate Transformation Errors: Converting between pixel and world coordinates fails

  4. Reproducibility Issues: Published coordinates don’t match the extracted regions

  5. Subtle Bugs: Errors may not be obvious, especially in symmetric images or when differences are small

Summary of differences betwen Python-style and FITS-style indexing#

There are important differences between Python-style and FITS-style indices. For example, in the case of 2D images:
  • Python-style indexing:
    • intervals are given for NAXIS2 (first index) and NAXIS1 (second index)
    • indexes start at 0
    • the range number1:number2 runs from number1 to number2-1 when both number1 and number2 are > 0
  • FITS-style indexing (inherited from the FORTRAN programming language):
    • intervals are given for NAXIS1 (first index) and NAXIS2 (second index)
    • indexes start at 1
    • the range number1:number2 runs from number1 to number2, including both number1 and number2

These differences are an important source of errors, which in many cases are difficult to detect. To safeguard when writing code that handles FITS images (which have historically used the FORTRAN convention) and NumPy arrays (which naturally use the Python convention), teareduce defines classes that require explicitly specifying the convention being used. Subsequently, the corresponding instances have attributes that clearly indicate which convention is applied.

Auxiliary classes#

In particular, the following three classes have been defined: SliceRegion1D, SliceRegion2D, and SliceRegion3D. Each of these classes is intended for use with arrays of 1, 2, or 3 dimensions, respectively.

Instances of these classes are defined by specifying two parameters: region and mode.

The region parameter#

In all cases, this parameter can be provided using np.s[], slice() or as a string.

  • SliceRegion1D:

    • np.s_[num1:num2]

    • slice(num1, num2)

    • a string '[num1:num2]'

  • SliceRegion2D:

    • np.s_[num1:num2, num3:num4]

    • (slice(num1, num2), slice(num3, num4))

    • a string '[num1:num2, num3:num4]'

  • SliceRegion3D:

    • np.s_[num1:num2, num3:num4, num5:num6]

    • slice((num1, num2), slice(num3, num4), slice(num5, num6))

    • a string '[num1:num2, num3:num4, num5:num6]'

The mode parameter#

This parameter is a simple string that defines the convention assumed for region. It must take one of the following two values: 'fits' or 'python'.

Example of use#

Let’s illustrate its use for the two-dimensional case. The other two cases are similar.

import numpy as np
import teareduce as tea

Let’s define a region following the FITS convention. We use the three different ways to define the same region.

underscan_region1 = tea.SliceRegion2D(np.s_[6:45, 1:250], mode='fits')
underscan_region2 = tea.SliceRegion2D((slice(6, 45), slice(1, 250)), mode='fits')
underscan_region3 = tea.SliceRegion2D('[6:45, 1:250]', mode='fits')
underscan_region1 == underscan_region2
True
underscan_region1 == underscan_region3
True

Once a SliceRegion2D object is created, we can display the indices of the corresponding region using either the FITS or Python convention. To do this, we just need to use the .fits and .python attributes of these objects.

underscan_region1.fits
(slice(6, 45, None), slice(1, 250, None))
underscan_region1.python
(slice(0, 250, None), slice(5, 45, None))

We can verify that if we use the Python convention to define the same region, the result is the same.

underscan_region4 = tea.SliceRegion2D(np.s_[0:250, 5:45], mode='python')
underscan_region4 == underscan_region1
True