Reading structures of an Excel workbook

Posted6 April 2021

Updated27 August 2025

ByGathering Tools

Overview

This article attempts to provide the GT developer with an outline of how to build a GT questionnaire design workbook from a raw Excel workbook submitted by the functional designer (person or organisation issuing the request to build a GT questionnaire).

The raw Excel workbook should comply with the recommendations made in the project sheet. The cell texts, row heights, column widths, formatting, conditional formatting and finally, the formulas should be operational in Excel.

It often happens that, for existing collection processes using Excel workbooks, the switch to a GT questionnaire standardises certain workings of the workbook or data flows: the workbook can be modified with respect to the initial workbook to improve, clarify or simplify the input, data and/or flows.

The next step is to go through the information displayed in the raw Excel workbook to identify the underlying structures and prepare the corresponding tables in the client database.

However, even before going through the data in the workbook, it is necessary to specify the elements questioned in the questionnaires sent out: the entities.

Finally, reading this article assumes that you have assimilated the notions of compartments as well as the launch and integration mechanisms.

Determining the questionnaire’s query entities

Questionnaire entities are the subjects of the questionnaire.

The entities allow to globally distinguish the questionnaires sent from each other.

CAUTION: generally, the entities are not the correspondents or even a data linked only to the correspondent.

The questionnaires are sent to correspondents by email. However, in many cases, the correspondent is not asked to provide information about himself but about an entity (a department, a subsidiary, an agency, etc.). The correspondent provides this information ‘on behalf’ of that entity.

When the correspondent is asked personal questions, in this case, the entity and the correspondent are confused, but this is not usually the case and there is very often an underlying entity to distinguish the subjects of the query.

Entities are represented by the items/modalities of one or more axes: the diffusion axes.

The correct determination of entities can be done by asking the question: What if my respondent changes?

Or, more precisely, if the data is archived, what is the right information to store in order to minimise the impact of a change of correspondent on the data history? If only the correspondent’s email address is stored in the history, the change of correspondent will have major impacts. If the entity identifier is stored in the history, the change of correspondent in the assignment relationship alone can be carried out transparently for the history.

The choice of entities determines the follow-up work of the campaign: the follow-up managers visualise the last response per entity, validate a response per entity and re-contact the entities.

Data integration will also be facilitated by the correct determination of the entities (by default, the integration concerns the last response per entity or the validated response per entity).

Corrective responses from correspondents (successive responses correcting previous responses) will be all the more easily managed if the distribution axes have been correctly defined. This is all the more true in the case of additional validation axes.

Additional validation axes

In some cases, it is not the sender who precisely determines the entity. The correspondent may respond for several entities from a list or the correspondent is completely free to provide information on a particular subject.

The “response subjects” (from the additional validation axes) are differentiated from each other with the additional validation axis(es): these are axes whose values are usually entered directly or indirectly (via formulas) by the correspondent and which allow the response subjects to be differentiated from each other.

However, in all cases, one or more distribution axes are needed: an extreme distribution axis might contain only one value for all correspondents.

Although it is always preferable to identify one or more axes to differentiate responses from each other, at least to allow corrective responses from the correspondent, in some cases, it is not possible to find an “identifier” to distinguish responses from each other. The transmission date information (e.g. a control containing the current date and time) can then be used as a last resort, this information will, however, not allow corrective responses from the correspondent (each correspondent response will be the one and only response for a new entity).

Path of the Excel workbook

The operation consists of scanning the Excel workbook several times to identify the variable elements and attach them to data structures: the compartments of the questionnaire and the tables of the client database.

For an experienced GT developer, the scans will be done together.

From a practical point of view, it may be useful to construct two tables: one to draw/sketch the data template and the other to record the changes to be implemented in the questionnaire and the additional questions to be asked to the functional designer (when going through the Excel workbook, there are usually questions about how the questionnaire works and what the data presented means or how it is linked).

For certain information in the workbook whose components are obvious, these can be created in the Excel workbook in the first few runs.

Path 1: Determining the variable elements of the questionnaire

In the life of a questionnaire, its data is only modified at three points:

– during design

– during personalisation (pre-filling of the questionnaire on the server before sending)

– during data entry by the correspondent

These three stages must be constantly borne in mind when designing the questionnaire.

It is generally not advisable to initialise data in the questionnaire during design (except in the specific case of transposition): it is more flexible to define the initial values during customisation.

This first path of the workbook consists of locating the components/information displayed and entered, as well as all the variable information of the questionnaire in the following ways

variables from one entity to another
variables from one correspondent to another
variable from one survey session/campaign to another
common to different questionnaires and which we want to be able to reuse from one questionnaire to another

Items that may be common to different questionnaires include entities and correspondents if these are used as the basis for sending multiple questionnaires.

The following elements should not be omitted

– direct variable elements: labels, numbers displayed, attachments

– dependencies: drop-down list contents, numbers used in the calculation of a formula, reference lists used to display certain elements to be chosen, lists or numbers used in the context of validation constraints/checking the consistency of data entered

Path 2: Outline of the data template

This step essentially consists of determining the key axes for each level of information in the questionnaire.

An outline of the logical data template will usually be developed in conjunction with the determination of the keys for the data: an attempt will be made to relate each of the information displayed or entered in the questionnaire to one of the keys determined in this step.

The first key fields for the questionnaire are the diffiusion axes defining the entities.

The additional validation axes, if they exist, are obviously to be included for the integrated data. The diffusion axes will remain key fields for the information to be diffused.

The variable data identified by the previous step must now be attached to the different keys of the questionnaire.

For each data item displayed, the question of its meaning must be asked to link it to an already identified key of the questionnaire or consider the development of a new key.

New keys must be identified when an implicit or explicit dimension is added to the questionnaire (multiplication of a sheet, list of rows or cross-table).

The other variable information is then attached to one of the identified keys.

These keys may or may not include the keys of the questionnaire. For example, in the case of a diffusion by subsidiary, a price list which would be the same for all the subsidiaries would not include the diffusion axes as a key (the subsidiary axis in our case, which is the key of the questionnaire).

Nevertheless, if the key axes of the questionnaire are not necessarily included, the keys of the upper compartments should be: if an item of information is found in a multi-tab, its key should be attached to the multi-tab axis (same thing for the pattern). However, the information can be deposited at the root of the questionnaire and recalled (by a formula) in the multi-tab.

The data template is, therefore, drawn up during this key development process.

The GT compartments in the form of multi-tabs, patterns and transposition are also sketched out.

Path 3: Choice of GT structures (multi-tab, pattern, transposition)

Generally, the compartments will be imposed by the form of the questionnaire and the arrangement of the requested information. Homogeneous areas will stand out in the questionnaire (identical sheets, lists, and cross-tables associated with multi-tabs, lists and transpositions).

However, in some cases, it may be interesting to switch from one type of compartment to another. For example, from a multi-tab to a pattern or transposition if you want correspondents to be able to add terms or for other reasons (conciseness of the workbook, checking global constraints, etc.). In other cases, switching to a multi-tab can make the questionnaire more open and easier to enter.

Finally, when the sheets are almost identical, it may be interesting to use a multi-tab to make the initial sheets completely identical, even if it means making some concessions on the layout or operation.

The following table attempts to highlight the comparative characteristics of each of the containers/transformations.

	Multi-tab	Pattern	Transposition
Brief description	Tabs	Lines	Dimensions cross-tabulation in rows, in columns, or even in tabs
Number of dimensions	1 but concatenation possible	1 The keys can be used and / or constraints can be defined	Several
Elements multiplied in the qstx	1 or more contiguous tabs Multiplication in a single direction for the multi-tab	Rectangular range of contiguous cells Multiplication in a single direction for pattern	Contiguous cells or not in different tabs or not. Cells defined when the qstx was designed
Modalities of the dimensions defined during	customization but if use of anonymous items => pay attention to checking key constraints	customization or input. pay attention to the case without modalities (empty pattern)	the design but if use of anonymous items => pay attention to the verification of key constraints
Adding / removing items	No	Yes The ability to add or remove lines in a pattern is a global option for the pattern.	No
Behaviors / relative dates	No direct management	No direct management	Anonymization of dates
Differentiated behaviors directly according to the items (formatting, constraints, formulas)	No Conditional formatting, gtlock ( prev. LockGT), global validation constraints and formulas are possible but must manage the item turn	No Conditional formatting, gtlock ( prev. LockGT), global validation constraints and formulas are possible but must manage the item turn	Yes Conditional formatting, gtlock ( prev. LockGT), global validation constraints and formulas are possible
Minimum number of modalities	1	1 at least one row exists and will be returned, even if not filled	1
Max number of modalities defined during	customization	customization or input	design
Underlying key	Multi-tab axis	Not defined, constraints can be placed in the questionnaire to check the keys	Axes transposed outside the value axis i.e. all the axes mentioned at least once outside the value axes
Inside score (multiplication particular item)	Yes possible by pointing each item to encrypted on unencrypted data but with performance constraints (see multi-tab Data summary)	Yes, indirectly via index / match or VLOOKUP [3]	Yes, directly on the cell, indirect searches also possible
Possible aggregations	Sum but others possibilities via pointing of each item on encrypted or unencrypted data but with performance constraints	All Excel aggregation formulas including possibly offset / indirect except matrix formulas	All Excel aggregation formulas including possibly offset / indirect except matrix formulas
Conditional aggregations	Yes, possible by pointing each item to encrypted or unencrypted data but with performance constraints	Yes via the sumif,countif, sumifs functions	Directly via the sumif, countif, sumifs functions
Excel input matrix	No	Yes Provide a large number of lines to simplify the handling of the Excel workbook by callers => this implies management of empty lines in the pattern	Yes
Preferred integration	Update on key field or pivot	Pivot if the caller can add / delete lines or modify Update keys on key fields otherwise	Update on key fields if axes out of value in limited number Pivot on higher level axes + axes
Campaign launch	Multiplication axis key	Beware of the Motif + Transposition association: the key may be absorbed by transposition	Modalities other than those defined at design time are ignored [4]
Checking keys for integration	General case: OK The keys cannot be modified by the caller If manual anonymization of items, only in certain cases and difficult	To be built systematically to ensure data consistency if the caller can add or delete lines or modify existing keys	General case: OK (The keys cannot be modified by the caller) If manual anonymization of items, verification by Excel formulas
Checking constraints	Tab by tab, checking constraints on all possible but only on quantified data	basic range checking, global verifications are possible using aggregation functions	All checks with Excel formulas are possible
Import Excel-Answer	Import Excel in "point to point" by updating the corresponding tab	Global Excel import on pattern based on row number in Excel and qstx	Excel import based on the Excel names corresponding to the components
for valid components (alphanumeric headers) see Answer Excel Import

Launching the campaign

Important points in the launch include the choice of diffusion axes and the articulation between tables associated with different compartments.

As described in this document, each compartment has an explicit key (for the multi-tab or transposition) or an implicit key (row number in the case of the pattern). The tables associated with the different compartments should have the same key.

The following articles describe the options of the launch action and the mechanics of assembling the data:

Launching a campaign
Assembly of data at campaign launch
Informal collection workflows

Data integration

The following articles describe the options for the integration action and the mechanics of data integration:

Campaign integration
Advanced integration
Campaign integration modes

Integrating formula results

The results of formulas are extracted from the questionnaire using the gtcontrol function, which allows you to create a control component (a component that cannot be edited by the respondent) and to initialise it with a formula (second argument to the gtcontrol function).

The integration of formula results is necessary in cases where this information is not redundant with other data in the questionnaire.
For example, the code corresponding to an item label chosen from a drop-down list should be produced in the questionnaire (using

In most cases, integrating a formula such as, a sum of numbers while each of the numbers is integrated elsewhere is more problematic. Indeed, beyond the issue of redundant information, it means that the data cannot be easily corrected directly in the tables.
However, in cases where the formulas expressed in Excel would require a significant amount of work in terms of SQL sentences in the database, it may still be interesting to integrate these formulas to facilitate the work of restitution (in particular). Typical cases arise for formulas whose equivalents would be non-identical operations between indicators or between rows in the database, or incremental calculations (SQL is set-based).
If certain formulas are integrated, it should be borne in mind that the data thus integrated will retain a certain dependence on the questionnaire.

Repeated integrations

One point to be taken into consideration during the design of the project is whether or not the manager is allowed to carry out the integration action repeatedly: this cannot generally be prohibited and the consequences must therefore be assessed.

Successive repeated integrations for the same campaign/query session require fine-tuning of the integration parameters.

It is crucial to know whether, when archiving, the integrated data only modifies the data for a single date or a set of dates: in the case of a single date, it will always be possible, as a last resort, to create a stored procedure to delete the “current” data from the history. If several dates are affected by the integration action, it will be necessary to use pivots or to mark records as to be deleted in order to delete them later with a stored procedure.

A case of interest arises when a response with an empty list for a pattern is allowed (in fact only one row exists in the response, but this row is blank). As the integration does not currently handle deletions of existing records, this case will have to be handled by marking elements to be deleted or by appending the data of this pattern to the data of another pattern which cannot be empty (in the latter case, importing by pivot will be sufficient).

How can we help?