Reporting

Reporting the research process transparently will allow others to reproduce the study and build on your work. We recommend maximising transparency of the whole research process. For a lexical decision study, the following are the most relevant aspects:

Rationale for the study

Research questions can evolve around novel exploration, phenomena, or existing theories of reading.
Hypothesis: A pre-registration is highly recommended if the study is confirmatory and involves testing a specific hypothesis. Here, one should include the power calculation (to obtain the number of words and number of participants), if applicable, and the statistical model, including the covariates and random effects structure. If highlighted as an exploration model, modifications can be conducted post-hoc but must be transparently reported.
Exploratory analyses: If one cannot determine a hypothesis a priori, one should transparently report that the analyses are exploratory.

Documentation of contributions and sources

Keep track of contributions to the study, relevant papers and discussions with collaborators.
We recommend specifying author contributions using the CRedIT system to keep track and determine whether a contribution warrants co-authorship (Holcombe, 2019).

Methodology

Keep track of the relevant choices and the rationale for the selection (e.g., "We used an inter-stimulus interval of 500 ms, as recommended by the TRUST guidelines").
For the reproducibility of a lexical decision study, you need to specify:
- Variable(s) of interest, along with rationale (if applicable) and transparent documentation on how they were calculated
- Non-word generation process (with code as a supplement, if applicable)
- Selection criteria for words and non-words (with code as a supplement, if applicable)
- Specification of how word- and non-word-level characteristics were determined (with code as a supplement)
- A table with all item characteristics as part of the appendix or supplementary materials
- Participant recruitment procedure, including procedure and justification for determining sample size
- Participant inclusion and exclusion criteria, including justification and definition (e.g., if participants with dyslexia are excluded: Why, and how was this determined?)
- Number of participants, including an a-priori power analysis, if applicable
- Pilot studies (including raw data) that were used to inform decisions, along with justifications (as appendix or supplementary materials)
- Experimental setup: e.g., online, lab, field, or similar

Experimental procedure

Note that the following are suggestions for what to report, rather than an exhaustive list.

Equipment:
- In-lab hardware: display size, display refresh rate, distance from display [approximate or controlled with chinrest], chinrest if used, anything else you controlled
- In-lab tech specs: include in shared data information like Operating Systems or computer specifications (CPU, GPU, RAM, etc.)
- Online study hardware: which devices were allowed and the proportions of responses from each device, as well as the internet browser used, including this as a variable in the trial-level data
Software:
- Report the program used to run, and version(s)
- Share the code used to run the experiment, whether developed internally or using existing tools.
- It is highly recommended that any shared code can be run independently without requiring additional software. The best-case scenario would be the code availability in a repository, including a list of all the dependencies (i.e., programming languages, libraries, environments, etc). For more complex solutions, it can make sense to implement a connection to a cloud computing device, such that one can run code without installing anything (e.g., see this speechless reader implementation).
Trial sequence:
- What fixation indicator is used (e.g., +)
- Durations of everything shown to the participants (inter-stimulus intervals, inter-trial intervals if relevant)
- Stimulus onset asynchrony (i.e., mean, range, standard deviation, distribution type, etc.)
- Whether participant responses terminate stimulus presentation
- Timeouts used for the response
- Other relevant information: Have there been breaks, the randomisation method/trial order (potentially including transition probabilities), the number of trials, if stimulus and stimulus type repetitions were implemented, and if attention checks were implemented.
Instructions
- Information given about the purpose and the procedure of the study
- Overall duration of testing sessions, including when to expect breaks
Stimulus parameters:
- Font (including font file if possible)
- If the font is relevant to your experiment, consider using an open-source font.
- Size (degrees of visual angle if controlled, give approximate visual angle if possible, see here for tool to calculate stimulus size in visual angle)
- Consider measuring luminance
- Background colour
- Text colour
- Kerning if not default
- Graphic card setting like Anti-aliasing, etc.
“Macro”-level procedure
- Order of events: entering the lab, information about the study, experiment, debrief, payment, etc.

Results & Analysis

Note most issues below are discussed HERE

Data & code
- Share the raw trial-level data and code with necessary information: Item, anonymised participant ID, accuracy, reaction time, trial order, and central metadata: Relevant item and participant characteristics. Here, one can review the dataset description provided in the BIDS specifications for guidance.
- Analysis script, including libraries used & versions as supplementary materials
- Report in the manuscript what software and libraries were used, if possible - for example, share a containerized environment (e.g., Docker or better Podman files) for reproducing the computing environment, in case libraries differ between operating systems and computer architectures (see Tutorials HERE or HERE).
- Record who wrote what code (e.g., standardised file headers, a separate document containing this metadata; specify in CRedIT)
- Comment code well to increase readability and understanding.
- Use relative rather than absolute paths.
- Report and provide implemented data cleaning/preprocessing pipelines.
- Give summary statistics of lost trials, items, participants, etc., from each criterion applied.
- Justify whatever decisions are made in data cleaning
- If one implements a multiverse/robustness analysis to decide on rules for data cleaning, one should report this and share the code to reproduce it.
Data analysis
- Give descriptives that make sense for the data, in terms of central tendency and dispersion/spread.
- Visualise, especially when it this helps to communicate (e.g., complex interactive effects)
- Explicate the criteria on which a decision for one modelling approach or another was made
- Describe the model structure (e.g., fixed, random effects, distributional parameters, etc.) and the software used to fit it
- If a model takes a long time to fit, share the output (e.g., share a .rds file for a Bayesian model if it takes more than a couple of hours to fit)
- Describe and justify model diagnostics. For example, report model comparisons used to determine which variables were included in the model (both fixed & random effects). If there is a justification for a certain model diagnostic, then report it.

Data Storage

Repositories & Platforms
- Use a platform that allows for long-term/perpetual storage, and permanent identifiers (e.g., Digital Object Identifier, DOI), and make this clear with links directly from the paper, e.g., in a Data Availability statement.
- Recommendations for specific platforms: choose the platform according to regulations (e.g., GDPR) that apply in your locale
- Decentralise data storage by providing back-ups. Still provide one repository, but link to mirrors in the README.
- Be aware of any conflicts between the data requirements of different countries.
- Implement FAIR data formats
Specific Platforms
- ZPID University Trier, Germany
- GIN LMU, München, Germany
- NFDI Germany
- OSF US/Germany
- GitHub San Francisco, US
- Note that non-European platforms can be problematic for data security reasons, such that at least storage in the EU is highly recommended.
Sensitive Personal Information & Data Protection
- Here we strongly advise to consult with an ethics committee (either local or at the German Society of Psychology, DFG)
Documentation recommendations
- Data dictionary: must be used for the raw data (see Example here)
- We recommend sharing the data in BIDS or similar structured formats, which have a specification for behavioural data (see here)
- Informative READMEs should be included to provide important context or information relevant for any data users to know, which is not necessarily apparent from the data and metadata.
Sharing Metadata and Protocols
- Consider sharing protocols like ethics, lab-specific protocols/lab routines.
Copyright & licenses
- Any shared materials should be checked for copyright and license restrictions – e.g., corpora and published texts, font files, image/video / sound stimuli, code.
- We recommend that people look into the consequences of different open licenses
Manuscripts / Publications
We recommend posting preprints to a server like PsyArXiv before the first submission; if that is not possible, image resources can be shared openly with DOIs such that the author retains rights to use the illustrations.
We recommend publishing the article as Open Access if possible.