Jump to…

About the Open Science Lab Notebook
Current Projects
Lab Notes

About the Open Science Lab Notebook

What is the Open Science Lab Notebook? This space will be where I keep notes, make decisions, and record my research process. While I am still debating the exact format of this record (I suspect at times written logs, audio files, coding documents, and other tools) will be included, the goal is to make a complete record of my workflow.

What does the lab book do? I have seen discussions online and listened to debates about the need to make psychology an open science. That is, scientific discovery is garden of forking decisions, many of which we are not always even aware we are making. By recording my thought process and the questions I ask, I can keep a record of that process. Thus, I, and anyone interested, can go back and understand the decision process.

Why not just use the Open Science Framework or a similar preregistration website? While I do intend to post these documents online in some format, possibly to OSF, I have found that the questions and exact formatting of an OSF preregistration require many decisions already be made by the time of preregistration. The effect, in my experience, is a timestamp of some decisions being made at an early point in research – though rarely before significant decisions have been made. This lab book does not replace a formal preregistration. Rather, it explains the steps that led up to it and documents what follows after and up through publication.

Implementation in my lab. I am currently in the processes of establishing a new lab at Ohio Wesleyan University, where I teach and conduct research. I have 4 primary lab students and intend to recruit additional research assistants over the course of the next year. In my experience working with undergraduate students, though many are quite capable of conducting high-quality research, often details or small question can get lost in the process. The effect is that those working on a project are left filling in the gaps or determining the decision process that a student made several weeks, months, or semesters after the original student has graduated. One hope for this log is that by implementing it for myself, I can teach others how to do the same. In so doing, my students will be able to provide a written record of their thought process, what they worked on, and where they encountered issues. It will also serve as an early introduction to scientific writing for those students.

Current Projects

Below I list my current research projects, where they are, and their next steps. These projects range from those with data collected to those in the ideation stage. I am listing these projects here so that I can log where I am on each at the start of this log.

  • Data Quality In-Person v. Online: In this project we are comparing the quality of data collected in a lab versus online. An earlier version of this project finished collecting data in the spring of 2025. The short version of our theory is that when participants are under direct supervision, they will provide higher quality data than when less directly observed. We are currently collecting data at UW-Madison and will wrap data collection at the end of the spring semester. In the most recent version of the project, we have participants in a lab under direct supervision, less direct supervision, or online.

  • Bike Path Norms: In this project, we are testing how social norms messages and a newly identified type of message (behavioral requests) affect a target behavior (whether bikers on public bike paths audibly signal to pedestrians before passing them). Two previous studies on this topic were published as part of my dissertation and can be found on my website. Briefly, our first field study found that all messages were equally effective at changing behavior relative to a no-message control condition. In a follow-up we attempted to measure norm perception prior to observing behavior. The intention of the second field study was to assess the mediating effect of norm perception on the relationship between message condition and behavior. However, we did not find the same direct nor total effect from the first field study. We theorized that the difference in effects was because we observed biker’s behavior further form the messages in the second field study than the first. In a recent third study, we found that all messages had a diminishing effect on behavior. That is all messages outperformed the control condition when close to the signs, but further form the signs this effect diminished. Of particular interest, the behavioral requests were most effective in the first few seconds after seeing the sign, but had a faster drop-off in effectiveness compared the descriptive and injunctive norms message conditions. We have completed those analyses, though I may conduct one additional test where we calculate the difference from the control condition (the result would be to display the control condition as a flat line in our graph – currently it fluctuates down and then up, likely in accordance with the necessity to signal along the bike path). A fourth field study will be conducted in the spring of 2026 in an attempt to replicate these findings. Long term, I would like to explore the dimension of time as it relates to the effect of social norms messages (and social influence more broadly).

  • Social Norms Messaging Meta-Analysis: One of the challenges of studying the effects of social norms messages as a means to better understand how social norms effect behavior is that different authors use different (and overlapping) labels for the same messages. For example, in one study an author might call a statement reading “please turn off the lights” as a prescriptive injunctive norms messages, while another calls it a control condition. The effect of these overlapping labels is that when one attempts to meta-analyze the social norms literature, direct comparisons between studies are meaningless. That is, unless we go back and re-categorize these messages using a standardized process, we cannot make meaningful comparison between studies. This meta-analysis attempts to do just that. While an early draft of this process was included in my dissertation (see my website), I wanted to conduct a more systematic version of the same procedure. Starting in March of 2025, a team of research assistants and I began the process of finding field studies that compared the effects of social norms messages. We are only including papers that test the effects of messages on real-world, directly measurable behaviors. Since that time, I have been delayed pulling together the results (a dissertation, a cross-country move, and 3 new course preps will do that). At present, I am working with my RAs to confirm that we have all the papers we intended to conclude. A second team of RAs is collecting Ns and Means from each condition in each paper. I am planning to post the formal preregistration later this week. I have spent some time confirming how to best meta-analyze results that include between-subjects designs with continuous outcomes, between-subjects designs with categorical outcomes, and pretest-posttest designs with each type of outcome. This being my first formal meta-analysis, I have been working to understand the underlying process behind the calculations that we will conduct. Once these decisions have been made, I will post the pre-analysis plan and (only then) will I begin analyzing the results. One note that I would add here is that I am still debating whether to include demographic data in our meta-analysis. We have started collecting this information, and though some studies provide it, many do not. The lack of demographic information is unsurprising because we are analyzing field studies. Often these studies do not or cannot collect such information. Thus, it may not be possible to report or control for those factors when conducting the analysis.

  • Perceptions of Social Norms: Several studies have already been reported on how social norms messages affect the perception of social norms. These results can be found in my published dissertation. The general conclusion has been that all types of messages we tested affected the perception of norms relative to a no message control condition (particularly for messages that were uncommon in the control condition). This project is currently paused while we decide on the next study, but (following on advice from a colleague) we plan to test other types of messages that go beyond our original design.

  • Environmental Racism Project: One of my goals in studying social norms is to determine interventions that might mitigate environmental harms. Of particular concern is environmental racism. In a series of survey studies, I have determined that many people fail to understand how issues of systemic racial injunctive relate to environmental concerns. I have designed a podcast-type discussion that will serve as our informational intervention. While a draft script was written by a research assistant, it still requires revision. The current plan is to formally record this podcast in the winter of 2026 for a spring launch of the survey.

  • Data Quality with International Samples: I would like to attempt to replicate a previous study on data quality collected on a series of online survey panels but this time compare these samples with different international audiences. I have not yet determined how to translate the surveys, nor which countries we will recruit from. The survey will be designed with the assistance of a research assistant in the spring of 2026.

  • Tangibility and Climate: this is just a project I am vaguely thinking about. I wonder if people who can more clearly imagine personal effects of climate change are more likely to take actions to address the issue/support policies that address the issue.

  • Hostile Attribution Bias Lit. Review: I am currently working on a theory paper about the hostile attribution bias. The basic idea is that the bias has exceptional value for the study of social psychology, however an over-emphasis on the effects on aggression have meant that less attention has been given to range of fight-flight-freeze responses, alternative biases like the benign attribution bias, and how this bias may influence the formation of social norms. This is an ongoing writing project and should be shared with a collaborator over the next few weeks.

  • Social Norms in Crowds: I have pre-registered and am in the process of designing a study to measure crowd movement in theaters. The main goal of this project is to determine how the actions of one person influence those of another, and the speed at which they do so. By placing infrared cameras in theaters, we will record standing ovations as they occur. We will use a combination of logistic regression and social network analysis to model the rate at which social influence occurs. A pilot recording will be captured in the spring of 2025. Simulated data will also be generated over the same time span to determine the optimal analysis plan. The long-term goal is to expand the recording procedure to other universities. 

  • Water Protection Analysis: A colleague of mine and I have discussed the possibility of analyzing the results of a messaging campaign to encourage individuals to donate to a clean-water non-profit organization. The exact messages we will compare are yet unknown to me and thus the project will likely not have more details until I can meet with another representative of the organization.

Lab Notes

1.26.2026 

Started today by updating the preregistration. I have added additional notes that came up when going through the papers to find the relevant statistics, new categories for messages, and new decisions about excluding papers based on data availability. I have also preregistered two sets of exploratory analyses (1 for pre-post data and 1 for the many other categories of messages).

I realized that I may not have pre-registered that I will not be analyzing data on demographic data due to not having enough of it. I will go back and do that shortly.

As of today, I have begun writing the introduction, using some parts of my dissertation as a guide.

 

1.25.2026

I have completed my pass through all articles. I am waiting on a few final checks from my RA team, but will make all updates to preregistration tomorrow before writing the introduction and method sections of the paper. After I have the RA confirmations of the data, I will begin with the analysis using the CMA (YAY!!!).

 

1.24.2026

No research progress today.

 1.23.2026 

More time spent recording data. No major updates.

1.22.2026 

RA flagged an example of a message where the justification was buried in the descriptive norm “Most people use hand sanitizer to prevent getting sick.” We decided this was justification and will check other messages for the same. We found another message where the message was about hand sanitizer where the behavioral request just said “stay healthy.” We decided this and similar messages fall within the definition of a behavioral request.

We also decided to code for please/thank you when they appeared in messages. We also coded other additional but topic-relevant message (neutral content). In this process we have begun debating the limits of a behavioral request. Does a statement “to do x, please y” count? It is somewhat more a description of how to do something rather than a request to do an action. For now, these will be left as how to messages and not behavioral requests.

As for the analyses, I have been thinking about how to include some of the pre-post data. If I think about a circumstance where I had 10 studies comparing treatment A to treatment B, 10 comparing A to C, and 10 comparing B to C, I would just test each separately and discuss d for each analysis. If I had a pre-post design where pre-A -> A was compared to pre-B -> B, I would just use the post-test data. The question remains though about C. Because C is a no-treatment control group, it is not different than pre-treatment A or B. But, there could be a concern with biasing results to treat pre-A v. A as a C v. A type analysis.

I have consulted several resources [1, 2, 3] about pre-post designs. The doing meta analysis guide notes that using within group tests requires an r coefficient (as is needed in the CMA). However, the guide (as well a Cuijpers et al, 2017) notes that r is rarely reported, so such a correlation cannot be made. Instead, the recommendation is to compare your t2 groups (essentially as if they were a between-subjects design). Thus, for the above scenario, I would just look at post-test data and not include the pre-test data as part of my analyses.

I did also speak with a colleague about this project. One point that came from that conclusion was that many of these “pre-tests” are based on using the same location (not always the same participants). Thus, the assumptions surrounding biased estimates may not be applicable in this scenario. For the sake of doing what is recommended in the field, I will analyze the data as described above. I will however do an exploratory analysis treating the pre-tests as a control group (C). Again the logic is that while the authors may have called these pre-post designs there are some that are highly unlikely to introduce the biased estimates that we are otherwise attempting to avoid.

References
[1] Doing Meta Analysis in R: a Hands-on Guide
[2] Westfall, J. (2016). Five different “Cohen’s d” statistics for within-subjects designs. https://web.archive.org/web/20160829095224/http://jakewestfall.org/blog/index.php/2016/03/25/five-different-cohens-d-statistics-for-within-subject-designs/
[3] Cuijpers, P., Weitz, E., Cristea, I. A., & Twisk, J. (2017). Pre-post effect sizes should be avoided in meta-analyses. Epidemiology and psychiatric sciences, 26(4), 364-368.

  

1.21.2026

I am still coding more data for the meta-analysis. I spent part of my time also double checking that I am entering the pre-post data correctly into the CMA in my examples (I use fake data for these to confirm I know the data entry format and record the correct information).

 

1.20.2026

I have found an article where we the authors report both data pre and post the intervention. They also include a no-message control condition at the time of conducting the study. Based on how the results are reported (I do not see sufficient information for a pre-post analysis), I am selecting the data at the intervention only and using means and SD with sample size to determine SMD.

One thing I have noticed in my own coding is that the use of the “irrelevant” category has shifted somewhat to include additional (potentially relevant) information. I am going to have an RA go through the papers and check my coding. I will also do a pass through the messages for irrelevant information. I have noticed a similar pattern of personal normative feedback. I am not yet sure how to code this, but I will have an RA check for it. I have also been noting instances of personal normative feedback in the notes column of the spreadsheet. I will also note here that I am using incentive just for monetary rewards or other external (non-psychological) rewards for doing the behavior (e.g., feeling good for helping others would not be an incentive, but being paid for doing good would). 

Note for self: there are several places where I will need to enter a direction into the CMA software. Make sure to note the correct direction when doing so. There are not too many of these examples, but I will need to do a double check as I work through them.

1.19.2026 

Work progresses steadily on the meta-analysis. A few more days, and we will have all the data we need. I think one thing I am taking from this entire process is how to best report my own results. Another is that to include more studies and draw stronger conclusions requires some trade off to get meaningful results from each paper while still maintain the integrity of the original study design.

1.18.2026

I am working today through the meta-analysis. I have run into an issue with one study, Gossling et al., (2019), where the authors are analyzing towel and linen reuse in a hotel based on social norms messages. The challenge here is that while they report the number of re-uses per condition, they do not report the total number of towels used in the study. Based on the information they provided that each condition looks at 10 rooms for 100 days in 7 hotels, I will assume that there are 7000 rooms analyzed across all studies. But the analysis is not at the room level, but the individual person level. The authors provide the average number of guests per room in each condition, which I will multiple by my 7000 estimate to create a value to use as the total N. It is not a perfect solution, but it is a solution nonetheless. The authors also include two behavioral outcomes, I am focusing on towel reuse because that is most comparable to other studies. Additionally, one condition does not mention the reuse of linens, which is the other outcome, and using that outcome would make for an odd comparison between studies.

I have another study where I have the proportion of individuals who completed a behavior and the SE, but not N. Using the formula SE = {sqrt[p(1-p)/N]} where SE is standard error, p is the proportion, and N, is the sample size, I can use {N = [p(1-p)]/SE^2} to determine sample size. I attempted this method with mixed success. However, because the authors rounded their SE, I am getting much lower Ns than I would anticipate. However, the authors do report F tests which I can use to get the total N. I will then assume that N per condition is equal and use the proportions to find the number of events. This method was used for Van de Vyver & John (2017).

I had some additional trouble with Bergquist and Nilsson (2017) and determining the needed N. I did upload the paper to Gemini to confirm the calculation of the effect size. Part of my issue, confirmed by AI, is that the authors report two different total samples. One appears to be a typo (a 3 was reported where and 8 should have appeared). Using the provided information and Chi-Square tests, I was able to go back and determine close estimates for effect size and number of events.

Note for self: I write these notes out in word and then move them here. My nice formatting of the equations above did not translate to this platform. If I keep running into that issue frequently, I will have to find a workaround or move this to another site.

1.17.2026 

I came across one study that combined multiple interventions run on the same day into one “intervention” day category, which was compared against a control day. For this study, I reached out to the corresponding author for data because only one of the interventions should be included in the meta-analysis. If I cannot access the original data and determine the results for each condition, I will be forced to remove the study.

Another study I looked at today had a design where new icons were added over a series of weeks, leading up to a week where the norms message was added. I only focused on the comparison between that week with a message and the baseline. I will note that while this was a pre/post design, I do not have the correlation that I would want for the analysis, nor the odds ratio. I will record the total sample and events, and may have to treat it the same as I would for a between-subjects design.

For a study using Facebook as data, I used unique click-through rate as the primary outcome (because this represents an actual behavior of clicking). I used reach (which is unique impressions) rather than impressions as the total N. I am noting this for any future studies with Facebook ad data.

I have been working on another data entry for a regression discontinuity design. After reading a bit online, I have found that I can enter this information into CMA using the option for continuous outcomes and then “Raw difference (independent groups) SE.” Given this information, I can now record the RD information as my difference in means and (if available) use sample size before and after the cutoff as my group sizes.

In the case of Bonan et al. (2020), I do not think that we can include the study in the meta-analysis. While the authors do examine the effects of normative messages (both descriptive and injunctive) there is also personalized normative feedback. The authors pull from the OPOWER designs where thumbs up and “need to improve,” “good,” and “excellent” are attached to electricity bills. While sometimes we can still work with this type of data, the message design ends up overlapping the normative information with the personalized feedback. At present, I do not know how we would be able to code this information accurately or to have a meaningful comparison with other studies while maintaining the integrity of the original design (or even anything close to it).

1.16.2026 

I spent some time earlier today discussing the meta-analysis with a colleague. For now, my goal will be to fill in what information I can. There are some studies where I may have to decide to exclude them because we cannot make meaningful comparisons between studies, but for now we will continue to record the data.

1.15.2026

Starting today with more work on the meta-analysis. I have assigned a team of research assistants to go through and confirm my findings and flag any differences from their work and what I observe.

I have run into another example (Ayal, 2021) where they present a pre-post design (but there are presumably different people) at each location where they did the pre-post intervention. Yet, they only provide N total and N events for each condition. I suscept that is enough information to gather the missing information, but I need to check how to do that. For now, I have only recorded those pieces of information and will decide how to proceed from there. Essentially, this question comes down to how I will handle nested designs in the CMA program. From the reading I have done, I do not think that CMA can directly handle nested designs. An inability to handle nested designs will mean I either have to work with biased estimates or find an alternative way to analyze these data. Pustejovsky & Tipton (2021) provide a method potentially for doing this and they have simulated data. However, I will note that their simulation does not include finding SMD (standardized mean difference) for the nested design, which was the impetus for looking into their resources. There may be an easy way to calculate this, but a brief search online did not reveal an obvious answer.

Note that for Salazar et al. (2021) the outcome was the absence of a behavior. Thus, I have taken the number of events the authors intended to prevent and subtracted that from the total number of observations. In essence, events for this study is actually the number of non-events, which was the desired outcome. Further, the event in question that we are trying to avoid was not an option for the average person. That is, the authors wanted to limit straw usage with drinks, but participants already had to ask to use the straws. Thus, they measure the number of times a person goes out of their way to violate the desired behavior. The main concern here is that doing the behavior is the default, so we will want to consider if this paper qualifies as testing how to change people’s behavior.

Another new challenge I have come across in one study is the use of Bayesian analyses reporting mode and 95% credible intervals in combination with difference scores. This might be a time where I need to reach out to the researcher directly and ask for the raw data. Noting however, that their OSF page indicates that they may not be able to share the data because the study is about student performance in classrooms. 

1.14.2026 

No research – first day of classes. 

1.13.2026 

Starting today by checking the progress on the data-quality (in person v. online) progress. We had an issue at the start of the semester where we overscheduled people in the online condition and have been trying to correct the balance of in-person and online participants since. We have 125 in the at-home condition, 81 in the main-lab condition, and 75 in the separate lab condition. My expectation is that we will continue to collect data through spring 2026.

Some additions to add to my preregistration based on the initial data collection from the articles:

  1. Some studies do not provide exact numbers of events. There are several studies where I can only find (or estimate based on a graph) the % of people doing a behavior. In these cases, I will use the provided sample size and the percentage to determine the number of events (rounded to the nearest whole number).

  2. There are other studies where I can find the total N, but not the sample size per condition. In these instances, I will reach out to the authors for the exact information. If I cannot determine the exact N, I will take the total N and divide it by the number of conditions.

  3. Note that for classifying messages, some include norms related to unmeasured behaviors. For example, I have found that Guichard et al. (2021) use a control message that reads “most students here sort their tray after lunch” in a study about eating vegetables. I will classify this as irrelevant information.

One open question that I am facing is how to handle some of these pre-post designs. Many of them are only pre-post in the sense that the same location was used twice, but there is no information about the individuals included within the samples. Traditionally, I think of pre-post in terms of treatment on an individual, but I only have it for location. And, with only one location, we lack the needed information to check for things like a random slope.

1.12.26

No research progress today; today was for class prep.

1.10.26 & 1.11.26 

This was the weekend and I did very little work. I found the needed information for a handful of articles in the meta-analysis, then I went for a walk. 

1.9.2026 

Started today by sending out the requests for studies to SPSP and SPSSI. We will see if we get any responses, I will check again on Monday.

I am spending most of today categorizing articles. One issue that I have to decide how to handle is what do I do about extra text in a message. I think I need to make additional categories. This is because in one study I might have a directive “Please reuse your towel” (a behavioral request) paired with information such as “If you want to re-use your towel, hang it up. If you want to replace it, place your towel on the ground.” Another study might present irrelevant information in one condition, maybe a study tells hotel guests to “check out fin activities in our city.” Yet another study might have additional contextual information. This latter type of information is relevant to the target behavior, but is not a directive. For example, “reusing towels helps save water.” This last message is not a directive, but it does justify the action. Other messages also promote incentives (such as discounts) unrelated to the behavior. Based on these responses I will note if the study used each of the following:

  1. A no message control condition

  2. A behavioral request

  3. A descriptive norms message

  4. An injunctive norms message

  5. Additional directions (“how to” information)

  6. Additional justification (focuses on benefits of the action; central route processing)

  7. Additional incentives (getting money that one would not expect for doing the behavior)

  8. Irrelevant information unrelated to the target behavior

After making these categories, I have flagged one more paper that has a “deterrence condition” that says they have already identified people who have not paid their taxes. It could be a justification message based on the way it is phrased, but I am not entirely sure. I am flagging it in my document and will come back to it later. Might have an RA or two look at it and see what they think.

Other notes:

  • For Melendez-Jimenez et al., (2022) I have selected the completed survey rate as this is the outcome that seems most directly related to the initial question posed by the authors.

  • Capps et al. (2022) I need to see if I can find a better way to determine sample size. I have the number of total observations, but not them by condition. I can just divide by 4, but I’d like to be more accurate than that if possible. 

1.8.2026 

Picking up where I left off yesterday. First, I have to say that the comprehensive meta-analysis program (CMA) is fantastic. I am tempted to just use this software for the remainder of the project, but I do want to make sure I can get my own results and post them online with R code.

I have done a little poking around with the doing meta guide and a previous attempt from my dissertation when attempting to replicate the Rhodes et al. (2020) findings. What I had previously done was calculate weights based on previously calculated standard error and used existing SMD calculations to work through the formulas. However, Rhodes et al. (2020) used the CMA file to calculate these values. Given that, I am going to move ahead with using the CMA program for the meta-analysis. I have come to this decision because CMA provides the equations for calculating SMD within its program, so I am satisfied that I can explain those equations if needed when I go to write up the results of the meta-analysis. Below, I crossed out 3 steps that I am intentionally skipping at present (though may return to in the future).

Here I will list the information that I need to get for each study:

  • The text of the messages for each condition

  • Means for each condition (continuous outcome)

  • SD for each condition (continuous outcome)

  • Sample size for each condition (continuous outcome)

  • Events that occurred by condition (dichotomous outcome)

  • Sample size for each condition (dichotomous outcome)

  • Pre-post correlation (continuous outcome in pre-post design)

  • External correlation (dichotomous outcome in pre-post design)

Note that if we come across a design that is not among the 4 that I listed above, I will check the CMA file for the needed information for calculating the results.

I am now ready to write and post the preregistration to OSF.

I am spending the remaining portion of today compiling the messages and statistical information from the existing list of studies. I will send out the call for additional papers tomorrow. 

1.7.2026 

I got back 3 of the 4 materials requests for the meta-analysis today. It seems that of those 3 papers, none qualified (2 had self-reported behavior, 1 was not a field experiment).

The main goal of today was to test how we will actually pool our effect sizes with different designs. I am largely pulling from 2 sources: https://doing-meta.guide/ and a meta-analysis by Rhodes and colleagues (2020) which used the Comprehensive Meta-Analysis Software. I have previously read the doing-meta guide and today started by watching the how-to video for the CMA software.

Rhodes et al. (2020) very helpfully provided all of their data and, as part of my dissertation, I have already confirmed that I can replicate their findings in R using my own code. All I am doing today is confirming that my code will get the same results as these other methods.

I will write out my steps for this process here, so that I can follow them. I will use an open bullet for the steps I have not done, then fill it in once I have done it. When reading this later, it should just look like a typical bulleted list. [Note Squarespace does not allow for open bullets, so I have just crossed out the items I ended up not completing because my approach with CMA led me to explore that approach in greater depth].

  • Create fake data with continuous and binary outcomes, some being pre-test post-test designs and some being between-subjects designs. In these “data” I have four columns (crtl_m, ctrl_n, exp_m, and exp_n). The first of these four columns refers to the mean in the control group, the number who did the behavior, or the mean at pretest (depending on the design). The N always refers to sample size. The exp refers to the experimental or post-treatment group. Outcome tells me whether the outcome is continuous or dichotomous. Type tells me if the data are between or within subjects. Author and year are made up as “example_a” and “2025”

  • Download the CMA software

  • Do a practice meta-analysis with the CMA software. (competed this step and have a better sense of the data I will need to collect, but still somewhat unclear how I will handle pre-post data with dichotomous outcomes)

  • Use the doing-meta.guide to replicate the results in R (with their packages)

  • Use R on my own (base R and tidyverse) to replicate their results

  • Write out the formulas I am using so that I can explain them later in the meta-analysis

  • Write down the information that I will need to collect from the studies (note that in an earlier version RAs already collected Means, Ns, % doing behavior, and study type for some – but not all – of the articles) 

1.6.2026 

Writing as I am traveling today. I had a plan to work on the introduction and formal preregistration for the meta-analysis, but I would like internet access for those. Instead, we are back to the hostile attribution bias paper. Today’s focus was looking through papers and past writing to see if we have a solid argument that an anxiety results in more than just and aggressive response from individuals high in hostile attribution bias. At present, I think this argument has not been made particularly well, but there are clear places and work from the ADHD literature than can help with this argument.

1.5.2026

As of this morning, I have finished reviewing the articles that we initially identified for possible inclusion in the meta-analysis. I am writing my summary below. The italics text will be used for when I have text that I want to directly add to a paper (with minor revisions for spelling, grammar, or clarity).

Of the 1765 articles identified from either our initial search on Psych Info (search term “social norms messages” conducted in March, 2025) or articles that we had identified for this project while working on separate research projects, 616 were identified as testing the effects of social norms messages. Of those 616, 411 were removed because they did not test real world behavior or were not field experiments. An additional 71 were removed because the messages included were not relevant to the questions we were testing in the present meta-analysis. A total of 2 studies were removed because the types of messages tested were too similar for a meaningful comparison (e.g., the comparison would have been between two descriptive norms messages). We were not able to access 2 more studies. A further 5 studies were removed because the manipulation of social norms did not involve message testing. A total of 3 articles were removed because no results were reported, the article was retracted, or because the article was irrelevant to the present meta-analysis. We identified 4 articles that appeared multiple times on our list. After these removals, 118 articles remained as eligible for potential inclusion in the present meta-analysis.

We conducted a second check on the remaining 118 articles to ensure that the correct types of messages and field interventions were included in the meta-analysis. During this process 3 additional studies were removed as duplicates, 3 were removed because they tested dynamic norms messages, 4 were removed because they tested neither any type of social norms message nor behavioral request, 30 were removed because they were not field studies, 1 was removed because the message was exclusively an icon, 4 were removed because the message were too similar for meaningful comparison, 5 were removed because they only included self-reported behaviors, and 2 were removed because they only tested personal normative feedback (a related but not strictly relevant intervention). Additionally, we requested access to the materials or data for 2 other studies. As of 1.5.26, we are waiting for responses from the authors. We will wait until 1.12.26. This process identified 62 articles for inclusion in the meta-analysis (with the potential for up to 64).

Next step, I am going to do one additional search with the term social norms messages on psych info but limit it to the years 2025 – 2026 to make sure we have fully captured newly published papers.

The new search on 1.5.2026 identified an additional 75 potential articles for inclusion in the meta-analysis. Of these 75 additional articles, only 1 met the requirements for inclusion in our meta-analysis (note that it was my dissertation). We are waiting to gain access to 2 additional studies that may qualify.

Note for self: while going through the articles I came across one about VR and exploring museums. The study (Shen et al., 2025) made me think of the signs in the MFA in Boston that read “please do not touch.” One could imagine a study using different types of social norms around touch art (a commonly held norm) compared to a less common norm in a different context to compare the effectiveness of these messages in different contexts. 

1.4.2026 

More meta-analysis work today. Hooray! I came across another personalized normative feedback article (Alcott, 2011). I think for this article, I should not include it. The use of personalized normative feedback is a comparison of 1 person’s energy usage to their neighbors, but the comparative dimension seems to add another component that would make for an odd comparison with other norms messaging interventions.

I am running into a few examples where I cannot determine whether a message is a behavioral request (e.g., Oceja & Berenguer, 2009). The prototypical behavioral request is “please do x” such as was used in this study (“Before leaving, turn the lights off” – what the authors call a correct behavior message). There are other messages that have a directive but are focused on another element of persuasion beyond the direct request (“Save electricity for the benefit of all”). The latter of these messages is a far less explicit request and is focused on the social benefit behind the behavior. Only the former of the two messages seems to qualify as a behavioral request.

I have noticed a few other messages that are not behavioral requests, but should be noted for the next steps when we collect and categorize the messages. Some studies have open ended statements about the desired behavior. For example, Schultz and Zaleski (2008) have a statement “if you your towels replaced, please leave your used towels in the basket on the bathroom floor”. This is not a direct request, though similar statements in other studies heavily imply a desired behavior. By our definition these are not behavioral requests.

I have found another example of a field study that might not qualify by an exceptionally strict definition of a field study. Wenzel (2005) mailed people a survey about tax compliance opinions, but this was then used to establish an injunctive norm. In this instance, they survey is part of the intervention and used to make the messages. This example still meets our definition of a field experiment. I contrast it to an example from yesterday where the authors recruited people for a study about soap dispensers and then had messages posted above the dispensers. The tax compliance study still exists in a naturalistic context, the soap dispensers were not. Arpan (2015) was selected for inclusion for similar reasons.

Noting here that I did not include Campbell et al. (2024) because while they do use social norms messages and measure a behavior (as indicated by GPA), the messages are about inclusion and the behavior in question does not assess inclusive behavior (that target outcome is measured via self-report).

One other thought: after this meta-analysis concludes, another one could be conducted about the role of iconography in social norms studies. 

1.3.2026 

Meta-analysis work continues today with additional checks on articles. I have found another example where iconography is used to communicate an injunctive social norm. In this newest case, there was also a text version of the messages. I will include the study but only look at the main effects of text, and not the effects of iconography.

I found a study that looked at the consumption of fish. What was challenging with determining if this study qualified was that it was translated from Finnish. In the translation the messages read “we reduce together the eutrophication of the Baltic Sea” as a result of doing the desired behavior (Salmivaara & Lankoski, 2021). This message does not quite read as a typical descriptive norms message because (who is “we” is somewhat unclear, the proportion doing the behavior is vague). However, after thinking through other examples of descriptive norms messages, I think that this message does qualify – particularly when considering my judgement is based on a translation. I am making an assumption (I think a reasonable one) that this message reads more like a typical descriptive norms message when read in the original language.

Following up on the note about personalized normative feedback from yesterday, I have found a study where the only norms information is personalized normative feedback. That is, a comparison between the household’s water usage and the usage of their neighbors (Schultz et al., 2016). This type of normative feedback appears to be different than what I am trying to study. In cases where the only norms information is comparative between the message recipient and a peer group, we will not include the message. I am making this decision because I know of work (can’t remember the citation) that shows that one can create boomerang effects if you make it appear that a person is outperforming the group. I think these studies are doing something fundamentally different than a simple “majority of x group does y behavior” style descriptive norms message. That said, I will likely need to discuss with my collaborator on the project. We may opt to include these messages later.

One study (Richetin et al., 2016) gave me some pause when trying to determine whether it qualified. In this particular study, the researchers recruited participants to evaluate a hand soap dispenser. The evaluation of the dispenser was a cover to see if a norms message in a bathroom would reduce water usage. While this study did evaluate a behavior in its real-world environment (water-usage in a bathroom), participants were also recruited outside a naturalistic context (i.e., they were aware they were in a study). For the purpose of this meta-analysis, I am going to limit our definition of field study to studies where participants are not aware they are being observed. That is, we want to understand the effects of social norms messages on real-world behaviors when desirability demands are not present. Again, I will want to confirm this with my collaborator.

1.2.2026

More work on the meta-analysis. I had two interesting articles that I needed to make decisions about. For the first, the behavior was whether a person completed a survey that was emailed to them. Since this behavior (taking a survey) did occur in its naturalistic context, I decided to include it in the meta-analysis. For the second, the authors included both static and dynamic norms messages in their study. Because we have the results of the static norms messages v. no-message control, I have decided to include it (only looking at the static comparison). In instances where only dynamic norms messages are tested (or static are compared only to dynamic), I will not include those studies.

Some open questions that remain that I need to answer are: how will I handle pre-test/post-test designs, what adjustments need to be made for binary outcomes, and how do I handle pre-test/post-test designs with binary outcomes?

I have found a third paper that has raised a new question: what if the original authors have 3 conditions: no-message control, dynamic norms message, and injunctive norms message but the injunctive norms message does not meet our definition of an injunctive norms message? That is, when we re-categorize these messages, some will be a no-norms control condition. Will we want this comparison? I am inclined to say no, because that is not part of our central question about norms messages. I will however, make a note in my spreadsheet to record these papers so that we can return to them in the future (in case a reviewer wants to see their inclusion).

Note on multiple outcomes: For some studies, there are multiple outcome behaviors. For example, a study where the outcome involves donating to a charitable cause may include both “did the person donate” and “the amount that a person donated” as outcomes. In these instances, I will use the binary outcome variable as it is most analogous to the other studies included in the meta-analysis. That is, for many studies we can only answer “did a person do the behavior” and not “to what extent” did they do the behavior. Deciding to donate v. not is the more analogous outcome.

I have come across a few articles that have “positive” and “negative” versions of descriptive and injunctive social norms messages. I am also aware that we have studies where we have prescriptive and proscriptive versions of each message. I do not think we will have enough studies to look at the differences of positive v. negative or prescriptive v. proscriptive. In these instances, we will have to look at main effects.

There are a number of studies I keep finding that use visual cues to communicate an injunctive norms message. I am not sure that these really count as an injunctive norms message as I think of them, but a future study or comparison might be interested in looking into that question.

Personal normative feedback is another type of intervention that I am sorting through. In these interventions, researchers provide updates at the person-specific level about a behavior. For example, how much water is a household using compared to its neighbors. The personalized normative information is then often paired with additional requests or messages encouraging a behavior. I will include these studies, but it will likely be the case that we have to compare the presentation of that normative information or the accompanying messages rather than the personalized normative feedback itself. 

1.1.2026

Meta-analysis work. We have at this point complied a list of articles (115 in total) that should be included. I am doing a second pass through them to make sure that the messages that are tested are the right kind (e.g., are static and not dynamic norms and have meaningful comparisons between conditions), that the behavior is actually measured (not behavioral intentions or down-steam effects of messages – more on this in a moment), and that we can actually access the messages tested in the study. I have decided to exclude studies that use GPA as an outcome in cases where the intervention was focused on behaviors other than studying. For example, an intervention to improve equity in the classroom might improve GPA, but that is not the direct behavioral effect of the intervention – just a downstream consequence of other people’s behaviors. As of today, I have evaluated 9/115 articles, but had to pause due to slow internet connection. 

12.31.2025

Worked for a while writing updates to the hostile attribution bias paper. Today focus was on a section about clinical applications of hostile attribution bias and how it predicts both fight responses (elevated rates of antisocial disorders) and flight responses (elevated rates of anxiety). Having written these sections, I am considering removing a few paragraphs on ADHD. While the ADHD work is interesting, I think it does not add much to the overall argument of the paper. It might be useful for future directions, but it distracts from the larger argument that there are many outcomes experienced from having a hostile attribution bias. 

12.30.2025 

On vacation with family. No research conducted today.

12.29.2025 – 2

Worked on updates to the hostile attribution bias literature review. Was focused on a section about the role of bully and the development of a hostile attribution bias. Discovered 1 paper I need to re-read and two paragraph I need to streamline when I next resume this work. I then worked on a short section about media psychology, though this has been largely unedited since we received peer reviews. I also worked on a section of the paper that examined clinical applications of the hostile attribution bias as evidence that HAB affects all parts of a person’s fight, flight, and freeze response.

12.29.2025 – 1

Today is the first day I am attempting to start keeping an open, active log book of my research. A few thoughts about what this is, and what I intended for it to do.

What is the Open Science Lab Notebook? This space will be where I keep notes, make decisions, and record my research process. While I am still debating the exact format of this record (I suspect at times written logs, audio files, coding documents, and other tools) will be included, the goal is to make a complete record of my workflow.

What does the lab book do? I have seen discussions online and listened to debates about the need to make psychology an open science. That is, scientific discovery is garden of forking decisions, many of which we are not always even aware we are making. By recording my thought process and the questions I ask, I can keep a record of that process. Thus, I, and anyone interested, can go back and understand the decision process.

Why not just use the Open Science Framework or a similar preregistration website? While I do intend to post these documents online in some format, possibly to OSF, I have found that the questions and exact formatting of an OSF preregistration require many decisions already be made by the time of preregistration. The effect, in my experience, is a timestamp of some decisions being made at an early point in research – though rarely before significant decisions have been made. This lab book does not replace a formal preregistration. Rather, it explains the steps that led up to it and documents what follows after and up through publication.

Implementation in my lab. I am currently in the processes of establishing a new lab at Ohio Wesleyan University, where I teach and conduct research. I have 4 primary lab students and intend to recruit additional research assistants over the course of the next year. In my experience working with undergraduate students, though many are quite capable of conducting high-quality research, often details or small question can get lost in the process. The effect is that those working on a project are left filling in the gaps or determining the decision process that a student made several weeks, months, or semesters after the original student has graduated. One hope for this log is that by implementing it for myself, I can teach others how to do the same. In so doing, my students will be able to provide a written record of their thought process, what they worked on, and where they encountered issues. It will also serve as an early introduction to scientific writing for those students.

Current projects

Below I am listing my current research projects, where they are, and their next steps. These projects range from those with data collected to those in the ideation stage. I am listing these projects here so that I can log where I am on each at the start of this log.

  • Data Quality In-Person v. Online: In this project we are comparing the quality of data collected in a lab versus online. An earlier version of this project finished collecting data in the spring of 2025. The short version of our theory is that when participants are under direct supervision, they will provide higher quality data than when less directly observed. We are currently collecting data at UW-Madison and will wrap data collection at the end of the spring semester. In the most recent version of the project, we have participants in a lab under direct supervision, less direct supervision, or online.

  • Bike Path Norms: In this project, we are testing how social norms messages and a newly identified type of message (behavioral requests) affect a target behavior (whether bikers on public bike paths audibly signal to pedestrians before passing them). Two previous studies on this topic were published as part of my dissertation and can be found on my website. Briefly, our first field study found that all messages were equally effective at changing behavior relative to a no-message control condition. In a follow-up we attempted to measure norm perception prior to observing behavior. The intention of the second field study was to assess the mediating effect of norm perception on the relationship between message condition and behavior. However, we did not find the same direct nor total effect from the first field study. We theorized that the difference in effects was because we observed biker’s behavior further form the messages in the second field study than the first. In a recent third study, we found that all messages had a diminishing effect on behavior. That is all messages outperformed the control condition when close to the signs, but further form the signs this effect diminished. Of particular interest, the behavioral requests were most effective in the first few seconds after seeing the sign, but had a faster drop-off in effectiveness compared the descriptive and injunctive norms message conditions. We have completed those analyses, though I may conduct one additional test where we calculate the difference from the control condition (the result would be to display the control condition as a flat line in our graph – currently it fluctuates down and then up, likely in accordance with the necessity to signal along the bike path). A fourth field study will be conducted in the spring of 2026 in an attempt to replicate these findings. Long term, I would like to explore the dimension of time as it relates to the effect of social norms messages (and social influence more broadly).

  • Social Norms Messaging Meta-Analysis: One of the challenges of studying the effects of social norms messages as a means to better understand how social norms effect behavior is that different authors use different (and overlapping) labels for the same messages. For example, in one study an author might call a statement reading “please turn off the lights” as a prescriptive injunctive norms messages, while another calls it a control condition. The effect of these overlapping labels is that when one attempts to meta-analyze the social norms literature, direct comparisons between studies are meaningless. That is, unless we go back and re-categorize these messages using a standardized process, we cannot make meaningful comparison between studies. This meta-analysis attempts to do just that. While an early draft of this process was included in my dissertation (see my website), I wanted to conduct a more systematic version of the same procedure. Starting in March of 2025, a team of research assistants and I began the process of finding field studies that compared the effects of social norms messages. We are only including papers that test the effects of messages on real-world, directly measurable behaviors. Since that time, I have been delayed pulling together the results (a dissertation, a cross-country move, and 3 new course preps will do that). At present, I am working with my RAs to confirm that we have all the papers we intended to conclude. A second team of RAs is collecting Ns and Means from each condition in each paper. I am planning to post the formal preregistration later this week. I have spent some time confirming how to best meta-analyze results that include between-subjects designs with continuous outcomes, between-subjects designs with categorical outcomes, and pretest-posttest designs with each type of outcome. This being my first formal meta-analysis, I have been working to understand the underlying process behind the calculations that we will conduct. Once these decisions have been made, I will post the pre-analysis plan and (only then) will I begin analyzing the results. One note that I would add here is that I am still debating whether to include demographic data in our meta-analysis. We have started collecting this information, and though some studies provide it, many do not. The lack of demographic information is unsurprising because we are analyzing field studies. Often these studies do not or cannot collect such information. Thus, it may not be possible to report or control for those factors when conducting the analysis.

  • Perceptions of Social Norms: Several studies have already been reported on how social norms messages affect the perception of social norms. These results can be found in my published dissertation. The general conclusion has been that all types of messages we tested affected the perception of norms relative to a no message control condition (particularly for messages that were uncommon in the control condition). This project is currently paused while we decide on the next study, but (following on advice from a colleague) we plan to test other types of messages that go beyond our original design.

  • Environmental Racism Project: One of my goals in studying social norms is to determine interventions that might mitigate environmental harms. Of particular concern is environmental racism. In a series of survey studies, I have determined that many people fail to understand how issues of systemic racial injunctive relate to environmental concerns. I have designed a podcast-type discussion that will serve as our informational intervention. While a draft script was written by a research assistant, it still requires revision. The current plan is to formally record this podcast in the winter of 2026 for a spring launch of the survey.

  • Data Quality with International Samples: I would like to attempt to replicate a previous study on data quality collected on a series of online survey panels but this time compare these samples with different international audiences. I have not yet determined how to translate the surveys, nor which countries we will recruit from. The survey will be designed with the assistance of a research assistant in the spring of 2026.

  • Tangibility and Climate: this is just a project I am vaguely thinking about. I wonder if people who can more clearly imagine personal effects of climate change are more likely to take actions to address the issue/support policies that address the issue.

  • Hostile Attribution Bias Lit. Review: I am currently working on a theory paper about the hostile attribution bias. The basic idea is that the bias has exceptional value for the study of social psychology, however an over-emphasis on the effects on aggression have meant that less attention has been given to range of fight-flight-freeze responses, alternative biases like the benign attribution bias, and how this bias may influence the formation of social norms. This is an ongoing writing project and should be shared with a collaborator over the next few weeks.

  • Social Norms in Crowds: I have pre-registered and am in the process of designing a study to measure crowd movement in theaters. The main goal of this project is to determine how the actions of one person influence those of another, and the speed at which they do so. By placing infrared cameras in theaters, we will record standing ovations as they occur. We will use a combination of logistic regression and social network analysis to model the rate at which social influence occurs. A pilot recording will be captured in the spring of 2025. Simulated data will also be generated over the same timespan to determine the optimal analysis plan. The long-term goal is to expand the recording procedure to other universities. 

  • Water Protection Analysis: A colleague of mine and I have discussed the possibility of analyzing the results of a messaging campaign to encourage individuals to donate to a clean-water non-profit organization. The exact messages we will compare are yet unknown to me and thus the project will likely not have more details until I can meet with another representative of the organization.