Creating a Community Organisation Roster using Microsoft 365 Copilot

Robert Ward
Mar 4
7 min read

Updated: Mar 6

(Also published on LinkedIn)

This worked better than I expected.

Recently I was at a gathering of people from a community organisation. The person who had been working out a roster of volunteers is taking a break (after 8 years!). I don’t have capacity to take on manually building the roster. However, I was inspired by demos on a recent PMI seminar (Global Study on the AI-Powered Project Management) and energised from the recent Global Power Platform Bootcamp Sydney. So I said I’d see if I could make it easier for the next volunteer.

I must admit I had low expectations.

Here is my story (scroll to the bottom to fast-forward to reflections):

The SKU:

Microsoft 365 Copilot on Business Premium

The scenario:

Run the volunteers on the current roster through Copilot to see what it would do.

The spreadsheet:

4 tabs:

Roster:

A list of dates in the row headings and a list of positions in the column headings. The table was blank:

People:

A list of people and the roles they can fill:

People Rules:

Natural language sentences of preferences:

Skill Rules:

Natural language sentences about priorities and skills:

The prompt, complete with poor grammar:

“

You are a master scheduler

The attached excel file contains 4 sheets:

* the Roster sheet contains a table of dates and the skills needed on each table

* the People sheet contains a list of people and the skills that each person can do

* the People Rules sheet contains a list of rules (constraints) regarding scheduling of people

* the Skill Rules sheets contains a list of rules (constraints) regarding which skills a person can fill on a given day and also some days where a particular skill isn't needed

Your tasks is to populate the Roster sheet table with the people to undertake each skill on each date, subject to the skills each person has and the constraints advised on the People Rules and Skill Rules sheets.

Each person should have a minimal number of dates where they are on the roster.

Provide 3 different possible rosters and give an evaluation of each roster based on the constraints and the amount of dates that each person is rostered.

“

The result:

About 7 minutes of processing…

The reasoning engine repeatedly referred to resolving “the header issue”, which is my habit of leaving blank column(s) to the left and above my spreadsheet working tables to allow room for margin notes.

Then “yeah, wow.” It actually worked.

And lots of additional details, headings reproduced below:

· “What I ingested from your workbook”

· “Deliverable (download)”

o Including descriptors of the sheets in each book:

o Excel — Roster Build – Results.xlsx (3 options + summaries + checks):

o Contents:

o Roster Option 1/2/3 — fully populated tables (Date,…).

o Summary 1/2/3 — count of dates per person.

o Violations 1/2/3 — any hard rule breaks detected by an automated validator.

o Soft Scores 1/2/3 — how often non‑mandatory preferences are satisfied (e.g., together).

· “Hard constraints I enforced”

· “Quick comparison of the three options”

o Fairness target: keep people’s rostered dates as low as possible while fully covering each date.

o Coverage & violations

o Fairness (dates per person)

§ Bottleneck roles

o How the soft preferences were handled

o A few detailed checks

· “Recommendation”

· “What’s inside each option (at a glance)”

· “Want me to push one back into your original “Roster” sheet?”

The Verdict:

“Yeah, wow.” Indeed.

There were 18 dates, 8 unique positions (but some with multiple people per day), and 31 people.

It gave 3 x pretty good roster suggestions, even without all the additional commentary and analysis, and received a thumbs up on a smoke test check from other volunteers.

Reflections:

Processing:

The agent interpreted the task into multiple steps, generally involving creating and running a python script, evaluating the results, and then proceeding to the next step. At least two scripts returned errors, so that the agent needed to retry.

I didn’t review the steps in detail and I don’t understand Python, but the broad approach seemed to be to work through parsing the spreadsheet, then creating and executing a series of python script to generate and evaluate optimised rosters, and lastly generating output spreadsheets with final checks. It’s one of the many approaches a person could follow, but it would take a person at least a day instead of 7 minutes.

Based on my reading of the steps, it’s not an approach than can be directly copied so that it could be easily transferred to a computer program. I’m sure the agent will be able to re-use some of the approach when repeating the same exercise, but it can’t be identical due to changes in constraints and dates.

Accolades:

I’ve done some pilot work with linear optimisation models in different contexts. These are challenging to configure, to work out the parameters and constraints in a way that can be processed by the model. This also creates a gap between the administrator doing the configuration compared to the people using the tool every day, so that it’s difficult for them to understand the inputs to the tooling and why it behaves in certain ways.

By contrast, there were some real strengths to the Copilot method:

All the preferences and constraints are in natural language, and very easy for a non-technical person to verify and correct
A non-technical person could submit new constraints (although refer caveats below…)
There is a level of robustness, for example bad grammar and typos. I used an abbreviation of a role in one place and the full name in another, and the agent correctly matched them. It also listed where I used a wrong role number and so it excluded it.
It is a huge advantage to be able to express constraints qualitatively, even if the agent is distilling intent and using a quantitative scoring method behind the scenes. It also made trouble-shooting unexpected results much simpler, because I could simply read the constraints to see whether the scenario was covered.
Qualitative, natural language constraints also allow inclusion of ad hoc or specific rules that the agent can dynamically process into the outputs, whereas traditional modelling process involves a heavy upfront time investment that makes it difficult to include transient or ad hoc constraints into the optimisation.
Related to qualitative constraints, the agents also apply semantic awareness for relationships that aren’t explicitly written. For example, the output referred to applying a particular team structure for consistency of volunteers across related roles on some weeks, based on the agent identifying that some of the skill sets generally work together. Some might say that there is a risk of the agent creating an incorrect relationship, but if the roster works according to the constraints, then the worst case is no harm done and the best case is happier volunteers.
The ability to mix the prompt with spreadsheet tables creates more opportunities to scale the tooling, for example to generate the input spreadsheet from database queries rather than requiring a person to have the ability to manipulate the data elements into a text prompt.

Caveats:

I don’t think any of these apply on a small to medium scale deployment, run intermittently. To schedule 31 people over 18 days in 7 minutes is pretty good, in my opinion anyway.

However:

I assume this process was compute expensive. 7 minutes is longer than any other prompt I’ve run, acknowledging that carelessness in the input increased the cost (empty rows and columns around the tables). I am not sure on the cost / capacity feasibility if I was to run a similar query daily.
Compute normally increases exponentially with complexity, and, while the same is also true of traditional optimisation models, the flexibility and accessibility inherent in submissions to an agent means that problems can get very complex very quickly. For example, if the scenario involved rostering across two locations with some level of overlap between them. This can be mitigated by breaking down problems into groups with most commonality, like traditional optimisation, and there is also the option to add specific guidance to the agent to the extent that it isn’t natively applying segmentation of the problem space.
While it is non-technical friendly, there is still a skill set to building these types of input spreadsheets, especially regarding defining constraints. The agent was surprisingly tolerant of errors, but there was one erroneous duplicate that was ignored instead of reported, so that one person didn’t get scheduled. Similarly, precision in constraint wording and completeness of the listing is important to the results. As soon as I opened the results sheet, I identified at least 3 constraints I had missed.
There were some patterns in the skill-people pairings that were apparent in the generated rosters and made intuitive sense with hindsight, which may have aided the agent to perform the task compared to a more random skill set, so that it will be harder to repeat the result when I include volunteer availability. However, I wouldn’t have necessarily picked it up the patterns if I was doing the roster manually, people patterns in skills are more reflective of reality anyway, they are patterns rather than consistent rules so that the agent still handled the exceptions, and 7 minutes is still faster than manually sorting it out.

Summary:

I am impressed and excited about the possibilities. If the results can be replicated on a new roster, it will turn the role of roster builder from luckless hard worker to speedy scheduling wizard.

However, it’s not a coding agent in the sense of a tool that is built once and then run many times. It may be that a tool could be derived or built from the actions taken, but that’s not something I’m able to evaluate.

Similarly, the level and cost of compute required to run the model may erode some of the benefits. This isn’t a relevant concern for a quarterly volunteer roster but may impact weekly or daily re-evaluation of rosters and schedules or as complexity grows.

Finally, there is also a level of skill to building the inputs and defining the constraints, even though the inputs can then be very accessible for review by someone else. In theory, prompting can be used to help generate the constraints, but this approach will still yield rapidly diminishing returns without the right optimisation skill set. For medium and large organisations needing to cater to diverse constraint groups or running complex schedules, this can be an obstacle to scalability unless a technique or model can be derived to reliably create constraint lists.

Based on this small test, I do expect this style of schedule optimisation to rapidly become a feature of project management and field service software, and much more than my limited prompt above. Subject to cost, but it’s flexibility and ease of implementation may be enough to entirely displace traditional optimisation engines altogether. This style of optimisation with qualitative and ad hoc and transient constraints hasn’t been something that I am aware as being easily achievable either by person or computer until now. I look forward to seeing how the use cases evolve.