Artificial Intelligence and Food Safety
The year 2023 has been to artificial intelligence (AI) what 1993 was to the internet: the year it became available to the masses. While the public debate on the impact of AI on society has just started, one of its most fascinating aspects is its potential to generate elaborate predictions based on an analysis of immense volumes of data.
For the past few years, researchers and regulators have been trying to apply this ability to food safety. FDA has made data analytics a part of its New Era of Smarter Food Safety Blueprint, an initiative the agency launched in 2020 that seeks to reduce the number of foodborne illnesses by leveraging technology to create a safer, more digital, and more easily traceable food system.
DATA SHARING IN THE FIELD
Food safety organizations have also joined the AI movement. One online platform developed by the Western Growers Association, a trade organization comprising more than 2,200 farmers, aims to allow users to share food safety data. This network, called GreenLink, started in 2021 in partnership with Creme Global, an Ireland-based data analytics company, and six participating members and has grown tremendously, reaching 140 growers and 6 million data points. “Our goal is to capture and analyze field food safety data so that each operation can view it individually and compare it with an aggregated data of other operations,” says De Ann Davis, PhD, senior vice president of science for the Western Growers Association.
The GreenLink platform plans to use both descriptive and predictive models for analysis. “For example, if a water test comes back high in E. coli, we would like to be able to use descriptive analytics to explain what’s likely causing that, and predictive analytics to understand [whether] that value is expected to be high in that period of the year,” says Dr. Davis. The use of predictive analytics, however, hasn’t been implemented; GreenLink’s datasets are not yet consistent enough to start making predictions. “That doesn’t mean that in six months we won’t be able to do that, though,” she adds.
This insufficient level of consistency has to do with the freedom that the project leaves to participants to decide what data to share—for example, field location, water or pathogen testing results, or bird activity. Such flexibility is meant to encourage members to share information that is normally treated as confidential.
The challenge of collecting non-public data is an aspect of AI in which the human factor is very much present. When sensitive company data is essential for developing AI tools, sharing it is not a spontaneous act done for the sake of the algorithm; rather, it’s a business decision taken to gauge risk versus reward.
Dr. Davis says this is a chicken-and-egg problem: “People want to know what you’re going to deliver before they go all the way in with the data, but you can’t deliver anything if they don’t provide data first. So, it’s also a matter of balancing the value they’re getting out with the amount of data they’re putting in.”
WHY THE PRODUCE INDUSTRY IS RIPE FOR AI
Indeed, growers may be receptive to the idea of sharing data. Matt Stasiewicz, PhD, an associate professor of applied food safety at the University of Illinois Urbana-Champaign, says, “While the produce industry is well controlled, we’re still seeing outbreaks. Yet, no single company is going to observe enough contamination events to understand truly what’s driving that risk. People are starting to realize that sharing data across companies may be the way to find answers to those questions.”
Dr. Stasiewicz is one of his university’s site leads for the AI Institute for Food Systems (AIFS), a consortium formed by six universities and USDA. One of the group’s aims is to create an AI-powered database based on information gathered from public research projects, with a specific focus on microbiological testing data from growing fields: “Just knowing that a test was positive or negative is not really predictive,” says Dr. Stasiewicz. “It’s much more useful to find out what else about that sample could help predict the result, such as how the sample was taken, its size, the assay method, or the size of the field. That can be combined with publicly available data such as weather patterns, the presence of migratory birds, or a specific wind pattern that may be blowing dust in from somewhere else.”
FEDERATED LEARNING
Getting growers and researchers to share data can be a challenge, a challenge Dr. Stasiewicz is certainly familiar with. “Nobody is going to share with me, as an academic, a bunch of data,” he says. “Even if it’s not clear what the risk is, if you can’t define a benefit, it’s not worth doing it. If we want to show a path to share food safety information in a non-competitive and non-risky way, we need to find a way to provide more value than the standard root cause analysis.”
One way to lower the perceived risk of sharing data is to remove personally identifying information: “We don’t necessarily need a firm name, a facility location, and a sample date. What we need is the relationships: knowing, for example, that two samples came from the same facility,” says Dr. Stasiewicz.
Another method would be not to require data sharing in the first place. This approach is called federated learning. Bas van der Velden, PhD, head of data science at Wageningen Food Safety Research (WFSR), a research organization based in Utrecht, Netherlands, says, “In the
traditional model, you collect data in a centralized place and use it to train the algorithm. In federated learning, it’s the algorithm that goes to the data stations—which can be a computer, a smartphone, or a server—but, instead of coming back with the data, it just takes the optimized model back. The data never leaves its original location.”
Through this model, WFSR and additional partner companies and research institutions are contributing to an EU-funded project called Extreme Food Risk Analytics (EFRA). The organization’s goal is to develop AI-powered food risk prevention tools using what it calls “extreme data mining.”
Dr. van der Velden explains that the next phase of the project will be to take this model into a real food production environment by working with a large European poultry producer as a use case. “We plan to apply the federated learning approach to train the AI tool with all sorts of internal and external data. A possible use case could be an early warning system that tells you there’s a pattern indicative of microbiological hazard in the short or long term,” he says.
Another crucial aspect of food safety that WFSR is working on, and one that machine learning normally lacks, is a concept called “explainability,” adds Dr. van der Velden. “If you simply say to a farmer not to harvest or not to irrigate today because the algorithm says so, you likely won’t have a successful adaptation. Explainable AI tells why a certain action matters in a language that is tailored to each user, whether it’s policymakers, farmers, researchers, or average citizens,” he adds.
CONNECTING INFORMATION
One type of AI that makes massive use of public information is a model developed by Agroknow, a data and analytics company based in Athens, Greece. The company uses AI technology to collect public food safety data, such as product recalls, border rejections, or facility inspections, and combine it with the internal information of food companies. “Part of our work is to discover announcements hidden in the websites of public authorities around the world and translate them into English,” says Nikos Manouselis, CEO of Agroknow. “When the municipality of Athens inspects a food facility in the region and discovers an issue, they announce it in Greek on their website. Similarly, the FDA publishes its most important announcements in one or two pages, but there are also other pages that nobody looks at.”
Once all of this public data is mined, Agroknow uses AI to connect pieces of information that, though seemingly unrelated, likely refer to the same event: “There may be a news article about five people who got sick from Salmonella after consuming a chicken product in Crete, and a public announcement about a recall of the same product, area, and days, where the serotype is specified. The algorithm would match them and provide a complete description of the event, assigning a reliability score,” says Manouselis.
When all this data is analyzed and harmonized with the use of AI, it gives food companies an accurate idea of the current risks in the supply chain. When their internal data, such as results of inspections, audits, and lab tests, is added, the picture is complete.
Manouselis says that this information can be used to assess the risk related to ingredients or suppliers almost in real time. “If there’s a spike in contaminations of ethylene oxide in sesame seeds and it’s one of my ingredients, I will know I have to test more. If one of my suppliers or other suppliers in the same area were involved in food safety or food fraud incidents, I will source from a different region.”
The most interesting and impactful use of this model, however, is to anticipate trends to better allocate testing and auditing resources, which is especially important for large food companies with extensive supply chains. “When we were in the middle of the ethylene oxide crisis, everyone was testing much more. At some point, our forecasting models showed that the risk was decreasing. For our clients, that was a signal that they could start testing less for ethylene oxide treatment and redirect resources to other areas.”
Right now, the accuracy score of Agroknow’s typical forecasting model ranges between 80% and 95%. But for Manouselis, even a lower level could be useful: “We’re not going to keep it locked up until it reaches 100%. We prefer to put it in the hands of our clients and let them decide if it is useful or not; very often they tell us that even 40% would be enough for them to make better decisions.”
Manouselis cautions that an important part of making AI tools useful and accessible is to demystify them: “AI is not black magic; it’s a scientific model,” he says. “You train it with data, it gives back results; you validate these results and improve the model with more data. It’s a constant cycle.”
By Andrea Tolu