CHARLOTTESVILLE, Va. (CBS19 NEWS) -- A new tool could make it easier for doctors to diagnose cancer by detecting cancerous cells.

Researchers at the University of Virginia Health System have developed a new tool that can speed up the understanding of the genetic causes of cancer and other diseases.

According to a release, this new tool is a mathematical model that will help ensure the integrity of “big data” about the building blocks of chromosomes and genetic material called chromatin.

Chromatin is a combination of DNA and protein that has an important role in directing what genes do, but when it goes wrong, it can turn a healthy cell into a cancerous one or contribute to other diseases.

Scientists can already study chromatin within individual cells using a technology called single-cell ATAC-seq, but this new tool will be able to cut through the enormous amount of data that the technique generates.

That means it can prevent scientists from following false leads and wasted effort by eliminating some of the noise and bias in the data.

“Using the traditional way of analyzing the data, you might see some patterns that look like real signals of a particular chromatin state, but they are actually fake due to the bias of the experimental technology itself. Such fake signals can confuse scientists,” said Chongzhi Zang, PhD, a computational biologist with UVA’s Center for Public Health Genomics and UVA Health Cancer Center. “We developed a model to better capture and filter out such fake signals, so that the real needle we are looking for can more easily stand out of the hay.”

The release says the new tool adapts a model from number theory and cryptology called “simplex encoding,” which is used to code DNA sequences into mathematical forms.

It converts complex genome sequences into a simpler mathematical form that can then be compared to different forms to find bias and noise in the data that cannot be found easily through conventional methods.

“The DNA sequences’ complexity increases exponentially when they get longer. They are difficult to model because a typical dataset has millions of sequences from thousands of cells,” said Shengen Shawn Hu, PhD, a research scientist in Zang’s lab and the lead author of this work. “But the simplex encoding model can give an accurate estimation of sequence biases because of its beautiful mathematical property.”

Testing has shown the new tool is significantly better at analyzing single-cell data to characterize different cell types, which is important for biological research and disease diagnosis because doctors can use it to find tiny cell samples within larger specimens.

“The biases were not easy to find because they were tangled with real signals and hidden in the big data. It might not be a big deal if people are only going to pick the strongest signals from a large number of cells,” said Zang. “But when you look at single-cell data, there are no low-hanging fruits anymore. The signals are always weak on the individual cell level, and the effect of noise and biases can be catastrophic. Bias correction is often ignored but can be vital in single-cell data analysis.”

The new tool is available in a free, open-source software format, which can be found here.

The findings have been published in the scientific journal Nature Communications in a free-access format.