top of page
  • Writer's pictureShawna Applequist

Sisyphus and the Data Boulder

Sisyphus, once a king of cunning and guile, now lives in Hades with an eternal burden. In the midst of the bleakness that is Hades, stands a hill, steep and unforgiving, its summit shrouded in the mists of impossibility. And at the base, Sisyphus stares at his eternal burden –– a colossal boulder, massive and unyielding.


Every muscle strains, every sinew tightens as he begins his journey upwards. The boulder, a symbol of his eternal struggle, rolls grudgingly forward, inch by painstaking inch, defying the efforts of the man doomed to push it.


The ascent is a torture designed not just to punish the body, but break the spirit. Sweat beads on his brow, muscles scream in agony, and Sisyphus has no choice but to push on. With every step, the summit seems to recede, a horizon that forever eludes his grasp.


Just when victory seems within reach, the boulder, as if possessed by the spite of Olympus itself, slips, teetering on the edge before it succumbs to gravity. Down it rolls, thundering back to the valley below, leaving Sisyphus to gaze in despair at its retreating form.


With a heart heavier than the boulder he pushes, Sisyphus descends. As he reaches the base, and once more sets his hands against the stone, the cycle begins anew.


If this classic Greek myth was written today, Sisyphus might have been eternally doomed to classify documents. Don't believe me? According to the latest estimates, 328.77 million terabytes of data are created every day. Where does this data go? How is it being stored? How much of it is lost to data silos?


A modern Sisyphus-type character pushing a giant ball up a hill. The ball is made up of computers and papers, disorganized and falling off of the ball as the worker pushes.

These are questions that companies globally struggle with. It is a problem that needs to be solved. So many companies turn to applications like SharePoint to help store and organize their data. But this often creates a new problem: who or what is going to tag all that information so it is findable?


Manually tagging documents would cause a Sisyphean fate for many data workers. In a hypothetical scenario, let's assume that manual categorization takes five minutes per item. With a dataset of 10,000 items, it would take 34.72 days of nonstop tagging. This does not take into consideration the fact that that same dataset may be collecting even a modest 100 new items a day, or the fact that humans do not work 34.27 days at a time with no breaks.


But what is the solution? How do we get this proverbial data boulder to the top of the hill? Taxonomies. And no, we don't mean stuffing dead animals. Taxonomies are the hierarchical sets of terms that can, along with AI, work to automatically tag all of your documents. Even with a conservative estimate, the potential time savings could be up to 90%. With time savings like that, knowledge workers could get back to doing what they are good at –– working with organized data –– while the taxonomy works in the background getting all that data organized. The taxonomy and auto-tagger combination is lightyears faster than a manual tagger and it doesn't get tired or need a break.


What sets a good taxonomy apart from a bad one? How can your company ensure it is getting that 90% time savings? Here are four factors to be on the lookout when it comes to time savings:


Complexity and Volume of Data: The larger and more complex a dataset is, the more time it will require to manually categorize and analyze. A well-structured taxonomy can expedite this process through its framework, providing automated categorization.


Specificity and Depth of the Taxonomy: A taxonomy which closely matches nuances in the content can lead to faster and more accurate categorization which reduces the need for manual corrections.


Alignment with Project Goals: A taxonomy's relevance to specific objectives is crucial.


Efficiency of Tools and Processes: The integration of a taxonomy with data analysis tools and the efficiency of the existing processes for handling and analyzing data will contribute significantly to the extent of time saved.


Some companies choose to create their own taxonomies from scratch, while others choose to purchase a prebuilt taxonomy. Not sure what would be best for your situation? Consider the problem at hand: if your employees are finding themselves pushing that proverbial data boulder up the hill every day, and cannot find the things they need to do the work they need to be doing, why would you add to their load and ask them to create a taxonomy from scratch?


Furthermore, most people are experts at what they do, not experts at creating taxonomies.

By using a prebuilt taxonomy, you can ensure you are getting the highest quality dataset possible in the most time-efficient manner.


Sisyphus may have cheated death, but he ended up with a punishment far worse than the fate he tried so desperately to escape. Greek mythology left him with a cruel fate. Here in the real world, however, we don't have to get stuck pushing an endless data boulder. With a taxonomy and the accompanying AI, no one is cheating the system, they are simply using tools to do the seemingly endless, mindless tasks so that your knowledge worker's time can be spent making a difference in your company and in the world.

Comments


bottom of page