Photo by Avel Chuklanov

The need for speed

People tend to be impatient. If a delay is introduced into any process, whether it be driving on the road or being placed on hold, a person’s satisfaction goes down. The same holds true for the internet.

According to research by Google, 53% of site visits are abandoned if pages take longer than 2 seconds to load, and a 2-second delay in load times result in abandonment rates of up to 87%.

On top of customer satisfaction, search engine rankings are heavily impacted by site performance. Getting ranked higher than your competitors equates to more visibility and more revenue. …

TL;DR: Sometimes, writing an intermediate result to disk and then reading it back again is faster than checkpointing or unoptimized caching!

At Spokeo, we use Spark on Amazon EMR to build data from our data lake into documents that can be consumed by our backend services and eventually shown in the frontend. Therefore, many of our ETL jobs are concerned with producing one result at the end of a script. Getting to the end may involve many heavy calculations, e.g. windowing on id, before joining back to the main dataframe. …

We (like everyone else) want to be able to ship easy to read code in a way that’s fast, maintainable, and free of errors. As our operations grow, so does the amount of infrastructure we manage and the size of the teams involved. At Spokeo, we have gone from AWS GUI -> shell scripts aws-cli calls to create instances -> using Terraform, Infrastructure as code.

As operations teams grow, ground rules on code styles and review can become tribal knowledge that is hard to communicate or standardize. The whole argument of tabs vs spaces, heredoc vs codified comes into play…

Anyone who opens any software at all is familiar with versioning. It’s a concept people understand — newer versions of software have higher numbers. Think about all those iOS updates required for your iPhone. It might seem like a simple thing to come up with — each update comes with an increased number with the highest number being the most current version . For software development teams, this is not as simple as it seems.

Semantic Versioning Specification (SemVer) is a universal system for automating and tracking performance versioning of software. This framework for specifying changes uses a three-part number…

Spokeo Engineering

Spokeo is a people search engine! We’ve organized over 12 billion records from thousands of sources into easy-to-understand profiles.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store