I used to work close with incredibly smart people who was dealing with things like data sharding on daily basis from them I learned a lot on that topic. Later I moved to a different role where that knowledge was not needed and faded away over the time. Here I’m trying to reclaim to myself that long forgotten knowledge.
Sharding is a process of assigning an item to a shard - a smaller chunk of data out of a large database or other service. The general idea is that we can distribute data or service across multiple locations
and handle large volumes of data or handle more requests and with replication we can scale even more and make the system more resilient etc. But we need to have clear rules on how we assign partitions aka shards so
that we can route requests to the right location.
One of the question I often ask in my interview is to design a log processing library:
You need to write a library for processing logs in the following format:
timestamp<TAB>message
The library will be handed over to a different team for further maintenance and improvements and so maintainability and expandability is the most most important requirements.
The library need to support the following operations out of the box:
filtering
counting
histograms
The original version also included some language and background specific expectations I never include in my assessment because I feel that they put the candidate into a position when they need to read my mind to meet those expectations.
One of the questions are really love asking during coding interviews is this:
Given a continuous stream of words, a dictionary on disk and cost associated to read from disk, create a stream processor that returns true when a word exists in the dictionary while minimizing the cost of reading from disk.
Example:
Recently I was reading though a bunch of technical designs and I’ve noticed a common mistake when it comes to
writing user-stories and requirements - assuming a solution. The biggest issue for me when I write requirements myself is that than whenever I include a part of a solution I’m thinking about into the requirements it limits my ability to innovate since I’m bound to a specific solution. In many cases I observed improvements in my designs when I was focused on what the customer needs rather on fulfilling requirement tied to my first and probably not a bright idea.
It is interesting to observe that any endeavor where attention is one of key metrics or key drivers.
Regardless of the company size end up in the same hell pit of attention craving and optimization for
it. Even small single person blogs that teach us to be a better person, engineer, or somethings
are prone to that. Many of them, those I used, slowly became “Energy Vampires” to me constantly
seeking for my attention.
Every so often I interview senior software engineers for Amazon. Where I ask more or less same questions in
each interview. One of them requires adding a caching logic to get better results. I’ve noticed that the interviewee make one of to mistakes that blocks them from
standing out as a software engineers:
they don’t know, or talk about conditions under which a cache will do the best. Primarily, how a request frequency distribution affects cache performance.
they don’t know the standard library of a programming language of their choice.
The same Joscha Bach: Life, Intelligence, Consciousness, AI & the Future of Humans | Lex Fridman Podcast #392 podcast I mentioned in the previous post AI, people, trees, and mushrooms: the same software different hardware triggered another chain of thought. Joscha was talking about how our neurons always operate using data available right here and right now. That is enough to build complex systems like the human brain. Working together, neurons form parts responsible for memories, image processing, data buses, etc. But ultimately, each of them individually works only with data provided by other neurons. In a similar fashion, neural networks in GPTs are just a multiplication of matrices connected with each other, forming memories, attention, generation, etc.
Exploring the idea that all living things have spirit or the ability to run neurological signals.
Recently, I listened to Joscha Bach: Life, Intelligence, Consciousness, AI & the Future of Humans | Lex Fridman Podcast #392 where Joscha and Lex discussed different ideas about consciousness, neurology, and AI. At one point, they talked about the ability of all types of cells to process neurological signals. The key difference is that neurons can process data much faster, over longer distances, and interact with more neighbors at once.
It all started a while ago. In high school, I was into cyberpunk, reading and watching about hackers, virtual reality, etc. Neuromancer, Johnny Mnemonic by William Gibson, Labyrinth of Reflections by Sergei Lukyanenko, The Matrix, and The Lawnmower Man were my go-to entertainment. Then I forgot about it until the recent generative AI explosion and… a few episodes of Rick and Morty. In one episode, Morty plays a VR game where he starts as a newborn with no memories of life outside the game and lives an entire life until he dies at 60-80 years old. In another episode, he gets stuck in a game where his consciousness is fragmented into pieces, acting as an entire world of independent agents.
Amazon is famous for its writing culture which I discovered later in my career. The more I wrote the easier it was to apply the similar approach to other aspects of software development.
Introduction
When I joined Amazon, I transitioned from a company with a markedly different culture, particularly in writing and development processes. Initially, my feelings toward Amazon’s writing-centric culture were skeptical. However, over the next seven years, I gradually embraced and excelled in the Amazon style of writing.