How to Read Unfamiliar Code

28 minutes, 11 links
From

editione1.0.1

Updated August 7, 2023

Itโ€™s a common misconception among students and aspiring programmers that professional software engineers spend all of their time writing new code and building new systems from scratch. Many new developers face a rude awakening when they land their first job and find out that this is far from the truth. In fact, aside from planning and documenting, most of your early-career time will be spent maintaining, extending, and fixing bugs in legacy codebases. Youโ€™ll be tasked with making small- to medium-sized changes to the code that your team members wrote, and you may sometimes find yourself working on code written by someone who is no longer with your company.

Working on legacy code gives you the opportunity to get experience working on a mature codebase. In a way, it can be seen as a rite of passage on some teams because it allows you to get familiar with complex abstractions and business logic. There will be design patterns, coding standards, and test cases that the previous programmers established and that youโ€™ll be able to follow when making your changes. Following established patterns when learning a new codebase will help you focus on the behavior of your code without getting too bogged down in details about the design and architecture of the code.

This is especially true when you join a new team, because youโ€™ll be learning the nuances of the codebase and the business rules while getting up to speed. Your manager will probably start you off with some small bug fixes and enhancements before you graduate to larger projects. In many cases, it would actually be counterproductive for you to jump in and make large changes to a codebase that you donโ€™t understand very well. That would be very risky, especially as a junior software engineer still learning the best practices.

Before you can run, you need to learn how to walk, which is why itโ€™s so important to develop skills for reading and understanding unfamiliar code. The quicker you can read code and understand its intended behavior, the quicker youโ€™ll be able to make changes, fix bugs, or identify edge cases that werenโ€™t considered.

Your manager will give you projects that will require you to do some digging to identify the location of a bug or to determine the best way to extend a feature to enhance its functionality. At times, youโ€™ll feel like an archeologist uncovering corners of the codebase that havenโ€™t been touched in years, decoding what the previous engineers were thinking when they wrote the code on your screen, and piecing together a mental model of how the system works as a whole.

Even though you may read a piece of code and understand its behavior, you may not have all the information you need in order to fix certain bugs. Code can be very nuanced sometimes. You may read a piece of code and think thereโ€™s a better way it could have been written, or that perhaps the problem could have been solved in fewer lines of code, but there may be additional context that the original author had to consider but that you may not yet understand. Your job is to put yourself in their shoes and figure out what their code is doing and if thereโ€™s a reason why it was written the way it was. Oftentimes, the author had to accommodate specific edge cases that may not be apparent on a first reading of the code. Youโ€™ll need to put on your investigator hat and ask yourself some questions about their code.

โ€‹exampleโ€‹Here are examples of things youโ€™ll need to figure out as you read new code line by line:

  • What kind of inputs did the author expect? Are they validated?

  • Which edge cases did the author consider? Are there any that arenโ€™t handled?

  • What do the data structures look like?

  • What assumptions did the author make about the data? Could any of those assumptions be wrong?

  • How did the code change over time? Were additional changes made after the code was shipped?

Reading other peopleโ€™s code isnโ€™t the most glamorous aspect of being a software engineer, but itโ€™s an important skill to master if you want to excel in your career. Itโ€™s frustrating reading code thatโ€™s hard to follow, especially when there are layers of abstractions or itโ€™s written differently from how you would have approached the problem.

Unlock expert knowledge.
Learn in depth. Get instant, lifetime access to the entire book. Plus online resources and future updates.
Now Available

Reading other peopleโ€™s code might not have been what you had in mind when you decided to be a professional programmer, but itโ€™s part of the job. What might surprise you though is that reading unfamiliar code is also one of the most important things youโ€™ll do in your programming career. In fact, reading other peopleโ€™s code is one of the best things you can do to improve your own coding skills.

As a child, you were assigned to read books and write essays, which was no different. You had to read another personโ€™s work and come to conclusions about what the authorโ€™s intentions were. Except in that case, you were dealing with literature and written language instead of computer programs and code.

Even the most successful authors in history didnโ€™t create their work in a vacuum. Ask any famous author what their favorite books are, and youโ€™ll receive title after title of books that inspired their own writing. In fact, some of the best writers often spend more time reading other authorโ€™s works than writing their own. And theyโ€™re not just skimming through the books, theyโ€™re studying them: dissecting and analyzing the choice of words, sentence structure, style, tone, and vivid scenery. They notice which literary rules the original author followed and, perhaps more importantly, which ones they broke. By observing how great authors bend the rules of their language, writers become better at their craft, and they adopt similar techniques and styles in their own writing.

The same is true for software engineers. You must study other programmersโ€™ code in order to understand how their programs work. Youโ€™ll learn new design patterns, ways to structure your codebase, optimization techniques, algorithms, novel solutions to complex problems, and so much more.

Reading code from better programmers will help you become a better programmer, plain and simple.

Youโ€™ll mostly be reading code written by your coworkers, which is great because you can ask them about specific details when you have questions about their code. You may be reviewing their code in a pull request or reading code in a specific part of the codebase youโ€™re working on. Your team members are an excellent resource for learning, so make sure you utilize them when youโ€™re having trouble understanding a specific piece of code. Donโ€™t hesitate to ask questions if you donโ€™t understand a piece of code.

Additionally, with the rise of open-source software, you have an incredible amount of resources available to you online. Reading code from popular open-source projects is an excellent way to learn how other programs are structured, and you can follow along in the open issues, pull requests, and discussions around how new features and bugs are fixed and merged into the main branch. GitHub, GitLab, Bitbucket, and other websites have millions of open-source code repositories available online, so itโ€™s easy to find some popular projects in your favorite language. You can even subscribe to get updated on all new issues if you find a project you want to follow along with.

So, now that weโ€™ve gone over the benefits of reading code and why you should read other peopleโ€™s code, letโ€™s jump into some specific tools and techniques you can use to improve your code-reading skills.

Find the Entry Points

First things firstโ€”figure out where the program starts. To execute a program, the loader (typically an operating system) will pass control of the process to a programโ€™s entry point, which begins the run-time execution of the application.

The entry point is the place where a program begins, and itโ€™s important to know what the program is doing once it begins executing the code. When you follow a program from the entry point, youโ€™ll be able to follow the application as it boots up and configures itself to do whatever work it was designed to do.

Some programming languages may enforce conventions for how or where a program should start, while others may give more freedom in how a program is executed.

  • C-family languages, such as C, C++, and Rust, and JVM languages such as Java contain a predefined function called main.

  • Interpreted languages like JavaScript, Python, Ruby, and PHP will simply begin execution at the first statement.

Once your program has control of the process and has begun execution, it will be able to access command-line arguments and environment variables that can be used to dynamically configure the behavior of your application during run time. The program may contain specific logic to check for these arguments or environment variables in order to change the run-time behavior of the application without needing to recompile or redeploy the application.

Itโ€™s important to know where and how your program starts because that may give you valuable information as to how the program is configured, which could affect how the program behaves. If you donโ€™t know what run-time configurations your program is using, you may not fully understand what itโ€™s doing, so this is always a good first step.

Leverage Your IDE

Your integrated development environment (IDE) is one of the most important tools you will use when reading code. Your IDE gives you a set of tools to analyze and manipulate your codebase, so choosing a good IDE will help you navigate the code efficiently.

When reading code, youโ€™ll want an IDE that lets you jump to a function definition. This feature is crucial for learning and studying a new codebase, and most modern development environments should support this functionality. This allows you to jump through the codebase to see where a function is defined, which is useful whenever you come across a function call youโ€™re not familiar with.

This feature gives you the ability to step through the codebase and follow the execution path, which helps you build a mental model of the code and what itโ€™s doing. Itโ€™s a great way to explore unfamiliar code and can help you get up to speed quickly.

When you jump to a function, take note of the file name and directory structure where the function lives. You can learn a lot about the structure of an application just by observing how things are organized.

Most IDEs that allow you to jump to function definitions should also give you the ability to move in the opposite direction as well. When youโ€™re looking at a function, you might want to know all the places where itโ€™s used within the codebase, which is helpful if youโ€™re trying to track down a bug or refactor a piece of code. The ability to see all places where a function is called is equally as powerful for learning and understanding a codebase.

If your IDE doesnโ€™t offer these basic features, consider switching to one that does. Once you get in the habit of navigating around the codebase by jumping from function to function, youโ€™ll wonder how you ever lived without it.

Dig Deeper

Development tools arenโ€™t perfect, and sometimes our IDEs wonโ€™t be aware of the entire structure of the codebase. Perhaps you have some code that is called dynamically or your language supports metaprogramming, both of which can be difficult for IDEs to understand. In some cases, you may need to use other tools like grep or git grep instead, which give you the ability to search your codebase for specific patterns such as variables, functions, class names, or constants.

For example, you may come across a function called findNearbyLocations() while reading some code. In order to find all locations where that function is called, you can run the following command from your projects root directory:

$ grep -r findNearbyLocations *

Most of the time, youโ€™ll want to search recursively using the -r flag, although this means it will also search in folders we may not want to query, such as dependency directories that contain large amounts of third-party code. While grep gives you the ability to exclude certain directories from your search, it may be annoying to have to manually exclude them every time.

Fortunately, if youโ€™re using git for version control, there is a command called git grep that works similarly, except that it automatically ignores any files and directories that are defined in a file called .gitignore. This makes it much easier to query your codebase without having to sift through files and directories youโ€™re not interested in.

With these tools, you have a way to query your codebase any time you come across a function youโ€™re not familiar with. This will help you learn how a function works, what parameters it expects, what the return values are, and where else itโ€™s used in the codebase. Using these tools will help you to better understand what the code is doing and how it is organized, and will ultimately help you build and refine your mental model of the codebase.

โ€‹resourcesโ€‹

The Blame Game

When youโ€™re reading through code, you may want to know when it was last changed. If youโ€™re using git, thereโ€™s another tool called git-blame, which displays the last revision and the author who most recently modified each line of a file that youโ€™re interested in. This is useful for determining when certain functions were last modified and by whom.

Use the command below to view the last revision and last person to touch each line of a file:

$ git blame <file>

โ€‹confusionโ€‹ It should be mentioned that git-blameโ€™s intentions are not to actually blame someone for writing a bad piece of code, and hopefully you wonโ€™t use it for that purpose. Itโ€™s simply another tool at your disposal for understanding the code and how it evolved.

You should consider using git-blame when working on a bug youโ€™ve been assigned to, or when you have questions about a specific function. Git-blame will give you clues as to who you should talk to first when you have a question regarding specific lines of code.

Depending on the age of the codebase, the most recent author may no longer be with your company. If thatโ€™s the case, you wonโ€™t be able to ask them any questions, but youโ€™re not out of luck. With git-blame, you will still be able to find the commit hash, which you can use to view the full context of the changes. Oftentimes, being able to read the commit message and see all the other changes that were made in the same commit will give you more context for why the change was made.

If youโ€™re still not able to find any developers who are familiar with the code youโ€™re looking at, use git-blame to find the developers who made modifications to other parts of the file and ask them if theyโ€™re familiar with the code in question. Chances are youโ€™ll be able to find someone who has worked in that part of the codebase before or reviewed the pull requests for the code in question.

โ€‹resourcesโ€‹

Read the History

While git-blame shows you who made the most recent changes to each line in a file, sometimes you might be more interested in the history of a single file and how itโ€™s changed over time. Git offers a useful tool called git-log that lets you inspect the commit logs for a given file.

Use the following command to view a reverse chronological list of commits where changes were made to a file:

$ git log <file_path>

This will give you a full history of all commits to the file so youโ€™ll be able to see who made changes to it and, more importantly, when they made those changes. Just as with git-blame, you can use git-log to find the developers who made the most recent changes to a file, because they should be the ones you reach out to first.

If you suspect a bug is located in a certain file, use git-log to view when a file was changed and by whom. Itโ€™s extremely helpful if you know when a bug was first reported or when an error started popping up in your logs. You can use git-log to line up errors with changes made to specific files, which may help you pinpoint when bugs may have been introduced into the codebase.

โ€‹resourcesโ€‹

Log Some Data

As youโ€™re reading through code, you will need to hold a mental model of the data in the system and how it is manipulated as the business logic is applied. Some code may be easy to follow, but you may find yourself deep in the codebase without any idea what the data looks like when it reaches a certain function. In these situations, itโ€™s sometimes useful to lean on your logging system to print some data to your log files so that you can inspect it.

Add a few log statements with data youโ€™re interested in. This could be certain values of variables or object properties, or it could be an arbitrary text string that will give you some useful information if you see it in your logs. Either way, setting log statements throughout your code is a quick and easy way to get a snapshot of what your data structures look like at a point in time when the code is executing. Sometimes, a well-placed log statement can reveal a bug youโ€™ve been tracking down, or it can expose certain things that help you understand what the code is doing.

All programmers rely on logging to gain insight into what their code is doing, so donโ€™t feel like itโ€™s the wrong way to debug your code. Even the most experienced engineers rely on logging when theyโ€™re developing new features or tracking down a hard-to-find bug.

Fire Up Your Debugger

Occasionally, youโ€™ll come across code that you wonโ€™t understand no matter how many log statements you add. Wrapping your head around confusing code is frustrating, especially if youโ€™re trying to figure out how some piece of data is being manipulated. While you may be able to figure it out with enough log statements, itโ€™s messy to add them all over your codebase just to piece together whatโ€™s going on. Sometimes a debugger is the better tool for the job.

When you distill a program down to the simplest form, itโ€™s really just taking some inputs, manipulating the data structures, and producing output somewhere. To really get a grasp on how everything works, you need to understand how the data changes as it moves through the system. While itโ€™s helpful to read through code and build a mental model of what the data structure looks like, itโ€™s sometimes easier to visualize the program with a debugger and observe how the data changes as it moves through the system.

If you have a debugger configured, youโ€™ll be able to see what the data looks like at each breakpoint you set. As you step through the debugger, focus on the data and how it changes as you step in and out of functions.

โ€‹resourcesโ€‹

Tests Contain Context

An underrated technique for studying an unfamiliar codebase is to read through the automated tests. While itโ€™s not the most glamorous part of the codebase, thereโ€™s an enormous amount of institutional knowledge stored in the test files. Automated tests are where past and present developers have codified the specifications the application is expected to operate within.

Most young developers donโ€™t realize that a mature test suite will show you exactly how a program should perform, because each test thatโ€™s added to the suite should be designed to validate a specific part of the program for a specific scenario. As you read through the test cases, youโ€™ll see what edge cases the tests handle and what the expected outcomes should be.

Additionally, the assertions in automated tests will show you what the expected output should be when you call a function. Assuming the tests are passing, this gives you a clear picture of how the system works and what application states you should expect.

Donโ€™t Try to Understand It All

Codebases are complex, plain and simple. A codebaseโ€™s complexity can be roughly estimated as proportional to the number of engineers who have contributed to the codebase multiplied by its age. As more developers contribute to a codebase over time, the complexity continues to increase.

Itโ€™s almost impossible to understand every line of a codebase, especially if you didnโ€™t write it yourself. In fact, even a solo developer who has written every single line of a codebase will forget the details and context of parts of the system over time. They may come back to a file they wrote months ago and struggle to remember how it works.

Setting the right expectations now will help reduce your frustrations in the future. Itโ€™s okay if you donโ€™t understand how every line of code in a program works.

As developers, itโ€™s our job to form a mental model of how a program works, and how the pieces fit together to form a complete system. You have a limited capacity in your brain to hold this mental model, and eventually, youโ€™ll hit a saturation point where youโ€™re not able to hold the entire mental model in your head at once. As you learn new parts of the system, you may forget other parts you havenโ€™t visited in a while. Itโ€™s natural and common among all software engineers.

Depending on the size of the codebase, it may even take years to feel like you know your way around it. It certainly doesnโ€™t help that the codebase is constantly changing as new features are added, bugs are fixed, tests are written, algorithms are optimized, and engineers come and go. Part of the system you understood months ago might have been refactored since then and now works completely differently. Youโ€™ll always be chasing a moving target, so donโ€™t beat yourself up if you donโ€™t understand every corner of a codebase.

The best thing to do is to accept that you wonโ€™t have a deep understanding of every single part of a codebase, and thatโ€™s okay. As long as you work hard to form a mental model about the parts youโ€™re responsible for, things will start to make more sense. It wonโ€™t happen all at once, but given enough time, the picture will become clearer and clearer. The trick is to be patient and get comfortable with reading unfamiliar code, because youโ€™ll be doing it for your entire career.

โ€‹resourcesโ€‹

How to Add Valuean hour, 12 links

As software engineers, we often get caught up in the day-to-day details of our job without even knowing it. We make hundreds of decisions each day, such as the architecture of our programs, what to name our variables, when to add a new function, which ticket to work on, how to design our database schema, and so much more.

While these are all fun decisions to make, they require us to consider the long-term implications of our choices, debate the pros and cons, and ultimately settle on a solution. There are so many choices to make that sometimes we fail to see how an individual decision fits into the grand scheme of things. We lose sight of the bigger picture because weโ€™re so focused on the details of the current problem weโ€™re trying to solve.

As you gain experience and progress in your career, youโ€™ll learn how your decisions fit into the overall system, and your decision-making skills will evolve. Youโ€™ll start to comprehend the trade-offs between solutions and understand the positive and negative impacts your decisions could have on the business. Youโ€™ll start to understand the implications of changing one part of the system and how it affects other parts. Eventually, youโ€™ll improve your ability to know which decisions add the most value to the customers and the business, and to prioritize those decisions above the others.

Youโ€™re reading a preview of an online book. Buy it now for lifetime access to expert knowledge, including future updates.
If you found this post worthwhile, please share!