editione1.0.1
Updated August 7, 2023Itโs a common misconception among students and aspiring programmers that professional software engineers spend all of their time writing new code and building new systems from scratch. Many new developers face a rude awakening when they land their first job and find out that this is far from the truth. In fact, aside from planning and documenting, most of your early-career time will be spent maintaining, extending, and fixing bugs in legacy codebases. Youโll be tasked with making small- to medium-sized changes to the code that your team members wrote, and you may sometimes find yourself working on code written by someone who is no longer with your company.
Working on legacy code gives you the opportunity to get experience working on a mature codebase. In a way, it can be seen as a rite of passage on some teams because it allows you to get familiar with complex abstractions and business logic. There will be design patterns, coding standards, and test cases that the previous programmers established and that youโll be able to follow when making your changes. Following established patterns when learning a new codebase will help you focus on the behavior of your code without getting too bogged down in details about the design and architecture of the code.
This is especially true when you join a new team, because youโll be learning the nuances of the codebase and the business rules while getting up to speed. Your manager will probably start you off with some small bug fixes and enhancements before you graduate to larger projects. In many cases, it would actually be counterproductive for you to jump in and make large changes to a codebase that you donโt understand very well. That would be very risky, especially as a junior software engineer still learning the best practices.
Before you can run, you need to learn how to walk, which is why itโs so important to develop skills for reading and understanding unfamiliar code. The quicker you can read code and understand its intended behavior, the quicker youโll be able to make changes, fix bugs, or identify edge cases that werenโt considered.
Your manager will give you projects that will require you to do some digging to identify the location of a bug or to determine the best way to extend a feature to enhance its functionality. At times, youโll feel like an archeologist uncovering corners of the codebase that havenโt been touched in years, decoding what the previous engineers were thinking when they wrote the code on your screen, and piecing together a mental model of how the system works as a whole.
Even though you may read a piece of code and understand its behavior, you may not have all the information you need in order to fix certain bugs. Code can be very nuanced sometimes. You may read a piece of code and think thereโs a better way it could have been written, or that perhaps the problem could have been solved in fewer lines of code, but there may be additional context that the original author had to consider but that you may not yet understand. Your job is to put yourself in their shoes and figure out what their code is doing and if thereโs a reason why it was written the way it was. Oftentimes, the author had to accommodate specific edge cases that may not be apparent on a first reading of the code. Youโll need to put on your investigator hat and ask yourself some questions about their code.
โexampleโHere are examples of things youโll need to figure out as you read new code line by line:
What kind of inputs did the author expect? Are they validated?
Which edge cases did the author consider? Are there any that arenโt handled?
What do the data structures look like?
What assumptions did the author make about the data? Could any of those assumptions be wrong?
How did the code change over time? Were additional changes made after the code was shipped?
Reading other peopleโs code isnโt the most glamorous aspect of being a software engineer, but itโs an important skill to master if you want to excel in your career. Itโs frustrating reading code thatโs hard to follow, especially when there are layers of abstractions or itโs written differently from how you would have approached the problem.
Reading other peopleโs code might not have been what you had in mind when you decided to be a professional programmer, but itโs part of the job. What might surprise you though is that reading unfamiliar code is also one of the most important things youโll do in your programming career. In fact, reading other peopleโs code is one of the best things you can do to improve your own coding skills.
As a child, you were assigned to read books and write essays, which was no different. You had to read another personโs work and come to conclusions about what the authorโs intentions were. Except in that case, you were dealing with literature and written language instead of computer programs and code.
Even the most successful authors in history didnโt create their work in a vacuum. Ask any famous author what their favorite books are, and youโll receive title after title of books that inspired their own writing. In fact, some of the best writers often spend more time reading other authorโs works than writing their own. And theyโre not just skimming through the books, theyโre studying them: dissecting and analyzing the choice of words, sentence structure, style, tone, and vivid scenery. They notice which literary rules the original author followed and, perhaps more importantly, which ones they broke. By observing how great authors bend the rules of their language, writers become better at their craft, and they adopt similar techniques and styles in their own writing.
The same is true for software engineers. You must study other programmersโ code in order to understand how their programs work. Youโll learn new design patterns, ways to structure your codebase, optimization techniques, algorithms, novel solutions to complex problems, and so much more.
Reading code from better programmers will help you become a better programmer, plain and simple.
Youโll mostly be reading code written by your coworkers, which is great because you can ask them about specific details when you have questions about their code. You may be reviewing their code in a pull request or reading code in a specific part of the codebase youโre working on. Your team members are an excellent resource for learning, so make sure you utilize them when youโre having trouble understanding a specific piece of code. Donโt hesitate to ask questions if you donโt understand a piece of code.
Additionally, with the rise of open-source software, you have an incredible amount of resources available to you online. Reading code from popular open-source projects is an excellent way to learn how other programs are structured, and you can follow along in the open issues, pull requests, and discussions around how new features and bugs are fixed and merged into the main branch. GitHub, GitLab, Bitbucket, and other websites have millions of open-source code repositories available online, so itโs easy to find some popular projects in your favorite language. You can even subscribe to get updated on all new issues if you find a project you want to follow along with.
So, now that weโve gone over the benefits of reading code and why you should read other peopleโs code, letโs jump into some specific tools and techniques you can use to improve your code-reading skills.
First things firstโfigure out where the program starts. To execute a program, the loader (typically an operating system) will pass control of the process to a programโs entry point, which begins the run-time execution of the application.
The entry point is the place where a program begins, and itโs important to know what the program is doing once it begins executing the code. When you follow a program from the entry point, youโll be able to follow the application as it boots up and configures itself to do whatever work it was designed to do.
Some programming languages may enforce conventions for how or where a program should start, while others may give more freedom in how a program is executed.
C-family languages, such as C, C++, and Rust, and JVM languages such as Java contain a predefined function called main.
Interpreted languages like JavaScript, Python, Ruby, and PHP will simply begin execution at the first statement.
Once your program has control of the process and has begun execution, it will be able to access command-line arguments and environment variables that can be used to dynamically configure the behavior of your application during run time. The program may contain specific logic to check for these arguments or environment variables in order to change the run-time behavior of the application without needing to recompile or redeploy the application.
Itโs important to know where and how your program starts because that may give you valuable information as to how the program is configured, which could affect how the program behaves. If you donโt know what run-time configurations your program is using, you may not fully understand what itโs doing, so this is always a good first step.
Your integrated development environment (IDE) is one of the most important tools you will use when reading code. Your IDE gives you a set of tools to analyze and manipulate your codebase, so choosing a good IDE will help you navigate the code efficiently.
When reading code, youโll want an IDE that lets you jump to a function definition. This feature is crucial for learning and studying a new codebase, and most modern development environments should support this functionality. This allows you to jump through the codebase to see where a function is defined, which is useful whenever you come across a function call youโre not familiar with.
This feature gives you the ability to step through the codebase and follow the execution path, which helps you build a mental model of the code and what itโs doing. Itโs a great way to explore unfamiliar code and can help you get up to speed quickly.
When you jump to a function, take note of the file name and directory structure where the function lives. You can learn a lot about the structure of an application just by observing how things are organized.
Most IDEs that allow you to jump to function definitions should also give you the ability to move in the opposite direction as well. When youโre looking at a function, you might want to know all the places where itโs used within the codebase, which is helpful if youโre trying to track down a bug or refactor a piece of code. The ability to see all places where a function is called is equally as powerful for learning and understanding a codebase.
If your IDE doesnโt offer these basic features, consider switching to one that does. Once you get in the habit of navigating around the codebase by jumping from function to function, youโll wonder how you ever lived without it.
Development tools arenโt perfect, and sometimes our IDEs wonโt be aware of the entire structure of the codebase. Perhaps you have some code that is called dynamically or your language supports metaprogramming, both of which can be difficult for IDEs to understand. In some cases, you may need to use other tools like grep
or git grep
instead, which give you the ability to search your codebase for specific patterns such as variables, functions, class names, or constants.
For example, you may come across a function called findNearbyLocations()
while reading some code. In order to find all locations where that function is called, you can run the following command from your projects root directory:
$ grep -r findNearbyLocations *
That command will recursively search all directories in your codebase and output the lines where the term โfindNearbyLocationsโ occurs. With this information you can pull up each file to see how that function is used. When you can see where a certain term is used throughout the codebase, you gain a better understanding of what the program is doing.
Most of the time, youโll want to search recursively using the -r flag, although this means it will also search in folders we may not want to query, such as dependency directories that contain large amounts of third-party code. While grep
gives you the ability to exclude certain directories from your search, it may be annoying to have to manually exclude them every time.
Fortunately, if youโre using git
for version control, there is a command called git grep
that works similarly, except that it automatically ignores any files and directories that are defined in a file called .gitignore
. This makes it much easier to query your codebase without having to sift through files and directories youโre not interested in.
With these tools, you have a way to query your codebase any time you come across a function youโre not familiar with. This will help you learn how a function works, what parameters it expects, what the return values are, and where else itโs used in the codebase. Using these tools will help you to better understand what the code is doing and how it is organized, and will ultimately help you build and refine your mental model of the codebase.
grep โ Reference Page (man7.org)
git-grep โ Reference Page (git-scm.com)
ripgrep โ GitHub Repository (github.com)
When youโre reading through code, you may want to know when it was last changed. If youโre using git
, thereโs another tool called git-blame
, which displays the last revision and the author who most recently modified each line of a file that youโre interested in. This is useful for determining when certain functions were last modified and by whom.
Use the command below to view the last revision and last person to touch each line of a file:
$ git blame <file>
โconfusionโ It should be mentioned that git-blame
โs intentions are not to actually blame someone for writing a bad piece of code, and hopefully you wonโt use it for that purpose. Itโs simply another tool at your disposal for understanding the code and how it evolved.
You should consider using git-blame
when working on a bug youโve been assigned to, or when you have questions about a specific function. Git-blame will give you clues as to who you should talk to first when you have a question regarding specific lines of code.
Depending on the age of the codebase, the most recent author may no longer be with your company. If thatโs the case, you wonโt be able to ask them any questions, but youโre not out of luck. With git-blame
, you will still be able to find the commit hash, which you can use to view the full context of the changes. Oftentimes, being able to read the commit message and see all the other changes that were made in the same commit will give you more context for why the change was made.
If youโre still not able to find any developers who are familiar with the code youโre looking at, use git-blame
to find the developers who made modifications to other parts of the file and ask them if theyโre familiar with the code in question. Chances are youโll be able to find someone who has worked in that part of the codebase before or reviewed the pull requests for the code in question.
While git-blame
shows you who made the most recent changes to each line in a file, sometimes you might be more interested in the history of a single file and how itโs changed over time. Git offers a useful tool called git-log
that lets you inspect the commit logs for a given file.
Use the following command to view a reverse chronological list of commits where changes were made to a file:
$ git log <file_path>
This will give you a full history of all commits to the file so youโll be able to see who made changes to it and, more importantly, when they made those changes. Just as with git-blame
, you can use git-log
to find the developers who made the most recent changes to a file, because they should be the ones you reach out to first.
If you suspect a bug is located in a certain file, use git-log
to view when a file was changed and by whom. Itโs extremely helpful if you know when a bug was first reported or when an error started popping up in your logs. You can use git-log
to line up errors with changes made to specific files, which may help you pinpoint when bugs may have been introduced into the codebase.
As youโre reading through code, you will need to hold a mental model of the data in the system and how it is manipulated as the business logic is applied. Some code may be easy to follow, but you may find yourself deep in the codebase without any idea what the data looks like when it reaches a certain function. In these situations, itโs sometimes useful to lean on your logging system to print some data to your log files so that you can inspect it.
Add a few log statements with data youโre interested in. This could be certain values of variables or object properties, or it could be an arbitrary text string that will give you some useful information if you see it in your logs. Either way, setting log statements throughout your code is a quick and easy way to get a snapshot of what your data structures look like at a point in time when the code is executing. Sometimes, a well-placed log statement can reveal a bug youโve been tracking down, or it can expose certain things that help you understand what the code is doing.
All programmers rely on logging to gain insight into what their code is doing, so donโt feel like itโs the wrong way to debug your code. Even the most experienced engineers rely on logging when theyโre developing new features or tracking down a hard-to-find bug.
Occasionally, youโll come across code that you wonโt understand no matter how many log statements you add. Wrapping your head around confusing code is frustrating, especially if youโre trying to figure out how some piece of data is being manipulated. While you may be able to figure it out with enough log statements, itโs messy to add them all over your codebase just to piece together whatโs going on. Sometimes a debugger is the better tool for the job.
When you distill a program down to the simplest form, itโs really just taking some inputs, manipulating the data structures, and producing output somewhere. To really get a grasp on how everything works, you need to understand how the data changes as it moves through the system. While itโs helpful to read through code and build a mental model of what the data structure looks like, itโs sometimes easier to visualize the program with a debugger and observe how the data changes as it moves through the system.
If you have a debugger configured, youโll be able to see what the data looks like at each breakpoint you set. As you step through the debugger, focus on the data and how it changes as you step in and out of functions.
An underrated technique for studying an unfamiliar codebase is to read through the automated tests. While itโs not the most glamorous part of the codebase, thereโs an enormous amount of institutional knowledge stored in the test files. Automated tests are where past and present developers have codified the specifications the application is expected to operate within.
Most young developers donโt realize that a mature test suite will show you exactly how a program should perform, because each test thatโs added to the suite should be designed to validate a specific part of the program for a specific scenario. As you read through the test cases, youโll see what edge cases the tests handle and what the expected outcomes should be.
Additionally, the assertions in automated tests will show you what the expected output should be when you call a function. Assuming the tests are passing, this gives you a clear picture of how the system works and what application states you should expect.
Codebases are complex, plain and simple. A codebaseโs complexity can be roughly estimated as proportional to the number of engineers who have contributed to the codebase multiplied by its age. As more developers contribute to a codebase over time, the complexity continues to increase.
Itโs almost impossible to understand every line of a codebase, especially if you didnโt write it yourself. In fact, even a solo developer who has written every single line of a codebase will forget the details and context of parts of the system over time. They may come back to a file they wrote months ago and struggle to remember how it works.
Setting the right expectations now will help reduce your frustrations in the future. Itโs okay if you donโt understand how every line of code in a program works.
As developers, itโs our job to form a mental model of how a program works, and how the pieces fit together to form a complete system. You have a limited capacity in your brain to hold this mental model, and eventually, youโll hit a saturation point where youโre not able to hold the entire mental model in your head at once. As you learn new parts of the system, you may forget other parts you havenโt visited in a while. Itโs natural and common among all software engineers.
Depending on the size of the codebase, it may even take years to feel like you know your way around it. It certainly doesnโt help that the codebase is constantly changing as new features are added, bugs are fixed, tests are written, algorithms are optimized, and engineers come and go. Part of the system you understood months ago might have been refactored since then and now works completely differently. Youโll always be chasing a moving target, so donโt beat yourself up if you donโt understand every corner of a codebase.
The best thing to do is to accept that you wonโt have a deep understanding of every single part of a codebase, and thatโs okay. As long as you work hard to form a mental model about the parts youโre responsible for, things will start to make more sense. It wonโt happen all at once, but given enough time, the picture will become clearer and clearer. The trick is to be patient and get comfortable with reading unfamiliar code, because youโll be doing it for your entire career.
Learn to Read the Source, Luke (blog.codinghorror.com)
How to Read Code (Eight Things to Remember) (spin.atomicobject.com)
How to read a code (iamjonas.me)
Reading Code is a Skill (trishagee.com)
As software engineers, we often get caught up in the day-to-day details of our job without even knowing it. We make hundreds of decisions each day, such as the architecture of our programs, what to name our variables, when to add a new function, which ticket to work on, how to design our database schema, and so much more.
While these are all fun decisions to make, they require us to consider the long-term implications of our choices, debate the pros and cons, and ultimately settle on a solution. There are so many choices to make that sometimes we fail to see how an individual decision fits into the grand scheme of things. We lose sight of the bigger picture because weโre so focused on the details of the current problem weโre trying to solve.
As you gain experience and progress in your career, youโll learn how your decisions fit into the overall system, and your decision-making skills will evolve. Youโll start to comprehend the trade-offs between solutions and understand the positive and negative impacts your decisions could have on the business. Youโll start to understand the implications of changing one part of the system and how it affects other parts. Eventually, youโll improve your ability to know which decisions add the most value to the customers and the business, and to prioritize those decisions above the others.