In this mini-Fragment episode, Donn talks about Item #13 of the Effective Java series – Minimize the accessibility of classes and members. You’ll learn why it’s important to limit the access on your public API, how it can help you with development and performance. You’ll also learn how changing a public API can affect the consumers of your API, for good and bad.
Donn Felker: Today, I’m going to talk about Item #13 from Effective Java, by Joshua Bloch. For those of you who are just joining us, Kaushik and I are covering all of the items inside this book in relation to how they apply to Android developers. We’ve already gone through the first 12, so I’ll be talking about #13: Minimize the accessibility of classes and members.
Joshua puts this very well right out of the gate, so I’m going to read this verbatim:
The single most important factor that distinguishes a well-designed module from a poorly designed one is the degree to which the module hides its internal data and other implementation details from other modules. A well-designed module hides all of its implementation details, cleanly separating its API from its implementation. Modules then communicate only through their APIs and are oblivious to each others’ inner workings. This concept, known as information hiding or encapsulation, is one of the fundamental tenets of software design.
To be honest, I spent about ten minutes trying to think of a better way to say that, so that I didn’t have to quote it exactly, but it was too hard. Joshua hit this nail right on the head. The way that he said this was so succinct and perfect that I couldn’t come up with a better way to say it.
Why is it important to hide the implementation details and only expose parts of your API? While there are many reasons, it’s mostly due to how it decouples your modules from how people are consuming them. Furthermore, it allows you to speed up the system development, since the different modules underneath the hood can be developed in parallel.
Suppose you’re building a library or module which has a public API. All of the internals of that could be split apart to multiple different developers, and the work could be done internally. Then the actual API component could be delivered to some other third party, so that they can code against that—basically, an interface. It’s the API which they’re going to be working with.
The great thing about this is that since you’re hiding all of the implementation details, you can actually build those modules internally without worrying about breaking anyone else’s code. As long as you adhere to the public API that you’ve put out, and you’re not exposing the internals of how you’re doing all of the work, then you don’t need to wonder, “Hey, are we breaking anybody else?” You can actually design it, test it, make sure that it works, and move much more efficiently behind a wall. One of the great things about this is that it allows you to essentially ship faster. If you’re able to do all of these things without impacting the consumers of your API, you can deliver the product quicker.
Furthermore, while this does not give immediate performance benefits, it does enable effective performance tuning. Let’s assume that you’ve built your library. You have a public API of 5-10 methods or classes that people are using. Underneath it, you have 50 other files. At some point, you realize: “Wow, this one part of the library is super slow. How can I fix that? Well, I know that it’s in this component down here. Because it’s part of the internals of my library, and I haven’t exposed those internals, I can hop in there and change the way that works.”
Maybe you’re using an improper data structure, or maybe it’s something simple, like a memory leak. You can go in and change the internals of that underlying part of the library, and then ship an update to that library. The interface to it (the public API) doesn’t change, but the internals have now changed. You’re enabled to have some type of effective performance tuning without affecting any consumers of the API.
Okay, that makes sense, but how does Java facilitate any type of information hiding? Well, the access control mechanism that Java uses is something which we use every day: modifiers like public, protected, private, and all of the other types of modifiers that are essential to information hiding. To most of us, they’re second nature. We really don’t think about them at all. But understanding when to use them is key to developing software. The rule of thumb that Joshua clearly gives here is to make each class or member as inaccessible as possible. In other words, use the lowest possible access level that you can while enabling your application or library to do the work that it needs to do.
For top-level classes and interfaces, you’re basically always going to go with public. Some people do go with packaged private, and there are some weird idioms around that which you can follow, but these top-level classes are the things that people will implement. They’re basically not nested. You’re going to make them public. They’re wide-open—things that folks can use.
Here’s the thing, though: if you make something public, you’re obligated to support it forever to maintain compatibility. For those of you who listen to one of our other favorite podcasts—”Android Developers Backstage,” hosted by Chet Haase and Tor Norbye—I’ve heard Chet talk about building an API on various occasions. When you make a decision to implement a particular API, it’s the best decision you can make at that time. The Android developers can make an API public. At that point in time, it’s the right decision, but two months down the road they may have realized (after they shipped that Android OS update) that it was actually a bad decision. That API probably wasn’t the best one. They may have made note of this in the podcast, since hindsight is 20/20, but they still can’t go back and change it.
Why? Let’s assume that Android has shipped a public API. This has happened before. Once it’s released, all kinds of apps suddenly start using it: Facebook, Instagram, Twitter—you name it. Then the Android team determines: “Wow, that was not the best decision. We should not have made that one public.” Or, “We let that slip through a code review by accident.” Any number of things can happen. But now they have to support it anyway.
Suppose they decide to yank it in the next version of Android. What would happen? Let’s assume that a major application like Instagram is using the API. I’m not saying anything negative about Instagram. I’m just using them as an example. This can happen to any application, but I’m using Instagram since it’s a very popular application that would affect millions of people. Anyway, folks decide to update their phones up to the newest version of Android. So they update them, and then open the Instagram app—and the app suddenly crashes. If the Android team removed an API that they’ve already released as a public API, they would not be maintaining backwards compatibility. In other words, as soon as a new version of Android comes out and someone updates to it, the app is going to crash. It’s going to look for that method, but it’s just not there.
What the Android team has done over time is to mark these classes or methods as deprecated and provide alternatives: “Please start using this one.” You’ve probably seen deprecation notices all throughout your applications. AndroidLint does a good job of catching these in the IDE so that you can see them. It puts a little strike-through in the IDE, saying, “That doesn’t look right.” That usually means that you’ll have to take a look at the implementation. You’ll hit Command+B (if you’re on a Mac) and look at the definition of that class. It will actually have some notes in the Javadoc, such as, “This class is now deprecated”, or “This method is deprecated. Go use X, Y, or Z.” Then you can decide which one to use.
To kind of bring this full circle, this is important as an API developer because you have to make the best decisions possible. Realize that when you release something with a public API, you’re stuck with that API. You can’t just pull it, even if you hate it, don’t like it, or it just makes it painful. You have to keep it and work around it.
Of course, there will always be times when people remove them, which can cause pain and so forth. Is that something you would want to do? That’s up to each team to determine. Maybe you’re developing an shared internal library at your company that’s used by just ten other apps. Then you can make a judgment call and say, “Hey, we’re going to yank this API. It was a bad decision. Let all the other developers know so that we can update it within the company.” However, if you’re releasing a public SDK (for instance, the Facebook SDK) and you remove a method, that could become very problematic. All of a sudden, people’s applications break, and they have to fix them. This has happened to all of us before. We’ve used a third-party API, things change when versions change, and stuff suddenly breaks.
This is also why it’s very important to specify exact version numbers in your build.gradle file. Let’s say that the version of the library that you’re depending on is 9.0.1. But in your build.gradle file, you’ve said “9.0.+”, or even “9.+”. What that’s basically telling gradle is, “You know what? So long as it’s version 9 or 9.0, it’s cool. Don’t worry about it. Just pull the new one down.”
Many developers have had weird instances where they build something in the morning, do a bunch of development, rebuild, and it’s broken. Why is that? Perhaps it’s because some other developer has released a new version of that library, upgrading it to 9.0.2. That had a breaking change in it, which removed or changed an API call. Suddenly, an app that worked 10 minutes ago doesn’t work anymore, even though you didn’t change anything. Why is it breaking? Because you didn’t specify your exact version number in your build.gradle file. This is a good reason why you want to use a particular version number in build.gradle. We’ll go into this in detail here on Fragmented at a later time, but this is something that can affect you as an Android developer.
Let’s step back into this effective Java item and talk about how you can lock down access to particular parts of you application or module. Let’s assume that you’re building a module with a bunch of internals. You’ve probably looked at the source of many open source projects and noticed that they have a package called something like “com.example.fu.internal”. It’s a very common pattern for folks to put everything they want internal to the library inside of an internal package. A lot of the time, everything that’s outside of the internal package will be public. This isn’t written in stone. It’s just a common practice that a lot of folks do.
When they put those things in those packages, you’ll notice that they use different Java access modifiers. Let’s talk about this for a second. We have four different access modifiers. The first one is “private”. Private is basically saying that members are only accessible from the top-level class where they’re declared. Say we have a “customer” class with an integer age field called “private.int.age”. That age field is only accessible within that instance of that class. Nobody else can access it. If someone’s trying to call “customer.age”, they cannot get access to it. It’s not allowed.
The second modifier that we have is “package private”. This one says that the member is accessible from any class in the package where it is declared. This is technically known as default access. This is the access level you get if you don’t provide any type of access modifier on a member. Let’s say you create a class called “customer”. That class is now package private. Anything that’s in the same package can access that class. Again, let’s just use that integer age field. But instead of saying “private.int.age”, we just say “int.age”. Any class that’s inside of the same package can access that age, because it’s package private. But only classes within that package can access all of those members.
The third modifier is “protected”. Protected is accessible from subclasses of the class where it’s declared. Again, let’s say that we have a “customer” class and with an “age” field, and we’ve used “protected.int.age”. At this point, we’ve said, “You know what? We need to change our application to support platinum customers and gold customers.” So you inherit from the superclass, which is now customer, and have a sub-class of customer called “platinum customer”. Well, because the customer’s age is marked as protected, the platinum customer can access the age. The “gold customer”, which is also a sub-class of customer, can also access the age. But anything else—perhaps an employee object, which is a completely different class—cannot access the age, because they’re not a sub-class of “customer”.
Finally, we have the very popular “public”. This is a member that’s accessible from anywhere. This is one that we’re very familiar with. When you mark something as public, anyone can access it. If you have that customer class again, and mark that age as public.int.age, anyone can access that age. We can have a whole bunch of problems there, which I’ll talk about in a second, but it’s the most open option. As soon as you mark something public, you basically need to support it from that point forward.
Let’s talk about public classes for a second. If you use a public class, you have a huge increase in accessibility. The access level goes from package private—basically saying that you have to be a part of this package to access it—to protected. A protected member is part of the class’s exported API and must be also supported forever. Let’s say again that we have a library which exposes a customer object that was previously package private. No one could talk to that int unless they were in the same package. But your application, since you’re consuming that third-party library, is not part of the same package. You don’t have access to that integer.
Now let’s say that we changed that from the default access level to protected, for some reason. As soon as we do that, we now open that that as part of the public API. Now I can sub-class the customer object and create my own type of customer. I have access to that age, because it’s now a protected access modifier level. It’s part of the public API, so be very aware of that. As soon as you change something that was package private before to protected, it does become part of your public API. That’s something that you need to think about quite a bit.
There’s one rule that you need to make sure you’re aware when you’re trying to reduce accessibility of methods. If you have a method which overrides a superclass method—maybe it calculates some type of value for your object, and you override that int for me in the subclass—you cannot change the visibility of that, saying, “Hey, this was protected before. Now I want it to be private.” If you do, the compiler is going to complain and not let you do it. You have to keep the same accessibility.
I’ve been on both sides of the field, as an API developer as well as an API consumer. Throughout the years, there have been many times when I wished a particular class or something of that nature was much more accessible. Maybe I wished it was public, so that I could have access to it. The main reason for that was that either I needed to provide some sort of implementation which the class didn’t allow, or I needed to facilitate testing. If you need to facilitate testing, you may be tempted to make everything more protected or more public so that you can have access to it. But realize that that’s not necessary. If you’re building the API yourself, you can provide an interface for folks to use. They can go ahead and mock things out, and with new versions of the mocking tools, you could even mock final classes, which is amazing.
But if you’re worried about testing a library that you’re going to be publishing to external folks, you don’t have to make all of those things public. Your tests are going to be internal to your application anyway, so if you have things that are package private, you can leave them that way. Your tests can still access those, because you’re part of the same package. There’s not really a need to open up your API for testing as much as you think you need to.
Joshua makes a couple of other statements inside of this item which are very important. One of those is that instance fields should never be public. Say it’s that customer’s age again. He’s saying, “Never mark that public, because you give up the ability to take any action when the field is modified. Classes with public, mutable fields are not thread-safe.” That was a big “Aha” moment for a lot of folks—especially me, when I first read this. I wasn’t super aware of that, and never really thought of that in too much detail.”
This also applies to static fields, with one exception: you can expose constants via public static final fields, assuming that the constants are basically primitive values or references to immutable objects themselves. You’ll want to be aware and stick to that rule, except for that one little variation there.
Joshua also makes mention of something that’s very tricky when you think about it: non-zero length arrays are always mutable, so it’s wrong for a class to have a public static final array field or an accessor that returns such a field. What that means is that if you’re returning an array of customer values, that can always be mutated outside of the class. You’ll run into a bunch of problems, crazy bugs, and security holes when you expose a public static final array field or an accessor that returns such a field itself. You want to be careful around that as well.
You may be wondering, “I need to return an array of things. How do I do that?” There are a couple of ways to fix the problem. You can make the public array private and add a public, immutable list; you can use things like collections.unmodifiableList, which will return an unmodifiable list using the collections utilities; or you can make a copy of that private array and return that copy, so that it’s not the same one. That could be confusing to folks, who think, “Hey, I’m getting back an array. I think I can modify it. It should modify my existing array in memory,” which could add some confusion to your application. You may want to be aware of that too.”
In the end, you need to choose between these alternatives if you’re returning an array, and think about what the client is likely to do with the result. Which return type will be more convenient for him, and maybe even what will give him more performance?
That kinda wraps it up for Item #13. It’s all about accessibility and minimizing your API footprint. Remember: if you mark something as public, or it’s available as part of the public API, you’ll have to support it, because people are going to rely on it. You want to hide the internals away. Why is it easy to hide the internals away? It allows us to develop in parallel much quicker. We can iterate and build tests. It also enables effective performance tuning, because if that library we ship doesn’t really function that well, we can hop in there, take a look at the internals, fix it, ship another version, and still not break the public API. I hope that helps. Talk to you next time.
Transcription and editing provided by 10:17 Transcription