A stream is a bunch of information organized to be read linearly from beginning to end. This is like a video tap. You start reading at one end and read straight through to the other. The data you read off the tape could be called the video stream.
The important thing about a stream is that to display the image at a given point you only need to know the data up to that point and usual you also only need to know a small amount before that point as well.
Encapsulation is taking a set of streams and combining them into a single large stream or file. The most obvious use for this is to combine a video and audio stream together so that they can be played back together. Also subtitle streams and alternate audio and video streams can be combined.