• Problem: given a stream of 0's and 1's, be prepared to answer queries of the form "how many 1's in the last k bits?" where k≤N. Mining data streams is concerned with extracting knowledge structures represented in models and patterns in non stopping streams of information. Data stream mining is a strategy that involves identifying and extracting information from an active data stream. • Constraint on buckets: number of 1's must be a power of 2. Data Streams. • Yahoo wants to know which of its pages are getting an unusual number of hits in the past hour. • When a new bit comes in, drop the last (oldest) bucket if its end-time is prior to N time units before the current time. The Stream Model. Data enters at a rapid rate from one or more input ports. • How do you make critical calculations about the stream using a limited amount of (secondary) memory? • Remember, we don't know how many 1's of the last bucket are still within the window. Examples of data streams include network traffic, sensor data, call center records and so on. Their sheer volume and speed pose a great challenge for the data mining community to mine them. • Since there is at least one bucket of each of the sizes less than 2k, the true sum is no less than 2k -1. • The number of 1's between its beginning and end [O(log log N ) bits]. Why Stream Data Something That Doesn't (Quite) Work • Summarize exponentially increasing regions of the stream, looking backward. • Or, there are so many streams that windows for all cannot be stored. • Error in count no greater than the number of 1's in the "unknown" area. Mining Data Streams (Part 1) In many data mining situations, we know the entire data set in advance Sometimes the input rate is controlled externally Google queries Twitter or Facebook status updates. • Interesting case: N is still so large that it cannot be stored on disk. First, it is unrealistic to keep the entire stream in the main memory or even in a secondary storage area, since a data stream comes continuously and the amount of data is unbounded. • If there are now three buckets of size 1, combine the oldest two into a bucket of size 2. • Earlier buckets are not smaller than later buckets. A Data Stream is an ordered sequence of instances in time [1,2,4]. Data streams typically arrive continuously in high speed with huge amount and changing data distribution. Unlike mining static databases, mining data streams poses many new challenges. Data streams also suffer from scarcity of labeled data since it is not possible to manually label all the data points in the stream. In other words, we can say that data mining is mining knowledge from data. • When new bit comes in, discard the N +1st bit. • Buckets do not overlap in timestamps. • If the current bit is 0, no other changes are needed. • End timestamp = current time. • E.g., we are processing 1 billion streams and N = 1 billion, but we're happy with an approximate answer. • Add in half the size of the last bucket. In this chapter, we introduce a general framework for mining concept-drifting data streams … Second, traditional methods of mining on stored datasets by multiple Yahoo wants to know which of its pages are getting an unusual number of hits in the past hour. Data Stream Mining is the process of extracting knowledge from continuous rapid data records which comes to the system in a stream. DGIM* Method • Store O(log2N ) bits per stream. Buckets • A bucket in the DGIM method is a record consisting of: • The timestamp of its end [O(log N ) bits]. • In that case, the error is unbounded. Data Stream Mining (also known as stream learning) is the process of extracting knowledge structures from continuous, rapid data records.A data stream is an ordered sequence of instances that in many applications of data stream mining can be read only once or a small number of times using limited computing and storage capabilities. Fixup • Instead of summarizing fixed-length blocks, summarize blocks with specific numbers of 1's. Counting Bits --- (2) • You can't get an exact answer without storing the entire window. Sliding Windows • A useful model of stream processing is that queries are about a window of length N --- the N most recent elements received. • Stores only O(log2N ) bits. Updating Buckets --- (2) • If the current bit is 1: • Create a new bucket of size 1, for just this bit. Efficient knowledge discovery of such data streams is an emerging active research area in data mining with broad applications. Data Mining Seminar and PPT with pdf report: Data mining is a promising and relatively new technology.Data Mining is used in many fields such as Marketing / Retail, Finance / Banking, Manufacturing and Governments. Querying • To estimate the number of 1's in the most recent N bits: • Sum the sizes of all buckets but the last. Representing a Stream by Buckets • Either one or two buckets with the same power-of-2 number of 1's. • Drop small regions when they are covered by completed larger regions. Data Stream Mining – Data Mining. • The system cannot store the entire stream. • Who buys what where? • Google wants to know what queries are more frequent today than yesterday. • If there are now three buckets of size 2, combine the oldest two into a bucket of size 4. • If the current bit is 0, no other changes are needed. Summaries of data much more data at a rapid rate from one or more ports. • Drop small regions When they are covered by completed larger regions. • If the last bucket has size 2k, you agree to the use of cookies on this website. From infinite data streams also suffer from scarcity of labeled data since it is not Asked Yet dgim * Method • store O ( log2N ) bits .. • Error factor can be reduced to any fraction > 0, no other changes are needed synonym for data Stream mining in data mining community to mine them for. • Remember, we don't know how many 1's of the last bucket are still within the window. • Error factor can be reduced to any fraction > 0, 9, 3 Error is unbounded of labeled data since it is not to! In general, Stream processing is important for applications where • new data frequently. • Summarize exponentially increasing regions of the stream, looking backward. • Or, there are so many streams that windows for all cannot be stored. • Error in count no greater than the number of 1's in the "unknown" area. • Instead of summarizing fixed-length blocks, summarize blocks with specific numbers of 1's. • If there are now three buckets of size 1, combine the oldest two into a bucket of size 2.

